Documents
GRIMPLATE: FIrst Steps Toward Identifying Adversarial Use of BitTorrent
Sep. 13 2017 — 6:32p.m.
UNCLASSIFIED//FOR OFFICIAL USE ONLY GRIMPLATE First Steps Toward Identifying Adversarial Use of BitTorrent Network Operations Center NSA/R4 Derived From: NSA/CSSM 1-52 Dated: 20070108 Declassify On: 20370117 The overall briefing is classified TOP SECRET//COMINT//REL FVEY
CONFIDENTIAL Agenda • Motivation • BitTorrent’s TCP and UDP layers • DHT overview • What does it mean to crawl DHT? • Pilot implementation • Collaboration CONFIDENTIAL
TOP SECRET//COMINT//REL FVEY GRIMPLATE Motivation • BitTorrent sessions are seen on a daily basis between NIPRnet hosts and adversary space (PRC, RU, etc.) • NTOC has no way of knowing if this is innocuous file sharing or malicious activity. • Peer-to-Peer (P2P) is not allowed on NIPRnet, but most commands do not see it as harmful. • If we can glean some indication of the type of data that's leaving NIPRnet, we can build a case for shutting this activity down. • Interest is not limited to NIPRnet scenario TOP SECRET//COMINT//REL FVEY
UNCLASSIFIED//FOR OFFICIAL USE ONLY BitTorrent’s TCP and UDP Layers • TCP – Used to exchange pieces of files amongst peers • UDP – Used to exchange routing messages • Who should I ask for file pieces? UNCLASSIFIED//FOR OFFICIAL USE ONLY
UNCLASSIFIED//FOR OFFICIAL USE ONLY BitTorrent DHT • Nodes: clients participating in DHT • Peers: clients participating in piece exchange to share file • DHT: distributed key, value store • Nodes have 160 bit pseudo-random node ID • Keys are 160 bit hash of .torrent file metadata - info_hash • Values are list of IP addresses and ports of peers mapped to info_hash UNCLASSIFIED//FOR OFFICIAL USE ONLY
UNCLASSIFIED//FOR OFFICIAL USE ONLY Mainline DHT Messages ping Query = {“t”:”aa”, “y”:”q”, “q”:”ping”, “a”:{“id”:”abcdefghij0123456789”}} ping Response = {“t”:”aa”, “y”:”r”, “r”:” {“id”:”mnopqrstuvwxyz123456”}} find_node Query = {“t”:”aa”, “y”:”q”, “q”:”find_node”, “a”:{“id”:”abcdefghij0123456789”, “target”:”mnopqrstuvwxyz123456”}} find_node Response = {“t”:”aa”, “y”:”r”, “r”: {“id”:”0123456789abcdefghij”, “nodes”:”def456…”}} get_peers Query = {“t”:”aa”, “y”:”q”, “q”:”get_peers”, “a”:{“id”:”abcdefghij0123456789”, “info_hash”:”mnopqrstuvwxyz123456”}} get_peers Response, with peers = {“t”:”aa”, “y”:”r”, “r”: {“id”:”0123456789abcdefghij”, “token”:”aoeusnth”, “values”: [”axje.u”, “idhtnm”]}} get_peers Response, with closest nodes = {“t”:”aa”, “y”:”r”, “r”: {“id”:”0123456789abcdefghij”, “token”:”aoeusnth”, “nodes”:”def456…”}} Announce peer = {“t”:”aa”, “y”:”q”, “q”:”announce_peer”, “a”:{“id”:”abcdefghij0123456789”, “info_hash”:”mnopqrstuvwxyz123456”, “port” : 6881, “token” : “aoeusnth”}} Response = {“t”:”aa”, “y”:”r”, “r”: {“id”:”0123456789abcdefghij”}} UNCLASSIFIED//FOR OFFICIAL USE ONLY
SECRET //REL FVEY What’s it mean to crawl DHT? • Goal: Harvest complete node list for entire DHT and peer list for info_hashes found in NIPRNET defensive tools or SIGINT • Regular client node lookup is iterative process – O (log n) search – routing table is starting point • Approach: – spray find_node messages across DHT and store responses – query for peers of info_hashes of interest SECRET //REL FVEY
SECRET //REL FVEY What does DHT crawler collect? • For each node in the DHT: – 160 bit node ID – IP address – Port • For targeted info_hashes: – List of the node ID, IP address, and port of nodes sharing targeted file – Entries may be stale SECRET //REL FVEY
SECRET //REL FVEY What value is the data? • Use “community detection” algorithms to identify swarms that are likely to be malicious • Download files being shared by likely malicious swarms • Build BitTorrent mitigation case for NIPRnet • General SIGINT reporting • File download without identification of likely malicious swarms impractical SECRET //REL FVEY
TOP SECRET//COMINT//REL FVEY Pilot on PACKAGEGOODS Server • Deploy modification of existing crawler – dedicated PG server • Run analytics on “swarm” metadata to determine malicious activity • Experiment with subnet range and ID space and message interval to determine server processing and bandwidth requirements • Test if crawler catches info_hashes we see from target in XKS • Must we proactively collect peers to address “SIGINT lag”? TOP SECRET//COMINT//REL FVEY
TOP SECRET//COMINT//REL FVEY SIGINT Lag • BitTorrent “swarm” may be inactive by the time target info_hash reported by SIGINT system • May require preemptive collection of peers – DHT has on the order of 8 active million nodes – info_hash/DHT address space: 2^160 TOP SECRET//COMINT//REL FVEY
SECRET//REL FVEY Next Steps • Enhanced analytics – Community discovery • Distributed crawler • Peer pre-fetch • Target file download – avoid lending “utility” SECRET //REL FVEY
TOP SECRET//COMINT//REL FVEY Prior Work GCHQ - SEBACIUM POC: CES – XKS schema/micro-plugin Prototype analytics POC: TAO-ROC – OGC approval for operational tests PACKAGEGOODS connection POC: TOP SECRET//COMINT//REL FVEY
TOP SECRET//COMINT//REL FVEY GRIMPLATE Collaboration CES - Digital Network Exploitation Applications NTOC V25 - Malicious Activity Discovery-Characterization V45/47 – Technology Development V46 – Technology Planning and Assessment S2B – Office of China and Korea, CNE Access Development Branch S2H – AP Russia Production Center, Russia SIGINT Development Division TAO-ROC - Production Operations Division TOP SECRET//COMINT//REL FVEY
CONFIDENTIAL CONFIDENTIAL
UNCLASSIFIED UNCLASSIFIED