Documents

SKYNET: Courier Detection via Machine Learning

May. 8 2015 — 10:26a.m.

/20
1/20

TD USA. SKYNET: Courier Detection via Machine Learning RBGFIJHU SZISITD June 5, 2012 TDP TU USA.

TOP SECRET//COMINT//REL TO USA, FVEY Given a handful of courier selectors, can we find others that “behave similarly” by analyzing GSM metadata? It’s worth noting that: • we are looking for different people using phones in similar ways • without using any call chaining techniques from known selectors • by scanning through all selectors seen in Pakistan that have not left Af/Pak (~55M) TOP SECRET//COMINT//REL TO USA, FVEY

TOP TO USA, FVEY From GSM - spect - selector?s pattern-of-life and travel behavior itFilmr', ana JJHBII Falsalabnd TOP USA,

TOP SECRET//COMINT//REL TO USA, FVEY This presentation describes our search for AQSL couriers using behavioral profiling Behavioral Feature Extraction Cross Validation Experiment on AQSL Couriers Preliminary SIGINT Findings TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY Counting unique UCELLIDs shows that couriers travel more often than typical Pakistani selectors TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY By examining multiple features at once, we can see some indicative behaviors of our courier selectors TOP SECRET//COMINT//REL TO USA, FVEY

TOP TD USA. Looking at a hierarchical clustering derived from all 80 features, the AQSL groups mostly stay together Gen AGE-L Flume-II: Gamma I 5 ADI-II F-Im'nnlo ?nmm': I I AGE-L Hume-It [hummus I 5 ADS-L Heme-It: Gun-urns I 5: Clo-n Pop 5 ADSL Local 5 ADSL LDC-ill [Lemma Gen F'up rm Full: Gen Pup I 5: Gun F'ncellect Gen Ger-I Fun I AOEL Cnmme urnsLISA, PM

TOP SECRET//COMINT//REL TO USA, FVEY Now, we’ll describe a cross validation experiment on the AQSL selectors that we were provided Cross Validation Experiment on AQSL Couriers TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY Our initial detector uses the centroid of the AQSL couriers to “find other selectors like these” AQSL Cross-Validation Experiment • 7 MSISDN/IMSI pairs • Hold each pair out and score them when training the centroid on the rest • Assume that random draws of Pakistani selectors are nontargets • How well do we do? TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY Our initial detector uses the centroid of the AQSL couriers to “find other selectors like these” AQSL Cross-Validat Experiment • Initial experiments showed EER in 10-20% range • Here, performance much worse again these nontargets: • • • Seen in Pakistan Not seen outside Af/Pak Not FVEY selecto TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY Statistical algorithms are able to find the couriers at very low false alarm rates, if we’re allowed to miss half of them Random Forest Classifier • • 7 MSISDN/IMSI pairs Hold each pair out and then try to find them af learning how to disting remaining couriers fro other Pakistanis better (using 100k random selectors here) • • Assume that random draws of Pakistani selectors are nontarge 0.18% False Alarm Ra 50% Miss Rate TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY We’ve been experimenting with several error metrics on both small and large test sets 100k Test Selectors Training Data Classifier Features False Alarm Rate at 50% Miss Rate None Random None 50% Known Couriers Centroid All 20% 43% 0.18% + Anchory Selectors Random Forest Outgoing Mean Reciprocal Rank 55M Test Selectors Tasked Tasked Selectors in Selectors in Top 500 Top 100 1/23k 0.64 0.13 (simulated) (active/Pak) (active/Pak) 1/18k 1/27k 1/9.9 5 1 Random Forest: • 0.18% false alarm rate at 50% miss rate • 7x improvement over random performance when evaluating its tasked precision at 100 TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY To get more training data we scraped selectors from S2I11 Anchory reports containing keyword “courier” Anchory Selectors • Searched for reports containing “S2I11” AND “courier” • Filtered out non-mobile numbers and kept selectors with “interesting” travel patterns seen in SmartTracker TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY Adding selectors from Anchory reports to the training data reduced the false alarm rates even further Anchory Selectors • Searched for reports containing “S2I11” AND “courier” • Filtered out non-mob numbers and kept selectors with “interesting” travel patterns seen in SmartTracker TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY We’ve been experimenting with several error metrics on both small and large test sets 100k Test Selectors Training Data Classifier Features False Alarm Rate at 50% Miss Rate None Random None 50% Known Couriers Centroid All 20% 43% 0.18% + Anchory Selectors Random Forest Outgoing 0.008% Mean Reciprocal Rank 55M Test Selectors Tasked Tasked Selectors in Selectors in Top 500 Top 100 1/23k 0.64 0.13 (simulated) (active/Pak) (active/Pak) 1/18k 1/27k 1/9.9 5 1 1/14 21 6 Random Forest trained on Known Couriers + Anchory Selectors: • 0.008% false alarm rate at 50% miss rate • 46x improvement over random performance when evaluating its tasked precision at 100 TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY Now, we’ll investigate some findings after running these classifiers on +55M Pakistani selectors via MapReduce Preliminary SIGINT Findings TOP SECRET//COMINT//REL TO USA, FVEY

The highest scoring selector that traveled to Peshawar and Lahore is PROB AHMED ZAIDAN 5: Pathilegmd :3 Has AHMED mud TIDE Parsun Number: 4' MEHBER DF HEMBER DF MUSLIM 1' WORKS FDR AL JAZEERA TOP TO USA,

TOP TU LISA, In the top 500 scoring selectors, 21 are tasked leading us to believe that we?re on the right track C: Wana. Baum 1.USA,

TOP USA, FUEY We have also discovered many untasked selectors with interesting travel patterns TOP TO USA,

TOP SECRET//COMINT//REL TO USA, FVEY Preliminary results indicate that we’re on the right track, but much remains Cross Validation Experiment: – Random Forest classifier operating at 0.18% false alarm rate at 50% miss – Enhancing training data with Anchory selectors reduced that to 0.008% – Mean Reciprocal Rank is ~1/10 Preliminary SIGINT Findings: – Behavioral features helped discover similar selectors with “courier-like” travel patterns – High number of tasked selectors at the top is hopefully indicative of the detector performing well “in the wild” TOP SECRET//COMINT//REL TO USA, FVEY

Filters SVG