Documents

SKYNET: Courier Detection via Machine Learning

May. 8 2015 — 6:26 a.m.

1/20
Download
Page 1 from SKYNET: Courier Detection via Machine Learning
TD USA. SKYNET: Courier Detection via Machine Learning RBGFIJHU SZISITD June 5, 2012 TDP TU USA.
Page 2 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY Given a handful of courier selectors, can we find others that “behave similarly” by analyzing GSM metadata? It’s worth noting that: • we are looking for different people using phones in similar ways • without using any call chaining techniques from known selectors • by scanning through all selectors seen in Pakistan that have not left Af/Pak (~55M) TOP SECRET//COMINT//REL TO USA, FVEY
Page 3 from SKYNET: Courier Detection via Machine Learning
TOP TO USA, FVEY From GSM - spect - selector?s pattern-of-life and travel behavior itFilmr', ana JJHBII Falsalabnd TOP USA,
Page 4 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY This presentation describes our search for AQSL couriers using behavioral profiling Behavioral Feature Extraction Cross Validation Experiment on AQSL Couriers Preliminary SIGINT Findings TOP SECRET//COMINT//REL TO USA, FVEY
Page 5 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY Counting unique UCELLIDs shows that couriers travel more often than typical Pakistani selectors TOP SECRET//COMINT//REL TO USA, FVEY
Page 6 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY By examining multiple features at once, we can see some indicative behaviors of our courier selectors TOP SECRET//COMINT//REL TO USA, FVEY
Page 7 from SKYNET: Courier Detection via Machine Learning
TOP TD USA. Looking at a hierarchical clustering derived from all 80 features, the AQSL groups mostly stay together Gen AGE-L Flume-II: Gamma I 5 ADI-II F-Im'nnlo ?nmm': I I AGE-L Hume-It [hummus I 5 ADS-L Heme-It: Gun-urns I 5: Clo-n Pop 5 ADSL Local 5 ADSL LDC-ill [Lemma Gen F'up rm Full: Gen Pup I 5: Gun F'ncellect Gen Ger-I Fun I AOEL Cnmme urnsLISA, PM
Page 8 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY Now, we’ll describe a cross validation experiment on the AQSL selectors that we were provided Cross Validation Experiment on AQSL Couriers TOP SECRET//COMINT//REL TO USA, FVEY
Page 9 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY Our initial detector uses the centroid of the AQSL couriers to “find other selectors like these” AQSL Cross-Validation Experiment • 7 MSISDN/IMSI pairs • Hold each pair out and score them when training the centroid on the rest • Assume that random draws of Pakistani selectors are nontargets • How well do we do? TOP SECRET//COMINT//REL TO USA, FVEY
Page 10 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY Our initial detector uses the centroid of the AQSL couriers to “find other selectors like these” AQSL Cross-Validat Experiment • Initial experiments showed EER in 10-20% range • Here, performance much worse again these nontargets: • • • Seen in Pakistan Not seen outside Af/Pak Not FVEY selecto TOP SECRET//COMINT//REL TO USA, FVEY
Page 11 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY Statistical algorithms are able to find the couriers at very low false alarm rates, if we’re allowed to miss half of them Random Forest Classifier • • 7 MSISDN/IMSI pairs Hold each pair out and then try to find them af learning how to disting remaining couriers fro other Pakistanis better (using 100k random selectors here) • • Assume that random draws of Pakistani selectors are nontarge 0.18% False Alarm Ra 50% Miss Rate TOP SECRET//COMINT//REL TO USA, FVEY
Page 12 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY We’ve been experimenting with several error metrics on both small and large test sets 100k Test Selectors Training Data Classifier Features False Alarm Rate at 50% Miss Rate None Random None 50% Known Couriers Centroid All 20% 43% 0.18% + Anchory Selectors Random Forest Outgoing Mean Reciprocal Rank 55M Test Selectors Tasked Tasked Selectors in Selectors in Top 500 Top 100 1/23k 0.64 0.13 (simulated) (active/Pak) (active/Pak) 1/18k 1/27k 1/9.9 5 1 Random Forest: • 0.18% false alarm rate at 50% miss rate • 7x improvement over random performance when evaluating its tasked precision at 100 TOP SECRET//COMINT//REL TO USA, FVEY
Page 13 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY To get more training data we scraped selectors from S2I11 Anchory reports containing keyword “courier” Anchory Selectors • Searched for reports containing “S2I11” AND “courier” • Filtered out non-mobile numbers and kept selectors with “interesting” travel patterns seen in SmartTracker TOP SECRET//COMINT//REL TO USA, FVEY
Page 14 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY Adding selectors from Anchory reports to the training data reduced the false alarm rates even further Anchory Selectors • Searched for reports containing “S2I11” AND “courier” • Filtered out non-mob numbers and kept selectors with “interesting” travel patterns seen in SmartTracker TOP SECRET//COMINT//REL TO USA, FVEY
Page 15 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY We’ve been experimenting with several error metrics on both small and large test sets 100k Test Selectors Training Data Classifier Features False Alarm Rate at 50% Miss Rate None Random None 50% Known Couriers Centroid All 20% 43% 0.18% + Anchory Selectors Random Forest Outgoing 0.008% Mean Reciprocal Rank 55M Test Selectors Tasked Tasked Selectors in Selectors in Top 500 Top 100 1/23k 0.64 0.13 (simulated) (active/Pak) (active/Pak) 1/18k 1/27k 1/9.9 5 1 1/14 21 6 Random Forest trained on Known Couriers + Anchory Selectors: • 0.008% false alarm rate at 50% miss rate • 46x improvement over random performance when evaluating its tasked precision at 100 TOP SECRET//COMINT//REL TO USA, FVEY
Page 16 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY Now, we’ll investigate some findings after running these classifiers on +55M Pakistani selectors via MapReduce Preliminary SIGINT Findings TOP SECRET//COMINT//REL TO USA, FVEY
Page 17 from SKYNET: Courier Detection via Machine Learning
The highest scoring selector that traveled to Peshawar and Lahore is PROB AHMED ZAIDAN 5: Pathilegmd :3 Has AHMED mud TIDE Parsun Number: 4' MEHBER DF HEMBER DF MUSLIM 1' WORKS FDR AL JAZEERA TOP TO USA,
Page 18 from SKYNET: Courier Detection via Machine Learning
TOP TU LISA, In the top 500 scoring selectors, 21 are tasked leading us to believe that we?re on the right track C: Wana. Baum 1.USA,
Page 19 from SKYNET: Courier Detection via Machine Learning
TOP USA, FUEY We have also discovered many untasked selectors with interesting travel patterns TOP TO USA,
Page 20 from SKYNET: Courier Detection via Machine Learning
TOP SECRET//COMINT//REL TO USA, FVEY Preliminary results indicate that we’re on the right track, but much remains Cross Validation Experiment: – Random Forest classifier operating at 0.18% false alarm rate at 50% miss – Enhancing training data with Anchory selectors reduced that to 0.008% – Mean Reciprocal Rank is ~1/10 Preliminary SIGINT Findings: – Behavioral features helped discover similar selectors with “courier-like” travel patterns – High number of tasked selectors at the top is hopefully indicative of the detector performing well “in the wild” TOP SECRET//COMINT//REL TO USA, FVEY