Documents
SKYNET: Courier Detection via Machine Learning
May. 8 2015 — 6:26 a.m.

TD USA.
SKYNET:
Courier Detection via Machine Learning
RBGFIJHU
SZISITD
June 5, 2012
TDP TU USA.

TOP SECRET//COMINT//REL TO USA, FVEY
Given a handful of courier selectors, can we find others
that “behave similarly” by analyzing GSM metadata?
It’s worth noting that:
• we are looking for
different people using
phones in similar ways
• without using any call
chaining techniques
from known selectors
• by scanning through
all selectors seen in
Pakistan that have not
left Af/Pak (~55M)
TOP SECRET//COMINT//REL TO USA, FVEY

TOP TO USA, FVEY
From GSM - spect
-
selector?s pattern-of-life and travel behavior
itFilmr',
ana JJHBII
Falsalabnd
TOP USA,

TOP SECRET//COMINT//REL TO USA, FVEY
This presentation describes our search for
AQSL couriers using behavioral profiling
Behavioral Feature Extraction
Cross Validation Experiment
on AQSL Couriers
Preliminary SIGINT Findings
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
Counting unique UCELLIDs shows that couriers
travel more often than typical Pakistani selectors
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
By examining multiple features at once, we can see some
indicative behaviors of our courier selectors
TOP SECRET//COMINT//REL TO USA, FVEY

TOP TD USA.
Looking at a hierarchical clustering derived from all
80 features, the AQSL groups mostly stay together
Gen
AGE-L Flume-II: Gamma I 5
ADI-II F-Im'nnlo ?nmm': I I
AGE-L Hume-It [hummus I 5
ADS-L Heme-It: Gun-urns I 5:
Clo-n Pop 5
ADSL Local 5
ADSL LDC-ill [Lemma
Gen
F'up
rm Full:
Gen Pup I 5:
Gun F'ncellect Gen Ger-I Fun I
AOEL Cnmme urnsLISA,
PM

TOP SECRET//COMINT//REL TO USA, FVEY
Now, we’ll describe a cross validation experiment
on the AQSL selectors that we were provided
Cross Validation Experiment
on AQSL Couriers
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
Our initial detector uses the centroid of the AQSL
couriers to “find other selectors like these”
AQSL Cross-Validation
Experiment
• 7 MSISDN/IMSI pairs
• Hold each pair out
and score them when
training the centroid
on the rest
• Assume that random
draws of Pakistani
selectors are
nontargets
• How well do we do?
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
Our initial detector uses the centroid of the AQSL
couriers to “find other selectors like these”
AQSL Cross-Validat
Experiment
• Initial experiments
showed EER in
10-20% range
• Here, performance
much worse again
these nontargets:
•
•
•
Seen in Pakistan
Not seen outside
Af/Pak
Not FVEY selecto
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
Statistical algorithms are able to find the couriers at very
low false alarm rates, if we’re allowed to miss half of them
Random Forest
Classifier
•
•
7 MSISDN/IMSI pairs
Hold each pair out and
then try to find them af
learning how to disting
remaining couriers fro
other Pakistanis
better
(using 100k random selectors here)
•
•
Assume that random
draws of Pakistani
selectors are nontarge
0.18% False Alarm Ra
50% Miss Rate
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
We’ve been experimenting with several error
metrics on both small and large test sets
100k Test Selectors
Training Data
Classifier
Features
False Alarm
Rate at 50%
Miss Rate
None
Random
None
50%
Known
Couriers
Centroid
All
20%
43%
0.18%
+ Anchory
Selectors
Random
Forest
Outgoing
Mean
Reciprocal
Rank
55M Test Selectors
Tasked
Tasked
Selectors in Selectors in
Top 500
Top 100
1/23k
0.64
0.13
(simulated) (active/Pak) (active/Pak)
1/18k
1/27k
1/9.9
5
1
Random Forest:
• 0.18% false alarm rate at 50% miss rate
• 7x improvement over random performance when
evaluating its tasked precision at 100
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
To get more training data we scraped selectors from S2I11
Anchory reports containing keyword “courier”
Anchory Selectors
• Searched for reports
containing “S2I11”
AND “courier”
• Filtered out non-mobile
numbers and kept
selectors with
“interesting” travel
patterns seen in
SmartTracker
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
Adding selectors from Anchory reports to the training data
reduced the false alarm rates even further
Anchory Selectors
• Searched for reports
containing “S2I11”
AND “courier”
• Filtered out non-mob
numbers and kept
selectors with
“interesting” travel
patterns seen in
SmartTracker
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
We’ve been experimenting with several error
metrics on both small and large test sets
100k Test Selectors
Training Data
Classifier
Features
False Alarm
Rate at 50%
Miss Rate
None
Random
None
50%
Known
Couriers
Centroid
All
20%
43%
0.18%
+ Anchory
Selectors
Random
Forest
Outgoing
0.008%
Mean
Reciprocal
Rank
55M Test Selectors
Tasked
Tasked
Selectors in Selectors in
Top 500
Top 100
1/23k
0.64
0.13
(simulated) (active/Pak) (active/Pak)
1/18k
1/27k
1/9.9
5
1
1/14
21
6
Random Forest trained on Known Couriers + Anchory Selectors:
• 0.008% false alarm rate at 50% miss rate
• 46x improvement over random performance when
evaluating its tasked precision at 100
TOP SECRET//COMINT//REL TO USA, FVEY

TOP SECRET//COMINT//REL TO USA, FVEY
Now, we’ll investigate some findings after running these
classifiers on +55M Pakistani selectors via MapReduce
Preliminary SIGINT Findings
TOP SECRET//COMINT//REL TO USA, FVEY

The highest scoring selector that traveled to
Peshawar and Lahore is PROB AHMED ZAIDAN
5: Pathilegmd :3
Has AHMED mud
TIDE Parsun Number:
4' MEHBER DF
HEMBER DF MUSLIM
1' WORKS FDR AL JAZEERA
TOP TO USA,

TOP TU LISA,
In the top 500 scoring selectors, 21 are tasked
leading us to believe that we?re on the right track
C: Wana. Baum
1.USA,

TOP USA, FUEY
We have also discovered many untasked
selectors with interesting travel patterns
TOP TO USA,

TOP SECRET//COMINT//REL TO USA, FVEY
Preliminary results indicate that we’re on the
right track, but much remains
Cross Validation Experiment:
– Random Forest classifier operating at
0.18% false alarm rate at 50% miss
– Enhancing training data with Anchory
selectors reduced that to 0.008%
– Mean Reciprocal Rank is ~1/10
Preliminary SIGINT Findings:
– Behavioral features helped discover
similar selectors with “courier-like”
travel patterns
– High number of tasked selectors at
the top is hopefully indicative of the
detector performing well “in the wild”
TOP SECRET//COMINT//REL TO USA, FVEY