Documents
XKS Tech Extractor 2010
Jul. 1 2015 — 9:52 a.m.

Tech Strings in ocuments
(aka Tech Extractor)
April 2010

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER1-.-
. i
Overview and History of Tech Strings in Documents
Why is it important?
II Limitations of capability advance to fingerprints
II Examples and live demo
TOP TD USA, AUS, CAN, GER, NZL

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GERTw?
'51.
.4.-
I-
5"
Content-based Selection
0 How do you find DNI data if you don?t have a strong
selector like IP or E?rnail address?
0 What if you only know keywords, part names, phrases
etc. expected to be used by your target?
TOP TD USA, AUS, CAN, GER, NZL

TOP SECRETHCUMINTHREL To USA, aus, CAN, GER, NZL
.
atThe "Tech Extractor? is a way of finding valuable
intelligence based on keywords in the content of DNI
sessions but it is a departure from traditional "soft
selection" which tends to bring back a lot of junk.
.
. .5
. .
Tech. Extra Ctr?
TOP TO USA, AUS, CAN, GER, NZL

To 5 EC RETHCUM INTHRE To USA, AU 3, CAN, 6 BR, NZL
J44
is soft selection?
ii Soft selection, aka content based selection, is an
approach at targeting traffic by looking for keywords
or phrases rather than specific E?mail accounts
Content based selection has suffered because of the
poor design of content based selection engines
TOP TO USA, AUS, CAN, GER, NZL

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
Communication vs. DNI Cotent
1' Selection engines in use today were based on designs built to
handle TELEX traffic
TELEX is a highly Formatted content rich type of traffic that
does not resemble raw DNI seen with Internet traf?c
Raw Internet traffic contains HTML, web-pages, raw base-64
encoded documents etc.
When think of DNI ?content? they are more
referring to ?communication content? then raw DNI content.
Current DNI selection does not allow you to restrict hits to
the ?type? of traffic you want eg. Emails (including Webmail)
or Documents
TOP TO USA, AUS, CAN. GER, NZL

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
munication vs. DNI Contt
If an analyst tasks a Boolean equation ?bomb? and ?chemical?
they likely want to see all communication that mentions
?bomb? and ?chemical? and not all web pages, news stories,
blog posts etc. where those two words appear
i What we need is a context?aware scanning engine that knows
where it is inside of the raw DNI in order to properly apply
analyst tasking
TOP TO USA, AUS, CAN, GER, NZL

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
5/
oft Selection vs. Surgical Selection
..
Existing selection techniques are blunt instruments
XKEYSCORE contextual dictionaries provide an extremely sharp knife
to make accurate selection decisions
?That?s not a a knife!?
TOP TO USA, AUS, CAN, GER, NZL

TOP SECRETHCOMINTHREL To USA, aus, can, GER, sz
I
. HT-
is the Tech Extractor
0 The Tech Extractor was ?rst stab at
context?aware scanning and it only focuses on three
contexts:
It E?mail Bodies
I Chat Bodies
Document Bodies:
Microsoft Word, Excel, PowerPoint, Project, Visio
Adobe PDF,
- Rich Text Format (RTF)
TOP TO USA, AUS, CAN, GER, NZL

mo
TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
es the Tech Extractor work?
0 The Tech Extractor works by scanning a list of
keywords (or regular expressions) against those three
contexts and then tags the results.
0 This is not ?Filtering and selection? and we?re not
forwarding any data home
XKS is simply tagging sessions with meta?data, much
like we do with appids+?ngerprints
TOP TO USA, AUS, CAN, GER, NZL

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
moes the Tech Extractor work?
0 After the meta?data tag is applied, can then
use that meta?data tag as part of a compliant
query for traffic
0 It?s important to note, just like AppIDs+Fingerprints,
Tech Extractor tags aren?t necessarily
compliant by themselves.
0 You may need to add a valid foreign IP address, MAC
address or country code before you query!
TOP TO USA, AUS, CAN, GER, NZL

Euhjent:
E3,
Ta:
En:
[lint-E: TLIE GMT EDDIE
HTML Plain Text
emaiLt
Madel: Eng-Cu:-
Fm WIDE-56024
Ri?ng
Syrup-tam: .41:ij
Camments: no fault fr:qu phczane is pmperljr kindly; can?rm the fault in detail when and in which cmnditican it
creates pmblem related menticjn
GEM HE: air En il'IEEf

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GERdoes the Tech Extractormw?
Also this is not retrospective.
II After a list is tasked, XKS will scan data collected from
that point on looking hits.
Any data previously collected and stored by XKS will
not be scanned.
TOP TO USA, AUS, CAN, GER, NZL

here does XKS get its list of terms?
TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
0 provide the XKS team with lists of terms,
called ?Tech Dictionaries? which can contain multiple
category names (aka ?Tech Names?)
1' Only after the XKS team is supplied with those terms
can the system begin scanning and tagging.
GUI to allow entry of tech terms almost
complete
TOP TO USA, AUS, CAN, GER, NZL

T0 5 EC RETHCOM INTHRE TU USA, AU 3, CAN, BR, NZL
.-
?xtractor Tasking Rules
Currently, all terms need to be classi?ed REL FVEY
II Terms are case insensitive by default, but can be forced to be case
sensitive
Terms can hit as a substrings by default
ex: ?ricin? will hit in ?pricing?
However, terms can be forced to hit as a unique word (either by
tasking them with a space at the beginning and end or by using a
regular expression)
TOP TO USA, AUS, CAN, GER, NZL

La nguage Su ppo rt
-Supports full foreign language tagging
and querying
-Ex look for common Arabic expressions
in E-mails coming from the Pakistan
tr
. -..
EUIS Unplug Lwe Maul
E?nza
Medium nsH?nn may not know this senderMa?: as safE ManL E13 unsafe
Sent: Thu 12:07? PM
Anna 113:1?:
TDP SEGHETHCUMIHTHRELTD CAN, GER,

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
ech Extractor Limitations
0 While terms tasked for the Tech Extractor are applied
only to Document, E?mail and Chat bodies, that is still
a lot of traffic!
If the term is too generic (or short) you?re still likely to
run into a lot of false hits.
0 Also, while you can limit your results by adding more
search criteria (country code, IP address etc), the term
will be scanning all data looking for hits
TOP TO USA, AUS, CAN, GER, NZL

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
ech Extractor vs. Fingerprints
0 Tech Extractor treats E?mail, Chat and Document
bodies as a single ?context?
4' The XKS Fingerprint language gives you over 65+
contexts that can be used together to form powerful
and specific signatures
When terms are generic and are returning too many
poor results through Tech Extractor, then it?s time to
make the switch to the full fingerprint language
TOP TO USA, AUS, CAN, GER, NZL

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
hy use the Tech Extractor at all?
0 One of the most powerful feature of the Tech Extractor
is that it shows you exactly which term hit in the meta?
data results:
Erem Type Tech Dictienarg Tech Name I Tech Value at Tech Fileneme
[lecumem_hadg claeeic gem HLFI Ee?ecumeme and Se?ingeHSE Daily Break anne
decument ham: claeeic gem ICCIEI Ea?ecumente am2 Statue Te
email_hadg claeeic gem IMEI
[lecumem_hadg claeeic gem IMSI
email_hadg claeeic gem MSISDH
TOP TO USA, AUS, CAN, GER, NZL

'33 TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
-
1 Why not Fingerprints?
4' With fingerprints, you only see that the full equation
(which can be very complex) was satisfied and you
won?t see which speci?c terms from the equation hit.
TOP TO USA, AUS, CAN, GER, NZL

TOP TU USz?s, ?310.33, GR, NZL
Live Demo
TOP TO ?10.33, GER, MEL

TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL
More information:
0 On GCI-IQ wiki:
TOP TO USA, AUS, CAN, GER, NZL

T0 5 EC RETHCOM INTHRE TU USA, AU 3, CAN, BR, NZL
.-
.a
To submit tasking
0 Please use the Excel Spreadsheet template developed
by GCHQ CP
4' And then E?mail-
0 In the near future will be able to enter the
terms themselves through a web?based GUI
with the list
TOP TO USA, AUS, CAN, GER, NZL