DYNAMIC PAGE -- HIGHEST POSSIBLE CLASSIFICATION IS
TOP SECRET // SI / TK // REL TO USA AUS CAN GBR NZL
(U) Ask Raul: HTML Coding and Email
FROM: Raul, a DNI Analyst
Unknown
Run Date: 11/05/2004
(S) Hey, Raul,
I got this crazy numeric stuff the other day. Here's what it looks
like:
Doh!
Is this some sort of encryption?
Tracy
Dear Tracy,
(U) Believe it or not, that's just text. Some folks refer to it as HTML
Escape, or HTML special characters, HTML Unicode or if you want to
go by the RFC, Numeric Character Reference (NCR). Here's how it
works:
Format:
&# Code ; for decimal (base 10) numbers
&#x Code ; for hexadecimal (base 16) numbers
Code:
Unicode value of the character:
English/Latin letter D = 68 decimal or 44 hex
English/Latin letter a = 97 decimal or 61 hex
Cyrillic letter A = 1040 decimal or 410 hex
Arabic letter Alef = 1575 decimal or 627 hex
Parenthesized Hangul Hieuh = 12813 decimal or 320D hex
Assembled as:
D or D D
А or А Cyrillic A Ð
㈍ or ㈍ Parenthesized Hangul Hieuh ãˆ
It is very simple, is used quite commonly and has been for many
years. And it seems to be a "new" thing for most NSA analysts. But
it gets better!
(S) Most email tools are normally set by default to send a text and
an html version of a mail message. As a result, you can oftentimes
take advantage of this little condition to select traffic at the frontend, search for it on the back-end, or recover a perfectly good
email from a session which looks lost by simply going after the
NCR format.
(S) As you well know, all non-7-bit material going into Lionheart
SERIES:
(U) Ask Raul - Answers
to DNI Questions
1. Ask Raul : Fonts and
Encoding
2. Ask Raul : Dictionary
Equations
3. Ask Raul : HTML
Coding and Email
4. Ask Raul : PDF Files
5. Ask Raul: Damaged
Data
6. Ask Raul : Getting
the Most from
Metadata
gets indexed as though it were 7-bit. So, a perfectly good foreign
text term actually gets indexed as something it is not. If you are
lucky, and have a target that uses a character set such as KOI-8,
this bit stripping results in what looks like very good translit. If you
are unlucky, and your target is using something like CP-1251, then
that bit stripping produces a mess. For example, the Russian word
for tank, танк, comes out of this indexing process as ' TANK
' for KOI-8 but ' r`mj ' for CP-1251. You're not going to be able to
use that CP-1251 term to do anything. However, the same term
using NCR would look like this: танк
танк . Aside from being much more unique simply because
the pattern contains more characters (bytes), this form is not
harmed by the 7-bit indexing procedure.
(S) If you'd like to test this out, go to an Internet terminal or even
webworld and type in the string above. If this is the first time
you've heard of NCR and you are a linguist, you'll probably be
amazed at what you can find on the Internet using NCR strings.
There are many things that normally don't show up in your other
types of queries.
(S) Making things even better, this NCR form is absolutely beautiful
for selection and filtering or searching. What's more, since this
form uses nothing but 7-bit characters, it can go through systems,
which will destroy 8-bit traffic. This means if you know what you
are doing, you can 'recover' a perfect email message from a
session, which appears to have been totally lost. Ever seen a piece
of email where the entire foreign language portion was nothing but
question marks? Chances are that same email had an html double
with it containing the NCR. Old Raul used to use this magic when
he was in the Office of Russia to recover "totally destroyed" email.
When Raul left, so did the technique. Жаль!
(U) So, Tracy, there you go. It's usually the simple little things that
bite us in the rear. Oh, if you want a little smile, cut and paste the
text below, which includes your sample, into a text editor. Save it
as showme.htm and then open it in your web browser.
Enjoy!
Raul
<html>
<font size=+4>
Doh!
</font>
</html>
"(U//FOUO) SIDtoday articles may not be republished or reposted outside NSANet
without the consent of S0121 (DL sid_comms)."
DYNAMIC PAGE -- HIGHEST POSSIBLE CLASSIFICATION IS
TOP SECRET // SI / TK // REL TO USA AUS CAN GBR NZL
DERIVED FROM: NSA/CSSM 1-52, DATED 08 JAN 2007 DECLASSIFY ON: 20320108