Snowden Archive
——
The SIDtoday
Files
Browse the Archive

How Do You Fix a Damaged PDF?

SUMMARY

SIDtoday readers give feedback in response to the "Ask Raul" column on repairing damaged PDFs, explaining their possible fixes.

DOCUMENT’S DATE

Jan 12, 2005

PUBLICLY AVAILABLE

Feb 05, 2018

1/1
Download
Page 1 from How Do You Fix a Damaged PDF?
DYNAMIC PAGE -- HIGHEST POSSIBLE CLASSIFICATION IS TOP SECRET // SI / TK // REL TO USA AUS CAN GBR NZL (U) How Do You Fix a Damaged PDF? FROM: SIGINT Communications Unknown Run Date: 01/12/2005 (U) Last month, SID today 's DNI columnist Raul posted an article on repairing damaged PDF files . We heard from a few analysts via the feedback mechanism that is available at the end of every article on SID today . What follows is a sampling of that feedback. (U//FOUO) There is a program that tries to "repair" a PDF file. It was published as an Informal Technical Note (ITN) S3T/TECH/11/2004 - S-249, 497 entitled "Repairing Broken PDF Files." (U)...Another resource for extracting data has been CSE. I have obtained Perl routines that have enabled me to extract information from PDF files using the PDF structure. (U//FOUO)...A comment is made in this article that NSA does not have a capability against damaged PDF files. Actually, on the DESERTFOX (go DESERTFOX) team, we have a program called KEYSTONE that contains a damaged-PDF converter...This software only handles PDF files compressed using flate. Since flate is found in the vast majority of PDF files that we see, this has been a good solution for us for the past year. Sadly, it does not handle foreign character sets due to the tremendous amount of work required to pull out the (potentially damaged) code pages. The damaged-PDF converter has been ported to SORCERER and could even be provided as a stand-alone program. I sincerely hope that some effort was made to locate a program such as KEYSTONE before making the overarching statements, "No consideration has been given..." and "We should be able to deal with broken or damaged PDFs with great ease." (S) Why not just use pdftotext with UTF-8 encoding specified for output? This handles the decompression and encoding normalization issues, from what I've seen...(NOTE: pdftotext is one of the many valuable command-line utilities that comes with the xpdf application.) "(U//FOUO) SIDtoday articles may not be republished or reposted outside NSANet without the consent of S0121 (DL sid_comms)." DYNAMIC PAGE -- HIGHEST POSSIBLE CLASSIFICATION IS TOP SECRET // SI / TK // REL TO USA AUS CAN GBR NZL DERIVED FROM: NSA/CSSM 1-52, DATED 08 JAN 2007 DECLASSIFY ON: 20320108