How We Prepared the NSA’s Sensitive Internal Reports for Release

A team of Intercept staffers subjected the documents to a careful process of research, editorial evaluation, legal review, reporting, redaction, and digital processing. Here are the details.

Illustration: The Intercept

The 166 articles we’re releasing today from an internal NSA publication, SIDtoday, were subjected to a careful process of research, editorial evaluation, legal review, reporting, redaction, and digital processing. This article explains how that process worked.

After deciding it was in the public interest to release and write about large batches of documents supplied by whistleblower Edward Snowden, The Intercept settled on beginning with SIDtoday, the online newsletter of the NSA’s core spying division, the Signals Intelligence Directorate. Written in accessible, non-technical language, SIDtoday offers a window into the NSA’s culture and operations. Although much of its content is unclassified and often appears largely designed to burnish the agency’s self-image, SIDtoday also includes more revelatory sections that are officially “secret” and “top secret.”

Our initial goal was to release in one batch the first three months of SIDtoday, starting at the end of March 2003, as well as all 2003 installments of SIDtoday article series that began during this period. Before we could do that, we needed to build a pipeline that would allow us to securely handle these sensitive documents and prepare them for publication.

From Messy HTML to Clean PDF

In its original home at the NSA, SIDtoday is a website, but, unlike normal websites that are accessible from the internet, this one is only accessible from computers that are connected to an internal spy agency network. This means SIDtoday articles, as provided by Snowden, were in Hypertext Markup Language, or HTML, the native format of the web. To be fully readable, most webpages require HTML files as well as associated images, stylesheets, and other files, but unfortunately those other elements were not included in the Snowden archive. This meant the original webpages looked rather chaotic when viewed in a web browser. They also included broken links to the internal NSA network.

Staff technologist Micah Lee prepared software that parsed the original SIDtoday HTML files and extracted the usable content. This content was then placed into a template with minimalist design and converted to PDF format. PDF files, unlike HTML files, could be edited by multiple staffers using widely available, easy-to-use software tools with native redaction facilities.

The PDF files don’t look the same way the original SIDtoday website looked — they don’t use the original layout or style elements — but they do contain the same text content. They also include the original clip art-style SIDtoday logo.

Research and Redaction

Once the files were formatted, Research Editor for Investigations Margot Williams combed through each article, doing several things at once: identifying names that we might need to redact, researching those individuals to see what was publicly known about them, summarizing the document, and looking for material that could be the basis of stories for The Intercept.

In keeping with The Intercept’s editorial standards, we redacted the names of covert agents and the names and contact information of government personnel who are neither high-ranking nor already publicly associated with the NSA. We also removed information if we believed its release could cause serious injury or death to innocent people.

After Williams reviewed each article, the article was reviewed again by Lee, by a senior editor, and by our legal counsel. Redacted articles were also shown to the NSA, which was offered a chance to respond, resulting in the redaction of two additional names.

Writing and Publishing

The Intercept’s examination of this first batch of SIDtoday content yielded an article on the NSA’s involvement with interrogations at Guantánamo Bay as well as a “highlights reel” summarizing insights gleaned from the material that struck us as most relevant for our readers. In addition, Senior Editor Peter Maass places SIDtoday in context with his story describing the publication, and co-founder Glenn Greenwald explains why we decided to begin making large batches of Snowden material available to the public. While we have focused on the parts of these documents we find most compelling, we anticipate readers will find other information of interest in the files we are sharing — which is why we decided to approach the release in the way we did.

A landing page dedicated to our SIDtoday releases is available. On this special section of The Intercept’s website, readers can download the files individually or all at once, navigate by tag or date, and view SIDtoday articles alongside a wealth of metadata, including summary, original publication date, citations pointing to prior publication of unredacted names, tags, and related links. This represents the first phase of an evolving project, which will eventually include additional features, such as a search capability, as well as a growing number of documents.

Related Stories:

Join The Conversation