When it comes to the National Security Agency’s recently disclosed use of automated speech recognition technology to search, index and transcribe voice communications, people in the United States may well be asking: But are they transcribing my phone calls?
The answer is maybe.
A clear-cut answer is elusive because documents in the Snowden archive describe the capability to turn speech into text, but not the extent of its use — and the U.S. intelligence community refuses to answer even the most basic questions on the topic.
Asked about the application of speech-to-text to conversations including Americans, Robert Litt, general counsel for the Office of the Director of National Intelligence, said at a Capitol Hill event in May that the NSA has “all sorts of technical capabilities” and that they are all used in a lawful manner.
“I’m not specifically acknowledging or denying the existence of any particular capability,” he said. “I’m only saying that the focus needs to be on what are the authorities the NSA is using, and what are the protections around the execution of those authorities?”
So what are those authorities? And what are the protections around their execution?
Litt wouldn’t say. But thanks to previous explorations of the Snowden archive and some documents released by the Obama administration, we know there are four major methods the NSA uses to get access to phone calls involving Americans — and only one of them technically precludes the use of speech recognition.
The only surveillance program we know does not involve speech-to-text processing is the bulk collection of metadata of domestic phone calls, commonly known as 215 collection, after the section of the Patriot Act that the NSA says makes it legal.
U.S. officials have unequivocally denied that they get any access to any content through the 215 program — information about the calls, yes, but no actual calls. So no voice means no transcripts.
As it happens, that’s also the one program that Congress has decided to eliminate in its current form. But many other far more invasive programs, many of which sweep up American content including voice communications, remain untouched.
Voice communications have been and continue to be widely intercepted and collected under both programs the NSA considers authorized by section 702 of the Foreign Intelligence Surveillance Act.
The two 702 programs are called PRISM and Upstream. You may recall this slide from an earlier story:
PRISM collects hundreds of millions of Internet communications (text and video, as well as voice) of “targeted individuals” from providers such as Facebook, Yahoo and Skype. But plenty of ordinary people speak to those “targets.” Those “targets” are not necessarily targeted for good reason. And the system picks up additional “incidental” communications it wasn’t technically looking for.
As a result, an unknown but not inconsiderable amount of voice communications involving ordinary Americans gets swept up by PRISM and dumped into massive, centralized databases that are widely accessible to U.S. law enforcement agencies.
There is nothing in the Snowden archive that indicates whether or not the NSA applies what it calls “human language technologies” to these huge troves of voice data, to index, tag, transcribe and/or search them by keyword. And the NSA will not say. But there’s no technological impediment, given the huge leaps forward in automatic speech recognition.
The 702 program called Upstream takes all sorts of communications, including voice, straight from the major U.S. Internet backbones run by telecommunication companies such as AT&T and Verizon.
Here, text is handled differently from telephone calls. Data in text form is searched in bulk using “selectors” such as email addresses, IP addresses and unique IDs. Any “transaction” (some are big, and some are small) that contains a “selector” is moved to an NSA database for further examination.
By contrast, according to the Privacy and Civil Liberties Oversight Board (PCLOB) report on the 702 program, traditional phone calls are collected solely if they are to or from targeted foreign individuals.
But even so: In 2013, the one year we have hard numbers for, the NSA used Upstream to collect phone calls to and from 89,000 targets — many of those calls inevitably involving U.S. persons. And after collection, these calls are dumped into the NSA’s databases, where — although we don’t know for sure — they could be transcribed, indexed, keyword-searched and stored.
An otherwise unalarmist PCLOB report on the 702 program issued this quite striking warning on that topic: “Even though U.S. persons and persons located in the United States are subject to having their telephone conversations collected only when they communicate with a targeted foreigner located abroad, the program nevertheless gains access to numerous personal conversations of U.S. persons that were carried on under an expectation of privacy.”
The PCLOB report made no reference to automated transcription. When asked about the topic, board chairman David Medine told The Intercept that the report contained only information that the intelligence community agreed could be declassified.
Most analyses of Upstream also assume that the NSA considers VoIP calls — voice communications that travel across the Internet — as entitled to the same legal protection as the ever-diminishing number of calls that travel over the old-fashioned telephone circuits. If that’s not the case — and an NSA spokesperson declined to comment on that question — then many more American voice conversations could be subject to collection and processing.
Finally, there is the vast and essentially unconstrained collection of communications that the NSA intercepts abroad, citing its authority under Executive Order 12333. The scope and scale of those programs is massive; in some cases it involves NSA collecting voice communications of entire countries, hacking cell networks, breaking into private data links, and tapping phone and Internet backbones throughout the world.
All the specific examples of the application of speech-to-text processes described in the Snowden documents reviewed by The Intercept appear to have involved intercepts abroad. But surveillance anywhere in the world will inevitably pull in a great deal of voice conversations involving Americans who call, visit or work in the country being surveilled.
The NSA responded to our inquiries with boilerplate: “Regardless of the tool, analytic technique, or technology, NSA always applies rigorous protections designed to safeguard the privacy not only of U.S. persons, but also of foreigners abroad, as directed by the President in January 2014.”
In a now-public October 2011 opinion, former Foreign Intelligence Surveillance Court presiding Judge John Bates bitterly complained that for the third time in less than three years, the government had significantly misrepresented the scope of its collection to the secret court.
It wasn’t until 2011, for instance, that the court understood that the government wasn’t just looking at a small cross-section of Upstream data, but was searching through all of it for “selectors” — and was putting into its databases not just discrete, single “transactions” that contained a given selector, but potentially huge “multi-communication transactions.” As a result, the court finally realized, an incalculable number of purely domestic communications were ending up in the NSA’s databases.
Georgetown University Law Center professor Laura Donohue has written that either “the Court was particularly slow, the government had been lying, or the government had made a mistake.”
The Bates opinion explicitly recognized that there might be more epiphanies to come — “that further revelations concerning what NSA has actually acquired through its 702 collection, together with the constant evolution of the Internet, may alter the Court’s analysis at some point in the future.”
Robert Litt at a May 8, 2015 panel discusssion on government surveillance and transparency.
Asked at a public event by The Intercept if anyone had ever explicitly advised the court that the NSA was using speech-to-text processing on voice intercepts that were collected by 702 programs, Litt replied: “The FISA court orders specifically dictate what we can do and what we can’t do in conducting collection under 702. You have seen those orders. You know what they say.”
He continued: “The orders also provide what kinds of processing we can do on them. We do what those orders authorize. If the orders authorize it, we’re allowed to do it. If they don’t, we’re not. And it doesn’t matter whether we would use this speech-to-text recognition tools or whether we use 800 monkeys sitting at typewriters.”
But none of the FISA court orders appear to say anything specific about processing. And the ability to turn massive amounts of voice into text raises intense privacy concerns because of the scale involved in the collection. Assigning an analyst (or even a single monkey) to listen in on every international phone call would be impossible. Automatically transcribing them and storing the text in a searchable database is not.
The NSA’s historic mission has been to spy on foreigners, not Americans. So all the surveillance methods mentioned above — with the notable exception of the domestic collection of phone records — come with a phalanx of rules intended to “minimize” the retention and dissemination of information related to U.S. persons.
The rules are explicit, and absolute, and the government’s argument is that they are sufficient to protect Americans’ constitutional rights.
And there is indeed evidence throughout the Snowden archive of how meticulously followed those minimization rules are supposed to be; how analysts are instructed to immediately throw out “U.S. persons” information the instant they recognize it shouldn’t be there; how careful they are supposed to be about providing access to unminimized intelligence gathered under FISA.
But even if you overlook the possibility of illicit searches — for loveint, to look for sexually explicit content, and so on — the NSA’s interpretations of the rules appear to be problematic, and their application inconsistent.
Georgetown Law’s Donohue, for instance, writes that the government “has created a presumption of non-U.S. person status” and “absent evidence to the contrary, assumes that the target is located outside the United States.”
The PCLOB report on 702 programs found that, as regards the rule quoted above, “in practice, this requirement rarely results in actual purging of data.”
A recently declassified report from the Justice Department’s inspector general found that authorities had failed to comply with basic minimization requirements regarding the 215 program — for eight years.
Glenn Greenwald reported in September 2013 that the NSA routinely shares raw, uniminized “Raw Sigint” with Israel, that “includes, but is not limited to, unevaluated and unminimized transcripts, gists, facsimiles, telex, voice and Digital Network Intelligence metadata and content.”
And of course there’s no external oversight.
Bob Litt says we shouldn’t worry. But neither he nor anyone else in a position to know will provide the facts that might — or might not — reassure us.
Research on the Snowden archive was conducted by Intercept researcher Andrew Fishman.
Photo illustration: Connie Yu; Photos: Getty images(2)