The Computers are Listening

Speech Recognition is NSA’s Best-Kept Open Secret

Illustration: Richard Mia for The Intercept

Second in a series. Part 1 here. Part 3 here

Siri can understand what you say. Google can take dictation. Even your new smart TV is taking verbal orders.

So is there any doubt the National Security Agency has the ability to translate spoken words into text?

But precisely when the NSA does it, with which calls, and how often, is a well-guarded secret.

It’s not surprising that the NSA isn’t talking about it. But oddly enough, neither is anyone else: Over the years, there’s been almost no public discussion of the NSA’s use of automated speech recognition.

One minor exception was in 1999, when a young Australian cryptographer named Julian Assange stumbled across an NSA patent that mentioned “machine transcribed speech.”

Assange, who went on to found WikiLeaks, said at the time: “This patent should worry people. Everyone’s overseas phone calls are or may soon be tapped, transcribed and archived in the bowels of an unaccountable foreign spy agency.”

The most comprehensive post-Snowden descriptions of NSA’s surveillance programs are strangely silent when it comes to speech recognition. The report from the President’s Review Group on Intelligence and Communications Technologies doesn’t mention it, and neither does the October 2011 FISA Court ruling, or the detailed reports from the Privacy and Civil Liberties Oversight Board.

There is some mention of speech recognition in the “Black Budget” submitted to Congress each year. But there’s no clear sign that anybody on the Hill has ever really noticed.

As The Intercept reported on Tuesday, items from the Snowden archive document the widespread use of automated speech recognition by the NSA.

The strategic advantage, invasive potential and policy implications of being able to turn spoken words into text are not trivial: Suddenly, voice conversations, historically considered ephemeral and unsearchable, can be scanned, catalogued and archived — not perfectly, but well enough to dramatically increase the effective scope of eavesdropping.

Former senior NSA executive turned whistleblower Thomas Drake, who’s seen NSA’s automated speech recognition at work, says the silence is telling.

“You’re seeing a black hole,” Drake told The Intercept. “That means there’s something there that’s really significant. You’re seeing some of the fuzzy contours of this whole other program.”

 Not Technically a Secret

The NSA’s ability to turn voice into text, interestingly enough, is not technically a secret.

And speech recognition technology has been heavily — and openly — funded by the Defense Advanced Research Project Agency (DARPA) since the early 1970s.

The latest of DARPA’s many public research projects in that area is the Robust Automatic Transcription of Speech program, known as RATS, which focuses on “noisy or degraded speech signals that are important to military intelligence.”

Meanwhile, DARPA’s intelligence-world counterpart, IARPA, announced the Babel Program in 2011, with its goal of  “developing agile and robust speech recognition technology that can be rapidly applied to any human language in order to provide effective search capability for analysts to efficiently process massive amounts of real-world recorded speech.”

Despite openly announcing its speech-to-text program, IARPA declined an interview request by The Intercept.

Robert Litt, who as general counsel for the Office of the Director of National Intelligence is the intelligence community’s chief lawyer, was asked about the NSA’s speech-to-text capabilities at a forum on transparency on Capitol Hill on Friday.

He took the opportunity to lash out at The Intercept’s reporting: “I think that story is a great example of what is wrong with a lot of media coverage of this,” he said. “That story made absolutely no distinction between technical capabilities and legal authorities. There are all sorts of technical capabilities that NSA has. I’m not commenting on the existence or nonexistence of any such authority. The question is when are they used and what are the legal authorities under which they are used. And I think that that’s something that a lot of the press reporting completely ignores, including that story you wrote.”

Asked to explain in what ways the use of speech-to-text is limited, Litt repeatedly refused to even acknowledge its existence.

“I’m not saying that the government isn’t using these techniques. I am not acknowledging that these techniques exist even.”

You won’t hear much about the use of speech recognition for surveillance in academe, either.

Researchers in the field are divided between those who don’t take NSA funding, and can only speculate about what goes on over there — and those who do take NSA funding, but won’t say what they know.

“There’s a lot of weird hush-hush that goes on,” said Bhiksha Raj, an associate professor at Carnegie Mellon University’s Language Technologies Institute, who said he does not receive NSA funding. “Academics who work for the NSA must go through various clearances. They sign several papers. They hold closed meetings that are only attended by people with clearances.”

Some non-NSA affiliated academics were once “quite keen” on seeing how the NSA was faring in the face of the technical challenges in the field, Steve Young, a professor of information engineering at the University of Cambridge, recalled. “But unless you actually work for the NSA and you’ve been vetted, you’re not going to get close to the real data.”

Ironically, even GCHQ, NSA’s intelligence partner in the U.K., has complained about DARPA and NSA’s secrecy. A 2009 GCHQ assessment of speech-to-text technology said that “The DARPA evaluation programme, with significant steer from NSA, has been the main driving force behind technology improvements in the field. Unfortunately, the results of the evaluations are not put in the public domain, making reference difficult.”

All the secrecy has an obvious advantage for the NSA. If the NSA can keep their speech-recognition capabilities secret, nobody can tell them what to do. And if nobody knows what they are doing, then nobody can tell them to stop.

Senator Ron Wyden, D-Ore., arguably the foremost congressional critic of NSA overreach, wouldn’t comment directly on the question of speech recognition. But, he said through a spokesperson: “After 14 years on the Intelligence Committee, I’ve learned that senators must be constantly on the lookout for secret interpretations of the law and advances in surveillance that Congress isn’t aware of.”

He added: “For centuries, individual privacy was protected in part by the limited resources of governments. It simply wasn’t possible for governments to secretly collect information on every single citizen without investing in massive networks of spies and informants. But in the 21st century mass surveillance is no longer difficult and expensive — it’s increasingly cheap and easy. The only privacy protections that will matter in the future are the ones that are written into law and defended by public demand for freedom and openness.”

Research on the Snowden archive was conducted by Intercept researcher Andrew Fishman.

Illustration by Richard Mia for The Intercept

Join The Conversation