As he helped plan massive demonstrations in Chicago to protest the 2012 NATO summit, Matt McLoughlin knew he was up against a formidable police force. An organizer with Chicago’s Occupy movement, he had watched as the city spent millions beefing up security. The Chicago Police Department invested in riot gear. It rolled out a controversial Long Range Acoustic Device, a sonic weapon that emits a piercing chirping sound. Police rounded up protesters who demonstrated against defunding mental health clinics. Then, shortly before the May summit began, authorities arrested three people in McLoughlin’s circle, and he discovered that the group had been infiltrated by undercover cops.
It turns out that there was more: Police also weaponized social media.
As tens of thousands of people flooded into Chicago’s streets, carrying signs that read “FOOD NOT BOMBS” and “NO TO WAR AND AUSTERITY,” McLoughlin was one of several organizers who posted updates on Occupy Chicago’s two Twitter accounts, sharing information on planned march routes and where protesters could find food and lodging. Newly discovered documents show that many of his tweets likely ended up flowing through CIA-funded data analytics software accessed by police.
According to the documents and to video presentations, CPD used a tool called Endeca Information Discovery, a product from tech giant Oracle, to merge crime records, 911 calls, and other routine police information with protesters’ tweets.
Oracle claims that Endeca helps police and other agencies make sense of mounds of big data. Like the more well-known government analytics software Palantir Gotham, the software owes its rise to “war on terror” surveillance and to backing from the CIA venture capital firm In-Q-Tel. Oracle acquired Endeca in 2011.
At the peak of the NATO protests, police reportedly used Endeca to process 20,000 tweets an hour. According to a 2012 talk given by Richard Tomlinson, who directed Endeca product management for Oracle, the tweets showed up in the software half a second after they were posted and remained there indefinitely, even if deleted. Police could then use the software to zero in on tweets that contained terms like “protest.” They could also sort tweets by sentiment, meaning that the software would single out for scrutiny negative or angry-seeming tweets.
Nine years later, police mining of social media is widespread — as is opposition to the practice. In addition to CPD, Oracle documents and SlideShare decks posted by former employees say that Endeca has been used by police in Argentina, Finland, and the United Arab Emirates, along with the California Department of Corrections and Rehabilitation.
Oracle is far from the only company in the market. During last summer’s George Floyd protests, police turned to Dataminr, another In-Q-Tel investment, to analyze demonstrators’ tweets. CPD, which works with the Federal Bureau of Investigation on a special task force charged with monitoring social media, has also used social media mining software made by Geofeedia, LexisNexis, and Pathar. Police across the United States have plugged images culled from social media into Clearview AI’s facial recognition engine.
But Oracle’s case has a twist: After promoting Endeca’s use on NATO protesters, Oracle went on to market the CIA-funded software for police use around the world — including in China, where its deployment would presumably be at odds with CIA interests and where social media users have few civil liberties protections to shield them from police abuses.
The Intercept previously reported on dozens of company documents showing that Oracle employees marketed the company’s analytics software for police surveillance projects in China and other repressive regimes. In a recent House Foreign Affairs Committee hearing on China, Rep. Tom Malinowski, D-N.J., cited Oracle as an example of how U.S. companies enable surveillance overseas. In fact, several of the products that Oracle markets to police abroad were first tested in the United States.
Among the products that Oracle pushed in the China documents was Endeca, which allows police to both visualize data and mine social media. The documents describe the software’s use by Chicago police as a pioneering event that paved the way for police adoption elsewhere.
In Chinese-language slide decks, Oracle employees touted Endeca’s use in the NATO protests and by the U.S. government to argue for the software’s adoption by Chinese authorities. The documents mention specific Chinese government policing projects and data sources, but several slides appear to have been taken from decks for an American audience, with Chinese translations superimposed on blocks of English text. Two slide decks even include CPD’s logo. One touts the use of other Oracle products by the U.S. Department of Homeland Security.
The documents that specifically market Endeca for police use in China date from 2012 to 2014. But later Chinese-language documents promote Big Data Discovery, an Oracle product that incorporates full Endeca Information Discovery functionality. A 2018 slide deck presented by an Oracle engineer at a developer conference in California describes the use of Big Data Discovery, among other software, by the Liaoning province public security bureau. One Oracle reseller with close Chinese government ties continues to sell Endeca, according to a recent listing on Oracle’s site.
Some of the marketing materials are the apparent work of a former senior employee who, while working for Oracle, held a visiting researcher position at China’s leading police academy. That man is described in documents as having led a team of people who marketed Oracle’s products to police and governments around the world. Oracle declined to comment for this article.
Oracle previously denied directly selling software to Chinese police for the explicit purpose of combing through citizens’ data. But it confirmed that the slide decks, which were hosted on its website in February, are genuine. Several of the documents have since been taken down. (The Intercept has linked to archived versions when possible.)
“Oracle wants to take the money from supplying tools to intelligence agencies, but then they want to argue that they have no culpability for the use,” said Jack Poulson, executive director of the accountability nonprofit Tech Inquiry, who compared the company’s China efforts to Google’s cloud sales to U.S. Customs and Border Protection, Israel, and Saudi Arabia. “Obviously a company would like to take all of the money they can with as little responsibility as possible. But why are we letting them do that?”
Endeca was founded in 1999 in Cambridge, Massachusetts, within a tech corridor that is sometimes compared to Silicon Valley. The company’s software excelled at analysis of unstructured data, raw information that isn’t organized into a database. Its launch was well-timed; the world soon entered the big data era. E-commerce search was an early application for the technology, and businesses have remained important customers. Endeca, whose name derives from a German word meaning “discover,” got an additional boost from the “war on terror,” which brought an infusion of money for increasing intelligence agencies’ surveillance powers.
In-Q-Tel announced a strategic partnership with Endeca in 2003, the same year that Palantir was founded. “We see significant opportunities to extend and apply their commercial technology in the pursuit of enhanced national security,” In-Q-Tel CEO Gilman Louie said in a press release at the time. According to Endeca, the deal entailed an investment of an undisclosed amount. Two years later, the CIA-funded venture capital firm announced a deal to introduce Endeca to the Defense Intelligence Agency.
Endeca became one of several tools used by government agencies to visualize different pools of data, for example, by plotting data entries on a searchable map. “This type of interface became the norm, even if Endeca didn’t become a household name,” said Poulson. “It’s something that the U.S. intelligence community has continued to invest in.”
Endeca software has even been used at Guantánamo Bay.
An early success story for In-Q-Tel, Endeca eventually gained clients at the FBI, U.S. Cyber Command, and the Army, according to federal spending records. The software has even been used at Guantánamo Bay, the U.S. military base in Cuba where prisoners captured in Afghanistan and elsewhere have been held and tortured. In 2008, the Joint Task Force Guantanamo, a U.S. military unit housed at the base, contracted with a technology firm for “Endeca maintenance,” according to spending records. The unit paid for Endeca support until at least 2019. Some of the transactions are coded Operation Enduring Freedom, the official name used by the U.S. government for the war in Afghanistan. Past job listings for developers at Guantánamo Bay describe “Endeca/Oracle Guided Search” as a requirement.
As Endeca gained intelligence and defense clients in the early 2000s, Oracle was building its base of police clients. The company is known for its relational databases, used to crunch large volumes of data. Publicly available contracts show that one major project was CPD’s Citizen Law Enforcement Analysis and Reporting, or CLEAR, an Oracle data warehouse that today is at the center of a campaign by activists to erase discriminatory data on immigrants and people of color collected by police enforcement agencies.
Along with millions of arrest records, CLEAR houses data on purported gang members that activists say has been used by police to discriminate against people. “It’s racist,” said Xanat Sobrevilla, an organizer with Organized Communities Against Deportations in Chicago, one of the groups behind the campaign Erase the Database. “Ninety-five percent of the names are Black and Latinx. It doesn’t serve our purpose of reducing crime and keeping us safe. It just serves the purpose of being able to justify more policing and brutality.” Erase the Database has documented children as young as 9 who were listed in the database.
“What the Oracles of the world are doing is trying to make a play to be the platform of policing.”
By 2011, when Oracle acquired Endeca for a $1.1 billion, U.S. police departments were showing interest in data-driven policing, in which surveillance feeds and personal data are used to identify potential criminals or crime hot spots. Oracle claims to house the world’s largest collection of third-party personal data, and it became one of several large tech companies to venture into the fraught field. “What the Oracles of the world are doing is trying to make a play to be the platform of policing,” said Andrew Guthrie Ferguson, a law professor at American University and author of “The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement.” “The calculation is, ‘We have all this personal data. Police need personal data. We know how to do data analytics, and we’re doing it in the corporate realm. What if we move into the government space? Wouldn’t that be great?’”
Because large tech companies like Oracle offer a broad variety of technologies, their forays into policing are harder to track than those of more niche companies like Palantir. But their work is no less concerning, said Ferguson. “It should raise red flags when big tech enters the policing space because it takes two forms of unreflective power — big tech and policing — and puts them together.”
In Chicago, Oracle’s acquisition of Endeca came amid police paranoia about the Occupy movement. CPD appears to be one of the first departments to use the software to mine social media. An Oracle white paper calls Endeca’s adoption by CPD a “proof-of-concept” exercise and claims that Endeca helped police zero in on threats during the protests. By “melding Social Media [sic] information with departmental 911 and force deployment information,” the white paper says, “officers could be rapidly redeployed to preemptively address evolving threats.”
As the protests got underway, Tomlinson, the former Oracle employee, tweeted updates. On May 20, 2012, the same day that police drew scrutiny for striking demonstrators with batons, he tweeted that Endeca was processing 8,000 tweets an hour. One day later, he said that traffic had gone up to 20,000 tweets an hour.
The software allowed police to sort posts by clout score, filter for “negative” sentiment, and zero in on specific user accounts.
A few months later, Tomlinson presented his takeaways. While appearing to click through actual data from the protests, he showed how police could use Endeca to conduct targeted searches, drill down to identify users by location, and draw on information from 15,000 incidents ingested from the National Counterterrorism Center. The software allowed police to sort posts by clout score, filter for “negative” sentiment, and zero in on specific user accounts, he said, noting, “You can start to search through the data based on anything that you might be interested in, such as the word ‘protests.’”
Tomlinson did not respond to requests for comment.
In another talk, also given in 2012, Tomlinson said that, in addition to mining social media posts, CPD could use Endeca to draw on years’ worth of geotagged crime incident reports and arrest notes. One dataset contained 4.8 million incidents in Chicago. Tomlinson showed how, in a few steps, police could home in on just 12 incidents of interest. Palantir has made similar claims about its software.
An Oracle slide deck prepared for a Chinese audience claims that CPD also used Endeca and other software to predict crimes within a three-day window, based on weather, the number of emergency calls, and arrest figures, among other data.
Scholars say that such calculations simply reinforce existing police biases. “It becomes a feedback loop, where the data tells the police to continue patrolling and stopping and surveilling the same people and the same neighborhoods that they’ve always been patrolling and stopping and surveilling,” says Andy Clarno, who coordinates the University of Illinois at Chicago’s Policing in Chicago Research Group and assists with Erase the Database. “And yet it appears as if it’s somehow objective and based in math, rather than based in a long history of racist practices.”
Oracle describes Endeca as an almost magical tool that could lead to investigative breakthroughs. One Oracle document says that Endeca helps address gaps in intelligence — what former Defense Secretary Donald Rumsfeld famously called the “unknown unknowns.” “Endeca Information Discovery is particularly powerful to address the pain of ‘we do not know what we do not know,’” the document claims. In one talk, Tomlinson describes the platform as a way for police to sniff around for suspicious activity, without any suspect or incident in mind. “They don’t know what questions they have up front, what they want to ask of the data — so they just start hitting search, they just start hitting guide nav, start drilling, and things pop up.”
That’s an alarming claim, say civil liberties advocates. “Sifting through data and social media posts in hopes of finding ‘unknown unknowns’ sounds more like dragnet surveillance than a targeted and responsible investigatory tactic,” said Matthew Guariglia, a policy analyst with the Electronic Frontier Foundation who focuses on surveillance and policing. “It’s irresponsible and careless from a civil liberties perspective to collect up a vast amount of data in hopes that something in the pile will be a precisely ‘unknown’ line of investigation.”
It is not clear how long Chicago police continued to use Endeca and whether they still use it today. CPD did not respond to a request to comment for this story.
As anger over police killings mounts, several tech companies have tried to downplay their work with U.S. police; Oracle openly boasts about it. “We proudly partner with law enforcement,” Oracle executive vice president Ken Glueck told The Intercept in an April interview, adding that in some cases Oracle takes those same products overseas. “It shouldn’t surprise anybody that if you’ve done something successful in one place that you’re going to market that success story elsewhere.”
“It shouldn’t surprise anybody that if you’ve done something successful in one place that you’re going to market that success story elsewhere.”
Glueck and others have tried to distance the company from reports of collaboration with authorities in China. There, Oracle’s marketing contradicts the image that executives have tried to cultivate as patriotic servants of the U.S. government. (Last year, that image helped Oracle and Walmart edge out rivals in a bid to control U.S. operations for the Chinese-owned social media company TikTok, a deal that is now in limbo.) And yet documents show that after testing the software on the NATO protesters, the tech giant also marketed it for police use there.
In China, one slide deck marked “CONFIDENTIAL” promotes Endeca for use in a provincial “police cloud” project, an effort to centrally house police data that has serious implications for human rights. The document recommends Endeca as one of several Oracle products well-suited to behavior and language analysis and to a “real-time warning” system. A second deck intended for a Chinese government audience shows social media and other data feeding into Endeca and other Oracle software.
Several presentation documents mention Siebel, another Oracle product extensively used by U.S. government and defense clients. They also tout the use of Oracle or Oracle partner technology for speech, facial, and license plate recognition and analysis of telephone metadata.
In April, The Intercept reported that Oracle works with resellers with extensive ties to the Chinese government. One such Oracle partner, Great Wall, names Endeca in a recent list of its offerings on Oracle’s site. Another close Oracle partner, Digital China, sells Siebel.
Oracle CEO Safra Catz sits on the National Security Commission on Artificial Intelligence, a Defense Department-backed initiative aimed at maintaining U.S. dominance. In an April panel, she called the Chinese government’s use of AI for surveillance “antithetical to our values.”
Endeca’s path starkly contradicts that claim. Like other big tech companies before it, Oracle has exported invasive tools first tested in the United States. The Chinese system is simply what happens when there are few checks on surveillance, a situation that serves profit-driven companies just fine.
“At the end of the day, it’s really disappointing and alarming that so much energy was spent on surveilling protesters,” said McLoughlin, the Occupy organizer. “It still boggles my mind.”