New York startup Dataminr aggressively markets itself as a tool for public safety, giving institutions from local police to the Pentagon the ability to scan the entirety of Twitter using sophisticated machine-learning algorithms. But company insiders say their surveillance efforts were often nothing more than garden-variety racial profiling, powered not primarily by artificial intelligence but by a small army of human analysts conducting endless keyword searches.
In July, The Intercept reported that Dataminr, leveraging its status as an official “Twitter Partner,” surveilled the Black Lives Matter protests that surged across the country in the wake of the police killing of George Floyd. Dataminr’s services were initially designed to help hedge funds turn the first glimmers of breaking news on social media into market-beating trades, enabling something like a supercharged version of professional Twitter dashboard TweetDeck. They have since been adopted by media outlets, the military, police departments, and various other organizations seeking real-time alerts on chaos and strife.
Dataminr’s early backers included Twitter and the CIA, and it’s not hard to see why the startup looked so promising to investors. Modern American policing hungers for vast quantities of data — leads to chase and intelligence to aggregate — and the entirety of online social media is now considered fodder. In a 2019 pitch to the FBI, Dataminr said its goal was “to integrate all publicly available data signals to create the dominant information discovery platform.” In addition to the bureau, the company has entered test programs and contracts with local and state police forces across the country.
But despite promises of advanced crime-sniffing technology, conversations with four sources directly familiar with Dataminr’s work, who asked to remain anonymous because they were not permitted to speak to the press about their employment, suggest that the company has at times relied on prejudice-prone tropes and hunches to determine who, where, and what looks dangerous. Through First Alert, its app for public sector clients, Dataminr has offered a bespoke, scariest possible version of the web: a never-ending stream of notifications of imminent or breaking catastrophes to investigate. But First Alert’s streams were assembled in ways prone to racial bias, sources said, by teams of “Domain Experts” assigned to rounding up as many “threats” as possible. Hunting social media for danger and writing alerts for cops’ iPhones and laptop screens, these staffers brought their prejudices and preconceptions along with their expertise, and were pressed to search specific neighborhoods, streets, and even housing complexes for crime, sources said.
Hunting for “Possible Gang Members” on Twitter
One significant part of Dataminr’s work for police, the sources said, has been helping flag potential gang members. Police gang databases are typically poorly regulated and have become notorious vehicles for discriminatory policing, unjust sentencing, and the criminalization of children; they’re filled with the names of thousands and thousands of young people never actually accused of any crime. Dataminr sources who spoke to The Intercept didn’t know exactly how allegedly “gang-related” tweets and other social media posts flagged via Dataminr were ultimately used by the company’s police customers. But in recent years, social media monitoring has become an important way to fill gang databases.
Staffers are pressed to search specific neighborhoods, streets, and even housing complexes for crime.
As part of a broader effort to feed information about crime to police under the general rubric of public “threats,” Dataminr staffers attempted to flag potential violent gang activity without the aid of any special algorithms or fancy software, sources said; instead they pored over thousands and thousands of tweets, posts, and pictures, looking for armed individuals who appeared to be affiliated with a gang. It’s an approach that was neither an art nor a science and, according to experts in the field, is also a surefire way of putting vulnerable men and women of color under police scrutiny or worse.
“It wasn’t specific,” said one Dataminr source with direct knowledge of the company’s anti-gang work. “Anything that could be tangentially described as a [gang-related] threat” could get sucked into Dataminr’s platform.
With no formal training provided on how to identify or verify gang membership, Dataminr’s army of “Domain Experts” were essentially left to use their best judgment, or to defer to ex-cops on staff. If Dataminr analysts came across, say, a tweet depicting a man with a gun and some text that appeared to be gang-related, that could be enough to put the posting in a police-bound stream as containing a “possible gang member,” this source said, adding that there was little if any attempt to ever check whether such a weapon was legally possessed or obtained.
In practice, Dataminr’s anti-gang activity amounted to “white people, tasked with interpreting language from communities that we were not familiar with” coached by predominantly white former law enforcement officials who themselves “had no experience from these communities where gangs might be prevalent,” per a second source. “The only thing we were using to identify them was hashtags, possibly showing gang signs, and if there was any kind of weapon in the photo,” according to the first source. There was “no institutional definition of ‘potential gang member,’ it was open to interpretation.” All that really mattered, these sources say, was finding as much danger as possible, real or perceived, and transmitting it to the police.
In its written comments, Dataminr stated that “First Alert does not identify indicators of violent gang association or identify whether an event is a crime.” Asked whether the company acknowledges providing any gang-related alerts or comments to customers, McGee did not directly respond, saying only that “there is no alert topic for crime or gang-related events.” Dataminr did not respond to a question about the race of former law enforcement personnel it employs.
There was no institutional definition of “potential gang member.”
A Dataminr source said that there never appeared to be any minimum age on who was flagged as a potential gang affiliate: “I can definitely recall kids of school-age nature, late middle school to high school” being ingested into Dataminr’s streams. Unlike Dataminr’s work identifying emerging threats in Europe or the Middle East, the company’s counter-gang violence monitoring felt slapdash by comparison, two Dataminr sources said. “There’s a great deal of latitude in determining [gang membership], it wasn’t like other kind of content, it was far more nebulous,” said the first source, who added that Dataminr staff were at times concerned that the pictures they were flagging as evidence of violent gang affiliation could be mere adolescent tough-guy posturing, completely out of context, or simply dated: “We had no idea how old they were,” the source added. “People save [and repost] photos. It was completely open to interpretation.”
While any image depicting a “possible gang member” with a weapon would immediately be flagged and transmitted to the police, Dataminr employees, tasked with finding “threats” nationwide, worried why some armed men were subject to software surveillance while others were not. “The majority of the focus stayed toward gangs that are historically black and Latino,” said one source. “More effort was put into inner-city Chicago gangs than the Three Percenters or things related to Aryan Brotherhood,” this source continued, adding that they recalled worried conversations with colleagues about why the company spent so much time finding online images of armed black and brown people — who may have owned or possessed such a weapon legally — but not white people with guns.
Two Dataminr sources directly familiar with these operations told The Intercept that although the company’s teams of Domain Experts were untrained and generally uninformed on the subject of American street gangs, the company employed ex-law enforcement agents as in-house “gang experts” to help scan social media.
Human Stereotypes Instead of Machine Intelligence
Although Dataminr has touted itself as an “AI” firm, two company sources told The Intercept this overstated matters, and that most of the actual monitoring at the company was done by humans scrolling, endlessly, through streams of tweets. “They kept saying ‘the algorithm’ was doing everything,” said a Dataminr source, but “it was actually mostly humans.” But this large staff of human analysts was still expected to deliver the superhuman output of an actual product based on some sort of “artificial intelligence” or sophisticated machine learning. Inadequate training combined with strong pressure to crank out content to meet internal quotas and impress police clientele dazzled by “artificial intelligence” presentations led to predictable problems, the two sources said. The company approach to crime fighting began to resemble “creating content in their heads that isn’t there,” said the second source, “thinking Dataminr can predict the future.”
As Dataminr can’t in fact predict crime before it occurs, these sources say that analysts often fell back on stereotyped assumptions, with the company going so far as providing specific guidance to seek crime in certain areas, with the apparent assumption that the areas were rife with criminality. Neighborhoods with large communities of color, for example, were often singled out for social media surveillance in order to drum up more threat fodder for police.
Although the company touts itself as an “AI” firm, most of the actual monitoring was apparently done by humans scrolling, endlessly, through streams of tweets.
“It was never targeted towards other areas in the city, it was poor, minority-populated areas,” explained one source. “Minneapolis was more focused on urban areas downtown, but weren’t focusing on Paisley Park — always ‘downtown areas,’ areas with projects.”
The two sources told The Intercept that Dataminr had at times asked analysts to create information feeds specific to certain housing projects populated predominantly by people of color, seeming to contradict the company’s 2016 claim that it does not provide any form of “geospatial analysis.” “Any sort of housing project, bad neighborhood, bad intersection, we would definitely put those in the streams,” explained one source. “Any sort of assumed place that was dangerous. It was up to the Domain Experts. It was just trial and error to see what [keywords] brought things up. Dataminr obviously didn’t care about unconscious bias, they just wanted to get the crimes before anyone else.”
Two Dataminr sources familiar with the company’s Twitter search methodology explained that although Dataminr isn’t able to provide its clients with direct access to the locational coordinates sometimes included in tweet metadata, the company itself still uses location metadata embedded in tweets, and is able to provide workarounds when asked, offering de facto geospatial analysis. At times this was accomplished using a simple keyword search through the company’s access to the Twitter “firehose,” a data stream containing every public tweet from the moment it’s published. Keyword-based trawling would immediately alert Dataminr anytime anyone tweeted publicly about a particular place. “Any time that Malcolm X Boulevard was mentioned, we would be able to see it” in a given city, explained one source by way of a hypothetical.
Dataminr wrote in its statement to The Intercept that “First Alert identifies breaking news events without any regard to the racial or ethnic composition of an area where a breaking news event occurs. … Race, ethnicity, or any other demographic characteristic of the people posting public social media posts about events is never part of determining whether a breaking news alert is sent to First Alert clients.” It also said that “First Alert does not enable any type of geospatial analysis. First Alert provides no feature or function that allows a user to analyze the locations of specific social media posts, social media users or plot social media posts on a map.”
Asked if Dataminr domain experts look for social media leads specific to certain geographic areas, McGee did not deny that they do, writing only, “Dataminr detects events across the entire world wherever they geographically occur.”
“In a way, Dataminr and law enforcement were perpetuating each other’s biases.”
On other occasions, according to one source, Dataminr employed the use of a “pseudo-predictive algorithm” that scrapes a user’s past tweets for clues about their location, though they emphasized this tool functioned with “not necessarily any degree of accuracy.” This allows Dataminr to build, for example, bespoke in-house surveillance streams of potential “threats” pegged to areas police wish to monitor (for instance, if a police department wanted more alerts about threatening tweets from or about Malcolm X Boulevard, or a public housing complex). These sources stressed that Dataminr would try to provide these customized “threat” feeds whenever asked by police clients, even as staff worried it amounted to blatant racial profiling and the propagation of law enforcement biases about where crimes were likely to be committed.
Dataminr told The Intercept in response that “First Alert provides no custom solutions for any government organizations, and the same First Alert product is used by all government organizations. All First Alert customers have access to the same breaking news alerts.”
Even if public sector customers use the same version of the First Alert app, the company itself has indicated that the alerts provided to customers could be customized: Its 2019 presentation to the FBI includes a slide stating that clients can adjust “user-defined criteria” like “topic selection” and “geographic filters” prior to “alert delivery.” Shown the below slide from the presentation, Dataminr said it was consistent with its statement.
The specially crafted searches focused on areas of interest to police were done “mainly looking for criminal incidents in those areas,” one source explained. When asked by police departments to find criminality on social media, “areas that were predominantly considered more white” were routinely overlooked, while poorer neighborhoods of color were mined for crime content.
Another source told The Intercept of an internal project they were placed on as part of a trial relationship with the city government of Chicago, for which they were instructed to scan Twitter for “Entertainment news from the North Side, crime news from the South Side.” (It is not clear if these instructions came from the city of Chicago; the Chicago Police Department did not respond to a request for comment.)
This source explained that through its efforts to live up to the self-created image as an engine of bleeding-edge “intelligence” about breaking events, “Dataminr is in a lot of ways regurgitating whatever the Domain Experts believe people want to see or hear” — those people in this case being the police. This can foster a sort of feedback loop of racial prejudice: stereotyped assumptions of what sort of keyword searches and locales might yield evidence of criminality are then used to bolster the stereotyped assumptions of American police. “In a way, Dataminr and law enforcement were perpetuating each other’s biases,” the source said, forming a sort of Twitter-based perpetual motion machine of racial confirmation bias: “We would make keyword-based streams [for police] with biased keywords, then law enforcement would tweet about the crimes, then we would pick up those tweets.”
Experts Alarmed by Techniques
Experts on criminal justice, gang violence, and social media approached for this story expressed concern that Dataminr’s surveillance services have carried racially prejudiced policing methods onto the internet. “I thought there was enough info out there to tell people to not do this,” Desmond Patton, a professor and researcher on gang violence and the web at Columbia University’s School of Social Work, told The Intercept. Social media surveillance-based counter-gang efforts routinely miss any degree of nuance or private meaning, explained Patton, instead relying on the often racist presumption that “if something looks a certain way it must mean something,” an approach that attempts “no contextual understanding of how emoji are used, how hashtags are used, [which] misses whole swaths of deep trauma and pain” in policed communities.
Systematized social media surveillance will only accelerate these inequities.
Babe Howell, a professor at CUNY School of Law and a criminal justice scholar, shared this concern over context-flattening Twitter surveillance and the lopsided assessment of who looks dangerous. “Most adolescents experiment with different kinds of personalities,” said Howell, explaining that using “the artistic expression, the musical expression, the posturing and bragging and representations of masculinities in marginalized communities” as a proxy for possible criminality is far worse than useless. “For better or worse we have the right to bear arms, and using photos including images of weapons to collect information about people based on speech and associations just imposes one wrong on the next and two wrongs do not make a right.”
Howell said the potential damage caused by labeling someone a “possible gang member,” whether in a formal database or not, is very real. Labeling someone as gang-affiliated leads to what Howell described as “two systems of justice that are separate and unequal,” because “if someone is accused of being a gang member on the street they will be policed with heightened levels of tension, often resulting in excessive force. In the criminal justice system they’ll be denied bail, speedy trial rights, typical due process rights, because they’re seen as more of a threat. Gang allegations carry this level of prejudicial bad character evidence that would not normally be admissible.”
All of this reflects crises of American overpolicing that far predate computers, let alone Twitter. But systematized social media surveillance will only accelerate these inequities, said Ángel Díaz, a lawyer and researcher at the Brennan Center for Justice. “Communities of color use social media in ways that are readily misunderstood by outsiders,” explained Díaz. “People also digitally brand themselves in ways that can be disconnected from reality. Online puffery about gang affiliation can be done for a variety of reasons, from chasing notoriety to deterring real-world violence. For example, a person might take photos with a borrowed gun and later post them to social media over the course of a week to create a fake persona and intimidate rivals.” Similarly fraught was Dataminr’s practice of honing in on certain geographical areas: “Geo-fencing around poor neighborhoods and communities of color only aggravates this potential by selectively looking for suspicious behavior in places they’re least equipped to understand.”
Of course, both Twitter and Dataminr vehemently maintain that the service they offer — monitoring many different social networks simultaneously for any information that might be of interest to police, including protests — does not constitute surveillance, pointing to Twitter’s strict prohibitions against surveillance by partners. “First Alert does not provide any government customers with the ability to target, monitor or profile social media users, perform geospatial, link or network analysis, or conduct any form of surveillance,” Dataminr wrote to The Intercept.
But it’s difficult to wrap one’s head around these denials, given that Twitter’s anti-surveillance policy reads like a dry, technical description of exactly what Datminr is said to have engaged in. Twitter’s developer terms of service — which govern the use of the firehose — expressly prohibit using tweets for “conducting or providing surveillance or gathering intelligence,” and orders developers to “Never derive or infer, or store derived or inferred, information about a Twitter user’s … [a]lleged or actual commission of a crime.”
Twitter spokesperson Lindsay McCallum declined to answer any questions about Dataminr’s surveillance practices, but stated “Twitter prohibits the use of our developer services for surveillance purposes. Period.” McCallum added that Twitter has “done extensive auditing of Dataminr’s tools, including First Alert, and have not seen any evidence that they’re in violation of our policies,” but declined to discuss this audit on the record.
“Twitter’s policy does not line up with its actions,” according to Díaz. “Dataminr is clearly using the Twitter API to conduct surveillance on behalf of police departments, and passing along what it finds in the form of ‘news alerts.’ This is a distinction without difference. Conducting searches of Twitter for leads about potential gang activity, much like its monitoring of Black Lives Matter protests, is surveillance. Having Dataminr analysts run searches and summarize their findings before passing it along to police doesn’t change this reality.”
“In this Dataminr example, you’re not talking about cops, you’re now talking about private individuals [who] lack the even basic knowledge that officers are coming from.”
Dataminr’s use of the Twitter firehose to infer gang affiliation is “totally terrifying,” said Forrest Stuart, a sociologist and head of the Stanford Ethnography Lab, who explained that even for an academic specialist with a career of research and field work spent understanding the way communities express themselves on social media, grasping the intricacies of someone else’s self-expression can be fraught. “There are neighborhoods that are less than a mile away from the neighborhoods where I have have intimate knowledge, where if I open up their Twitter accounts, I trust myself to get a pretty decent sense of what their hashtags and their phrases mean,” Stuart said. “But I know that I am still inaccurate because I’m not there in that community. So, if I am concerned, as a researcher who specializes in this stuff … then you can imagine my concern and hearing that police officers are using this.”
Stuart added that “research has long shown that police officers really lack the kind of cultural competencies and knowledge that’s required for understanding the kinds of behavioral and discursive practices, aesthetic practices, taken up by urban black and brown youth,” but that “here in this Dataminr example, you’re not talking about cops, you’re now talking about private individuals [who] lack the even basic knowledge that officers are coming from, some knowledge of criminal behavior or some knowledge of gang behavior.”
Stuart believes Twitter owes its over 100 million active users, at the very least, a warning that their tweets might become fodder for a semi-automated crime dragnet, explaining that he himself uses the Twitter firehose for his ethnographic research, but had to first consent to a substantial data usage agreement aimed at minimizing harm to the people whose tweets he might study — guidelines that Dataminr doesn’t appear to have been held to. “If it doesn’t violate Twitter’s conditions by letter, doesn’t it violate them at least in the essence of what Twitter’s trying to do?” he asked. “Aren’t the terms and conditions set up so that Twitter isn’t leading to negative impacts or negative treatment of people? At minimum, if they’re gonna continue feeding stuff to Dataminr and stuff to police, don’t they have some kind of responsibility, at least an ethical obligation, to let [users] know that ‘Hey, some of your information is going to cops’?” When asked whether Twitter would ever provide such a notice to users, spokesperson McCallum provided a link to a section of the Twitter terms of service that makes no mention of police or law enforcement.