There’s widespread concern that video cameras will use facial recognition software to track our every public move. Far less remarked upon — but every bit as alarming — is the exponential expansion of “smart” video surveillance networks.
Private businesses and homes are starting to plug their cameras into police networks, and rapid advances in artificial intelligence are investing closed-circuit television, or CCTV, networks with the power for total public surveillance. In the not-so-distant future, police forces, stores, and city administrators hope to film your every move — and interpret it using video analytics.
The rise of all-seeing smart camera networks is an alarming development that threatens civil rights and liberties throughout the world. Law enforcement agencies have a long history of using surveillance against marginalized communities, and studies show surveillance chills freedom of expression — ill effects that could spread as camera networks grow larger and more sophisticated.
To understand the situation we’re facing, we have to understand the rise of the video surveillance industrial complex — its history, its power players, and its future trajectory. It begins with the proliferation of cameras for police and security, and ends with a powerful new industry imperative: complete visual surveillance of public space.
In their first decades of existence, CCTV cameras were low-resolution analog devices that recorded onto tapes. Businesses or city authorities deployed them to film a small area of interest. Few cameras were placed in public, and the power to track people was limited: If police wanted to pursue a person of interest, they had to spend hours collecting footage by foot from nearby locations.
In the late 1990s, video surveillance became more advanced. A company called Axis Communications invented the first internet-enabled surveillance camera, which converted moving images to digital data. New businesses like Milestone Systems built Video Management Systems, or VMS, to organize video information into databases. VMS providers created new features like motion sensor technology that alerted guards when a person was caught on camera in a restricted area.
As time marched on, video surveillance spread. On one account, about 50 years ago, the United Kingdom had somewhere north of 60 permanent CCTV cameras installed nationwide. Today, the U.K. has over 6 million such devices, while the U.S. has tens of millions. According to marketing firm IHS Markit, 1 billion cameras will be watching the world by the end of 2021, with the United States rivaling China’s per person camera penetration rate. Police can now track people across multiple cameras from a command-and-control center, desktop, or smartphone.
While it is possible to link thousands of cameras in a VMS, it is also expensive. To increase the amount of CCTVs available, cities recently came up with a clever hack: encouraging businesses and residents to place privately owned cameras on their police network — what I call “plug-in surveillance networks.”
Detroit has popularized plug-in surveillance networks through its controversial Project Green Light program. With Project Green Light, businesses can purchase CCTV cameras and connect them to police headquarters. They can also place a bright green light next to the cameras to indicate they are part of the police network. The project claims to deter crime by signaling to residents: The police are watching you.
Detroit is not alone. Chicago, New Orleans, New York, and Atlanta have also deployed plug-in surveillance networks. In these cities, private businesses and/or homes provide feeds that are integrated into crime centers so that police can access live streams and recorded footage. The police department in New Haven, Connecticut, told me they are looking into plug-in surveillance, and others are likely considering it.
The number of cameras on police networks now range from tens of thousands (Chicago) to several hundred (New Orleans). With so many cameras in place, and only a small team of officers to watch them, law enforcement agencies face a new challenge: How do you make sense of all that footage?
The answer is video analytics.
Around 2006, a young Israeli woman was recording family videos every weekend, but as a student and parent, she didn’t have time to watch them. A computer scientist at her university, Professor Shmuel Peleg, told me he tried to create a solution for her: He would take a long video and condense the interesting activity into a short video clip.
His solution failed: It only worked on stationary cameras, and the student’s video camera was moving when she filmed her family.
Peleg soon found another use case in the surveillance industry, which relies on stationary cameras. His solution became BriefCam, a video analytics firm that can summarize video footage from a scene across time so that investigators can view all relevant footage in a short space of time.
Using a feature called Video Synopsis, BriefCam overlays footage of events happening at different times as if they are appearing simultaneously. For example, if several people walked past a camera at 12:30 p.m., 12:40 p.m., and 12:50 p.m., BriefCam will aggregate their images into a single scene. Investigators can view all footage of interest from a given day in minutes instead of hours.
Thanks to rapid advances in artificial intelligence, summarization is just one feature in BriefCam’s product line and the rapidly expanding video analytics industry.
Object recognition can recognize faces, animals, cars, weapons, fires, and other things, as well as human characteristics like gender, age, and hair color.
Anomalous or unusual behavior detection works by recording a fixed area for a period of time — say, 30 days — and determining “normal” behavior for that scene. If the camera sees something unusual — say, a person running down a street at 3:00 a.m. — it will flag the incident for attention.
Video analytics systems can analyze and search across real-time streams or recorded footage. They can also isolate individuals or objects as they traverse a smart camera network.
Chicago; New Orleans; Detroit; Springfield, Massachusetts; and Hartford, Connecticut, are some of the cities currently using BriefCam for policing.
With city spaces blanketed in cameras, and video analytics to make sense of them, law enforcement agencies gain the capacity to record and analyze everything, all the time. This provides authorities the power to index and search a vast database of objects, behaviors, and anomalous activity.
In Connecticut, police have used video analytics to identify or monitor known or suspected drug dealers. Sergeant Johnmichael O’Hare, former Director of the Hartford Real-Time Crime Center, recently demonstrated how BriefCam helped Hartford police reveal “where people go the most” in the space of 24 hours by viewing footage condensed and summarized in just nine minutes. Using a feature called “pathways,” he discovered hundreds of people visiting just two houses on the street and secured a search warrant to verify that they were drug houses.
Video analytics startup Voxel51 is also adding more sophisticated searching to the mix. Co-founded by Jason Corso, a professor of electrical engineering and computer science at the University of Michigan, the company offers a platform for video processing and understanding.
Corso told me his company hopes to offer the first system where people can “search based on semantic content about their data, such as, ‘I want to find all the video clips that have more than 3-way intersections … with at least 20 vehicles during daylight.’” Voxel51 “tries to make that possible” by taking video footage and “turning it into structured searchable data across different types of platforms.”
Unlike BriefCam, which analyzes video using nothing but its own software, Voxel51 offers an open platform which allows third parties to add their own analytics models. If the platform succeeds, it will supercharge the ability to search and surveil public spaces.
Corso told me his company is working on a pilot project with the Baltimore police for their CitiWatch surveillance program and plans to trial the software with the Houston Police Department.
As cities start deploying a wide range of monitoring devices from the so-called internet of things, researchers are also developing a technique known as video analytics and sensor fusion, or VA/SF, for police intelligence. With VA/SF, multiple streams from sensors are combined with video analytics to reduce uncertainties and make inferences about complex situations. As one example, Peleg told me BriefCam is developing in-camera audio analytics that uses microphones to discern actions that may confuse AI systems, such as whether people are fighting or dancing.
VMSs also offer smart integration across technologies. Former New Haven Chief of Police Anthony Campbell told me how ShotSpotters, controversial devices that listen for gunshots, integrate with specialized software so when a gun is fired, nearby swivel cameras instantly alter their direction to the location of the weapons discharge.
Officers can also use software to lock building doors from a control center, and companies are developing analytics to alert security if one car is being followed by another.
Video analytics captures a wide variety of data about the areas covered by smart camera networks. Not surprisingly, the information captured is now being proposed for predictive policing: the use of data to predict and police crime before it happens.
In 2002, the dystopian film “Minority Report” depicted a society using “pre-crime” analytics for police to intervene in lawbreaking before it occurs. In the end, the officers in charge tried to manipulate the system for their own interests.
A real-world version of “Minority Report” is emerging through real-time crime centers used to analyze crime patterns for police. In these centers, law enforcement agencies ingest information from sources like social media networks, data brokers, public databases, criminal records, and ShotSpotters. Weather data is even included for its impact on crime (because “bad guys don’t like to get wet”).
In a 2018 document, the data storage firm Western Digital and the consultancy Accenture predicted mass smart camera networks would be deployed “across three tiers of maturity.” This multi-stage adoption, they contended, would “allow society” to gradually abandon “concerns about privacy” and instead “accept and advocate” for mass police and government surveillance in the interest of “public safety.”
Tier 1 encompasses the present where police use CCTV networks to investigate crimes after-the-fact.
By 2025, society will reach Tier 2 as municipalities transform into “smart” cities, the document said. Businesses and public institutions, like schools and hospitals, will plug camera feeds into government and law enforcement agencies to inform centralized, AI-enabled analytics systems.
Tier 3, the most predictive-oriented surveillance system, will arrive by 2035. Some residents will voluntarily donate their camera feeds, while others will be “encouraged to do so by tax-break incentives or nominal compensation.” A “public safety ecosystem” will centralize data “pulled from disparate databases such as social media, driver’s licenses, police databases, and dark data.” An AI-enabled analytics unit will let police assess “anomalies in real time and interrupt a crime before it is committed.”
That is to say, to catch pre-crime.
While CCTV surveillance began as a simple tool for criminal justice, it has grown into a multibillion-dollar industry that covers multiple industry verticals. From policing and smart cities to schools, health care facilities, and retail, society is moving toward near-complete visual surveillance of commercial and urban spaces.
Denmark-based Milestone Systems, a top VMS provider with half its revenues in the U.S., had less than 10 employees in 1999. Today they are a major corporation that claims offices in over 20 countries.
Axis Communications used to be a network printer outfit. They have since become a leading camera provider pushing over $1 billion in sales per year.
BriefCam began as a university project. Now it is among the world’s top video analytics providers, with clients, it says, spanning over 40 countries.
Over the past six years, Canon purchased all three, giving the imaging conglomerate ownership of industry giants in video management software, CCTV cameras, and video analytics. Motorola recently acquired a top VMS provider, Avigilon, for $1 billion. In turn, Avigilon and other large firms have purchased their own companies.
The public is paying for their own high-tech surveillance three times over.
Familiar big tech giants are also in on the action. Lieutenant Patrick O’Donnell of the Chicago police force told me his department is working on a non-disclosure agreement with Google for a video analytics pilot project to detect people reacting to gunfire, and if they are in the prone position, so the police can receive real-time alerts. (Google did not respond to a request for comment.)
Video monitoring networks inevitably entangle and implicate a whole ecosystem of vendors, some of whom have offered, or may yet offer, services specifically targeted at such systems. Microsoft, Amazon, IBM, Comcast, Verizon, and Cisco are among those enabling the networks with technologies like cloud services, broadband connectivity, or video surveillance software.
In the public sector, the National Institute of Standards and Technology is funding “public analytics” and communications networks like the First Responder Network Authority, or FirstNet, for real-time video and other surveillance technologies. FirstNet will cost $46.5 billion, and is being built by AT&T.
Voxel51 is another NIST-backed venture. The public is thus paying for their own high-tech surveillance three times over: first, through taxes for university research; second, through grant money for the formation of a for-profit startup (Voxel51); and third, through the purchase of Voxel51’s services by city police departments using public funds.
With the private and public sector looking to expand the presence of cameras, video surveillance has become a new cash cow. As Corso put it, “there will be something like 45 billion cameras in the world within a few decades. That’s a lot of (video) pixels. For the most part, most of those pixels go unused.” Corso’s estimate mirrors a 2017 forecast from New York venture capital firm LDV, which believes smartphones will evolve to have even more cameras than they do today, contributing to the growth.
Companies that began with markets for police and security are now diversifying their offerings to the commercial sector. BriefCam, Milestone, and Axis advertise the use of video analytics for retailers, where they can monitor foot traffic, queue length, shopping patterns, floor layouts, and conduct A/B testing. Voxel51 has an option built for the fashion industry and plans to expand across industry verticals. Motionloft offers analytics for smart cities, retailers, commercial real estate, and entertainment venues. Other examples abound.
Public and private sector actors are pressing for a world full of smart video surveillance. Peleg, for example, told me of a use case for smart cities: If you drive into the city, you could “just park and go home” without using a parking meter. The city would send a bill to your house at the end of the month. “Of course, you lose your privacy,” he added. “The question is, do you really care about Big Brother knows where you are, what you do, etc.? Some people may not like it.”
Those who do not like new forms of Big Brother surveillance are presently fixated on facial recognition. Yet they have largely ignored the shift to smart camera networks — and the industrial complex driving it.
Thousands of cameras are now set to scrutinize our every move, informing city authorities whether we are walking, running, riding a bike, or doing anything “suspicious.” With video analytics, artificial intelligence is used to identify our sex, age, and type of clothes, and could potentially be used to categorize us by race or religious attire.
Such surveillance could have a severe chilling effect on our freedom of expression and association. Is this the world we want to live in?
The capacity to track individuals across smart CCTV networks can be used to target marginalized communities. The detection of “loitering” or “shoplifting” by cameras concentrated in poor neighborhoods may deepen racial bias in policing practices.
This kind of racial discrimination is already happening in South Africa, where “unusual behavior detection” has been deployed by smart camera networks for several years.
In the United States, smart camera networks are just emerging, and there is little information or transparency about their use. Nevertheless, we know surveillance has been used throughout history to target oppressed groups. In recent years, the New York Police Department secretly spied on Muslims, the FBI used surveillance aircraft to monitor Black Lives Matter protesters, and the U.S. Customs and Border Protection began building a high-tech video surveillance “smart border” across the Tohono O’odham reservation in Arizona.
Law enforcement agencies claim smart camera networks will reduce crime, but at what cost? If a camera could be put in every room in every house, domestic violence might go down. We could add automated “filters” that only record when a loud noise is detected, or when someone grabs a knife. Should police put smart cameras inside every living room?
The commercial sector is likewise rationalizing the advance of surveillance capitalism into the physical domain. Retailers, employers, and investors want to put us all under smart video surveillance so they can manage us with visual “intelligence.”
When asked about privacy, several major police departments told me they have the right to see and record everything you do as soon as you leave your home. Retailers, in turn, won’t even approach public disclosure: They are keeping their video analytics practices secret.
In the United States, there is generally no “reasonable expectation” of privacy in public. The Fourth Amendment encompasses the home and a few public areas we “reasonably” expect to be private, such as a phone booth. Almost everything else — our streets, our stores, our schools — is fair game.
Even if rules are updated to restrict the use of video surveillance, we cannot guarantee those rules will remain in place. With thousands of high-res cameras networked together, a dystopian surveillance state is a mouse click away. By installing cameras everywhere, we are opening a Pandora’s box.
To address the privacy threats of smart camera networks, legislators should ban plug-in surveillance networks and restrict the scope of networked CCTVs beyond the premise of a single site. They should also limit the density of camera and sensor coverage in public. These measures would block the capacity to track people across wide areas and prevent the phenomenon of constantly being watched.
The government should also ban video surveillance analytics in publicly accessible spaces, perhaps with exceptions for rare cases such as the detection of bodies on train tracks. Such a ban would disincentivize mass camera deployments because video analytics is needed to analyze large volumes of footage. Courts should urgently reconsider the scope of the Fourth Amendment and expand our right to privacy in public.
Police departments, vendors, and researchers need to disclose and publicize their projects, and engage with academics, journalists, and civil society.
It is clear we have a crisis in the works. We need to move beyond the limited conversation of facial recognition and address the broader world of video surveillance, before it is too late.