In March, two veteran Facebook engineers found themselves grilled about the company’s sprawling data collection operations in a hearing for the ongoing lawsuit over the mishandling of private user information stemming from the Cambridge Analytica scandal.
The hearing, a transcript of which was recently unsealed, was aimed at resolving one crucial issue: What information, precisely, does Facebook store about us, and where is it? The engineers’ response will come as little relief to those concerned with the company’s stewardship of billions of digitized lives: They don’t know.
The admissions occurred during a hearing with special master Daniel Garrie, a court-appointed subject-matter expert tasked with resolving a disclosure impasse. Garrie was attempting to get the company to provide an exhaustive, definitive accounting of where personal data might be stored in some 55 Facebook subsystems. Both veteran Facebook engineers, with according to LinkedIn two decades of experience between them, struggled to even venture what may be stored in Facebook’s subsystems. “I’m just trying to understand at the most basic level from this list what we’re looking at,” Garrie asked.
“I don’t believe there’s a single person that exists who could answer that question,” replied Eugene Zarashaw, a Facebook engineering director. “It would take a significant team effort to even be able to answer that question.”
When asked about how Facebook might track down every bit of data associated with a given user account, Zarashaw was stumped again: “It would take multiple teams on the ad side to track down exactly the — where the data flows. I would be surprised if there’s even a single person that can answer that narrow question conclusively.”
In an emailed statement that did not directly address the remarks from the hearing, Meta spokesperson Dina El-Kassaby told The Intercept that a single engineer’s inability to know where all user data was stored came as no surprise. She said Meta worked to guard users’ data, adding, “We have made — and continue making — significant investments to meet our privacy commitments and obligations, including extensive data controls.”
The dispute over where Facebook stores data arose when, as part of the litigation, now in its fourth year, the court ordered Facebook to turn over information it had collected about the suit’s plaintiffs. The company complied but provided data consisting mostly of material that any user could obtain through the company’s publicly accessible “Download Your Information” tool.
Facebook contended that any data not included in this set was outside the scope of the lawsuit, ignoring the vast quantities of information the company generates through inferences, outside partnerships, and other nonpublic analysis of our habits — parts of the social media site’s inner workings that are obscure to consumers. Briefly, what we think of as “Facebook” is in fact a composite of specialized programs that work together when we upload videos, share photos, or get targeted with advertising. The social network wanted to keep data storage in those nonconsumer parts of Facebook out of court.
In 2020, the judge disagreed with the company’s contention, ruling that Facebook’s initial disclosure had indeed been too sparse and that the company must reveal data obtained through its oceanic ability to surveil people across the internet and make monetizable predictions about their next moves.
Facebook’s stonewalling has been revealing on its own, providing variations on the same theme: It has amassed so much data on so many billions of people and organized it so confusingly that full transparency is impossible on a technical level. In the March 2022 hearing, Zarashaw and Steven Elia, a software engineering manager, described Facebook as a data-processing apparatus so complex that it defies understanding from within. The hearing amounted to two high-ranking engineers at one of the most powerful and resource-flush engineering outfits in history describing their product as an unknowable machine.
The special master at times seemed in disbelief, as when he questioned the engineers over whether any documentation existed for a particular Facebook subsystem. “Someone must have a diagram that says this is where this data is stored,” he said, according to the transcript. Zarashaw responded: “We have a somewhat strange engineering culture compared to most where we don’t generate a lot of artifacts during the engineering process. Effectively the code is its own design document often.” He quickly added, “For what it’s worth, this is terrifying to me when I first joined as well.”
The remarks in the hearing echo those found in an internal document leaked to Motherboard earlier this year detailing how the internal engineering dysfunction at Meta, which owns Facebook and Instagram, makes compliance with data privacy laws an impossibility. “We do not have an adequate level of control and explainability over how our systems use data, and thus we can’t confidently make controlled policy changes or external commitments such as ‘we will not use X data for Y purpose,’” the 2021 document read.
The fundamental problem, according to the engineers in the hearing, is that Facebook’s sprawl has made it impossible to know what it consists of anymore; the company never bothered to cultivate institutional knowledge of how each of these component systems works, what they do, or who’s using them. There is no documentation of what happens to your data once it’s uploaded, because that’s just never been something the company does, the two explained. “It is rare for there to exist artifacts and diagrams on how those systems are then used and what data actually flows through them,” explained Zarashaw.
“It is rare for there to exist artifacts and diagrams on how those systems are then used and what data actually flows through them.”
Facebook’s inability to comprehend its own functioning took the hearing up to the edge of the metaphysical. At one point, the court-appointed special master noted that the “Download Your Information” file provided to the suit’s plaintiffs must not have included everything the company had stored on those individuals because it appears to have no idea what it truly stores on anyone. Can it be that Facebook’s designated tool for comprehensively downloading your information might not actually download all your information? This, again, is outside the boundaries of knowledge.
“The solution to this is unfortunately exactly the work that was done to create the DYI file itself,” noted Zarashaw. “And the thing I struggle with here is in order to find gaps in what may not be in DYI file, you would by definition need to do even more work than was done to generate the DYI files in the first place.”
The systemic fogginess of Facebook’s data storage made answering even the most basic question futile. At another point, the special master asked how one could find out which systems actually contain user data that was created through machine inference.
“I don’t know,” answered Zarashaw. “It’s a rather difficult conundrum.”
Update: September 7, 2022, 9:56 p.m. ET
This story has been updated to include a statement from Meta sent after publication.