Last week, Google rightfully received coverage and criticism for releasing a dermatology app that didn’t work for people with darker skin. While, unfortunately, there’s nothing new about major tech companies releasing obviously racist machine-learning products, their growing footprint in medical technologies is a scale change — not in their technology, but in their impact. The medical professionals and care pathways the Google app sets out to “augment” are heavily regulated, continuously overseen by professional institutions and certification bodies, and subject to fiduciary obligations to their patients. By contrast, this app was based on a Nature article. Medical technologies are one example of the ways that technologies are increasingly giving advice to people — often advice that would otherwise be regulated — with no corresponding oversight, expertise or accountability.
When digital systems exchange data, they typically do so to communicate a fact, or a set of facts. And so, the first wave of internet regulation focused on protecting data as speech, giving rise to a number of broad protections, like the United States’ Section 230. But speech-based protections largely neglect an important type of communication: the representation of facts.
In law, a representation is a statement of fact delivered in a context with legal impacts, such as a courtroom or a contract. The way we regulate the representation of facts is based on context, because the impact of representations is also contextual. The way that we treat what people say under oath in court, for example, is very different from the way we treat phrases uttered to a therapist or to a friend. What separates a representation from other types of expression, like speech, is that the law assumes the person making the representation understands its seriousness, potential for impact and liability accordingly.
Digital systems, however, don’t usually distinguish how they share information or impact liability on the basis of context, which is where the broad protections particularly fall short. Unlike the Google dermatology app, for example, by the time a surgeon picks up a scalpel to operate on a person, that scalpel will be sterile, sharp and designed according to the medical industry’s standards. The medical industry regulates hardware because we understand how much surgery impacts a person’s life — and how important consistency of tooling is to high-impact service provision. Yet, when we design digital systems — even digital systems that impact the fundamental rights of millions of people — we often don’t hold those systems to any perceptible standard. Take, for example, the global rollout of contact-tracing apps — used by millions of people at a cost of hundreds of millions — which have yet to make a perceptible difference to COVID-19 containment. Perhaps worse, both commercial and “open” data licences grant even broader authorities, enabling the reuse of data without any real consideration for its context. That’s the data equivalent of allowing a surgeon to operate on you — or, more accurately, millions of people — using pretty much any knife they found on the street.
This disconnect between our legal systems and the digital systems they preside over is a fundamental problem for the ways we exercise (and protect) our rights. Today, digital systems built on poorly understood data make representations on behalf of individuals every day, but how to allocate responsibility for those decisions remains unclear, in no small part because data created at one source can be reused in an unpredictable and large range of contexts. To better address the potential harms of an increasingly digital society, policy makers addressing digital rights and governance should focus on ways to enforce laws that regulate contextual representations.
Every person impacted by digital systems, whether that be through a credit score or a bail recommendation algorithm, has fundamental rights that entitle them to hold those who are representing their interests digitally to some standard of integrity. Realizing those rights starts with being able to hold people responsible for the quality, character and quantity of representations they make on a person’s behalf. And with being able to differentiate the contexts in which data is exchanged.
Would a Fact, by Any Other Data, Be as True?
When you misrepresent a person, thing or fact, there are consequences, which depend on the context. For example, if you walk down an empty street and slander your former employer under your breath, there will be few consequences. If you make the same defamatory statement in a courtroom, the consequence will be vastly different. If you misrepresent the quality of a product so that someone buys it, that can be false advertisement or fraud. If you misrepresent a person, especially as their employee or agent, you can face a range of personal and professional consequences. But when individuals are misrepresented by data, they have little recourse — and struggle to hold those responsible to account for that misrepresentation. Even the most aggressive digital rights are primarily structured to create supply chains of consent — as if data were a commodity — and focus on how data is exchanged, instead of the likely impacts of its use.
The value of a digital representation, like currency, is worth what people believe it to be worth financially and factually. And data is increasingly used to justify why one approach to problem-solving is better than another: Whose credit score is the best at predicting risk? Which research should we invest in to cure which disease? Those questions, often, are answered with pools of data. There’s a lot to be gained (and lost) in the political contests over whose systems get to define value or truth at that scale. Scale, as scholar Helen Nissenbaum notes, can flatten and remove digital representations from their context, which, as scholar Jasmine McNealy notes, also removes the contextual protections designed to protect our fundamental rights. Said a different way, sharing data is often making a statement and, legally, context matters to how much weight we give statements.
Even the most aggressive digital rights are primarily structured to create supply chains of consent — as if data were a commodity — and focus on how data is exchanged, instead of the likely impacts of its use.
One of the earliest, and most publicized, examples of this trend is bail recommendation algorithms. Bail recommendation algorithms are designed to help judges set bail amounts by predicting a defendant’s “flight risk” — the likelihood that they’ll try to avoid court proceedings. While bail recommendation algorithms only produce a suggestion to judges, they have proven to reproduce historic bias in policing and the US justice system. Worse, they largely sidestep the typically high standards that law uses to ensure the integrity of facts and factors that influence a person’s freedom. These bail recommendation algorithms convert data to an affirmative representation: a prediction with consequences for people’s freedom. And they avoid accountability by only “informing” judicial opinions, as opposed to making them directly. The data-protection, privacy and rights-focused regimes that governments use to curb abuse in data markets fail to account for the harms in these systems.
Governments’ data-regulation regimes tend to focus on establishing the conditions for legal data exchange, as opposed to the accountability for individuals who use that data to make representations on our behalf. Ultimately, data becomes what we allow it to become: asset, resource, fuel for algorithms or, perhaps most commonly and significantly for our digital rights, a representation. For those whose freedom is in the balance during bail hearings, digital representations are a prediction they can’t contest, known to reproduce a bias against them.
Typically, data is used to produce representations of us that are tailor-made for particular contexts: as quantified citizens, consumers, job applicants, employees or patients. The low costs of reusing data enables the modelling of a growing number of objects, concepts and systems, and a growing market aimed at sharing data as a commercial activity, regardless of how it will be used to represent its subjects. Data can be represented in myriad ways: one person’s contact list might inform her credit score, or whether police sees him as a gang member, or how an advertiser classifies their political affiliation. Sometimes these representations have to be true, as in medicine, but in many cases, it only matters that people believe, or choose to believe, that they are.
Divorcing Data Creation from Context of Use
Of course, data quality is only one part of the design of a digital system — making it even harder to objectively understand its overall reliability. As a number of scholars have noted, the context of data’s creation is not only critical to how it should be used but also critical to defining the rights it represents. This can be where it gets complicated: a person producing data and making it available is not necessarily vouching for its accuracy in a specific context. And so, that context is often unknown or intentionally disregarded, because shedding context also reduces the amount of liability.
By divorcing the process of creating data from responsibility for the contexts in which that data is used, we sever accountability.
One important source of that disconnect is the push to maximize the reusability of data. That push is probably best represented by the open-data movement. Without getting into the open community’s philosophical debates, the goal of open-source licences is to reduce the barriers to reuse of data. Being accountable for the context of data’s use is, probably, the single biggest source of (necessary) friction possible. And so, while there are, inarguably, some contexts in which open data has served valuable public purposes, it can be exceptionally difficult to document the positive and negative impacts of openly reused data. By divorcing the process of creating data from responsibility for the contexts in which that data is used, we sever accountability.
And yet, we accept open data and ambiguously sourced data for use in a wide range of contexts that have implications for fundamental rights. Digitizing the process of making rights-affecting representations shouldn’t absolve those involved from accountability to the integrity of their work. Without that accountability, there’s very little pressure for the data to be correct at all, let alone any recourse for those impacted to seek justice.
The Fault in Our (Current) Data Rights
It is painfully obvious that data is often used to make rights-affecting representations. The fundamental requirements of laws that govern representation are, essentially, intent in context. In other words, when a person makes a statement that affects another person’s rights, the law basically judges whether the speaker meant to do so, based on a reasonable awareness of the likely impacts. The significant public and private investments in creating and reusing data, for the purpose of maximizing data’s value, are, often, in direct tension with law’s approach to assigning liability.
The two most common types of law referenced in public interest digital rights debates are privacy and data protection. Privacy law focuses on professional standards of care and tries to give the people affected the ability to limit others’ reuse of data. Data-protection law is about the integrity of data rights’ supply chains.
When a person brings a privacy or a data-protection case, they must prove that data about them was shared inappropriately or that the data system in question malfunctioned in ways that create liability, as in instances of negligence or fraud. Privacy and data-protection laws can set out fines and punish bad practice, but they rarely provide for appropriate redress to victims or reform to the system that created the harms in the first place. Both privacy and data-protection laws offer important foundations for digital rights, but have become focused, in their own ways, on creating an efficient operating environment for digital transformation, rather than on meaningfully addressing its harms.
And that is one area that digital rights work on representation rights could fill. One of the major problems in law is that we don’t have a very good system for making representation harm easy to observe and engage with, especially not at the scale of populations or databases. For example, if you slander a rich person in a way that costs them, the law can assess that harm, with clear procedures for identifying victim and perpetrator and for quantifying damage and restitution. But when, say, jail management software misrepresents the release dates of hundreds of inmates, extending their imprisonment for no legal reason, as has happened in Arizona, the liability is harder to assess. Falsely imprisoning one person is a crime, let alone hundreds of people. But even if the victims formed a class-action lawsuit, who is responsible? Is it the government for choosing the system? The software provider for failing to uphold an appropriate standard of service? How can a court offer justice to the wrongly imprisoned? The first step toward answering these questions is to connect those decisions about the creation, design and sharing of data to their impacts. And the best way to do that is legal liability.
The faster we transform the analogue into the digital, the harder it is to ignore the enormous amount of deliberate, accidental and acontextual misrepresentation this transformation enables. To compound these difficulties, companies use commercial protections, such as trade secrets, to obfuscate important details about their data supply chains, making it nearly impossible to attribute responsibility for representational harms. Similarly, whenever a company decides to reuse data, that reuse assigns different, potentially acontextual significance to that data, which affects its fitness for that context, the rights of those implicated, and the liability of the company using that data. There are a range of protections that can be used by multiple sources in the same digital supply chain, creating mutual cover for data producers, algorithm designers and digital service providers.
The faster we transform the analogue into the digital, the harder it is to ignore the enormous amount of deliberate, accidental and acontextual misrepresentation this transformation enables.
These gaps in our system aren’t, however, irreparable harms. There are a number of communities and bodies of law with experience trying to ensure that the use of data to make representations is accompanied by an appropriate level of care, even when that data use is implemented by multiple actors across a supply chain. There are practical contracting, legislative and litigation tools we can use to broaden the applicability of representational liability to digital systems. We can move from open-ended data licensing defaults to specific uses and limitations. Law makers can invest in building direct rights of action and invest in civil justice systems, toward enabling people to represent themselves. Strategic litigation is a time-tested tool to establish foundational rights. These legal infrastructure tools are important not just for digital rights, but also to ensure the integrity of the increasingly common digital decision-making systems impacting our fundamental rights.
Law’s Approach to Establishing Truth
Law has a system for establishing the truth of a representation that is hardly mentioned in the digital rights discussion: evidence. Evidence law is the body of law that determines what kinds of information should be allowed to influence how we make decisions about people’s fundamental rights. Evidence law directly addresses how we attribute significance to representations of all types. And, importantly, it also establishes accountability for misrepresentations with impact.
If you can submit something as evidence in court, the legal system is essentially saying, “This data or representation is credible enough to affect a person’s fundamental freedoms.” Evidence law, in other words, helps establish the procedural bar for determining truth in fundamental rights systems and is a useful bar for determining the contextual truth of data. That’s not to say that law has cracked epistemology; rather, law has tools for assigning value and liability to the impact of a representation.
Evidence law offers at least three, critical foundations for designing data rights:
- Evidence law requires a chain of custody. Lawyers introducing evidence need to be able to document how a piece of evidence travelled from the scene of an incident to a courtroom. The same is true of a piece of knowledge: most courts only let people testify to things they know or observed directly.
- Courts provide accountability for misrepresentations. In most places, deliberately lying in court is a crime. The law broadly recognizes that when speech becomes a representation and has significant impacts on a person’s rights, the speaker bears accountability for its truth.
- Evidence law is an implemented system that has had to establish “truth” across a broad range of facts and information sources. The operational structures of courts aren’t a new analogy for the ways that digital platforms manage specific types of problems, such as content moderation on social media platforms or small-claims adjudication on eBay. But these structures are applicable to a much broader range of digital rights than is currently recognized or implemented in digital systems.
These foundations are far from a complete blueprint for how to adjudicate rights in digital spaces, but they are important design requirements for those trying to build equity or trust in digital systems and the impacts of data use. The fastest way for us to lose the potential of digital systems is to fail to hold them accountable to the basic integrity we expect of everything else.