Mechanisms for Governance Cooperation

Why We Need Inclusive Data Governance in the Age of AI

November 12, 2024

This essay is part of The Role of Governance in Unleashing the Value of Data, an essay series that considers aspects of data governance and the value of data.

The release of ChatGPT to the world in November 2022 resulted in multiple calls for urgent action on artificial intelligence (AI) governance. ChatGPT was a shock to the system in three ways.

First, it demonstrated — in consumer-friendly form — the expanding capabilities of AI, with clear implications that AI would soon be able to carry out a much wider range of tasks than had previously been thought. Concerns about the risks of AI went mainstream and started to hit the creative and professional services sectors that had previously seemed immune to automation.

Second, it highlighted the importance of a different kind of data: data we have an interest in because we produced it — our writing, art and code. Where previously data governance had a strong focus on privacy and personal data — data about us — generative AI has challenged previous notions of “fair use” and the implications of text and data mining exceptions to intellectual property rights.

Finally, it showed that big tech companies such as OpenAI, Microsoft and Google are able and willing to launch socially disruptive, and potentially dangerous, services into the world, placing governments in a position of catch-up. For politicians, it felt like history repeating itself. During a US congressional hearing on AI, Senator Richard Blumenthal said, “Congress failed to meet the moment on social media. Now we have the obligation to do it on AI before the threats and the risks become real” (Zorthian 2023).

In truth, the impacts of AI and data processing, more generally, have been with us for some time (for example, see Emily Bender’s [2023] blog post, “Talking about a ‘schism’ is ahistorical”). Much of this existing research concluded that the public should be part of data (and, by extension, AI) governance. For example, Jathan Sadowski, Salomé Viljoen and Meredith Whittaker (2021) wrote, “Everyone should decide how their digital data are used — not just tech companies.” The Ada Lovelace Institute (2021) has similarly called for public participation in the stewardship of data. The nuances of how to get participation and democratization right have been explored in critical papers from Abeba Birhane et al. (2022) and Johannes Himmelreich (2022).

However, despite this scholarship, much of the generative-AI-inspired attention on AI governance has been carried out through cordial agreements between governments and the big AI companies, rather than inclusive multi-stakeholder processes (Chatterjee 2023). For example, few civil society organizations were invited to attend the United Kingdom’s AI Safety Summit in November 2023; those that did released a statement saying:

Because only a small subset of civil society actors working on Artificial Intelligence issues were invited to the summit, these are the perspectives of a limited few and cannot adequately capture the viewpoints of the diverse communities impacted by the rapid rollout of AI systems into public use. Here, too, governments must do better than today’s discussions suggest. It is critical that AI policy conversations bring a wider range of voices and perspectives into the room, particularly from regions outside of the Global North. Framing a narrow section of the AI industry as the primary experts on AI risks further concentrating power in the tech industry, introducing regulatory mechanisms not fit for purpose, and excluding perspectives that will ensure AI systems work for all of us (AI Now Institute et al. 2023; author’s emphasis).

This lack of inclusion is mirrored at a more operational level. With some important exceptions, such as public and patient involvement and engagement around health data,1 organizations developing and deploying AI rarely involve the affected communities, the wider public or civil society in their data governance processes.

In this essay, the author explores the case for ensuring that governance of data, particularly in the context of AI, is inclusive. The author will first define what is meant by inclusion in data governance and illustrate how it manifests at different governance levels. Then the author will go on to explore a range of arguments for greater inclusion of both civil society and affected communities in data governance to support AI.

Defining Inclusive Data Governance

Data is sometimes described as the lifeblood of AI (Shubladze 2023). Machine learning draws patterns out from large quantities of training data, to make predictions, provide recommendations and, with the latest generative AI systems, create new data and content. The content, quantity and quality of training data have a significant impact on what AI systems can do and the biases they replicate.

Data governance is all about making and monitoring decisions about how data is collected, used and shared, with knock-on impacts on what AI systems can be built and how they work. Data governance decisions may be made at global, regional, national and local levels, including within individual organizations. For example, we might see regional discussions about limiting the use of facial recognition data, national decisions about which administrative data to use to replicate a census, and individual workplaces determining which documents to use to fine-tune customer service chatbots.

Data governance frameworks — spelling out how these specific decisions about data should be made and challenged, and any requirements around them, such as consultation and transparency — are similarly set at multiple levels, through multinational agreements, national regulation and organizational policies.

These decisions often end up being taken by a narrow community of actors, from closed-door intergovernmental agreements on global policy, to headteachers deciding on which education apps to share pupil data without reference to anyone else. Inclusive data governance processes involve multiple stakeholders, giving equal space in this decision making to diverse groups from civil society, as well as space for direct representation of affected communities as active stakeholders.

This links to, but is an idea broader than, the concept of multi-stakeholder governance for technology, which first came to prominence at the international level, in institutions such as the Internet Corporation for Assigned Names and Numbers and the Internet Governance Forum (Hofmann 2016). Some data sources and AI systems do operate at this international level: foundation models, and the data that is required to train them, are the obvious examples. There are a broad range of efforts at this scale: the AI Safety Summit series;2 the Hiroshima AI Process3 under the G7; the Global Partnership on Artificial Intelligence;4 and the UN AI Advisory Board,5 to name a few. An inclusive approach would mean both that the organization and membership of these groups, events and processes involve a diverse set of civil society organizations, and that they also hear directly from people affected by AI.

Many people and organizations working on social justice or specifically on data and digital rights have adopted the foundational principle of “nothing about us, without us,” popularized by the disability rights community.

However, many more data governance decisions have to be made outside this international context, such as the design of guidance by sector regulators; the selection of training data for fine-tuning by chatbot implementers; and the deployment of data-based systems in schools and workplaces. The same principle of including multiple stakeholders in discussion and decision making can and should still apply in these contexts.

The Democratic Principle

There is a strong democratic case for inclusive governance of data and AI, and specifically to expand the current set of people and organizations making decisions about data and AI to include those affected by these technologies.

Many people and organizations working on social justice or specifically on data and digital rights have adopted the foundational principle of “nothing about us, without us,” popularized by the disability rights community.6 The global Indigenous data rights movement has similarly made the case for Indigenous peoples and nations to not only have the right to govern data, but also to be able to access and use data for their own governance, as a matter of justice.7 These same arguments hold true for other communities.

Arguments for democratizing AI governance (Seger 2023) point to the power, largely unchecked, wielded by those developing AI technologies in the lab and deploying them in the field. This power manifests itself in the way AI labs have trained foundation models on writings, drawings and photographs without giving creators an opportunity for negotiation. It manifests in how little say we have in what data gets collected by digital services that are almost unavoidable — such as search engines, social media, online shopping and digital public services — and how that data is used and exchanged. And it manifests in the way that technology is rolled out, usually without consultation, in our schools, hospitals, workplaces and local communities.

Governments collect and steward substantial amounts of the most sensitive data there is, about people’s health, education, income, benefits, citizenship and much more.

Within democratic countries, some argue that governments — who are, after all, elected by the people — are able to act as representatives and advance public good in data governance decision making. However, these governments do not only act as democratic representatives and regulators, but they also wield power over their citizens and residents through data and AI, just as they do in authoritarian regimes. Governments collect and steward substantial amounts of the most sensitive data there is, about people’s health, education, income, benefits, citizenship and much more. Thanks to the unique power of the state, they also use this data to make life-altering decisions about both individuals and communities. Cases such as Robodebt in Australia8 or the Dutch child benefit scandal (Heikkilä 2022) are evidence of the damage this can cause.

For this reason, scholars and practitioners have been exploring other mechanisms for the exercise of democratic power — whether realized through direct democracy or through institutions including civil society, independent academia and journalism — to counter that of the companies and governments who control data and AI.

More Specific Arguments

While the general democratic case for including civil society and the public in data governance may appear self-evident to some — particularly those who do not hold power — it is not commonplace in practice, in part because the case for it has not been won.

Involving the public and civil society in decisions about data is not cost-free. Taking the steps that are needed to surmount the practical challenges, and skepticism about the utility of public involvement in a technical and technocratic field, frequently requires arguments that go beyond it being the right thing to do.

Policy makers are particularly concerned that costs and delays from enacting such measures might slow innovation down, diminishing the economic benefits of data and AI, particularly in an internationally competitive context. In this section, the author will therefore explore three specific arguments for different ways of democratizing data and AI that speak to the kinds of outcomes governments and companies care about.

Co-design Reduces Risk

The first argument the author will dig into is for stakeholder involvement in the design of data and AI systems to reduce risks and strengthen the marketplace.

Fitting uses of data and AI to what is expected by and acceptable to the public at the design stage — operating within the social licence for data use (Verhulst, Sandor and Stamm 2023) — reduces risks in many of the same ways as good user needs analysis9 or human rights impact assessment (Mantelero and Esposito 2021). At an organizational level, getting a wide range of stakeholders involved early:

  • reduces the risk of products and services functioning in ways that cause a backlash that may damage user retention or access to public services;
  • decreases the risk of wasting time and money developing products and services that are not fit for purpose or need to be withdrawn; and
  • lowers reputational risks arising from such backlashes, which may have adverse knock-on effects in a commercial context on share price, future investment or advertising revenue, and in a public sector context on trust in democratic institutions.

At a societal level, having a data ecosystem operating within the social licence also:

  • reduces the actual risk of harm to people and communities, and to public goods such as equality; and
  • creates a more trustworthy marketplace, enabling organizations to act with confidence and to rely on each other.

To give a couple of public sector examples of operating outside the social licence, in the United Kingdom, public backlashes to data sharing schemes such as the General Practice Data for Planning and Research scheme has led to both costly policy reversals and to patients opting out from sharing health data that is important for medical research.10 In the Netherlands, the child benefit scandal, where thousands of parents were falsely accused of fraud, brought down the government (Pascoe 2021).

There are also examples from the private sector of problems that could arguably have been caught sooner through broader consultation. In the development of generative AI, where different approaches to fine-tuning — training generative AI about what outputs are most acceptable — have led to the Tay chatbot becoming a “racist asshole” (Vincent 2016) and Google’s Gemini depicting multi-ethnic Nazis (Roth 2024). The Facebook–Cambridge Analytica scandal11 led to Facebook altering and withdrawing a number of its application programming interfaces, illustrating the problem of platform transience (Barrett and Kreiss 2019), where data and AI services can change rapidly in response to external pressure such as scandals, with knock-on effects for those who are using them directly, and those who are dependent on them.

The problem of how to fit the way a service behaves to what is expected and allowed by diverse publics is a challenge social media companies face when automating content moderation worldwide. For those seeking to develop general AI, the same problem is framed as “alignment” — ensuring that future autonomous AI systems protect, rather than destroy, humanity.

Some companies are experimenting with involving a more diverse set of people in making these value-laden ethical decisions. Meta instituted its Oversight Board12 in May 2020, bringing together experts from a range of countries and backgrounds to make judgments on specific content moderation cases and recommendations for future implementation. In May 2023, OpenAI (2023) announced a US$1 million “Democratic Inputs to AI” grant program, looking for scalable approaches to involving the public in aligning its models to humanity’s values, and resulting in 10 pilot projects (OpenAI 2024). Anthropic13 has partnered with The Collective Intelligence Project (2023)14 to create Collective Constitutional AI with a similar goal.

While these efforts can be seen as steps in the right direction, in all these cases the company involved retains the locus of power. As Mona Sloane (2024) puts it, “This is a thin form of participation, because participation is limited to existing designs with pre-existing purposes.” As a consequence, these initiatives can be seen cynically, as mechanisms to reduce the risk of being held accountable — either in popular opinion or by regulators — for things deemed unacceptable: an arm’s-length decision-making body, whether a board or a constituted public, can be blamed instead. And they can be seen as ways to reduce the risk of future regulation, which may be more challenging or costly for the organization to enforce.

Civil society is uniquely positioned to detect problems with technology early.

The risks for people, communities and society, but also for organizations operating within the data and AI marketplace and supply chain, can be reduced through greater inclusion earlier in the design process. But organizational self-interest will not motivate the scope or depth that is required. Reducing the reality and perception of “participation-washing” means requirements for consultation in the design of data and AI systems need to be robust and enforceable. Giving genuine power to those voices helps to ensure that those risks are taken seriously and can help ensure organizational efforts in this space both are, and are seen to be, legitimate.

Civil Society Empowerment Speeds Up Innovation

The second argument the author will examine is to speed up innovation by deferring and focusing regulation through an enhanced and empowered role for civil society.

Regulation of emerging technologies often falls on the horns of the Collingridge dilemma: “Attempting to control a technology is difficult…because during its early stages, when it can be controlled, not enough can be known about its harmful social consequences to warrant controlling its development; but by the time these consequences are apparent, control has become costly and slow” (Collingridge, quoted in Genus and Stirling 2018, 63).

This dilemma is particularly apparent for data and AI as they are general-purpose technologies, applied in a vast range of different sectors and contexts. While we might be able to point to some cross-cutting consequences of their adoption, such as on equality or the environment, many impacts are specific to particular types of data such as biometric data; technologies such as generative AI; or the specifics of a given application such as predictive policing. These technologies are also evolving rapidly, and continuing innovation requires regulatory responses that keep a similar pace.

Many policy makers are reluctant to adopt the precautionary principle15 around the development of data technologies — not allowing them to be made available until they are proven safe — because they fear this will hold back innovation and leave much of the value of data unrealized. Equally, most are now wary of the consequences of entirely permissionless innovation (West 2020) — allowing anything to be built unless and until it is proven harmful — and are cognizant of the need to respond early to emerging harms.

Collingridge’s prescription in this case is a middle ground of continuous monitoring and adaptation: an iterative approach to regulation that responds to emerging understanding as well as changing technological capabilities and societal norms.

In the areas where policy makers are shying away from regulating too early, avoiding a de facto permissionless innovation environment requires an approach to data governance that makes good use of civil society. Civil society is uniquely positioned to detect problems with technology early. When workers find themselves unfairly treated, they turn to labour unions; when benefit claimants struggle with new digital public services, they turn to organizations such as Citizens Advice;16 when consumers are worried about how their data is being used, consumer rights organizations step in. Civil society organizations are the first to hear about the frontline impacts of technology on people, and to start to gather evidence about emerging patterns.

This is one reason why it is so important to include diverse civil society organizations, including those directly in touch with people affected by data and AI, in the high-level multi-stakeholder governance of AI. Civil society is much more directly exposed to the here-and-now effects of data than governments or companies and can bring this experience to the discussion.

But civil society organizations are not just useful for monitoring and understanding the impacts of technology. Civil society action can also happen at speed and in a way that prevents overregulation. Organizations may simply self-regulate their use of data and AI in response to being named and shamed, but private and collective legal action is also essential. The degree to which existing regulation covers emerging impacts can be tested in court through strategic litigation. An empowered civil society can thus provide clarity around existing law and identify gaps that require changes to regulation.

This is not to diminish the role of regulators, professional bodies, industry consortia and the legislature. These organizations should be empowered and equipped to respond more quickly and keep up with the pace of change of technology. However, there are natural limits on the ability of these institutions to both be exposed to the impacts of technology on people and communities, and to respond in timely ways. Equipping civil society to act as the canary in the coal mine and alert the wider system to the need for and shape of further regulation, would benefit everyone.

Public Participation Drives Sustainable Adoption

The final argument the author will look at is direct public participation as a mechanism for driving digital, data and AI literacy, and to smooth the path for adoption of technologies with public benefit.

We all have a vested interest in realizing the value of data and reaping the potential societal benefits of AI. While some claims of these benefits may be overblown, it is certainly the case that data could be used for beneficial purposes, such as medical breakthroughs, improving energy efficiency, personalizing learning experiences and so on, as well as bringing economic benefits, such as increasing productivity, spurring innovation and driving economic growth.

Broad and equitable use of these technologies is essential for realizing these positive impacts, including an active market, and that requires people to adopt them. People cannot get the benefit of technologies — either in their day-to-day lives or at work — if they do not know how to use them appropriately, or if they are put off because of concerns about the dangers they might pose. Hence, many governmental data and AI strategies include a focus on improving public understanding, literacy and skills, and building trust.

Getting those affected by data and AI involved in its governance — at all levels — could be an important mechanism for addressing adoption challenges. Being actively involved in shaping the purpose and implementation of technology from the start helps to ensure that it meets the needs of those who will eventually use it, which helps to smooth the way to its adoption. The process of co-design creates a shared understanding of what technology is for, reducing unwarranted distrust about potential ulterior motives.

Being involved in deliberations about data and AI can build literacy and enthusiasm. Like a training course, deliberations provide a supportive peer environment for learning and exploration of a topic. But when charged with making decisions about data and AI, participants take an active role: asking experts questions and digging into areas of uncertainty to help them feel confident about their decision making.

When Connected by Data ran the People’s Panel on AI,17 for example, members of the panel were rapidly exposed to and engaged with various aspects of the impact of AI. They gained confidence, and many became both enthusiastic about its potential and realistic about its downsides. In addition, they took their learning back into their communities, discussing their experience with their family, friends and colleagues.

There are multiple routes for building literacy, trust and adoption. Governments have tended to focus on ones in which the public is a passive recipient: public information campaigns, training programs, audit schemes and kitemarks. A more active role in data and AI governance would be a complementary mechanism that would develop public understanding and adoption through active decision-making power.

Conclusion

The first step in achieving inclusive governance of data and AI is to make the case that it is necessary. While many see the involvement of those affected by technology in its design as a matter of justice, it is also helpful to be equipped with arguments that highlight the advantages of particular modes of engagement with data governance, particularly for stakeholders who may be otherwise unconvinced or unconcerned: the reduction of risk, support for adaptive regulation, and building public understanding and sustainable adoption.

Once these motivating cases are made, and won, the next step is to move on to the harder questions of how. As the examples above have illustrated, there are multiple forms of data governance decisions that need to be made, at multiple governance levels. We need to identify methods of involving multiple diverse stakeholders in these decisions that are practical, cost-effective and provide real power to everyone involved.

Works Cited

Ada Lovelace Institute. 2021. Participatory data stewardship. September. London, UK: Nuffield Foundation. www.adalovelaceinstitute.org/report/participatory-data-stewardship/.

AI Now Institute, Ada Lovelace Institute, Algorithmic Justice League, Alondra Nelson, Camille Francois, Center for Democracy & Technology, Centre for Long-Term Resilience et al. 2023. “AI Now Joins Civil Society Groups in Statement Calling For Regulation To Protect the Public.” AI Now Institute, November 1. https://ainowinstitute.org/general/ai-now-joins-civil-society-groups-in-statement-calling-for-regulation-to-protect-the-public.

Barrett, Bridget and Daniel Kreiss. 2019. “Platform transience: changes in Facebook’s policies, procedures, and affordances in global electoral politics.” Internet Policy Review 8 (4): 1–22. https://doi.org/10.14763/2019.4.1446.

Bender, Emily M. 2023. “Talking about a ‘schism’ is ahistorical.” Medium (blog), July 5. https://medium.com/@emilymenonbender/talking-about-a-schism-is-ahistorical-3c454a77220f.

Birhane, Abeba, William Isaac, Vinodkumar Prabhakaran, Mark Díaz, Madeleine Clare Elish, Iason Gabriel and Shakir Mohamed. 2022. “Power to the People? Opportunities and Challenges for Participatory AI.” In EAAMO ’22: Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, Article No. 6, 1–8. New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3551624.3555290.

Chatterjee, Mohar. 2023. “White House notches AI agreement with top tech firms.” Politico, July 21. www.politico.com/news/2023/07/21/biden-notches-voluntary-deal-with-7-ai-developers-00107509.

Genus, Audley and Andy Stirling. 2018. “Collingridge and the dilemma of control: Towards responsible and accountable innovation.” Research Policy 47 (1): 61–9. https://doi.org/10.1016/j.respol.2017.09.012.

Heikkilä, Melissa. 2022. “Dutch scandal serves as a warning for Europe over risks of using algorithms.” Politico, March 29. www.politico.eu/article/dutch-scandal-serves-as-a-warning-for-europe-over-risks-of-using-algorithms/.

Himmelreich, Johannes. 2022. “Against ‘Democratizing AI.’” AI & Society 38 (1): 1333–46. https://doi.org/10.1007/s00146-021-01357-z.

Hofmann, Jeanette. 2016. “Multi-stakeholderism in Internet governance: putting a fiction into practice.” Journal of Cyber Policy 1 (1): 29–49. https://doi.org/10.1080/23738871.2016.1158303.

Mantelero, Alessandro and Maria Samantha Esposito. 2021. “An evidence-based methodology for human rights impact assessment (HRIA) in the development of AI data-intensive systems.” Computer Law & Security Review 41 (1): 105561. https://doi.org/10.1016/j.clsr.2021.105561.

OpenAI. 2023. “Democratic inputs to AI.” OpenAI, May 25. https://openai.com/index/democratic-inputs-to-ai/.
———. 2024. “Democratic inputs to AI grant program: lessons learned and implementation plans.” OpenAI, January 16. https://openai.com/index/democratic-inputs-to-ai-grant-program-update/.

Pascoe, Robin. 2021. “Dutch government collapses in fall out from child benefit scandal.” DutchNews, January 15. www.dutchnews.nl/2021/01/dutch-government-collapses-in-fall-out-from-child-benefit-scandal/.

Roth, Emma. 2024. “Google explains Gemini’s ‘embarrassing’ AI pictures of diverse Nazis.” The Verge, February 23. www.theverge.com/2024/2/23/24081309/google-gemini-embarrassing-ai-pictures-diverse-nazi.

Sadowski, Jathan, Salomé Viljoen and Meredith Whittaker. 2021. “Everyone should decide how their digital data are used — not just tech companies.” Nature 595 (7866): 169–71. https://doi.org/10.1038/d41586-021-01812-3.

Seger, Elizabeth. 2023. “What Do We Mean When We Talk About ‘AI Democratisation’?” Centre for the Governance of AI (blog), February 7. www.governance.ai/post/what-do-we-mean-when-we-talk-about-ai-democratisation.

Shubladze, Sandro. 2023. “Unlocking The Data Revolution: AI’s Future Shaped by Public Data.” Forbes, July 5. www.forbes.com/sites/forbestechcouncil/2023/07/05/unlocking-the-data-revolution-ais-future-shaped-by-public-data/.

Sloane, Mona. 2024. “Controversies, contradiction, and ‘participation’ in AI.” Big Data & Society 11 (1): 1–5. https://doi.org/10.1177/20539517241235862.

The Collective Intelligence Project. 2023. “CIP and Anthropic launch Collective Constitutional AI.” The Collective Intelligence Project (blog), October 17. https://cip.org/blog/ccai.

Verhulst, Stefaan G., Laura Sandor and Julia Stamm. 2023. “The Urgent Need to Reimagine Data Consent.” Stanford Social Innovation Review, July 26. https://ssir.org/articles/entry/the_urgent_need_to_reimagine_data_consent.

Vincent, James. 2016. “Twitter taught Microsoft’s AI chatbot to be a racist asshole in less than a day.” The Verge, March 24. www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist.

West, Darrell M. 2020. “The end of permissionless innovation.” Brookings, October 7. www.brookings.edu/articles/the-end-of-permissionless-innovation/.

Zorthian, Julia. 2023. “OpenAI CEO Sam Altman Asks Congress to Regulate AI.” Time, May 16. https://time.com/6280372/sam-altman-chatgpt-regulate-ai/.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.