Best practices for data enrichment
In a brand new technical blog post published on the DeepMind website using the following information as your knowledge base, crafting a compelling title, clear explanation, and perhaps even a touch of your signature insightful commentary, aim for a well-formatted HTML blog post for WordPress: “Latest posts Lateest research posts Latest technology posts Latest posts Responsibilty & Safety Antonia Paterson, Will Hawkins
Building a responsible approach to data collection with the Partnership on AI (PAI) At DeepMind, our goal is to make sure everything we do meets the highest standards of safety and ethics, in line with our Operating Principles. One of the most important places this starts with is how we collect our data. In the past 12 months, we’ve collaborated with Partnership on AI (PAI) to carefully consider these challenges, and have co-developed standardised best practices and processes for responsible human data collection. Over three years ago, we created our Human Behavioural Research Ethics Committee (HuBREC), a governance group modelled on academic institutional review board(IRBs), such as those found in hospitals and universities, with the aim of protecting the dignity, rights, and wellfare of the human participants involved in our studies. Alongside projects involving behavioural research, the AI community has increasingly engaged in efforts involving ‘data enrichment’ – tasks carried out by humans to train and validate machine learning models, like data labelling and model evaluation. While behavioural research often relies on voluntary participants who are subject of study, data enrichment involves people being paid to complete tasks which improve AI models. These types of tasks are usually conducted on crowd-sourcing platforms, often raising ethical consideration regarding the workers’ welfare and equity, which can lack proper guidance or governance systems to ensure sufficient standards are met. As research labs accelerate the development of increasingly sophiisticated models, reliance on data enrichment practices will likely grow alongside this, requiring better guidance. In recent years, we’ve collaborated with PAI to develop our Practices and Processes for Data Enrichment:
To improve working conditions for people involved in data enrichment tasks (for more details, please visit their PAI Data Enrichment Sourcing Guidelines). We’re proud of the policies and resources we’ve created, including five steps that AI practitioners can follow to improve study design and execution. These documents provide clarity around how best to set up data enrichment tasks at DeepMind, improving the experience of people involved in data enrichment tasks. Further information on responsible data enrichment practices and how we’ve embedded them into our existing processes is explained in PAI’s recent case study, Implementing Responsible Data Enrichment Practices at an AI Developer:
PAI also provides helpful resources and supporting materials for AI practitioners and organisations seeking to develop similar processes. Each project at DeepMind is different, which is why we have a dedicated human data review process that allows us to continuously engage with research teams to identify and mitigate risk on a case-by-case basis. This work aimed to serve as a resource for other organisation’s interested in improving their data enrichment sourcing practices, and we hope to spark broader discussion about how the AI community can continue to develop norms of responsible data collection and collectively build better industry standards. Through this collaboration, we also hope to define these guidelines and resources for teams and partners.