The promise and peril of the rapid expansion of genomic datasets

Credit: Gio.tto on Shutterstock.

In the twenty years since the Human Genome Project first sequenced a complete genetic profile, the quantity and quality of genomic data and related technologies have improved astronomically. This data is a gold mine. Genomic data hold enormous promise, both in terms of improving personal health outcomes and population-wide public health innovations and research. 

But collecting genomic data is also a double-edged sword. The roughly four million genetic mutations within each of us contain intimate, personal information with the power to save lives or to be used against us. So the question becomes: Who can be trusted with the world’s genomic data? And what rules should govern its collection, access, and use? 

Thanks for reading The Data Values Digest! Subscribe for free to receive new posts and support our work.

In this post, I lay out the landscape of genomic data as it stands, including the global inequities inherent to these systems, and propose a framework to spark conversations around ethical guidelines for genomic data access, management, and use. 

The unequal access and ownership of genetic data

Currently, a small number of well-funded research institutes concentrated in rich countries house and control most of the genomic data in the world. For example, the highly successful COVID-19 Host Genetics Initiative (HGI) is made up of researchers from 35 countries who recently uploaded their genomes to either the European Genome-phenome Archive (EGA) or NHGRI AnVIL, which then aggregated the data, analyzed it, and made it available for research via data access committees. The resulting published paper listed many contributors, but the data was severed from the patients providing the samples to de-identify it and expatriated from its country of origin. HGI’s subsequent Long COVID initiative has had difficulty establishing “re-consent” to use this data, resulting in fewer participating countries. As of December 15th, of the one million genome controls in the HGI Long COVID study, none is from Africa–a stat that’s emblematic of widespread inequities in this field. 

Amidst an emphasis on the utility of genomic data during the pandemic, the World Health Organization emphasized the importance of closing the gap between rich and poor countries in the availability and use of genomic technologies to achieve equitable global health outcomes: “Human genomics research and related biotechnologies has the potential… to reduce global health inequalities by providing developing countries with efficient, cost-effective, and robust means of preventing, diagnosing, and treating major diseases that burden their populations.” 

New and cheaper technology to unlock the potential of genomic data “makes the rapid adoption of genetic testing inevitable”—but not necessarily in an equitable or human-centered way. What cannot and must not be lost in this push is the people who submit their genomic data. They are the true owners of their data and have the most to gain—and lose—by its use and misuse.

In resource-strapped environments, countries face barriers to establishing genomic data systems and ethical guidelines to govern their use.

As genomic technology has become cheaper and more widely available, countries have started to develop their own legal frameworks—along with the technical capacity to enforce them—to safeguard data access and usage. “Federated” data—or decentralized data storage and management—is widely seen as a solution for sharing genomic research while respecting each country’s data sovereignty. 

However, a crucial step in the process—namely building infrastructure for genomic data collection, analysis, and sharing—is thwarted by lack of resources. Funding to build regional genomics capacity has been slow to materialize, and cost is a prohibiting factor for low- and middle-income countries seeking to harness their population’s genetic data. 

This places researchers and agencies in resource-strapped environments in a difficult position: Seek grant funding and partner with a select few organizations in rich countries that will ultimately own and manage data collected from people in your country? Or forego (or delay) the benefits of genomic data collection and innovation? 

Unfortunately, the monetary value of personal genetic data and the difficulty and cost of implementing genomic technologies means that governments and other agencies face pressure to trade genomic data in exchange for grant funding. This means that people who hand over their genetic data have little say in what happens to it or how it is used. 

Advocates, researchers, and policymakers need frameworks for re-imagining the current global system of genomic data management and ethical guidelines that place people at the center of these systems.

A framework for democratizing personal genomic data

The Eleanor Genome Protocol, in a nod to Eleanor Roosevelt’s contribution to the Declaration of Human Rights, is intended to further the conversation around how we should view and manage genomic data. Building on groundbreaking work by Indigenous data governance advocates, this framework is based on the idea that a genome represents a digital “person” and that, therefore, human rights principles should extend to genomic data management. 

The Eleanor Genome Protocol  – Human Data with Human Rights

  1. Prerequisites for human genomic data research to ensure authentic consent. People must have their basic needs met and access to healthcare before their genomes are used in research. Consent acquired from people in political, economic, or health distress cannot be construed as legitimate consent for genomic research purposes. 

  2. Democratically-elected, trusted representatives. Before starting a genomics  initiative, constituents should be asked to identify whom they trust to host their genomes. These trusted entities could be patient advocacy groups, churches, or any entity selected by the population to represent their interests. The entities then receive funding to host the raw genome datasets and are subject to audits to prove only authorized use. 

  3. One genome, one vote. To democratize information systems, people should be able to “vote” by revoking access to their data (an idea espoused by UNESCO nearly 20 years ago). As soon as a genome is assembled by a lab, the person providing the sample would receive an email allowing them to specify a guardian for hosting the data and representing their interests.

  4. Systems of checks and balances of power. A key component of a system with checks and balances of power is that access to a person’s genome be revocable. Also, within each country, multiple entities should have to compete for the public’s trust to host genomes (with people able to move their genomes to a different hosting entity if needed).

  5. The right to seek digital refuge and be forgotten. Human institutions are fragile. Individuals must have the right to seek digital refuge by migrating their genome storage to a different entity or even country during times of instability. The right to be forgotten applies to human genomes. The ability to destroy one’s genomic data should be readily accessible. Genomic initiatives must do no harm. 

This protocol is just a starting point. More conversations and ongoing dialogue are necessary to ensure that people are protected in sharing their genomic data in ways that benefit and do not harm them. Harnessing the power of new technologies and enormous genomic datasets will have multiplier effects to improve healthcare in low- and middle-income countries. These types of conversations are critical to ensuring such systems develop with the interests of the providers of such data in mind. 

Countries need tools to build their own genomics capacity that respect their data sovereignty without incurring data debts.

Scientists and advocates are working to solve the technical and ethical issues that are barriers to wider use of genomic data. These are people like Alphonse Mugenzi, from Stakeholder’s Rights Initiative in Development of Governance SRIDG-Rwanda, who is developing certification standards for entities to work with Rwanda’s genomic data. 

Alphonse and others are also using existing earth mapping software to map and manage genomes in-country to protect stakeholders rights. (Full disclosure, my company DNA Compass is a partner on these projects.) Winfred Gatua from Ubuntu Genome in Kenya and Samson Adegbe from GeneMap in Nigeria are already deploying cross-border genome mapping with embedded principles of revocable access between their two countries. They’ve created a consortium, Genome Earth, to enable the ethical exchange of African genetic information within a human rights context. 

This work represents concrete steps to shift current power structures such that governments can ensure benefits and protections for their populations and also make genomic data available to researchers and other relevant partners around the world for the greater good. 

Genetic data is unique both in terms of the level of intimate detail it reveals about us and in its capacity to help or cause harm. Ethical frameworks are urgently needed in this rapidly-developing field to rebalance power in genetic data and ensure that people reap its benefits and are protected from harm. 

Who can be trusted to manage the world’s genomic data? Perhaps the better question is what can we build so that trusted entities emerge that anchor human rights to human data?

Thanks for reading The Data Values Digest! Subscribe for free to receive new posts and support our work.

A note from Digest editors: We’re wishing you and yours a joyful holiday season. The Digest will return on January 9th. In the meantime, stay in touch by emailing

Related Articles


Your email address will not be published. Required fields are marked *