Ethical Data Use In The Age of AI
As we enter the age of AI, we can’t afford to make the same mistakes of the past
The emergence of AI has been accompanied by key ethical debates, such as whether AGI will lead to humanity’s destruction. While we can endlessly debate hypothetical existential risks, what is certain is that we must work to make sure novel AI systems prioritize data privacy to minimize future ethical risks.
Right now, as we arrive in this new era of data-driven AI, we need to identify and mitigate the ethical risks in how AI models are designed and deployed. People’s data is valuable, and it must be protected in our AI-powered future.
How Privacy Extends Into Ethics
Any conversation on ethical data use needs to start with privacy – but it can’t end there either.
Preserving privacy is fundamentally critical to achieving ethical data use. Privacy is essential because it can protect against both unintentional and intentional ethical violations of data.
It’s easy to think of intentional violations: A company takes data you thought would be kept private and sells it to a company that wants to target you for ads or a company like Uber could take private rider data and display users’ current location.
But unintentional violations can be just as damaging. Consider any of the many times personal data has been publicly shared after a company is hacked and loses control of their information, like when all of Yahoo users’ private information was lost in a hack.
Whether intentional or unintentional, the damage remains. People lose agency over their information they thought would be kept private. Now, the expansion of AI presents a whole new list of privacy and, therefore, ethical concerns.
Examples of Ethics Risks
Our new age of data-driven AI is an extension of the era of big data and shares many similar ethical risks in how users’ data can be misused. These include:
Intellectual Property Violations: Generative AI models are testing the limits of intellectual property law as most of them have been trained on significant amounts of data with some level of copyright protection. Several lawsuits have already been filed against the groups behind generative AI image generators, such as Stable Diffusion and Midjourney, for training their models on trademarked images without the artist’s consent or permission.
Disclosing Sensitive Information: Generative AI data leaks have already occurred where private user information was disclosed. Samsung banned ChatGPT after the source code related to the company’s semiconductor business was leaked. ChatGPT also had an outage when users’ payment information could be viewed.
Biased Models: Just as with preceding big data models, AI models are prone to bias based on whatever data they’ve been trained on and exposed to. Generative AI models have already demonstrated stereotypes related to race and gender. As discussed in a panel on the privacy concerns of generative AI involving leadership from the Ethical Tech Project, the outputs of models need to be managed responsibly. Biased data creates biased models, and without proper technical infrastructure, biased data can persist in AI models even after it has been removed.
Key Considerations for Carrying out AI Ethics
During the rise of big data, the shared goal throughout Silicon Valley has been to collect as much detailed data from as many people as possible, even if you don’t have an immediate clear use for it. Tech companies deserve their own episode of the show “Hoarders” to expose the insanity of how they greedily stockpile user data.
Right now, there’s a race between AI companies to gather data to train generative AI models. We need to avoid the past failures of Silicon Valley to ensure this next generation of tools respects peoples’ privacy and safely handles data. The tech industry’s ethical standards should evolve alongside our technological capabilities.
We just announced our “Commitment to the Ethical Use of Data” so that companies can reference a clear set of ethical guidelines for how they use data. Businesses need to lead the charge in using data ethically, including companies building novel data-fueled AI applications. This will allow companies to avoid unethical practices and respect the privacy, agency, and dignity of individuals and their data.
How to Achieve the Ethical Use of Data
There’s no final destination for the ethical use of data, but that doesn’t mean there aren’t clear steps companies can take to promote widespread data stewardship, including implementing ethical data principles organization-wide.
If you’re interested in learning more about how to ethically use technology, in the coming weeks, we’re going to continue laying out the core principles outlined in our commitment – and how organizations can apply them to their work.
Here’s What We’re Reading On Ethical Tech This Week: