There has been a lot of attention in the last two years about the serious effects of AI and machine learning when they consume biased data and produce models that perpetuate those biases. Recent scandals with data consumption and privacy hinge on bucketing people to intentionally carry bias and stereotypes into model output. Whether prescriptive and malicious or unintentional and inadvertent, biases in data have real, compounded effects when processed by AI, and mitigating these effects is our responsibility as professionals. For AI to flourish as a technology that betters the world, we must develop best practices to address source data biases.

This session shares strategies for working with biased source data. All data sets are fundamentally biased by societal, industrial, organizational, and even individual beliefs (even if the data is “objective”, like sensor data!). Assessing source data to uncover and expose those biases is a critical first step, but any data driven approach (AI, data science, machine learning) requires data to drive it. We can’t escape biased data, and waiting for perfect data to be created is impossible. We can assess the data we have access to and implement strategies to mitigate the existing bias so that it does not carry into our models (and thus into our output).

Speaker: Danielle Leighton

Suggested Experience: This session is aimed at attendees with an interest in AI/machine learning, but not necessarily any experience. A technical background is definitely necessary (to follow machine logic and to understand how technical decisions are made when developing software generally) but examples and code snippets do not assume a background in AI.

Technologies Used: This session deals primarily with statistical and technical data manipulation techniques for AI/machine learning/data science. It will cover some more common AI models and data prep techniques. Code examples where necessary will be in Python, Julia, and R.

Keywords: AI, machine learning, data, bias

Assess & Address: Dealing with Biased Data