Bias in Big Data


Building off the success of the Bias in Big Data workshop, ISGMH’s CONNECT Research Program is leading a new monthly newsletter to connect ethically-minded scientists, researchers, students, and community members with insights, resources, and each other. This #BiasinBigData newsletter will promote the ethical and accurate use of data in understanding the health of populations by centering marginalized voices, amplifying leaders in ethical data science, breaking down silos, and highlighting action steps and resources that can be used today. Sign up here.


Bias in Big Data was a workshop organized by the CONNECT Research Program. The workshop sought to stimulate intersectional discussion about the role of bias in big data and to explore, in particular, how bias in data and data science impacts the health of sexual and gender minority populations. The workshop was hosted in Chicago and live streamed to ensure broad and inclusive participation at no charge.

Video recordings, presentation slides, and digital programs from the 2019 Bias in Big Data workshop can be found here.

In the name of efficiency, our society increasingly relies on data to guide all forms of decision making. This cost-effective, data-led decision making, particularly when guided by unsupervised analytical methods, is often assumed to be free of human bias. However, there is growing concern about the potential misuse of these methods to further oppress already marginalized populations. From hiring decisions, to predictive policing, to auto insurance premiums, poor black and brown populations have been shown to be disproportionately impacted across a wide variety of domains. Less is known however about the impact of these systems on sexual and gender minority (SGM) populations.

Human biases emerge in these systems through various mechanisms. First, data can only mirror the existing social world – therefore analytical techniques which utilize existing data to predict the future will inevitably replicate and often amplify existing biases. Furthermore, decisions about what data are collected and what questions are important enough to be asked are also shaped by societal biases. Finally, those learning, developing, and deploying data science techniques are often are rarely connected to the communities most harmed by these practices. What results are data systems which function in ways to not only replicate but amplify existing biases and disproportionately hurt the least powerful populations.

The Bias in Big Data workshop aimed to bring together a diverse group of scientists, students, and community leaders at the intersection of technology, data science, and health equity to discuss bias in big data, how bias impacts all marginalized populations, and how bias may specifically impact sexual and gender minority communities.

The workshop was organized by Dr. Michelle Birkett, Assistant Professor of Medical Social Sciences and Director of the CONNECT Research Program on Complex Systems and Health Disparities at Northwestern University. CONNECT’s research focuses on understanding how multi-level mechanisms drive health disparities in stigmatized populations, with an emphasis on approaches using big data and network science, and on applying complex, cutting-edge methods to advance health equity research.