Symposium 6th World Congress on Positive Psychology 2019

Debiasing Big Data Predictions And Measurements  (#97)

H. Andrew Schwartz 1 , Peggy Kern 2 , Sal Giorgi 3 , Dirk Hovy 4 , Lyle Ungar 3
  1. Stony Brook University (SUNY), Stony Brook, NY, United States
  2. University of Melbourne, Melbourne, Australia
  3. University of Pennsylvania, Philadelphia, PA, USA
  4. University of Bocconi, Milan, Italy
Big Data, such as social media and other online platforms, are increasingly being used to measure (i.e., predict) psychologically-relevant constructs. For example, Twitter has been used to measure community health, and Facebook has been used to derive personality traits and well-being scores. However, despite the interest int he application of such predictions, recent work has noted that big data techniques can pick up on biases in favor or against certain classes or attributes of people. Here, I will discuss the ways in which predictive models become biased, including biases familiar to social science such as selection bias -- the sample of population on social media has different socio-demographic distributions than communities targeted for measurement, but also those emerging from machine learning techniques such as label bias -- predictive models fit to data with bias “built-in'' due to annotations or historical norms, and over-amplification -- when the labels are not biased, but statistical techniques can over-discriminate on user-attributes. I will systematically discuss each of these biases, providing illustrative examples, a mathematical formula for assessing predictive bias, and show that big data predictions can be made more equitable.