The World Well-Being Project has developed a range of methods for working with linguistic information available through social media, which has resulted in numerous intriguing insights as psychological phenomena are studied at scale. Such studies have tested very large numbers cross-sectionally or across short time periods. Could similar techniques be used to study people over long time periods?
The National Child Development Study is a UK based study that has followed a cohort of over 17,000 individuals prospectively across their lives, with 9 measurement occasions extending from birth through age 55. At age 11, children wrote essays describing their imagined life at age 25. Combining this large nationally representative sample, the essays written in childhood, and machine learning techniques, we examined the value of linguistic information in understanding lifespan development at large scale. We transcribed about 10,500 essays, extracted numerous linguistic features, and examined how well we could predict 4 outcomes (subjective and objective physical and mental health, physical activity, social mobility, and cognitive functioning), measured across 5 decades.
The results illustrate potentials for and challenges of combining interdisciplinary research, multiple types of data, and big data approaches.