“Luke is the epitome of someone you want to know. He is a leader. A mentor. And just a good person. While Luke is an expert data scientist, he also manages to be well versed in marketing and research and is creative to boot. Luke was very generous with his time helping me learn some programming and not to mention, very patient. On the other hand, he can go to a networking event and you would think he ran the show-- he is dynamic and his optimism is contagious. I can only hope to cross paths with Luke again soon!”
Experience
Publications
-
PREDICTING INDIVIDUAL WELL-BEING THROUGH THE LANGUAGE OF SOCIAL MEDIA
Biocomputing 2016
We present the task of predicting individual well-being, as measured by a life satisfaction scale, through the language people use on social media. Well-being, which encompasses much more than emotion and mood, is linked with good mental and physical health. The ability to quickly and accurately assess it can supplement multi-million dollar national surveys as well as promote whole body health. Through crowd-sourced ratings of tweets and Facebook status updates, we create message-level…
We present the task of predicting individual well-being, as measured by a life satisfaction scale, through the language people use on social media. Well-being, which encompasses much more than emotion and mood, is linked with good mental and physical health. The ability to quickly and accurately assess it can supplement multi-million dollar national surveys as well as promote whole body health. Through crowd-sourced ratings of tweets and Facebook status updates, we create message-level predictive models for multiple components of well-being. However, well-being is ultimately attributed to people, so we perform an additional evaluation at the user-level, finding that a multi-level cascaded model, using both message-level predictions and userlevel features, performs best and outperforms popular lexicon-based happiness models. Finally, we suggest that analyses of language go beyond prediction by identifying the language that characterizes well-being.
-
Psychological Language on Twitter Predicts County-Level Heart Disease Mortality
Psychological Science
Hostility and chronic stress are known risk factors for heart disease, but they are costly to assess on a large scale. We used language expressed on Twitter to characterize community-level psychological correlates of age-adjusted mortality from atherosclerotic heart disease (AHD). Language patterns reflecting negative social relationships, disengagement, and negative emotions—especially anger—emerged as risk factors; positive emotions and psychological engagement emerged as protective factors…
Hostility and chronic stress are known risk factors for heart disease, but they are costly to assess on a large scale. We used language expressed on Twitter to characterize community-level psychological correlates of age-adjusted mortality from atherosclerotic heart disease (AHD). Language patterns reflecting negative social relationships, disengagement, and negative emotions—especially anger—emerged as risk factors; positive emotions and psychological engagement emerged as protective factors. Most correlations remained significant after controlling for income and education. A cross-sectional regression model based only on Twitter language predicted AHD mortality significantly better than did a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking, diabetes, hypertension, and obesity. Capturing community psychological characteristics through social media is feasible, and these characteristics are strong markers of cardiovascular mortality at the community level.
-
The Online Social Self: An Open Vocabulary Approach to Personality
SAGE Journals
Objective: We present a new open language analysis approach that identifies and visually summarizes the dominant naturally occurring words and phrases that most distinguished each Big Five personality trait. Method: Using millions of posts from 69,792 Facebook users, we examined the correlation of personality traits with online word usage. Our analysis method consists of feature extraction, correlational analysis, and visualization. Results: The distinguishing words and phrases were face valid…
Objective: We present a new open language analysis approach that identifies and visually summarizes the dominant naturally occurring words and phrases that most distinguished each Big Five personality trait. Method: Using millions of posts from 69,792 Facebook users, we examined the correlation of personality traits with online word usage. Our analysis method consists of feature extraction, correlational analysis, and visualization. Results: The distinguishing words and phrases were face valid and provide insight into processes that underlie the Big Five traits. Conclusion: Open-ended data driven exploration of large datasets combined with established psychological theory and measures offers new tools to further understand the human psyche.
-
Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach
PLOS ONE
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on…
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase ‘sick of’ and the word ‘depressed’), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive ‘my’ when mentioning their ‘wife’ or ‘girlfriend’ more often than females use ‘my’ with ‘husband’ or 'boyfriend’). To date, this represents the largest study, by an order of magnitude, of language and personality.
-
Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach
ACL Anthology
Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings.
We characterize the types of errors that occur using…Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings.
We characterize the types of errors that occur using the word count approach, and find lexical ambiguity to be the most prevalent. We then show that one can reduce error with a simple refinement to such lexica by automatically eliminating highly ambiguous words.
The resulting refined lexica improve precision as measured by human judgments of word occurrences in Facebook posts. -
Toward Personality Insights from Language Exploration in Social Media
2013 AAAI Spring Symposium
Language in social media reveals a lot about people's personality and mood as they discuss the activities and relationships that constitute their everyday lives. Although social media are widely studied, researchers in computational linguistics have mostly focused on prediction tasks such as sentiment analysis and authorship attribution.In this paper, we show how social media can also be used to gain psychological insights. We demonstrate an exploration of language use as a function of age…
Language in social media reveals a lot about people's personality and mood as they discuss the activities and relationships that constitute their everyday lives. Although social media are widely studied, researchers in computational linguistics have mostly focused on prediction tasks such as sentiment analysis and authorship attribution.In this paper, we show how social media can also be used to gain psychological insights. We demonstrate an exploration of language use as a function of age, gender, and personality from a data set of Facebook posts from 75,000 people who have also taken personality tests, and we suggest how more sophisticated tools could be brought to bear on such data.
-
Characterizing Geographic Variation in Well-Being using Tweets
AAAI Conference on Weblogs and Social Media (ICWSM)
The language used in tweets from 1,300 different US counties was found to be predictive of the subjective well-being of people living in those counties as measured by representative surveys. Topics, sets of co-occurring words derived from the tweets using LDA, improved accuracy in predicting life satisfaction over and above standard demographic and socio-economic controls (age, gender, ethnicity, income, and education). The LDA topics provide a greater behavioural and conceptual resolution into…
The language used in tweets from 1,300 different US counties was found to be predictive of the subjective well-being of people living in those counties as measured by representative surveys. Topics, sets of co-occurring words derived from the tweets using LDA, improved accuracy in predicting life satisfaction over and above standard demographic and socio-economic controls (age, gender, ethnicity, income, and education). The LDA topics provide a greater behavioural and conceptual resolution into life satisfaction than the broad socio-economic and demographic variables. For example, tied in with the psychological literature, words relating to outdoor activities, spiritual meaning, exercise, and good jobs correlate with increased life satisfaction, while words signifying disengagement like `bored’ and `tired’ show a negative association
Recommendations received
3 people have recommended 🌉 Luke
Join now to viewOther similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content