Social Media Information Can Predict a Wide Range of Personality Traits and Attributes | National Institute of Information and Communications Technology

Highlights

Wide personality traits and attributes such as extraversion and IQ were predicted from Twitter information
Network and linguistic information from Twitter usage predicts social personality and mental health
The findings could lead to new technologies for mental health diagnostics and personalized nudges

Abstract

Principle Investigator HARUNO Masahiko and Dr. MORI Kazuma at the Center for Information and Neural Networks (CiNet), the National Institute of Information and Communications Technology (NICT, President: TOKUDA Hideyuki, Ph.D.), report the use of machine learning to analyze behavior on Twitter and predict a wide range of personality traits and attributes such as intelligence and extraversion. Specifically, the study uses component-wise gradient boosting to demonstrate that network features, such as the number of Tweets and the number of likes, and word usage on Twitter are predictive of social (e.g., extraversion) and mental health (e.g., anxiety) personalities, respectively. This approach may provide a new way for mental health diagnostics and personalized nudges.

The new study was published in Journal of Personality online on Thursday, August 20, 2020.

Background

Social media services (SNS) have quickly become universal tools for communication. Previous research has shown that information about Facebook and Twitter use can reveal basic and course personality traits based on the Big 5. However, which types of SNS information can be used to pinpoint specific personality traits and attributes are unknown. There is growing interest about what personality traits and attributes can be predicted by analyzing SNS information and how accurately that information reflects the user.

Achievements

The study by Dr. MORI and Principle Investigator HARUNO discovered that a wide range of personality traits and attributes can be predicted by analyzing four different types of users’ behaviors on Twitter (i.e., network features, time, word statistics, and word usage).

A statistical analysis found significant correlations between measured personality and attribute scores and predicted ones, with correlation coefficients around 0.25. This value is not sufficient for determining an individual’s personality traits precisely, but with a large enough population sample, this technology can provide informative results.

The study collected social media information from 239 participants (156 men, 83 women; average age 22.4 years old) who also took personality tests that measured 24 personality traits and attributes (52 subscales). Of the 52 subscales, the Twitter information could be reliably used to predict 23 of them. Figure 2A showcases a positive correlation (correlation coefficient = 0.44) between the measured and predicted Big 5 extraversion scores based on a 10-fold cross-validation procedure done 10 times (Bonferroni corrected p value of 0.05/52).

Figure 2 Predicion of a wide range of personality traits and attributes.
A: measured and predicted scores of Big5 extraversion. B: predictions from network infotmation.
C: prediction from word statistics. D: prediction from word usage (bag of words).
Performance was evaluated by correlation coefficinets between measured personality and attribute scores and predicted ones. Solid, dashed and dotted lines show p = 0.05/52, p = 0.01/52, and p = 0.001/52, respectively.
[Click picture to enlarge]

The analysis revealed that several social personality traits such as extraversion, empathy and autism could be predicted from network features (Figure 2B). Other personality traits such as socioeconomic status, smoking/drinking, and even depression or schizophrenia were predictable from the language usage features (Figure 2C and D). Prediction from time was more difficult to correlate with measured personalities, but did show a significant correlation with intelligence and social value orientation.

Future Prospects

We are expanding the analysis to thousands of subjects. The method described in this study could be used to for mental health diagnostics and personalized nudges to act on people’s behaviors. It will also give insight on the neural mechanisms underlying individual differences in personality traits.

Paper details

Journal: Journal of Personality

DOI: 10.1111/jopy.12578

URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/jopy.12578

Title: Differential ability of network and natural language information on social media to predict interpersonal and mental health traits

Authors: Kazuma Mori, Masahiko Haruno

Funders

This research was partly funded by the Japan Science Technology (JST) CREST program and a KAKENHI Grant-in-Aid (17H06314) to MH.

Appendix

Details of the experimental findings

In this study, 239 participants (156 men, 83 women; average age 22.4 years old, standard deviation 3.7) took a test to measure 24 personality traits (52 subscales). The participants also had to have a minimum amount of Twitter activity that was confirmed not be from a robot or advertiser.

Figure 3 shows the 4 types of information extracted from the participants’ Twitter activity: network features, which includes the number of Tweets, Replies and Retweets; time, which includes the frequency and time of day and month of the activity; word statistics, which includes the number of words, number of characters used, and frequency of positive/negative words; and word usage, which converts text into numerical vectors. Each information type was used to predict personality traits.

The component-wise gradient boosting algorithm (CGB) was used for the prediction. CGB has been applied to many real-world problems.

Personalities traits and attributes

Data for the 24 personality traits and attributes (52 subscales) was collected from an online system and divided into 8 groups:

mental health: schizophrenia (3 subscales), delusion (4 subscales), obsessive-compulsive disorder (6 subscales), psychopathy (2 subscales), Machiavellianism (1 subscale), depression (2 subscales), anxiety (2 subscales), and stress (1 subscale)

behavioral economics: socioeconomic measures (1 subscale), preferred coworker measures (1 subscale), social value orientation (3 subscales), risk aversion (1 subscale) and time discounting (1 subscale)

social: empathy (4 subscales) and autism (5 subscales)

inhibition/activation: behavioral inhibition system (1 subscale) /behavioral activation system (3 subscales)

Big 5 personality: extraversion, neuroticism, agreeableness conscientiousness and openness (1 subscale each)

intelligence: fluid and verbal intelligence (1 subscale each)

life satisfaction: happiness and self-esteem (1 subscale each)

drink&smoke: alcohol and cigarette consumption (1 subscale each)

Predictions from network features

Figure 3A shows which of the 52 subscales could be predicted from network features. The highest correlation was shown with extraversion (Big 5) and then with empathy and autism. Figure 3B exemplifies the positive correlation between the predicted and measured scores for extraversion (correlation coefficient = 0.44).

Figure 3C summarizes which markers of the network features predicted which personality traits. Favorited (frequency of being favorited) correlated positively with verbal intelligence and negatively with extraversion, while Reply network (the total number of users who a subject replied) correlated positively with extraversion and empathy but negatively with autism and schizophrenia.

Figure 3 Predictions from network infotmation.
[Click picture to enlarge]

Predictions from time

Figure 4 shows which of the 52 subscales could be predicted from time. Time showed a significant correlation with social value orientation and verbal intelligence. The personality traits that could be predicted from time information was small, possibly because the participants were university students who showed similar life patterns.

Predictions from word statistics

Figure 5 shows which of the 52 subscales could be predicted from word statistics. Figure 5A shows that mental health factors such as schizophrenia and anxiety had high correlation, as too did subscales for intelligence. Figure 5B shows the features which contributed to predictions. The standard deviation of the sentence length could be used to estimate schizophrenia and delusion. Further, the proportion of positive/negative words could be used to estimate a number of traits including intelligence, anxiety, depression and autism, where positive/negative words were classified based on the Affective Norms for English Words (ANEW). The words were translated into Japanese using WordNet. 10 native Japanese then evaluated the words, with an agreement from at least 9 being required to categorize the word as positive or negative.

Predictions from word usage

For the word usage analysis, words and two-word phrases (two adjacent words) used by at least 25% of the participants were included and binarized (given a value of 0 or 1). Figure 6 shows the results of the analysis. Drink&smoke could be correlated, as could mental health and intelligence subscales. Figure 6B shows words contributing to the predictions for obsessive-compulsive disorder and alcohol consumption: tense and imminent words were related to obsessive-compulsive disorder, while alcohol consumption showed an association with words such as “drunk”, “eat”, “last train” and “one hour”.

Conclusion

The study shows that Twitter information can predict a wide range of personality traits and attributes. Each type of information showed more reliability for certain personality traits and attributes, with network features revealing social traits, and word statistics correlating more with mental health. Intelligence, on the other hand, was predicted from all four information types. At this time, the prediction performance is enough to provide informative statistics, but not sufficient to determine individual personality traits and attributes precisely.

Note:

An ethics committee at the NICT approved of all procedures of the study.

Glossary

Nudge

A way to change behavior without imposing prohibitions or high costs on the individual based on human behavioral and cognitive biases such as social norms and risk aversion. Nudges are typically used to encourage people to follow prosocial and environmental/medical behaviors. Traditionally, a nudge assumes that all individuals receive the same information, but could be personalized by using information on personality traits and attributes.

Back to contents

Big 5

The most commonly used model in experimental psychology to measure basic personality traits. It consists of five factors: openness, honesty, extraversion, cooperation and neuroticism. The score of each factor is decided by questionnaires such as the Revised NEO Personality Inventory Test or the Big 5 Personality Test. Each factor can be broken down into more detailed factors.

Back to contents