diff --git a/_posts/2023-11-08-suscep.md b/_posts/2023-11-08-suscep.md new file mode 100644 index 00000000..00f9fb53 --- /dev/null +++ b/_posts/2023-11-08-suscep.md @@ -0,0 +1,265 @@ +--- +layout: distill +title: From Scroll to Misbelief - Modeling the Unobservable Susceptibility to Misinformation on Social Media +description: +date: 2023-11-08 + +authors: + - name: Yanchen Liu + url: https://liuyanchen1015.github.io/ + affiliations: + name: Harvard, Cambridge, MA + +bibliography: 2023-11-08-suscep.bib + +# Optionally, you can add a table of contents to your post. +# NOTES: +# - make sure that TOC names match the actual section names +# for hyperlinks within the post to work correctly. +# - we may want to automate TOC generation in the future using +# jekyll-toc plugin (https://github.com/toshimaru/jekyll-toc). +toc: + - name: Abstract + - name: Introduction + - name: Computational Susceptibility Modeling + subsections: + - name: Modeling Unobservable Susceptibility + - name: Training with Supervision from Observable Behavior + - name: Dataset and Experiment Setup + - name: Evaluation + - name: Analysis + - name: Related Work + - name: Conclusion + +# Below is an example of injecting additional post-specific styles. +# If you use this post as a template, delete this _styles block. +_styles: > + .fake-img { + background: #bbb; + border: 1px solid rgba(0, 0, 0, 0.1); + box-shadow: 0 0px 4px rgba(0, 0, 0, 0.1); + margin-bottom: 12px; + } + .fake-img p { + font-family: monospace; + color: white; + text-align: left; + margin: 12px 0; + text-align: center; + font-size: 16px; + } + +--- + +## Abstract + +Susceptibility to misinformation describes the extent to believe false claims, which is hidden in people's mental process and infeasible to observe. Existing susceptibility studies heavily rely on the crowdsourced self-reported belief level, making the downstream research homogeneous and unscalable. To relieve these limitations, we propose a computational model that infers users' susceptibility levels given their reposting behaviors. We utilize the supervision from the observable sharing behavior, incorporating a user's susceptibility level as a key input for the reposting prediction task. Utilizing the capability of large-scale susceptibility labeling, we could also perform a comprehensive analysis of psychological factors and susceptibility levels across professional and geographical communities. Hopefully, we could observe that susceptibility is influenced by complicated factors, demonstrating a degree of correlation with economic development around the world, and with political leanings in the U.S. + +*** + +## Introduction + +
+
+ {% include figure.html path="assets/img/2023-11-08-suscep/suscep_model.png" class="img-fluid rounded z-depth-1" zoomable=true %} +
Illustration of the Susceptibility Modeling. We formulate the model to predict whether a given user will retweet a specific misinformation tweet. We utilize a shallow neural network to predict the susceptibility score, and together with the dot product of the user and tweet embeddings to predict retweet behavior. Our model is trained using two loss functions: binary classification entropy and triplet loss.
+
+
+ +False claims spread on social media platforms, such as conspiracy theories, fake news, and unreliable health information, mislead people's judgment, promote societal polarization, and decrease protective behavior intentions. The harm is especially significant in various contentious events including elections, religious persecution, and the global response to the COVID-19 pandemic. +Many works have investigated the **observable** behavior of information propagation such as where the information propagates, how people share it, and what people discuss about it. However, it is still crucial but challenging to understand the **unobservable** mental and cognitive processes when individuals believe misinformation. Users' susceptibility (i.e., the likelihood of individuals believing misinformation) plays a pivotal role in this context. If a person is more susceptible to misinformation, they are not only more likely to share false claims but also more prone to being misled by them. + +Existing works have investigated the psychological, demographic, and other factors that may contribute to the high susceptibility of a population. +However, previous susceptibility studies heavily rely on self-reported belief towards false claims collected from questionnaire-based participant survey, which presents several limitations. For instance, different participants might interpret the belief levels in different ways. Moreover, the data collection process is labor-heavy and thus limits the scale of downstream research on size, scope, and diversity of the target population. + +The unobservance of people's beliefs makes it infeasible to model susceptibility directly. Luckily, existing psychological literature bridges unobservable beliefs and observable behaviors, showing that the sharing behavior is largely influenced by whether users believe the misinformation, the attributes of the sharing content, and users' internal mental motives. Motivated by these prior works, we formulate the relationship between believing and sharing described in social science literature into a machine learning task. + +Concretely, we propose to infer people's susceptibility level given their re/posting behaviors. To parameterize the model, we wrap the susceptibility level as input for the prediction model of the observable reposting behavior. We perform multi-task learning to simultaneously learn to classify whether a user would share a post, and rank susceptibility scores among similar and dissimilar users when the same content is seen. Note that our model does not aim to predict any ground-truth susceptibility for individuals. Instead, we use users' reposting behaviors towards misinformation as a proxy for their susceptibility level for better interpretability. Our model design enables unobservable modeling with supervision signals for observable behavior, unlocks the scales of misinformation-related studies, and provides a novel perspective to reveal the users' belief patterns. + +We conduct comprehensive evaluations to validate the proposed susceptibility measurement and find that the estimations from our model are highly aligned with human judgment. Building upon such large-scale susceptibility labeling, we further conduct a set analysis of how different social factors relate to susceptibility. We find that political leanings and psychological factors are associated with susceptibility in varying degrees. Moreover, our analysis based on these inferred susceptibility scores corroborates the findings of previous studies based on self-reported beliefs, e.g., stronger analytical thinking is an indicator of low susceptibility. The results of our analysis extend findings in existing literature in a significant way. For example, we demonstrate that susceptibility distribution in the U.S. exhibits a certain degree of correlation with political leanings. + +To sum up, our contributions are: +1. We propose a computational model to infer people's susceptibility towards misinformation in the context of COVID-19, by modeling unobservable latent susceptibility through observable sharing activities. +2. Evaluation shows that our model effectively models unobservable belief, and the predictions highly correlate with human judgment. +3. We conduct a large-scale analysis to uncover the underlying factors contributing to susceptibility across a diverse user population from various professional fields and geographical regions, presenting important implications for related social science studies. + +*** + + +## Computational Susceptibility Modeling +### Modeling Unobservable Susceptibility + +Inspired by the existing studies indicating that believing is an essential driver for dissemination, we propose to model susceptibility, which reflects users' beliefs, as a driver for the sharing behavior, while considering characteristics of the sharing content and user profile. + +We propose a computational model to infer a user's unobservable susceptibility score based on their historical activities as shown in the model figure, and further train the model with signals from the observable disseminating behavior. We construct approximate contrastive user-post pairs as the training data ([Dataset and Experiment Setup](#dataset-and-experiment-setup)). + +This design would allow us to explore the best parameters for the computational model of an unobservable and data-hungry susceptibility variable using the rich data resources available on social media platforms. + +#### Content-Sensitive Susceptibility +We compute the user's susceptibility when a particular piece of misinformation $p$ is perceived (i.e. $s_{u, p}$). This allows us to account for the fact that an individual's susceptibility can vary across different content, influenced by factors such as topics and linguistic styles. By focusing on the susceptibility to specific pieces of misinformation, we aim to create a more nuanced, fine-grained, and accurate representation of how users interact with and react to different COVID-19 misinformation. + +#### User and Misinfo Post Embeddings +As a component of the computational model, we use SBERT developed upon RoBERTa-large to produce a fixed-sized vector to represent the semantic information contained in the posts and user profiles. We consider the misinformation post as a sentence and produce its representation with SBERT. For the user profile, we calculate the average of sentence representations for the user's recent original posts. More specifically, for every user-post pair $(u, p)$, we gather the historical posts written by user $u$ within a 10-day window preceding the creation time of the misinformation post $p$, to learn a representation of user $u$ at that specific time. + +#### Computational Model for Susceptibility +Given the input of the user profile for the user $u$ and the content for misinformation post $p$, the susceptibility computational model is expected to produce the _susceptibility score_ $s_{u, p}$ as shown below, reflecting the susceptibility of $u$ when $p$ is perceived. + +$$ + s_{u, p} = suscep(E(u), E(p)) +$$ + +We first obtain the embeddings $E(p)$ and $E(u)$ for post $p$ and user profile $u$, where $u$ is represented by the user's historical tweets and $E$ is the frozen SBERT sentence embedding function. The susceptibility score is calculated by the function $suscep$, which is implemented as a multi-layer neural network, taking the concatenation of the user and post embeddings as inputs. In the training phase, we keep the sentence embedder frozen and learn the weights for the $suscep$ function that could be used to produce reasonable susceptibility scores. We expect to produce susceptibility scores for novel $u$ and $p$ pairs using the learned $suscep$ function during inference. Additionally, we normalize the resulting susceptibility scores into the -100 to 100 range for better interpretability. + +### Training with Supervision from Observable Behavior +Susceptibility is not easily observable, thus it is infeasible to apply supervision on $s_{u, p}$ directly as only the user $u$ themselves know their belief towards content $p$. Thus, we propose to utilize the supervision signal for sharing a piece of misinformation, which is an observable behavior. We consider susceptibility as an essential factor of sharing behavior and use the susceptibility computational model's output to predict the repost behavior. + + +To produce the probability for user $u$ to share post $p$, we calculate the dot product of the embeddings of the user profile and post content and consider the susceptibility score for the same pair of $u$ and $p$ as a weight factor, and passing the result through a sigmoid function, as illustrated in the model figure. + +$$ + p_{\text{rp}} = \sigma \left( E(u) \cdot E(p) \cdot s_{u, p} \right) +$$ + +Note that we do not directly employ the \textit{susceptibility score} to compute the probability of sharing because the sharing behavior depends not only on the susceptibility level but also on other potential confounding factors. It is possible that a user possesses a notably high susceptibility score for a piece of misinformation yet chooses not to repost it. Hence, we incorporate a dot product of the user and post embedding in our model \dkvtwo{involve the misinformation post content and user profiles into the consideration of predicting the sharing behavior}. + +$$ +\begin{align} +\mathcal{L}_{\text{bce}}(u_i, t) &= -\left( y_i \log(p_{\text{rt}}(u_i, t)) + (1 - y_i) \log(1 - p_{\text{rt}}(u_i, t)) \right) \nonumber \\ +\mathcal{L}_{\text{triplet}}(u_a, u_s, u_{ds}, t) &= \text{ReLU}\left(\Vert s_{u_{a},t} - s_{u_{s},t}\Vert_2^2 - \Vert s_{u_{a},t} - s_{u_{ds},t} \Vert_2^2 + \alpha \right) \nonumber \\ +\mathcal{L}(u_a, u_s, u_{ds}, t) &= \frac{\lambda}{3} \sum_{i \in \{a, s, ds\}} \mathcal{L}_{\text{bce}}(u_i, t) + (1 - \lambda) \mathcal{L}_{\text{triplet}}(u_a, u_s, u_{ds}, t) \nonumber \label{eq:loss} +\end{align} +$$ +#### Objectives +We perform multi-task learning to utilize different supervision signals. We first consider a binary classification task of predicting repost or not with a cross-entropy loss. Additionally, we perform the triplet ranking task to distinguish the subtle differences among the susceptibility scores of multiple users when the same false content is present. + +During each forward pass, our model is provided with three user-post pairs: the anchor pair $(u_a, p)$, the similar pair $(u_s, p)$, and the dissimilar pair $(u_{ds}, p)$. +We determine the similar user $u_s$ as the user who reposted $p$ if and only if user $u_a$ reposted $p$. Conversely, the dissimilar user $u_{ds}$ is determined by reversing this relationship. +When multiple potential candidate users exist for either $u_s$ or $u_{ds}$, we randomly select one. +However, if there are no suitable candidate users available, we randomly sample one user from the positive (for "reposted" cases) or negative examples (for "did not repost" cases) and pair this randomly chosen user with this misinformation post $p$. + +Here, we elaborate on the definition of our loss function. Here, $y_i$ takes the value of 1 if and only if user $u_i$ reposted misinformation post $p$. The parameter $\alpha$ corresponds to the margin employed in the triplet loss, serving as a hyperparameter that determines the minimum distance difference needed between the anchor and the similar or dissimilar sample +for the loss to equal zero. Additionally, we introduce the control hyperparameter $\lambda$, which governs the weighting of the binary cross-entropy and triplet loss components. + + +## Dataset and Experiment Setup +We use Twitter data because it hosts an extensive and diverse collection of users, the accessibility of its data, and its popularity for computational social science research. +Additionally, it provides users' free-text personal and emotional expression with crucial metadata, including timestamps and location data, which are useful for our subsequent analytical endeavors. + +#### Misinformation Tweets +We consider two misinformation tweet datasets: the ANTi-Vax dataset was collected and annotated specifically for COVID-19 vaccine misinformation tweets. On the other hand, CoAID encompasses a broader range of misinformation related to COVID-19 healthcare, including fake news on websites and social platforms. The former dataset contains 3,775 instances of misinformation tweets, while the latter contains 10,443. + +However, a substantial number of tweets within these two datasets do not have any retweets. Consequently, we choose to retain only those misinformation tweets that have been retweeted by valid users. +Finally, we have collected a total of 1,271 misinformation tweets for our study. + +#### Positive Examples +We define the positive examples for modeling as $(u_{pos}, t)$ pairs, where user $u_{pos}$ viewed and retweeted the misinformation tweet $t$. We obtained all retweeters for each misinformation tweet through the Twitter API. + +#### Negative Examples +Regarding negative examples, we define them as $(u_{neg}, t)$ pairs where user $u_{neg}$ viewed but did not retweet misinformation post $t$. However, obtaining these negative examples poses a substantial challenge, because the Twitter API does not provide information on the "being viewed" activities of a specific tweet. +To tackle this issue, we infer potential users $u_{neg}$ that highly likely viewed a given tweet $t$ following the heuristics: 1) $u_{neg}$ should be a follower of the author of the misinformation tweet $t$, 2) $u_{neg}$ should not retweet $t$, and 3) $u_{neg}$ was active on Twitter within 10 days before and 2 days after the timestamp of $t$. + +We have collected a total of 3,811 positive examples and 3,847 negative examples, resulting in a dataset comprising 7,658 user-post pairs in total. +We divide the dataset into three subsets with an 80% - 10% - 10% split for train, validation, and test purposes, respectively. The detailed statistics of the collected data are illustrated in the table below. + +| | Total | Positive | Negative | +|----------------|-------|----------|----------| +| # Example | 7658 | 3811 | 3847 | +| # User | 6908 | 3669 | 3255 | +| # Misinfo tweet| 1271 | 787 | 1028 | + +## Evaluation +In this section, we demonstrate the effectiveness of our susceptibility modeling by directly comparing our estimations with human judgment and indirectly evaluating its performance for predicting sharing behavior. + +### Validation with Human Judgement + +Due to the abstract nature of susceptibility and the lack of concrete ground truth, we face challenges in directly evaluating our susceptibility modeling. We use human evaluations to validate the effectiveness of our inferred susceptibility. Given the subjectivity inherent in the concept of susceptibility, and to mitigate potential issues arising from variations in individual evaluation scales, we opt not to request humans to annotate a user's susceptibility directly. Instead, we structure the human evaluation as presenting human evaluators with pairs of users along with their historical tweets and requesting them to determine which user appears more susceptible to overall COVID-19 misinformation. + +Subsequently, we compared the predictions made by our model with the human-annotated predictions. To obtain predictions from our model, we compute each user's susceptibility to overall COVID-19 misinformation by averaging their susceptibility scores to each COVID-19 misinformation tweet in our dataset. +As presented in the table below, our model achieves an average agreement of 73.06% with human predictions, indicating a solid alignment with the annotations provided by human evaluators. Additionally, we consider a baseline that directly calculates susceptibility scores as the cosine similarity between the user and misinformation tweet embeddings. Compared to this baseline, our susceptibility modeling brings a 10.06% improvement. +Moreover, we compare the performance with ChatGPT prompting with the task description of the susceptibility level comparison setting as instruction in a zero-shot manner. We observe that our model also outperforms predictions made by ChatGPT. +The results from the human judgment validate the effectiveness of our susceptibility modeling and its capability to reliably assess user susceptibility to COVID-19 misinformation. + +| | Our | Baseline | ChatGPT | +| -------------- | ----------------- | ---------------- | ---------------- | +| Agreement | 73.06±8.19 | 63.00±9.07 | 64.85±9.02 | + + +### Susceptibility Score Distribution + +We provide a visualization of the distribution of susceptibility scores within positive and negative examples produced by our model on the training data. +As depicted below, there is a notable disparity in the distribution between positive and negative examples, verifying our assumption that believing is an essential driver for sharing behavior. The difference in the means of the positive and negative groups is statistically significant, with a p-value of less than 0.001. + +
+
+ {% include figure.html path="assets/img/2023-11-08-suscep/pos_neg_distribution.png" class="img-fluid rounded z-depth-1" zoomable=true %} +
Susceptibility Score Distribution among positive and negative user-tweet samples. The distribution of positive (red) and negative (blue) examples exhibits a clear disparity.
+
+
+ +### Sharing Behavior Prediction +Furthermore, holding a belief is highly likely to result in subsequent sharing behavior. We demonstrated that our trained model possesses a strong ability for sharing behavior prediction. When tested on the held-out test dataset, our model achieves a test accuracy of 78.11% and an F1 score of 77.93. These results indirectly demonstrate the reliable performance of our model for susceptibility modeling. + +## Analysis +In this section, we show the potential of our inferred susceptibility scores in expanding the scope of susceptibility research. Our analysis not only aligns with the findings of previous survey-based studies but also goes a step further by extending and enriching their conclusions. + +### Correlation with Psychological Factors +Previous research on human susceptibility to health and COVID-19 misinformation has been primarily based on questionnaire surveys . These studies have identified several psychological factors that influence individuals' susceptibility to misinformation. For instance, analytical thinking (as opposed to intuitive thinking), trust in science, and positive emotions have been linked to a greater resistance to health misinformation. Conversely, susceptibility to health misinformation is associated with factors such as conspiracy thinking, religiosity, conservative ideology, and negative emotions. +In this part, we analyze the correlation coefficients between our modeled susceptibility scores and the aforementioned factors to determine if our results align with previous research findings. + +To achieve this, we compute factor scores for each user in our dataset based on their historical tweets using LIWC Analysis. We calculate the average value across all the user's historical tweets as the final factor score. However, for emotional factors such as anxiety and anger with less frequent appearance, we opt for the maximum value instead to more effectively capture these emotions. We primarily consider the following factors: *Analytic Thinking*, Emotions (*Positive* emotions, *Anxious*, *Angery* and *Sad*), *Swear*, *Political Leaning*, *Ethnicity*, *Technology*, *Religiosity*, *Illness* and *Wellness*. These factors have been extensively studied in previous works and can be inferred from a user's historical tweets. +We calculate and plot the Pearson correlation coefficients between each factor and the susceptibility predicted by our model in the following table. + +| Factors | Coeff. | Factors | Coeff. | +|---------------------|--------|---------------------|--------| +| Analytic Thinking | -0.31 | Emotion - Positive | -0.08 | +| Political Leaning | 0.13 | Emotion - Anxious | 0.08 | +| Ethnicity | 0.09 | Emotion - Angry | 0.16 | +| Religiosity | 0.10 | Emotion - Sad | 0.14 | +| Technology | -0.09 | Swear | 0.18 | +| Illness | 0.09 | Wellness | -0.02 | + +According to our analysis, correlations are consistent with previous social science studies based on surveys on health susceptibility. For instance, *Analytic Thinking* is a strong indicator of low susceptibility, with a correlation coefficient of -0.31. +Conversely, certain features such as *Swear*, *Political Leaning* and *Angry* exhibit a weak correlation with a high susceptibility score. +These results not only corroborate the conclusions drawn from previous survey-based studies but also provide further validation for the effectiveness of our computational modeling for susceptibility. + + +### Geographical Community Differences +We delve into the geographical distribution of susceptibility. Given the significant imbalance in the number of users from different U.S. states, we calculate the average susceptibility scores for each state using Bayesian smoothing. We use the overall mean susceptibility score and overall standard deviation as our priors and the more the users in the group, the less the overall mean will affect the group's score. + +We explore the susceptibility distribution among different U.S. states, considering the influence of political ideology associated with different states . Out of the 100,000 users sampled from around the world, 25,653 users are from U.S. states with more than 200 users for each state. As illustrated in the figure below, the susceptibility distribution across U.S. states is imbalanced and exhibits a certain degree of correlation with political leanings, where generally, states known to have a more conservative population tend to have relatively higher susceptibility scores, while states that are considered more liberal have lower scores. +Specifically, the average susceptibility score for users in blue or red states is -3.66 and -2.82 respectively. Red or blue states refer to US states whose voters vote predominantly for the Republican or Democratic Party. We determine blue/red states according to the 2020 presidential election result. We observe that 60% of the ten states with the highest susceptibility scores are red states, and 90% of the ten states with the lowest susceptibility scores are blue states. +This is a trend that has been observed in various research where political ideology influences the perception of scientific information . +However, it is crucial to acknowledge the limitations of our analysis, as it solely reflects the susceptibility distribution of the sampled users within each state. + +
+
+ {% include figure.html path="assets/img/2023-11-08-suscep/usa.png" class="img-fluid rounded z-depth-1" zoomable=true %} +
Susceptibility Distribution by U.S. State (with bayesian smoothing). We use the average susceptibility score in the United States (-2.87) as the threshold, with scores above it displayed in red, and those below it in blue. Due to space constraints and insufficient data points, we are only displaying data for 48 contiguous states within the U.S.
+
+
+ + +## Related Work + +### Measure of Susceptibility +The common practice in measuring susceptibility involves collecting self-reported data on agreement or disagreement with verified false claims, , , . Some studies assess susceptibility indirectly through its impact on behavior, but this approach fails to capture actual belief systems. Our work proposes a computational model as a scalable alternative to expensive and limited self-reported beliefs. + +### Contributing Factors and Application of Susceptibility +Research utilizing manually collected susceptibility annotations has explored various factors influencing susceptibility, such as emotion, , analytic thinking, partisan bias, source credibility, and repetition. Theories explaining susceptibility range from limited knowledge acquisition to overconfidence. This understanding aids in applications like analyzing bot-driven misinformation spread and developing prebunking interventions, . However, the field lacks a computational model for large-scale susceptibility inference, which we address in our work. + +### Inferring Unobservables from Observables +Latent constructs, or variables that are not directly observable, are often inferred through models from observable variables, . Methods like nonlinear mixed-effects models and hidden Markov models are used for this purpose. In our approach, we utilize a neural network-based architecture to represent these latent variables, aiding in predicting observable variables. + +## Conclusion +In this work, we propose a computational approach to model people's **unobservable** susceptibility to misinformation. While previous research on susceptibility is heavily based on self-reported beliefs collected from questionnaire-based surveys, our model trained in a multi-task manner can approximate user's susceptibility scores from their reposting behavior. When compared with human judgment, our model shows highly aligned predictions on a susceptibility comparison evaluation task. +To demonstrate the potential of our computational model in extending the scope of previous misinformation-related studies, we leverage susceptibility scores generated by our model to analyze factors contributing to misinformation susceptibility. This thorough analysis encompasses a diverse U.S. population from various professional and geographical backgrounds. The results of our analysis algin, corroborate, and expand upon the conclusions drawn from previous survey-based computational social science studies. + +## Limitations +Besides investigating the underlying mechanism of misinformation propagation at a large scale, the susceptibility scores produced by our model have the potential to be used to visualize and interpret individual and community vulnerability in information propagation paths, identify users with high risks of believing in false claims and take preventative measures, and use as predictors for other human behavior such as following and sharing. However, while our research represents a significant step in modeling susceptibility to misinformation, several limitations should be acknowledged. + +First, our model provides insights into susceptibility based on the available data and the features we have incorporated. However, it's important to recognize that various other factors, both individual and contextual, may influence susceptibility to misinformation. These factors, such as personal experiences and offline social interactions, have not been comprehensively incorporated into our model and should be considered in future research. + +Moreover, the susceptibility scores modeled by our model represent an estimation of an individual's likelihood to engage with misinformation. These scores may not always align perfectly with real-world susceptibility levels. Actual susceptibility is a complex interplay of cognitive, psychological, and social factors that cannot be entirely captured through computational modeling. Our model should be seen as a valuable tool for understanding trends and patterns rather than providing definitive individual susceptibility assessments. + +Finally, our study's findings are based on a specific dataset and may not be fully generalizable to all populations, platforms, or types of misinformation. For example, due to the high cost of data collection, not all countries or U.S. states have a sufficient amount of Twitter data available for analysis, especially when we examine the geographical distribution of susceptibility. Furthermore, platform-specific differences and variations in the types of misinformation can potentially impact the effectiveness of our model and the interpretation of susceptibility scores. + diff --git a/assets/bibliography/2023-11-08-suscep.bib b/assets/bibliography/2023-11-08-suscep.bib index 21e723a7..c27bcd07 100644 --- a/assets/bibliography/2023-11-08-suscep.bib +++ b/assets/bibliography/2023-11-08-suscep.bib @@ -1,3 +1,462 @@ +# Misinformation Susceptibility + +@article{imhoff2022conspiracy, + title={Conspiracy mentality and political orientation across 26 countries}, + author={Imhoff, Roland and Zimmer, Felix and Klein, Olivier and Ant{\'o}nio, Jo{\~a}o HC and Babinska, Maria and Bangerter, Adrian and Bilewicz, Michal and Blanu{\v{s}}a, Neboj{\v{s}}a and Bovan, Kosta and Bu{\v{z}}arovska, Rumena and others}, + journal={Nature human behaviour}, + volume={6}, + number={3}, + pages={392--403}, + year={2022}, + publisher={Nature Publishing Group UK London} +} + +@article{roozenbeek2020susceptibility, + title={Susceptibility to misinformation about COVID-19 around the world}, + author={Roozenbeek, Jon and Schneider, Claudia R and Dryhurst, Sarah and Kerr, John and Freeman, Alexandra LJ and Recchia, Gabriel and Van Der Bles, Anne Marthe and Van Der Linden, Sander}, + journal={Royal Society open science}, + volume={7}, + number={10}, + pages={201199}, + year={2020}, + publisher={The Royal Society} +} + +@article{van2022misinformation, + title={Misinformation: susceptibility, spread, and interventions to immunize the public}, + author={Van Der Linden, Sander}, + journal={Nature Medicine}, + volume={28}, + number={3}, + pages={460--467}, + year={2022}, + publisher={Nature Publishing Group US New York} +} + +@article{pennycook2021psychology, + title={The psychology of fake news}, + author={Pennycook, Gordon and Rand, David G}, + journal={Trends in cognitive sciences}, + volume={25}, + number={5}, + pages={388--402}, + year={2021}, + publisher={Elsevier} +} + +@article{bringula2022gullible, + title={“Who is gullible to political disinformation?”: predicting susceptibility of university students to fake news}, + author={Bringula, Rex P and Catacutan-Bangit, Annaliza E and Garcia, Manuel B and Gonzales, John Paul S and Valderama, Arlene Mae C}, + journal={Journal of Information Technology \& Politics}, + volume={19}, + number={2}, + pages={165--179}, + year={2022}, + publisher={Taylor \& Francis} +} + +@article{scherer2021susceptible, + title={Who is susceptible to online health misinformation? A test of four psychosocial hypotheses.}, + author={Scherer, Laura D and McPhetres, Jon and Pennycook, Gordon and Kempe, Allison and Allen, Larry A and Knoepke, Christopher E and Tate, Channing E and Matlock, Daniel D}, + journal={Health Psychology}, + volume={40}, + number={4}, + pages={274}, + year={2021}, + publisher={American Psychological Association} +} + +@article{nan2022people, + title={Why people believe health misinformation and who are at risk? A systematic review of individual differences in susceptibility to health misinformation}, + author={Nan, Xiaoli and Wang, Yuan and Thier, Kathryn}, + journal={Social Science \& Medicine}, + pages={115398}, + year={2022}, + publisher={Elsevier} +} + +@inproceedings{zhang2015ideology, +author = {Zhang, Amy X. and Counts, Scott}, +title = {Modeling Ideology and Predicting Policy Change with Social Media: Case of Same-Sex Marriage}, +year = {2015}, +isbn = {9781450331456}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +url = {https://doi.org/10.1145/2702123.2702193}, +doi = {10.1145/2702123.2702193}, +abstract = {Social media has emerged as a prominent platform where people can express their feelings about social and political issues of our time. We study the many voices discussing an issue within a constituency and how they reflect ideology and may signal the outcome of important policy decisions. Focusing on the issue of same-sex marriage legalization, we examine almost 2 million public Twitter posts related to same-sex marriage in the U.S. states over the course of 4 years starting from 2011. Among other findings, we find evidence of moral culture wars between ideologies and show that constituencies that express higher levels of emotion and have fewer actively engaged participants often precede legalization efforts that fail. From our measures, we build statistical models to predict the outcome of potential policy changes, with our best model achieving 87\% accuracy. We also achieve accuracies of 70\%, comparable to public opinion surveys, many months before a policy decision. We discuss how these analyses can augment traditional political science techniques as well as assist activists and policy analysts in understanding discussions on important issues at a population scale.}, +booktitle = {Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems}, +pages = {2603–2612}, +numpages = {10}, +keywords = {public policy, social media, political science, same-sex marriage}, +location = {Seoul, Republic of Korea}, +series = {CHI '15} +} + +@article{mccright2013influence, + title={The influence of political ideology on trust in science}, + author={McCright, Aaron M and Dentzman, Katherine and Charters, Meghan and Dietz, Thomas}, + journal={Environmental Research Letters}, + volume={8}, + number={4}, + pages={044029}, + year={2013}, + publisher={IOP Publishing} +} + +@article{baptista2021influence, + title={The influence of political ideology on fake news belief: The Portuguese case}, + author={Baptista, Jo{\~a}o Pedro and Correia, Elisete and Gradim, Anabela and Pi{\~n}eiro-Naval, Valeriano}, + journal={Publications}, + volume={9}, + number={2}, + pages={23}, + year={2021}, + publisher={MDPI} +} + +@misc{cui2020coaid, + title={CoAID: COVID-19 Healthcare Misinformation Dataset}, + author={Limeng Cui and Dongwon Lee}, + year={2020}, + eprint={2006.00885}, + archivePrefix={arXiv}, + primaryClass={cs.SI} +} + +@article{hayawi2022anti, + title={ANTi-Vax: a novel Twitter dataset for COVID-19 vaccine misinformation detection}, + author={Hayawi, Kadhim and Shahriar, Sakib and Serhani, Mohamed Adel and Taleb, Ikbal and Mathew, Sujith Samuel}, + journal={Public health}, + volume={203}, + pages={23--30}, + year={2022}, + publisher={Elsevier} +} + +@article{liu2019roberta, + title={Roberta: A robustly optimized bert pretraining approach}, + author={Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin}, + journal={arXiv preprint arXiv:1907.11692}, + year={2019} +} + +@article{chen2009ranking, + title={Ranking measures and loss functions in learning to rank}, + author={Chen, Wei and Liu, Tie-Yan and Lan, Yanyan and Ma, Zhi-Ming and Li, Hang}, + journal={Advances in Neural Information Processing Systems}, + volume={22}, + year={2009} +} + +@inproceedings{hoffer2015deep, + title={Deep metric learning using triplet network}, + author={Hoffer, Elad and Ailon, Nir}, + booktitle={Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3}, + pages={84--92}, + year={2015}, + organization={Springer} +} + +@book{gelman2009red, + title={Red state, blue state, rich state, poor state: Why Americans vote the way they do-expanded edition}, + author={Gelman, Andrew}, + year={2009}, + publisher={Princeton University Press} +} + +@article{singh2022misinformation, + title={Misinformation, believability, and vaccine acceptance over 40 countries: Takeaways from the initial phase of the COVID-19 infodemic}, + author={Singh, Karandeep and Lima, Gabriel and Cha, Meeyoung and Cha, Chiyoung and Kulshrestha, Juhi and Ahn, Yong-Yeol and Varol, Onur}, + journal={Plos one}, + volume={17}, + number={2}, + pages={e0263381}, + year={2022}, + publisher={Public Library of Science San Francisco, CA USA} +} + +@article{Thier2021HealthMisinformation, + title={Health misinformation}, + author={Nan, Xiaoli and Wang, Yuan and Thier, Kathryn}, + year={2021}, + publisher={OSF Preprints} +} + +@article{Ecker2022PsychologicalDriversMisinformation, + title={The psychological drivers of misinformation belief and its resistance to correction}, + author={Ecker, Ullrich KH and Lewandowsky, Stephan and Cook, John and Schmid, Philipp and Fazio, Lisa K and Brashier, Nadia and Kendeou, Panayiota and Vraga, Emily K and Amazeen, Michelle A}, + journal={Nature Reviews Psychology}, + volume={1}, + number={1}, + pages={13--29}, + year={2022}, + publisher={Nature Publishing Group US New York} +} + +@inproceedings{Taylor2023WhereDoesYour, + title={Where Does Your News Come From? Predicting Information Pathways in Social Media}, + author={Taylor, Alexander K and Wen, Nuan and Kung, Po-Nien and Chen, Jiaao and Peng, Violet and Wang, Wei}, + booktitle={Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval}, + pages={2511--2515}, + year={2023} +} + +@article{Yang2021COVID19InfodemicTwitter, + title={The COVID-19 infodemic: twitter versus facebook}, + author={Yang, Kai-Cheng and Pierri, Francesco and Hui, Pik-Mai and Axelrod, David and Torres-Lugo, Christopher and Bryden, John and Menczer, Filippo}, + journal={Big Data \& Society}, + volume={8}, + number={1}, + pages={20539517211013861}, + year={2021}, + publisher={SAGE Publications Sage UK: London, England} +} + +@article{Gupta2023PolarisedSocialMedia, + title={Polarised social media discourse during COVID-19 pandemic: evidence from YouTube}, + author={Gupta, Samrat and Jain, Gaurav and Tiwari, Amit Anand}, + journal={Behaviour \& Information Technology}, + volume={42}, + number={2}, + pages={227--248}, + year={2023}, + publisher={Taylor \& Francis} +} + +@misc{Scherer2020WhoSusceptibleOnline, + title={Who is susceptible to online health misinformation?}, + author={Scherer, Laura D and Pennycook, Gordon}, + journal={American Journal of Public Health}, + volume={110}, + number={S3}, + pages={S276--S277}, + year={2020}, + publisher={American Public Health Association} +} + +@article{Brashier2020AgingEraFake, + title={Aging in an era of fake news}, + author={Brashier, Nadia M and Schacter, Daniel L}, + journal={Current directions in psychological science}, + volume={29}, + number={3}, + pages={316--323}, + year={2020}, + publisher={Sage Publications Sage CA: Los Angeles, CA} +} + +@article{Pennycook2020WhoFallsFake, + title={Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and analytic thinking}, + author={Pennycook, Gordon and Rand, David G}, + journal={Journal of personality}, + volume={88}, + number={2}, + pages={185--200}, + year={2020}, + publisher={Wiley Online Library} +} + +@article{Escola-Gascon2021CriticalThinkingPredicts, + title={Critical thinking predicts reductions in Spanish physicians' stress levels and promotes fake news detection}, + author={Escol{\`a}-Gasc{\'o}n, {\'A}lex and Dagnall, Neil and Gallifa, Josep}, + journal={Thinking Skills and Creativity}, + volume={42}, + pages={100934}, + year={2021}, + publisher={Elsevier} +} + +@article{Rosenzweig2021HappinessSurpriseAre, + title={Happiness and surprise are associated with worse truth discernment of COVID-19 headlines among social media users in Nigeria}, + author={Rosenzweig, Leah R and Bago, Bence and Berinsky, Adam J and Rand, David G}, + year={2021}, + publisher={Shorenstein Center for Media, Politics, and Public Policy} +} + +@article{Altay2022WhyFewPeople, + title={Why do so few people share fake news? It hurts their reputation}, + author={Altay, Sacha and Hacquin, Anne-Sophie and Mercier, Hugo}, + journal={new media \& society}, + volume={24}, + number={6}, + pages={1303--1324}, + year={2022}, + publisher={SAGE Publications Sage UK: London, England} +} + +@article{Atske2019ManyAmericansSay, + title={Many Americans say made-up news is a critical problem that needs to be fixed}, + author={Mitchell, Amy and Gottfried, Jeffrey and Stocking, Galen and Walker, Mason and Fedeli, Sophia}, + journal={Pew Research Center}, + volume={5}, + pages={2019}, + year={2019} +} + +@article{Brady2020MADModelMoral, + title={The MAD model of moral contagion: The role of motivation, attention, and design in the spread of moralized content online}, + author={Brady, William J and Van Bavel, Jay J}, + journal={Perspectives on Psychological Science}, + volume={15}, + number={4}, + pages={978--1010}, + year={2020}, + publisher={Sage Publications Sage CA: Los Angeles, CA} +} + +@article{Islam2020MisinformationSharingSocial, + title={Misinformation sharing and social media fatigue during COVID-19: An affordance and cognitive load perspective}, + author={Islam, AKM Najmul and Laato, Samuli and Talukder, Shamim and Sutinen, Erkki}, + journal={Technological forecasting and social change}, + volume={159}, + pages={120201}, + year={2020}, + publisher={Elsevier} +} + +@article{Roozenbeek2020SusceptibilityMisinformationCOVID19, + title={Susceptibility to misinformation about COVID-19 around the world}, + author={Roozenbeek, Jon and Schneider, Claudia R and Dryhurst, Sarah and Kerr, John and Freeman, Alexandra LJ and Recchia, Gabriel and Van Der Bles, Anne Marthe and Van Der Linden, Sander}, + journal={Royal Society open science}, + volume={7}, + number={10}, + pages={201199}, + year={2020}, + publisher={The Royal Society} +} + +@article{Loomba2021MeasuringImpactCOVID19, + title={Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA}, + author={Loomba, Sahil and de Figueiredo, Alexandre and Piatek, Simon J and de Graaf, Kristen and Larson, Heidi J}, + journal={Nature human behaviour}, + volume={5}, + number={3}, + pages={337--348}, + year={2021}, + publisher={Nature Publishing Group UK London} +} +@article{Sharma2023SystematicReviewRelationship, + title={A systematic review of the relationship between emotion and susceptibility to misinformation}, + author={Sharma, Prerika R and Wade, Kimberley A and Jobson, Laura}, + journal={Memory}, + volume={31}, + number={1}, + pages={1--21}, + year={2023}, + publisher={Taylor \& Francis} +} + +@article{Weeks2015EmotionsPartisanshipMisperceptions, + title={Emotions, partisanship, and misperceptions: How anger and anxiety moderate the effect of partisan bias on susceptibility to political misinformation}, + author={Weeks, Brian E}, + journal={Journal of communication}, + volume={65}, + number={4}, + pages={699--719}, + year={2015}, + publisher={Oxford University Press} +} + +@article{Li2022EmotionAnalyticThinking, + title={Emotion, analytic thinking and susceptibility to misinformation during the COVID-19 outbreak}, + author={Li, Ming-Hui and Chen, Zhiqin and Rao, Li-Lin}, + journal={Computers in Human Behavior}, + volume={133}, + pages={107295}, + year={2022}, + publisher={Elsevier} +} + +@article{Roozenbeek2022SusceptibilityMisinformationConsistent, + title={Susceptibility to misinformation is consistent across question framings and response modes and better explained by myside bias and partisanship than analytical thinking}, + author={Roozenbeek, Jon and Maertens, Rakoen and Herzog, Stefan M and Geers, Michael and Kurvers, Ralf and Sultan, Mubashir and van der Linden, Sander}, + journal={Judgment and Decision Making}, + volume={17}, + number={3}, + pages={547--573}, + year={2022}, + publisher={Cambridge University Press} +} + +@article{Traberg2022BirdsFeatherAre, + title={Birds of a feather are persuaded together: Perceived source credibility mediates the effect of political bias on misinformation susceptibility}, + author={Traberg, Cecilie Steenbuch and van der Linden, Sander}, + journal={Personality and Individual Differences}, + volume={185}, + pages={111269}, + year={2022}, + publisher={Elsevier} +} + +@article{Foster2012RepetitionNotNumber, + title={Repetition, not number of sources, increases both susceptibility to misinformation and confidence in the accuracy of eyewitnesses}, + author={Foster, Jeffrey L and Huthwaite, Thomas and Yesberg, Julia A and Garry, Maryanne and Loftus, Elizabeth F}, + journal={Acta psychologica}, + volume={139}, + number={2}, + pages={320--326}, + year={2012}, + publisher={Elsevier} +} + +@article{Salovich2021CanConfidenceHelp, + title={Can confidence help account for and redress the effects of reading inaccurate information?}, + author={Salovich, Nikita A and Donovan, Amalia M and Hinze, Scott R and Rapp, David N}, + journal={Memory \& Cognition}, + volume={49}, + pages={293--310}, + year={2021}, + publisher={Springer} +} + +@article{Himelein-Wachowiak2021BotsMisinformationSpread, + title={Bots and misinformation spread on social media: Implications for COVID-19}, + author={Himelein-Wachowiak, McKenzie and Giorgi, Salvatore and Devoto, Amanda and Rahman, Muhammad and Ungar, Lyle and Schwartz, H Andrew and Epstein, David H and Leggio, Lorenzo and Curtis, Brenda}, + journal={Journal of medical Internet research}, + volume={23}, + number={5}, + pages={e26933}, + year={2021}, + publisher={JMIR Publications Toronto, Canada} +} + +@article{Roozenbeek2020PrebunkingInterventionsBased, + title={Prebunking interventions based on “inoculation” theory can cut sensor to misinformation cross farming}, + author={Roozenbeek, Jon and van der Linden, Sander and Nygren, Thomas} +} + +@article{Roozenbeek2022PsychologicalInoculationImproves, + title={Psychological inoculation improves resilience against misinformation on social media}, + author={Roozenbeek, Jon and Van Der Linden, Sander and Goldberg, Beth and Rathje, Steve and Lewandowsky, Stephan}, + journal={Science Advances}, + volume={8}, + number={34}, + pages={eabo6254}, + year={2022}, + publisher={American Association for the Advancement of Science} +} + +@article{Bollen2002LatentVariablesPsychology, + title={Latent variables in psychology and the social sciences}, + author={Bollen, Kenneth A}, + journal={Annual review of psychology}, + volume={53}, + number={1}, + pages={605--634}, + year={2002}, + publisher={Annual Reviews 4139 El Camino Way, PO Box 10139, Palo Alto, CA 94303-0139, USA} +} + +@article{Borsboom2003TheoreticalStatusLatent, + title={The theoretical status of latent variables.}, + author={Borsboom, Denny and Mellenbergh, Gideon J and Van Heerden, Jaap}, + journal={Psychological review}, + volume={110}, + number={2}, + pages={203}, + year={2003}, + publisher={American Psychological Association} @misc{sap_socialiqa_2019, title = {{SocialIQA}: {Commonsense} {Reasoning} about {Social} {Interactions}}, diff --git a/assets/img/2023-11-08-suscep/pos_neg_distribution.png b/assets/img/2023-11-08-suscep/pos_neg_distribution.png new file mode 100644 index 00000000..e73e11e9 Binary files /dev/null and b/assets/img/2023-11-08-suscep/pos_neg_distribution.png differ diff --git a/assets/img/2023-11-08-suscep/suscep_model.png b/assets/img/2023-11-08-suscep/suscep_model.png index 77e15506..e5adb4ff 100644 Binary files a/assets/img/2023-11-08-suscep/suscep_model.png and b/assets/img/2023-11-08-suscep/suscep_model.png differ diff --git a/assets/img/2023-11-08-suscep/usa.png b/assets/img/2023-11-08-suscep/usa.png new file mode 100644 index 00000000..0514f9fa Binary files /dev/null and b/assets/img/2023-11-08-suscep/usa.png differ