This repository enhances the original collection of research papers on Fake News Detection (FND). Contributions are welcome to expand this list, email suggestions to [email protected] to help build a comprehensive resource for the research community.
Title | Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection |
---|---|
Year | 2024 |
Approach/Methodology | The study involves an empirical investigation to understand the effectiveness of LLMs (e.g., GPT 3.5) in detecting fake news compared to SLMs like fine-tuned BERT. The authors designed an adaptive rationale guidance network (ARG) for fake news detection, which allows SLMs to leverage LLM-generated rationales for enhanced analysis. A rationale-free version, ARGD, was also developed for cost-sensitive use cases through distillation. |
Datasets Used | N/A |
Results/Performance Metrics | ARG and ARGD outperformed three types of baseline methods, including those based on SLMs, LLMs, and their combinations. |
Challenges | LLMs like GPT 3.5, despite offering desirable multi-perspective rationales, underperform compared to fine-tuned SLMs like BERT due to difficulties in selecting and integrating rationales effectively. |
Key Contribution | The research highlights that while LLMs alone may not surpass fine-tuned SLMs in fake news detection, they can serve as valuable advisors by providing multi-perspective rationales. The adaptive rationale guidance network (ARG) is a significant contribution that allows SLMs to incorporate LLM insights selectively, along with the cost-effective, rationale-free ARGD variant. |
Title | Learn over Past, Evolve for Future: Forecasting Temporal Trends for Fake News Detection |
Year | 2023 |
Approach/Methodology | FTT (Forecasting Temporal Trends) that forecasts the temporal distribution patterns of news data to guide the detector in adapting to future data distributions. |
Datasets Used | Real-world temporally split dataset |
Results/Performance Metrics | Demonstrated superiority of the proposed framework |
Challenges | Performance degradation due to training on past data and testing on future data caused by the temporal shift in news data. |
Key Contribution | Introduction of a framework FTT that forecasts temporal distribution patterns to improve the adaptability of fake news detectors to future data. |
Title | Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation |
Year | 2023 |
Approach/Methodology | The paper proposes a novel framework for generating training examples informed by human-authored propaganda styles and strategies. It employs self-critical sequence training guided by natural language inference to validate generated articles and integrates propaganda techniques such as appeal to authority and loaded language. |
Datasets Used | A new training dataset, PROPANEWS, with 2,256 examples. |
Results/Performance Metrics | Improved fake news detection performance, achieving a 3.62–7.69% increase in F1 score on two public datasets. |
Challenges | The gap between machine-generated fake news and human-authored disinformation, including differences in style and intent. |
Key Contribution | Development of the PROPANEWS dataset and a framework that enhances the detection of human-written disinformation by incorporating known propaganda techniques. |
Title | Zoom Out and Observe: News Environment Perception for Fake News Detection |
Year | 2022 |
Approach/Methodology | Introduced the News Environment Perception Framework (NEP), which constructs a macro and micro news environment from recent mainstream media for each post. It uses a popularity-oriented and a novelty-oriented module to capture environmental signals and improve fake news detection. |
Datasets Used | Newly built datasets for evaluating NEP. |
Results/Performance Metrics | Demonstrated that NEP efficiently improves the performance of basic fake news detectors. |
Challenges | Existing methods neglect the external news environment where fake news is created and disseminated. |
Key Contribution | The development of the NEP framework that observes the broader news environment to enhance fake news detection performance. |
Title | A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Explainable Fake News Detection |
Year | 2022 |
Approach/Methodology | The paper proposed a Coarse-to-fine Cascaded Evidence-Distillation (CofCED) neural network for explainable fake news detection. It utilized a hierarchical encoder for web text representation and developed two cascaded selectors to pick the most explainable sentences from the top-𝐾 reports in a coarse-to-fine manner. |
Datasets Used | Two explainable fake news datasets were constructed and made publicly available. |
Results/Performance Metrics | The experimental results showed that the model significantly outperformed state-of-the-art baselines and generated high-quality explanations from various evaluation perspectives. |
Challenges | Existing methods tailored automated solutions on manual fact-checked reports, suffering from limited news coverage and debunking delays. |
Key Contribution | The main contribution was the CofCED neural network that reduced dependency on fact-checked reports by leveraging raw reports and providing explainable fake news detection. |
Title | Demystifying Neural Fake News via Linguistic Feature-Based Interpretation |
Year | 2022 |
Approach/Methodology | The paper conducted a feature-based study to understand the linguistic attributes most exploited by neural fake news generators. Models were trained on subsets of features and tested against increasingly advanced neural fake news to identify robust attributes. |
Datasets Used | N/A |
Results/Performance Metrics | It was found that stylistic features were the most robust in confronting neural fake news. |
Challenges | The challenge discussed was understanding how to best confront misinformation generated by advanced neural fake news models. |
Key Contribution | The main contribution was the interpretative analysis identifying stylistic features as the most robust against neural fake news. |
Title | Generalizing to the Future: Mitigating Entity Bias in Fake News Detection |
Year | 2022 |
Approach/Methodology | The paper proposed an entity debiasing framework (ENDEF) that aimed to generalize fake news detection models to future data by mitigating entity bias from a cause-effect perspective. The causal graph among entities, news contents, and news veracity was modeled, and the direct effect of entities was removed during inference to reduce bias. |
Datasets Used | English and Chinese datasets were used for offline experiments. |
Results/Performance Metrics | The framework significantly improved the performance of base fake news detectors in offline experiments, and its superiority was verified through online tests. |
Challenges | Existing methods overlooked unintended entity bias in real-world data, affecting the models' generalization ability to future data. |
Key Contribution | The main contribution was introducing the first framework to explicitly enhance the generalization ability of fake news detection models to future data by addressing entity bias. |
Title | Early Detection of Fake News with Multi-source Weak Social Supervision |
Year | 2021 |
Approach/Methodology | The paper exploited multiple weak signals from different sources of user engagements with content (referred to as weak social supervision) and their complementary utilities for detecting fake news. It used a meta-learning framework to train a fake news detector by jointly leveraging limited clean data and weak signals, estimating the quality of different weak instances. |
Datasets Used | Real-world datasets were used. |
Results/Performance Metrics | The proposed framework outperformed state-of-the-art baselines for early detection of fake news without using any user engagements during prediction. |
Challenges | State-of-the-art systems faced challenges for early detection due to the rapidly evolving nature of news events and limited annotated data. |
Key Contribution | The main contribution was the development of a framework that effectively used weak social supervision and meta-learning for early fake news detection. |
Title | Mining Dual Emotion for Fake News Detection |
Year | 2021 |
Approach/Methodology | The paper verified that dual emotion was distinctive between fake and real news and proposed Dual Emotion Features to represent this dual emotion and their relationship for fake news detection. The proposed features were designed to be compatible with existing fake news detectors for enhancement. |
Datasets Used | Three real-world datasets, one in English and two in Chinese. |
Results/Performance Metrics | The proposed feature set outperformed state-of-the-art emotional features related to the task and improved the performance of existing fake news detectors. |
Challenges | Existing methods focused only on publisher emotions and did not consider the high-arousal emotions evoked in the crowd (social emotions). |
Key Contribution | The main contribution was the introduction of Dual Emotion Features, representing both publisher and social emotions, which enhanced fake news detection models. |
Title | Fake News Detection via NLP is Vulnerable to Adversarial Attacks |
Year | 2019 |
Approach/Methodology | The paper argued that existing models focusing solely on linguistic aspects without fact-checking were prone to misclassify fact-tampering fake news and under-written real news. It highlighted the importance of combining fact-checking with linguistic analysis and proposed a crowdsourced knowledge graph as a preliminary solution for collecting timely facts. |
Datasets Used | N/A |
Results/Performance Metrics | Experiments on Fakebox, a state-of-the-art fake news detector, demonstrated the effectiveness of fact-tampering attacks and highlighted the need for improved methodologies. |
Challenges | The main challenge discussed was the risk of misclassification by models that did not incorporate fact-checking. |
Key Contribution | The key contribution was the proposal of integrating fact-checking with linguistic analysis and the introduction of a crowdsourced knowledge graph as a potential solution for enhancing fake news detection. |
Title | Improving Fake News Detection of Influential Domain via Domain-and Instance-Level Transfer |
---|---|
Year | 2022 |
Approach/Methodology | The research proposed a Domain- and Instance-level Transfer Framework for Fake News Detection (DITFEND). The methodology involved training a general model with data from all domains using a meta-learning perspective for coarse-grained domain-level knowledge transfer. Additionally, fine-grained instance-level knowledge was transferred by training a language model on the target domain to evaluate the transferability of each data instance in source domains and re-weight each instance's contribution. |
Datasets Used | Two datasets were used for offline experiments, though specific dataset names were not mentioned in the abstract. |
Results/Performance Metrics | Offline experiments demonstrated the effectiveness of DITFEND, and online experiments indicated additional improvements over the base models in a real-world scenario. |
Challenges | The abstract mentioned the 'seesaw problem' in multi-domain fake news detection, where improving performance in some domains can hurt the performance of others, resulting in unsatisfactory outcomes for specific domains. |
Key Contribution | The main contribution was the proposal of DITFEND, a framework designed to improve the performance of specific target domains by transferring both coarse-grained domain-level and fine-grained instance-level knowledge. |
Title | Domain Adaptive Fake News Detection via Reinforcement Learning |
Year | 2022 |
Approach/Methodology | The research introduced a novel reinforcement learning-based model called REinforced Adaptive Learning Fake News Detection (REAL-FND). This model incorporated auxiliary information, such as user comments and user-news interactions, and leveraged both cross-domain and within-domain knowledge to maintain robustness in a target domain, even when trained on a different source domain. |
Datasets Used | Real-world datasets were used, though specific dataset names were not mentioned in the abstract. |
Results/Performance Metrics | Extensive experiments demonstrated the effectiveness of REAL-FND, particularly when limited labeled data was available in the target domain. |
Challenges | The abstract highlighted the challenge of diverse news domains and the high cost of annotation, which made effective fake news detection non-trivial. |
Key Contribution | The main contribution was the development of the REAL-FND model, which incorporated auxiliary user-related information and utilized reinforcement learning for enhanced cross-domain and within-domain fake news detection. |
Title | Memory-Guided Multi-View Multi-Domain Fake News Detection |
Year | 2022 |
Approach/Methodology | The research proposed a Memory-guided Multi-view Multi-domain Fake News Detection Framework (M3FEND). This approach modeled news pieces from a multi-view perspective, including semantics, emotion, and style. A Domain Memory Bank was developed to enrich domain information and discover potential domain labels, while a Domain Adapter adaptively aggregated discriminative information from multiple views for multi-domain news detection. |
Datasets Used | English and Chinese datasets were used, but specific dataset names were not mentioned in the abstract. |
Results/Performance Metrics | Extensive offline experiments demonstrated the effectiveness of M3FEND, and online tests verified its superiority in practice. |
Challenges | The research identified two main challenges: domain shift (discrepancy among domains in terms of words, emotions, styles, etc.) and domain labeling incompleteness (real-world categorization that outputs a single domain label regardless of topic diversity). |
Key Contribution | The main contribution was the development of the M3FEND framework, which addressed domain shift and labeling incompleteness by leveraging a multi-view approach and a Domain Memory Bank to enhance cross-domain fake news detection. |
Title | Characterizing Multi-Domain False News and Underlying User Effects on Chinese Weibo |
Year | 2022 |
Approach/Methodology | The research investigated false news across nine domains on Weibo from 2009 to 2019, analyzing 44,728 posts published by 40,215 users and reposted over 3.4 million times. The study focused on the distribution, spread, and user characteristics associated with false news. |
Datasets Used | Newly collected Weibo data, comprising 44,728 posts in nine domains. |
Results/Performance Metrics | The study observed that false news in daily-life-related domains generated more posts but diffused less effectively than political news, which had the highest diffusion capacity. Widely diffused false news was strongly associated with certain user types and evoked strong emotional responses. |
Challenges | The study highlighted the importance of considering domain characteristics in false news detection systems. |
Key Contribution | The research provided insights into the patterns of false news diffusion across multiple domains, emphasizing the need for domain-specific approaches in false news detection and the potential for designing improved detection systems. |
Title | FuDFEND: Fuzzy-domain for Multi-domain Fake News Detection |
Year | 2022 |
Approach/Methodology | The research proposed a novel model, FuDFEND, which addressed limitations in existing models by introducing a fuzzy inference mechanism. The model utilized a neural network to simulate the fuzzy inference process, constructing a fuzzy domain label for each news item. This label was used by the feature extraction module to capture multi-domain features, which were then fed into a discriminator module to determine if the news was fake. |
Datasets Used | Weibo21, Thu dataset |
Results/Performance Metrics | The model outperformed models that used only a single domain label and demonstrated superior domain knowledge transfer to the Thu dataset, which lacked domain labels. |
Challenges | The paper addressed the problem of news items possessing features from multiple domains and the inability of previous models to transfer domain knowledge to datasets without domain labels. |
Key Contribution | The main contribution was the development of FuDFEND, which used a fuzzy inference mechanism to create fuzzy domain labels, enabling the extraction of multi-domain features and enhancing the model's performance and transferability. |
Title | MDFEND: Multi-domain Fake News Detection |
Year | 2021 |
Approach/Methodology | The research proposed a Multi-domain Fake News Detection Model (MDFEND) designed to handle multi-domain scenarios. MDFEND utilized a domain gate to aggregate multiple representations extracted by a mixture of experts. The approach was tested against a benchmark dataset specifically designed for multi-domain fake news detection. |
Datasets Used | Weibo21 (4,488 fake news and 4,640 real news across 9 domains) |
Results/Performance Metrics | The experiments demonstrated that MDFEND significantly improved the performance of multi-domain fake news detection. |
Challenges | The study highlighted the challenge of domain shift due to variations in data distributions such as word frequency and propagation patterns across different domains. |
Key Contribution | The main contribution was the development of MDFEND and the creation of the Weibo21 dataset, facilitating advancements in multi-domain fake news detection. |
Title | DAFD: Domain Adaptation Framework for Fake News Detection |
Year | 2021 |
Approach/Methodology | The study proposed a Domain Adaptation framework for Fake News Detection (DAFD), which used a dual strategy involving domain adaptation and adversarial training. The method aligned the data distribution of source and target domains during pre-training and generated adversarial examples in the embedding space during fine-tuning to enhance the model's generalization and robustness. |
Datasets Used | Real datasets (specific datasets were not mentioned) |
Results/Performance Metrics | The DAFD framework achieved the best performance compared to state-of-the-art methods for detecting fake news in new domains with limited labeled data. |
Challenges | The study addressed the challenge of limited labeled data in new domains and the need for models to adapt effectively to these scenarios. |
Key Contribution | The main contribution was the development of the DAFD framework, which improved the detection of fake news in new domains by leveraging domain adaptation and adversarial training. |