More Likes, More Truths? (Research/Data Analysis)
Investigating Political News Popularity and Reliability through Facebook
Mores likes = More truths?
Coupling our group’s collective interest in political fake news and our generation’s obsession with viral Facebook posts, we decided to investigate the correlation between “viralness” and truthfulness of Facebook political news posts. We’re investigating it with an open data set from Kaggle on Facebook political news posts and running statistical tests as well as drawing other insights from it qualitatively. But first, let’s dive into what makes certain fake political news likable.
Qualitative Analysis of Data
September 2018 - December 2018 (3 months)
What is the correlation between popularity and factual correctness of Facebook political news posts?
Investigation into Russian Political News
The campaign, StopFake, targets and points out one common fake news tactic, which is the misrepresentation of images and quotes. (Haigh, 2018), but let’s take a look at an example. In light of recent tensions between Russia and Ukraine, we found that the Russian government has been employing such fake news tactic and more to justify their annexation of Ukraine through the Capitol One news network. With regards to Russian government’s methods, “fake news often takes the form of propaganda entertainment (kompromat), which is a combination of scandalous material, blame and denunciations, dramatic music and misleading images taken out of context” (Oates, 2014). For instance, the Russian government labelled the Ukrainian army to be “fascists”, and told the inaccurate story of a young woman fleeing from the Ukrainian army (Cottiero, 2015). Moreover, with regards to blame, news also presented “new proof of the responsibility of the Ukrainian forces for shooting down the Malaysian Airlines flight MH17” (Irina, 2016).
With regards to popularity, “the aggressive media campaign has been effective in that approximately 70 percent of Russian viewers believe that the events in Ukraine are covered by the government-owned channels truthfully and without bias” (Khaldarova 2016). Given that, we came up with our hypothesis.
We believe there’s a correlation between popularity and factual reliability of Facebook political news posts, the more trendy a post is, the less factual it is.
The dataset is titled “Fact-Checking Facebook Politics Page” and retrieved from Kaggle. It contains a wide range of articles posted on facebook that are either mainstream, right leaning, or left leaning, with truthfulness rating categories consisting of ‘no factual correctness’, ‘mostly false’, ‘mixture of true and false’, and ‘mostly true’. It also includes the number of comments, shares, and reactions that the post received until.
We decided to run a two sample t-test, so that we can compare the mean user interaction between posts that are factually correct and posts that are considered “fake”, and then determine if the difference between those means are statistically significant, and in which direction.
In order to run a t-test as accurately as possible, we had to perform some initial data clean-up by eliminating null values. If a row did contain a null value in one of the columns, it was likely to have many columns containing a null value, so simply eliminating these observations was likely to result in more accurate tests. Also, to run a two sample t-test, we decided to simplify by combining the ratings ‘no factual content’, ‘mostly false’, and ‘mixture of true and false’, into just one category of ‘Fake News’ so we can compare more easily. Lastly, we decided to normalize our data to account for the varying numbers of followers each page had. We computed an additional feature, “interaction score”, which is the sum of each article’s number of shares, number of comments, and number of interactions and divided that by the number of followers each specific page had.
Running our two sample t-test, we receive a p-value of .0000012, meaning that the probability of getting a difference in means the size that we did is less than one one-hundred-thousandth of a percent, allowing us to reject our hypothesis, or does it?
Upon first glance into the dataset, we found that most posts contained content that were “mostly true” (Figure 1). This seems to go against our hypothesis as well, however, we later found that the most popular posts contained “a mixture of true and false” or “mostly false” content (Figure 2). In addition, diving deeper, it seems that posts with “no factual content” got little exposure and popular news sites like CNN had posts with “mostly factual content” with occasional posts of “no factual content” (Figure 3). It seems that there is a high demand for this mix approach and in the direction of factual content since posts with “no factual content” got significantly less exposure than posts with “mostly true” content.
Our initial literature research on the tactics and effectiveness of Russian fake political news led us to believe that the less factually correct a news post is, the more popular it would be. However, upon further investigation supported by data, we realized that to not be the case. A mix approach involving both factual and false content seems to garner more attention, at least in the case of Facebook political news posts, leading us to reject our hypothesis. If anything, our research sheds light on the importance of interpreting information found on the web as they do not seem as simple as “factual” or “false” content, but more of a mix. We believe that further research on the specific fakes political news article techniques can help people understand the information they find on the web better and follow nicely onto our work.