We note that the initial foundation for this research is established in ref. 21, which analysed the news media diffusion dynamics on Twitter during the 2016 US presidential election. We harness part of the data used in that article and follow its relevant methodology to identify and classify influencers in the 2020 US election data. Additionally, following an editorial request added to the reviews of this article, we anonymized all Twitter usernames of personal accounts in both the main manuscript and the Supplementary Materials. Specifically, if the username being presented does not represent an established major news organization that is verified on Twitter, that username is replaced with an alias. This alias consists of two parts: affiliation and year of relevance. A user’s affiliation can be with the media, US politics or personal (see the News media influencers section for more information on how we define affiliations). The personal affiliation is also split into ‘individual’ and ‘other’ labels, with the former representing no official affiliation with media or politics, and the latter representing a lack of information required to make a distinction. All affiliation labels are shortened to their first five letters in the alias. Year of relevance is determined as being in the top 100 list of influencers for 2016, 2020 or both. See the Twitter retweet networks section for more details on influencers and our influencer identification algorithm. So, a politically affiliated user that was influential only in 2016 will have an alias of ‘Polit_2016’.
News media on Twitter in 2016 and 2020
We tracked the spread of political news on Twitter in 2016 and 2020 by analysing two datasets containing tweets posted between 1 June and election day (8 November in 2016 and 2 November in 2020). The data were collected continuously using the Twitter search API with the names of the two presidential candidates in each of the presidential elections in 2016 and 2020 as keywords. Using more keywords targeting specific media outlets or hashtags concerning specific news events could miss election-related tweets that did not contain references to the list of outlets or events.
The 2016 dataset contains 171 million tweets sent by 11 million users and was used in refs. 13,21 to assess the influence of disinformation on Twitter in 2016. The 2020 dataset contains 702 million tweets sent by 20 million users. Hence, we observe a near doubling of the number of Twitter users involved in spreading political news in 2020 compared with 2016.
At the time we collected our data, the statistical analyses of the raw collected data were limited because the data collection process designed by Twitter itself has been shown to have sampling issues. For instance, the probability of non-responses from API queries is not provided by Twitter, and Twitter has acknowledged that the 100% firehose is not actually a 100% sample, the 10% is not a randomly distributed 10% and the 1% is not a randomly distributed 1%. Thus, standard sampling methods are difficult to apply to the collected Twitter data. However, for the goals of our article, this is our best option as there are no other large-scale, comprehensive datasets available for both the 2016 and 2020 US elections that are readily accessible to us.
The classifications of news media websites presented below and used here, including ‘fake’, ‘extremely biased’, ‘left’ and ‘right’, and especially the boundaries between categories, are a matter of opinion rather than a statement of fact. We use terms ‘left’ and ‘right’ for political leanings that are often referred to as ‘liberal’ and ‘conservative’ on the US political ideology spectrum. The categorizations and labels assigned to the corresponding classes and used here originated in publicly available datasets from fact-checking and bias rating organizations, which are credited below. The classifications of political views and the related conclusions contained in this article should not be interpreted as representing opinions of the authors or their funders.
For each tweet containing a URL link, we extracted the domain name of the URL (for example, www.cnn.com) and classified each link directing to a news media outlet according to this outlet’s political bias. The 2016 and 2020 classifications rely on the website allsides.com (AS), followed by the bias classification from the website mediabiasfactcheck.com (MBFC) for outlets not listed in AS (both accessed on 7 January 2021 for the 2020 classification). We classified URL links for outlets that mostly conform to professional standards of fact-based journalism in five news media categories: right, right leaning, centre, left leaning and left. We also include three additional news media categories to include outlets that tend to disseminate disinformation: extreme bias right, extreme bias left and fake news. Websites in the fake news category have been flagged by fact-checking organizations as spreading fabricated news or conspiracy theories, while websites in the extremely biased category have been flagged for reporting controversial information that distorts facts and may rely on propaganda, decontextualized information or opinions misrepresented as facts. A detailed explanation of the methodologies used by AS and MBFC for rating news outlets and of the differences in classification between 2016 and 2020 is given in the Methods. The full lists of outlets in each category in 2016 and 2020 are given in Supplementary Tables 1 and 2. In the 2016 dataset, 30.7 million tweets, sent by 2.3 million users, contain a URL directed to a media outlet website. The 2020 dataset contained 72.7 million tweets with news links sent by 3.7 million users. This number reveals a drop in the fraction of tweets flowing from users that propagate news media links, from 18% in 2016 to 10% in 2020.
The proportions of tweets and users who sent a tweet in each of the news media categories are shown in Fig. 1a,b along with other statistics about the activity of users in each category. The raw numbers used to generate this figure are shown in Supplementary Table 3. Importantly, they demonstrate that the fraction of tweets in the fake and extremely biased category (representing outlets that were most susceptible to sharing disinformation) decreased from 10% to 6% for fake news and from 13% to 6% for extreme bias right news. The fraction of users who shared those tweets also decreased for extreme bias right news (from 6 to 3%) but not for fake news (which remained at 3%). However, the total number of tweets and users increased over the same period by 411 and 80%, respectively. In short, between 2016 and 2020, the numbers of tweets and users grew at a rate in the range of 80 to 246% for all categories, except the number of users who shared extreme bias right news, which declined by 10%.
a,b, The fraction of tweets (a) and users (b) that sent tweets with a URL pointing to a website belonging to one of the categories. Solid coloured bars show fractions for the 2016 election, while striped bars represent the corresponding fractions from the 2020 election. Users are classified as being in the category in which they posted the most links. c,d, The fractions of links across categories as a function of the users’ main categories, for those users that have at least two links classified, in 2016 (c) and 2020 (d).
The fraction of tweets in the extreme bias left category was only 2% in 2016 and it dropped to a mere 0.05% in 2020. The number of tweets in this category also dropped. The fraction of tweets in the centre category also decreased, from 21 to 10%, but the number of tweets increased dramatically. By contrast, the fraction of left-leaning tweets increased from 24 to 45%, while the fraction of right-leaning tweets increased from 3 to 6%.
The shift away from the centre may indicate the increasing ideological polarization, both among users and media outlets. However, most of the decrease in the fraction of centre media outlets reflects the shift of cnn.com, because it was categorized by AS as centre in 2016 and as left leaning in 2020. CNN accounted for more than twice the number of tweets in 2020 compared to the top outlet of the centre category in that year (thehill.com) (Supplementary Table 2).
Figure 1c,d shows the fraction of URLs for all categories as a function of a user’s modal category for users that posted at least two links in our datasets. The analysis reveals two clusters in 2016 and 2020, one with categories from the right (right leaning, right, fake news and extreme bias) and a second cluster with categories from the centre and left (centre, left leaning and left). These two clusters can be interpreted as two echo chambers in terms of a separation in news consumption. Asymmetrical patterns in Fig. 1c,d above and under the diagonal reveal that users in the right-wing echo chamber also link to an extremely limited number of left-wing media outlets. The users in the left-wing echo chamber link to right-wing media in an even more limited way. This is consistent with asymmetry between left-leaning and right-leaning users in social media observed in previous studies21,25,35,45.
To estimate the volume of tweets sent from automated accounts such as bots, we counted the number of tweets sent from unofficial Twitter clients, such as Twitter clients other than the Twitter Web client, Android client, iPhone client or other official clients. Unofficial Twitter clients include those who are using a variety of different applications used to automate all or part of an account activity, such as third-party applications used typically by brands and professionals (for example, SocialFlow or Hootsuite) or bots created with malicious intentions13. There is no fast and precise method for bot detection, and the sheer size of our datasets prevented us from using complicated methods, which often use natural language processing, machine learning classifiers and similar techniques46. Filtering through unofficial clients provides a simple alternative that meets the baseline requirements for our analyses. Accounts from unofficial clients are only removed during our polarization analysis presented later in the article. For all other analyses, these accounts are kept, as they impact the patterns of information diffusion that we are analysing.
The overall fraction of tweets sent from unofficial clients was 8% in 2016, but this had dropped to 1% in 2020. A similar drop over the same period was observed in the average activity of their users (Supplementary Table 3). This decrease, and the proportional decrease of extremely biased and fake news, could be attributed in part to measures taken by Twitter to limit the virality of disinformation. As mentioned above, the relative volume of tweets linking to disinformation websites dropped by a half in 2020 compared to 2016, and the fraction of users sharing fake news decreased even more substantially (Fig. 1a,b and Supplementary Table 3).
To understand how users shifted between categories from 2016 to 2020, we track users that were active during both election years (14% of the users present in 2020) and we classified each of them into the category in which they posted the most tweets in each year. Figure 2 shows the resulting shifts. The two largest shifts are among users in the centre and left news category in 2016 that AS rating shifted to the left-leaning category in 2020. This made the left-leaning category the largest in 2020, by shifting the three most widely shared news outlets: The New York Times, Washington Post and CNN (Supplementary Table 2). We also observe a large fraction of users in the fake and extremely biased news category in 2016 that moved to the right news category in 2020. However, these user shifts also reflect the change in the classification of media outlets from 2016 to 2020. We infer the ideological position of Twitter users without relying on the news outlet classification (see the subsection Polarization among Twitter users) and show that the resulting positions are highly correlated with the user’s positions computed using the news categories in which they posted.
The relative size of each category in 2016 corresponds to the ratio of the numbers of unique users in this category to the left category (Fig. 1). The shifts among categories over time are proportional to the fraction of users that were classified in 2016 and in 2020 in the two involved categories. Overall, 14% of the users present in 2020 persisted from the 2016 dataset.
To capture the dynamics of information diffusion, we reconstruct retweet networks corresponding to each news media category. We add a link (a directed edge) going from node v to node u in the news network when user u retweets the tweets of user v ≠ u containing a URL linking to a website belonging to one of the news media categories. Only one such link is created regardless of the number of tweets retweeted by u. With our convention, the direction of the link represents the direction of the influence propagation between Twitter users.
A 2015 study by Metaxas et al.47 found that ‘retweeting indicates not only interest in a message, but also trust in the message and the originator, and agreement with the message contents’. Although the retweeting does not explicitly represent support of the retweeted content (since a user who almost always retweets CNN might occasionally retweet Fox News), in a retweet the user cannot remark about reasons for propagating this content to others, while the alternatives of quoting or replying do allow users to remark and so are much more suitable for non-supporting forwarding of news. Accordingly, we assume that most users agree with and are influenced by the information they are propagating through retweets.
Within a network, the in-degree of a node is the number of links that point inward to the node and the out-degree is the number of links that originate at a node and point outward to other nodes. A retweet originates at the node that posted the original tweet, not at the node that posts the retweet (indicating the flow of influence in the direction of the retweeter). Thus, for our retweet networks, the in-degree of a user is equal to the number of users they retweeted at least once and their out-degree is the number of users who have retweeted them at least once. The higher a node’s out-degree, the greater its local influence. The characteristics of the retweet networks are shown in Supplementary Table 4.
We then use an algorithm to find the best spreaders of news media information within each network, that is, the influencers of the corresponding news media category. An alternative of finding the ‘most influential users overall’ through extracting influencers from the retweet networks of all users would result in a list of influencers dominated by left-leaning and centre-biased influencers while other news media bias categories would be underrepresented (see Supplementary Fig. 1, which shows the top overall influencers and their political alignments for both 2016 and 2020). This imbalance would understate the impact of these influencers on polarization between the two election years. Hence, we extract the top influencers from the retweet networks of each news media category to present an accurate representation of critical influencers from the different news media categories. As mentioned earlier, our work builds on and uses some of the results of the 2016 US election from ref. 21, which identifies influencers using the Collective Influence (CI) approach48. To ensure consistency of results, we too use CI to find influencers in the 2020 data.
News media influencers
The CI heuristic identifies and ranks influencers in 2016 and 2020 datasets and assigns to each influencer a value CIout that represents the strength of influence it exerts. The influencers identified from these networks only pertain to Twitter accounts who disseminate content by providing links to external sources. We compare the rankings of the influencers extracted from the 2020 network with the rankings of the influencers previously extracted by CI from the 2016 network21. A selection of 87 influencers (limited to officially recognized major news organizations that are verified on Twitter as per our disclaimer at the beginning of this section) and their rankings with their corresponding news media categories are shown in Supplementary Table 5.
For the remainder of the article, we use the top influencers extracted from the fake, extreme bias right, right, right-leaning, centre, left-leaning and left news media categories. However, we do not include any influencers from the extreme bias left news media category. According to Supplementary Table 4, this category is sparse and disconnected, with very few users compared to the users’ populations in the other networks. Our goal is to extract influencers that are highly relevant to the dissemination of information on Twitter across the different news media categories. However, we find that the influencers extracted from the extreme bias left category have an extremely low standing in the Twitter community compared to influencers extracted from other categories. For example, the 25th most influential user of the extreme bias left category has about 100 followers, while the 25th most influential user of the left category has over one million. Hence, keeping the extreme bias left category exaggerates the importance of these influencers and diminishes the importance of influencers in other categories. Consequently, we exclude the extreme bias left category from the analyses that follow.
Analysis of our lists of the top influencers in 2016 reveals that traditional news influencers were mostly journalists with verified Twitter accounts linked to traditional news media outlets. By contrast, fake and extremely biased news also contains influencers whose accounts are unverified or deleted, with deceptive profiles and much shorter lifespans on Twitter than traditional media influencers (see Supplementary Figs. 2 and 3 and supporting data in Supplementary Table 6 for details on the proportional shift of users to and from inactivity between election years). However, some of these influencers, despite their unknown, non-public nature, still played an important role in the diffusion of disinformation and information on Twitter21. There has been a substantial increase in deleted influencer accounts spreading fake news, from two in the top 25 in 2016 to eight in 2020. Also, the extreme bias right news, which in 2020 consisted primarily of verified influencers, grew from 15 in the top 25 in 2016 to 23 in 2020. We also found that among the top 100 influencers from each news media category in 2020, there was a 29% retention rate of influencers persisting from 2016. Furthermore, for the top 25 influencers from each of these categories, we find the retention rate to be 36%. Meanwhile, as noted earlier, the rate of retention between 2016 and 2020 for the average 2016 user was 14%. The increase in retention rate between the average user and the top 25 influencers is 157%, indicating that the more influential a user, the higher their retention rate.
Using a manual labelling process (see Methods for details), we label the top 25 influencers of each news media category in 2016 and 2020 as affiliated with media or political organizations, or unaffiliated, to observe the makeup of influencer types for these labels. Here, we define an influencer’s ‘affiliation’ with media or politics as their primary job, or other direct connection from which they received periodic financial support. Or, if the influencer is an organizational entity, this classification indicates that this is a legally recognized company. Subsequently, an affiliation indicates if the influencer is either a professional or a legal company outside Twitter.
An influencer affiliated with a media organization could be a media company or official media outlet, or an established writer, reporter or paid consultant. An influencer affiliated with a political party could be a politician, a political campaign platform or an affiliate of the platform, or someone who officially represents an aspect of US politics. We also split the unaffiliated label into two subclassifications: independent and ‘other’. An independent influencer is not officially affiliated with any media or political platforms. The ‘other’ label represents influencers whose accounts have no description or context that could be used to identify them. We generalized these affiliation labels to capture a variation of affiliations to media and politics. It also prevents overcategorization of influencers or the creation of categorization exceptions.
The fractions of influencers within these affiliation labels are shown in Fig. 3. The results reveal that unaffiliated influencers are more common in the fake and extreme bias news media categories, while affiliated influencers are more common in the other news categories. A similar trend is evident in the fractions of verified and unverified influencers found in these categories, as fake and extreme bias news categories contain fewer verified influencers. In addition, media-affiliated influencers have a greater presence in the left, left-leaning and centre news categories compared with their counterparts.
Influencers are classified as affiliated with a media organization, political organization, independent or other (for example, unidentified).
Interestingly, the number of media-affiliated influencers within most of the news media categories decreased from 2016 to 2020. The exceptions are the extreme bias right and fake news categories, in which the number of media-affiliated influencers increased. Also, the extreme bias right category had increased numbers of politically affiliated influencers. This indicates a shift in polarization of influencers affiliated with right-biased political and media organizations towards the extreme bias right and fake news, as well as the emergence of news media-affiliated influencers in these categories. We discuss these changes in polarization in more detail below.
In addition to changes in affiliations from 2016 to 2020, we observe a substantial reshuffle of the ranking of influencers. Figure 4 shows the change in rankings of the top 10 influencers in left and left-leaning, right and right-leaning, and extreme bias right and fake news categories. The ranking reshuffle in the centre news category is shown in Supplementary Fig. 4.
Influencers ranked in the top 10 in at least one news media category in 2016 or 2020 are shown. The 2016 rankings are displayed to the left of the username or alias, with 2020 rankings listed on the right. For each user only one shift is shown. Its colour changes from the user’s highest ranked news media category in 2016 to that in 2020. Each panel shows the change over time between two news media categories.
The comparison reveals several interesting changes between 2016 and 2020. First, we see that highly influential users rise from obscurity. Across all categories, a set of previously unranked or very low-ranked users break into the top 10 rankings. Considering all unique users in the top 25 influential users (from all categories of news media), 58% came from outside the top 100 influential users in 2016. However, most of these newly influential users are related in some way to media or political organizations, while 28% of these new influencers are independent.
Observing the change in rankings by news media category, we see that right and right-leaning, and extreme bias right and fake news categories have a substantally higher fraction of the top 10 influencers who were previously outside the top 50, compared with the change in rankings among the groups in left and left-leaning news categories. All categories show a large number of influencers falling out of the top 50 from 2016 to 2020, and in the case of the left news influencers, we see their former positions filled by users who were much less influential in 2016. The influencers with extreme bias right and fake news affiliations show the most volatility with regards to retaining the top 10 influencer positions, with many top 10 influencers in 2016 ranked below 50 in 2020 (or even banned from Twitter).
The change of classification of some news media outlets is also reflected in the category shifts of their Twitter accounts. In particular, the first- and third-highest ranked influencers in the centre category (@CNN and @politico) in 2016 shifted to left leaning. Such shifts of large and influential media influencers from news categories indicate the increased content polarization on Twitter. A shift of media-affiliated influencers from the right to the extreme bias right is also visible (for example, @DailyMail and @JudicialWatch), as is the emergence of new media-affiliated influencers in these categories (for example, @newsmax and @OANN). In contrast with the shift to the extremes among large media influencers, the centre rankings remained mostly consistent between 2016 and 2020 (Supplementary Fig. 4). Some new users rose from low ranks to fill in the gaps, including the winner of the 2020 US presidential election, but only one user dropped out of the top 50 entirely, and the remaining shifts are internal to these top-ranked users.
Polarization among Twitter users
The evolution of influencers from different news media categories (Figs. 1 and 2) suggests an increased polarization in the relations among influencers between 2016 and 2020. Here we broaden the scope of polarization analysis to the Twitter users who are consuming and retweeting the influencers’ content. For the 2016 and 2020 data, we consider a set of the top 100 influencers from each news media category. To avoid polarization changes caused by the varying composition of the set of influencers, we filter these sets to contain only influencers that were present in both the 2016 and 2020 CI rankings. For 2016 and 2020, the final set sizes are 505 and 548 influencers, respectively. For this analysis we use all the retweets in our datasets, not only those containing a link to a news outlet, but remove the retweets sent from unofficial Twitter clients.
With influencers as nodes, we create two fully connected similarity networks derived from the 2016 and 2020 Twitter networks, respectively. The weight of an edge between any two influencers in these networks represents the similarity between the retweeters that propagate the content of these influencers (see Methods for more details). Any edges with a weight of 0 are removed. A distribution of the similarity values for both networks, as well as their degree distributions, are shown in Supplementary Fig. 5a,c. In both similarity networks, a community detection algorithm found two communities. One contained influencers affiliated with news media in the centre, left-leaning and left news categories, while the other contained those affiliated with news media in the right-leaning, right and fake news categories. This indicates that influencers separate their user bases according to the content they generate.
Figure 5 illustrates this separation, showing subsampled similarity networks of the 25 most influential nodes for each news media category for 2016 (left panel) and 2020 (right panel). Using force-directed network layouts driven by the weighted similarity edges, we visualize for both years the two formed communities, one consisting of the right-biased and fake news influencers and the other the left-biased influencers. These communities form an echo chamber motif like the one seen in Fig. 1c,d for the analysis of the fractions of URLs in all news media categories.
Node size is proportional to its degree in their respective network. Node colour indicates which news media category the node spreads. Nodes that spread information from more than one category are represented as pie charts, where the size of each slice is proportional to their CI score within that respective news media category. An edge between each pair of influencers is weighted by the similarity between the retweeters of those influencers. Both networks are visualized using a force-directed node layout, with the strength of the force defined by the weights of the edges. Since these are complete networks, they are sparsified for visualization purposes, with each node only having up to their five strongest edges visible. Each of the two networks has 405 visible edges in total. Visible intercommunity edges are coloured purple, while intracommunity edges are orange. The distribution of the similarity values, and the degree distributions, for the visible edges of these two networks are shown in Supplementary Fig. 5b,d. Text to the side of each of the two networks shows the top five users of each news media category. A green number to the left of each user corresponds to a labelled node in the network, showing the location of that top influencer. The purple numbers in the 2020 tables indicate the user’s 2016 rank in that category. Users ranked in the top 25 for multiple news media categories have coloured superscripts, indicating the rank and media classification of their other top five positions.
Visually, Fig. 5 suggests a loss of intercommunity connections and increased density of intracommunity links from 2016 to 2020. We probe if these changes are reflected in the communities arising in the two main similarity networks containing the nodes from the influencer sets with the top 100 influencers of each news media category. We found that the trends persist, with community separation in the 2020 network increasing compared to the separation of communities in the 2016 network, as measured by modularity and the normalized cut between communities (see Methods for details).
The modularity for the 2020 network was 0.465 with a 95% confidence interval (CI95) of (0.454, 0.475), versus 0.401 with a CI95 of (0.392, 0.409) in 2016, indicating more closely knit communities in 2020, with stronger in-community ties and weaker between-community ties. This trend agrees with the changes of the average normalized cut, which decreased from 0.285 with a CI95 of (0.232, 0.339) in 2016 to 0.052 with a CI95 of (0.046, 0.058) in 2020. Both results show a much stronger separation of the two clusters in the later election and suggest a fundamental shift in retweeting behaviour. Between 2016 and 2020, users became even more likely to disseminate content from influencers with similar biases and less likely to spread content from influencers with opposing biases, effectively reducing cross-bias encounters and discourse. In addition, we also computed the above metrics on networks generated from user quote similarity to confirm that retweets are the strongest form of endorsement of influencer content (Supplementary Table 7). We also report the modularity and normalized cut for the subsampled networks of Fig. 5 in Supplementary Table 8, which reinforces the trend observed in the results above.
To further quantify and compare the changes in user behaviour and, subsequently, in user polarization, we infer the ideology of Twitter users based on the ideological alignment of political actors whom these users follow29,49. The bipartite network of followers is then projected on a one-dimensional scale using correspondence analysis50,51, which applies singular value decomposition of the adjacency matrix, standardized to account for the differences in popularity and activity of the influencers and their followers (see Methods for details). Two users are close on the resulting latent ideology scale if they follow similar influencers. This method has been shown to produce ideological estimates of the members of the US Congress that highly correlated with ideological estimates based on roll call voting similarity such as DW-NOMINATE49.
For 2016 and 2020, the data for the analysis consists of the top 100 influencers of each news media category, as used in the previous polarization analysis, and the sets of users that retweeted at least three different influencers (considering all tweets in our datasets, not only the ones with URLs). As discussed earlier, we interpret retweeting as an endorsement of the content being retweeted. Twitter offers other types of interactions, allowing users to comment on the content, such as quote tweets and replies. The ratio of quotes to retweets of users to influencers was very stable and small (<5%) in 2016 and 2020, for users on both the left and right sides of the latent ideology (Supplementary Table 9A), which motivated our focus on retweets to infer the ideology of users. The ratio of quotes to retweets from users of one side of the ideology spectrum to influencers of the other side increased from 2016 to 2020, indicating an increased usage of quotes to comment on tweets from influencers of the opposite side. However, the overall usage of quotes over retweets remained small (Supplementary Table 9B). We extract the coordinates of each user on the first dimension of the results of the correspondence analysis applied to the weighted network of retweets between the users and the influencers (see Methods for the details and robustness checks that we performed). Finally, for 2016 and 2020, the coordinates of all users are standardized to a mean of zero and a standard deviation of one. Two users are close together on the latent ideology scale if they tend to retweet similar influencers. The influencers’ latent ideological positions are then computed as the median of their retweeters’ positions.
Figure 6 shows the result of this analysis. The distribution of ideology positions of the users and of the influencers, displayed in green and purple, respectively, shows that polarization increased between 2016 and 2020. This is confirmed by a Hartigans’ dip test (HDT) for unimodality, which measures multimodality in a sample by the maximum difference, over all sample points, between the empirical distribution function and the unimodal distribution function that minimizes that maximum difference52. For the user distribution, the test statistic is D = 0.11074 with a CI95 of (0.11038, 0.1112) in 2016. In 2020, we have D = 0.14751 with a CI95 of (0.1471, 0.1477). For the influencer distribution, the test statistics are D = 0.18328 with a CI95 of (0.1672, 0.195) in 2016 and D = 0.23251 with a CI95 of (0.206, 0.238) in 2020. All tests reject the null hypothesis of a unimodal distribution with P < 2.2 × 10−16 and the 95% confidence intervals are computed from 1,000 bootstrap samples using the bias-corrected and accelerated method. Increasing values of the test statistic indicates distributions that increasingly deviate from a unimodal distribution, corroborating the growing division found in the similarity networks.
Top, the latent ideology of the top five influencers of each category is shown as a box plot representing the distribution of the ideology of the users who retweeted them. Bottom, the distributions for the users are shown in green and the distributions for the top 100 influencers of each news media category (computed as the median of the ideology of their retweeters) are displayed in purple. Box plots indicate the median and the 25th and 75th percentiles of the distributions with whiskers indicating the 5th and 95th percentiles. The sample size used for the computation of each box plot is reported to their side. Pie charts next to the influencers’ names represent the news categories to which they belong (weighted by their respective CI ranks in each category).
To resolve whether the increase in polarization is caused by the arrival of new users and influencers in 2020, we repeat the analysis including (1) only users (shown in Supplementary Fig. 6), (2) only influencers (Supplementary Fig. 7) and (3) both users and influencers (Supplementary Fig. 8) that were active during both elections. In all three cases we observe an increase of the HDT statistics (see Supplementary Fig. 9 and Supplementary Table 10) that means that the change in behaviour of the users in selecting whom to send their retweets contributes to the polarization increase. The largest increase in HDT for the user distribution arises when all users from years 2016 and 2020 and only influencers that were present during both years are considered (+0.08). This setting also corresponds to the smallest increase of the dip test of the influencer distribution (+0.01, within CI95), suggesting that the new influencers of 2020 have more polarized ideologies than the influencers who continued from 2016 to 2020. Thus, we conclude that the increased polarization of the users is caused, to a substantial extent, by the arrival and departure of users between elections (Supplementary Fig. 9 and Supplementary Table 10).
Figure 6 reveals a clear increase in polarization of the users and influencers in 2020 compared to 2016 and an alignment of their latent ideologies in two distinct groups, mirroring the news media classification groupings seen in Fig. 5 and Supplementary Fig. 6. This echo chamber behaviour for users became more concentrated in 2020, with two clearly opposite poles that had fewer influencers having a user base bridging opposite ideologies.
These results also independently confirm the shift of news outlets and influencers from the centre to the right and left observed using the news media classifications by external sources. Indeed, we find an extremely high correlation (above 0.90 for 2016 and 2020) between the users’ latent ideology position and their left- or right-leaning distribution computed using the news media categories in which they posted (see Methods for details). This high correlation indicates that the shift in bias observed at the level of the media outlets is also present at the level of the users’ retweeting pattern and serves as an independent validation of the media outlet classification.