Tracing Anti-Asian Rhetoric Trickle-Down Effects using Text Analytics

Emphasizing the importance of mindful public discourse

Photo by Jason Leung on Unsplash

From March 19, 2020 through February 28, 2021, the Stop AAPI Hate organization received 3,795 reports of hate incidents targeting the Asian community, ranging from verbal harassment to violent attacks. Recently, the New York Times scanned media reports to identify incidents where attackers explicitly expressed racial hostility towards people of Asian descent. They documented a rise in anti-Asian violence and found over 110 cases where attackers used anti-Asian sentiment in their language — nearly half make specific reference to the coronavirus.

In March 2020, Donald Trump controversially began referring to COVID-19 as a “Chinese virus.” Although he was widely criticized for using anti-Asian rhetoric by the media, world leaders, and the World Health Organization, he continued to use the phrase. Many believe his messaging led to the rise in violence against Asian Americans. To understand how we got here, it is critical that we examine how information spread early on in the pandemic, tracing the trickle-down effects of anti-Asian rhetoric used in public political discourse.

To that end, I am sharing results from an exploratory text analysis completed in April 2020. This work investigated the spread of information from the Trump administration to the media during the first few months of the COVID-19 pandemic, visualizing the amplification of certain topics and terms. Public discourse is often used with intent to inform and shape public opinion. In turn, public opinion is integrated with cultural beliefs to influence collective and individual action. Consequently, responsible public discourse is critical and we can study the historical flow of information to understand the effects of past communication choices. With a lens on communication from the Trump administration and the media, this analysis aims to:

1. Characterize how the Trump administration influenced the conservative and liberal media during the COVID-19 pandemic.

2. Build on this to determine how anti-Asian rhetoric spread from the Trump administration to the media and, ultimately, to the public during the COVID-19 pandemic.

Text data was collected from White House briefings and news articles published in the public domain from December 2019 through April 24, 2020. To capture polarity in media response, news articles were pulled from two opposing news outlets, left-leaning CNN and right-leaning Fox News. For each news outlet, articles within the date range were found using identical search terms about COVID-19. In total, the text corpus consisted of 179 White House briefing transcripts, 5626 CNN news articles, and 6980 Fox News articles. The figure below highlights the massive volume of COVID-19 information pushed out by the media compared to the White House, illustrating their potential to amplify information flowing out of the Trump administration.

Data volume per source per week during the study period, figure by author

To understand the types of information presented during this period from each source, I used scikit-learn to implement Latent Dirichlet Allocation (LDA). LDA is a generative mixture model developed by David M. Blei, Andrew Y. Ng, and Michael I. Jordan. It takes an unstructured set of texts as input and finds a set of topics that characterizes the entire set of input texts. More technically, a topic is a probabilistic distribution over the entire vocabulary in a text corpus. From the topics extracted by LDA, someone familiar with the text data can examine the most frequent words in each topic to create a descriptive label. For this study, I applied LDA to the text corpus and manually labeled each topic based on its top 10 words and then further grouped each topic into 8 broader themes.

Next, I used TFIDF sum scores to visualize how topic discussion differed between sources over time. TFIDF, short for Term Frequency-Inverse Document Frequency, is used to quantify the relative significance of a term in a text. It is calculated by normalizing the frequency of a word in a text by its frequency across all the texts in a corpus. This is useful in practice because common words like “the” and “of” occur frequently across all texts but hold little explanatory value in distinguishing what each text is about. These words receive low TFIDF scores while words that do have distinguishing power receive high scores. To determine term significance across many texts, we can aggregate the TFIDF values by taking the sum across all texts. The figure below depicts the aggregate TFIDF values of the top ten words in each topic per source published on each day.

TFIDF sum values of all extracted topics grouped by theme per day, figure by author

Given that CNN and Fox News are known for presenting very different viewpoints, it was striking to see in the plot above that they reported on the same topics at the same time. This suggests that the media outlets adapt their coverage to more or less cover the same topics as their competition. The evenly spaced dips show less coverage in general on weekends. The plot also indicates that the White House briefings covered topics less consistently, suggesting more ad-hoc information dissemination practices. Lastly, temporal trends from both media outlets show that health and social topics became increasingly important to report on as shutdowns were imposed and the gravity of the pandemic began sinking in.

Here, aggregate TFIDF scores were used again to examine the trickle-down effect of the Trump administration’s usage of the terms “foreign”, “Europe”, “China” and “Chinese”. The figure below shows the aggregate TFIDF scores per day from each source. While TFIDF can help us visualize what is being discussed, it does not reveal how terms are used. To get a better sense of the context in which a term appeared, I used spikes in TFIDF scores to guide further examination of pertinent texts.

Visualizing the fluctuation in term significance over time, with quotes from specific texts. Links to the full news articles referenced can be found in-text below. Figure by author

From the figure above, we can observe that the media started using the words “China” and “Chinese” more often in late January of 2020 when the coronavirus began spreading in Wuhan, China. Then, we see a spike in news coverage mentioning China when Donald Trump implements the China Travel Ban on January 31st. He controversially referred to COVID-19 as a “foreign virus” on March 11 when he announced the Europe Travel Ban. The next day, we see a spike in usage of the terms “Europe” and “foreign” by both media outlets as they respond to the March 11 press conference. Then, after Donald Trump started referring to COVID-19 as the “Chinese virus” or the “China virus”, we see a spike in term usage by media outlets again as they respond.

While these spikes are informative of the fluctuation of term importance over time, we must use the dates associated with the spikes to refer back to the full texts to understand how these terms were used by each media source. Upon further investigation, we see that Fox and CNN use these terms very differently. For instance, Ben Shapiro and Tucker Carlson, conservative political commentators associated with Fox News, both defended Trump’s use of the terms “foreign virus” and “Chinese virus”, dismissing concerns of the harm that such xenophobic language could cause. In contrast, CNN focused on criticizing Trump’s usage of the term, reporting that xenophobic language during crises has historically led to increased racism and violence. These observations suggest that some in the right-wing media amplified support for anti-Asian rhetoric.

Research has shown that public political discourse and media narratives using xenophobic language amplify existing divisions and negative stereotypes. A study conducted by Georgia Tech analyzed anti-Asian rhetoric in social media. Their findings suggest that exposure to anti-Asian hate speech was practically contagious, leading to a ripple effect of more anti-Asian hate speech. In summary, it is critical that politicians and media organizations (and anyone else for that matter) speak and write responsibly with a mindful approach.

As a Chinese-Vietnamese American woman, recent events hit close to home. I stand with fellow Asian Americans and advocates demanding a stop to the hate and violence. As the nation continues to deal with the ramifications partly stemming from anti-Asian rhetoric during the pandemic, I hope that people with the power to influence public opinion craft their speech responsibly as each utterance will ripple outward through filters beyond their control.

Passionate about applying machine learning creatively and artistically. Writing about exploratory work. (she/her/hers) Twitter: @StocasiaAI

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store