All About “COVID-17”
1. What is this?
COVID-17 explores how COVID-19 might have shifted Canadian perceptions of healthcare.
COVID-17 offers visitors the opportunity to compare news articles written during the first four months of the pandemic (January 21 to April 21) with GPT-2 generated articles. One can trace patterns over time by comparing the tonal (dis)similarities between each pair of articles. For example, by looking at how GPT-2 predicted the contents of an article with a headline discussing social distancing, one might note the AI is incapable of approaching the subject in a realistic manner – a consequence of the practice not being in the social consciousness of 2017.
The primary goal of COVID-17 is to assess the viability of using text analysis tools on artificially generated text. The output of models like GPT-2 is often overlooked. This is because the output is logically incoherent – without communicative intent. While the generated text may look coherent to a casual observer, closer inspection always reveals structural inconsistencies. However, this does not necessarily mean the output is without value. GPT-2 has reasons for choosing the words it does. Given enough examples, text analysis tools (which often ignore the structure) should be able to extract meaningful information. What sorts of information? Considering the static nature of models like GPT-2, one could conceivably treat them as time machines of public opinion. COVID-17 is an early exploration of this idea.
2. How do I use the viewer?
The viewer is made up of several components. See below for an outline of each major section.
Red: The date selector. Scroll up and down to expose the full 90-day range.
Brown: Contains the header and starter text. The “starter text” is the sub-header CBC articles typically contain. This was the prompt used for the GPT-2 generated text. Scrollable.
Blue: The two articles. Scrollable. Healthcare-related words are automatically highlighted to provide a cursory look at differences.
Purple: Various statistics. Similarity is judged by the positioning of certain words. The sentimentality value (“tone”) is judged by the use of adjectives.
Green: The chart shows the relative tone values for all articles, real and fake, over the full period. The higher the column, the more positive the article. The current date is highlighted in red. The columns are selectable!
Sources
Dai, Tianru. “News Articles.” Version 1.0, Harvard Dataverse, March 2017, https://doi.org/10.7910/DVN/GMFCTR.
Gwern. “GPT-2 Neural Network Poetry.” October 2019, https://www.gwern.net/GPT-2.
Han, Ryan. “COVID-19 News Articles Open Research Dataset.” Version 3.0, Kaggle, May 2020, https://www.kaggle.com/ryanxjhan/cbc-news-coronavirus-articles-march-26.