Academic researchers who previously had access to Twitter data are facing a deadline to delete the data they obtained unless they agree to a new contract worth $42,000 per month. This demand has been likened to “book burning” in the context of big data. Twitter had previously provided researchers with a valuable tool called the decahose, which allowed them to monitor conversations on the platform by providing a random sample of 10% of all tweets in real-time. This data was used by researchers to analyze various aspects such as the spread of disinformation, misinformation, and extremism.
However, Twitter has recently contacted researchers, asking them to pay a significantly higher fee of $42,000 per month for access to only 0.3% of all tweets. This new pricing structure has been deemed unaffordable for many researchers who previously had access for as low as a couple of hundred dollars per month. Researchers who decline the new contract are being required to delete all Twitter data stored in their systems and provide evidence of its removal within 30 days of the contract’s expiration.
While the requirement to delete data was part of the original contract signed by researchers, this change signifies a shift from Twitter’s previous openness to academic scrutiny and transparency. The contracts were initially signed under the previous Twitter regime, which valued academic research and welcomed the opportunity for scholars to study the platform. Researchers had no reason to expect that the contracts would be terminated or that they would be asked to delete previously obtained data, regardless of the contractual language.
The ramifications of this demand are significant. Ongoing research that aims to shed light on Twitter’s activities over the past few years will be severely impacted. Additionally, the transparency of the platform and the historical record of public discussions on Twitter will suffer. Researchers have expressed concerns over the negative effects on their studies into the spread of disinformation, social media manipulation, and online abuse. Furthermore, the accessibility of free tools developed by researchers, such as Hoaxy and Botometer, which rely on Twitter data, will be compromised.
While some researchers may attempt to find alternative methods to access Twitter data, such as unofficial scraping, these approaches are more challenging than official access facilitated by Twitter. It is worth noting that this change may affect only a small portion of academic research, as the decahose data was utilized by a minority of researchers. Some speculate that Twitter’s API change may be intended to target companies that utilize extensive Twitter data to train large language models like GPT-3.
Twitter has not provided an official response to the concerns raised by researchers and the academic community regarding the data access changes. The company’s press office has only offered an auto-response that provides a poop emoji when questioned on the matter.