Can I download Twitter data via web scraping for research?

I want to do a sentiment analysis using twitter data. Was thinking about hardcoding a cURL script to download data, from a Google Cloud service (I'll run the data on a neural network on the server, to label each tweet), but I have this question:

  • Am I allowed to do this? I know twitter sells the data, so I am not sure if I can get in trouble for downloading it directly (I have to disclose the data gathering methodology on the paper).

Topic web-scraping twitter

Category Data Science


Last time I checked it was not allowed to store the contents of the tweets, instead one is supposed to store the tweet id and retrieve the content of the tweet dynamically.

Afaik this is because users are allowed to delete their tweets at any time, and keeping a tweet that they chose to delete would be against Twitter terms of use (and possibly illegal in some jurisdictions). Using the tweet id solves the problem since the content will simply not be available anymore if the tweet was deleted.

Since you plan to write a paper I assume that you're in academia? If yes in case of doubt it's always safer to ask the data protection and/or legal office in your institution. In this case you're using a secondary source (i.e. you're collecting data which already exists) so it should be straightforward (I think).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.