This repository is out of date and no longer mantained due to technical changes and the restrictive policies imposed on access to the Twitter API, introduced by its current owner, Elon Musk, in 2022.
It is unacceptable that access to the Twitter API for academic research has been blocked, which appears to be a clear violation of the European Union's Digital Services Act (DSA). This move not only undermines the rights of researchers and developers but also demonstrates Twitter's (now X.com) disregard for the rules that ensure transparency and accountability on digital platforms.
Under Musk’s leadership, Twitter has transformed into an opaque platform operating without any public oversight, enabling the unchecked proliferation of hate speech and disinformation. By ignoring European regulations, Twitter not only shows a blatant disrespect for international legal frameworks but also places itself in a legally precarious position, potentially facing sanctions.
The author of this repository has chosen to remove content that, besides being outdated, can no longer contribute meaningfully to public discourse due to the barriers imposed by the platform.
It is time to abandon Twitter. Continuing to create content for this social network is, in fact, supporting its business model—a model that perpetuates toxic practices and mechanisms that fuel disinformation, hate speech, and manipulation. Twitter has made it clear that its priority is not the public interest, but the consolidation of an information monopoly based on opacity and a lack of accountability.
Therefore, users are strongly encouraged to explore alternatives in free and open-source social networks, where control and transparency are not mere illusions but guaranteed rights. Let’s not be complicit in supporting a platform that turns its back on the common good in favor of private profit.
This is a functional example of how "Search Tweets (all)" Twitter's API Endpoint (API V2) works. To make it work, you'll need:
- Request Access to the full Twitter Archive for Academic Research
- A Bearer Token (once your request access is aproved, you'll need to create a new "project" and generate the Bearer Token).
Before using, please carefully read the documentation available on the twitter API V2. This code is not intended to be a perfect example, but it can help you better understand how the V2 Twitter API works, and how to perform queries taking advantage of the access level for researchers. There you will find answers to many of the questions you may have.
Search Tweets READ THE DOCS ¿HOW TO BUILD A QUERY?
os
json
requests
Pandas
Openyxl # For xlsx datasets
tqdm (for progress bar)
Note: To install this packages use "pip install package_name" in your termninal.
You'll need:
- Python3 installed
- Commandline interface (Windows/linux/MacOs terminals)
- Recommend PyCharm IDE or similar, with/or virtual enviroments.
Define credentials (credentials.py) Copy/Paste your Twitter API Bearer Token in credentials.py
start | start_time | end | end_time | query | capture_name |
---|---|---|---|---|---|
dd/mm/yyyy | HH:MM:SS | dd/mm/yyyy | HH:MM:SS | complete Query String | output folder |
You can define one or multiple searchs (one row per search). The script will iterate over the csv file to search tweets within parameters.
Example (Job queue)
start | start_time | end | end_time | query | capture_name |
---|---|---|---|---|---|
01/01/2010 | 00:00:00 | 16/03/2023 | 23:59:59 | from:POTUS | DM_POTUS |
01/01/2022 | 00:00:00 | 10/02/2023 | 23:59:59 | (#Ukraine OR #Ucrania) -is:retweet | ukraine |
- Fisrt job (first Row) will search all tweets made by @potus Twitter account between january 1st 2010 at 00:00:00, to march 16 20223 at 23:59:59, and save all data with .csv output in /DM_POTUS/ folder.
- Second job (second row) will search all tweets containing #Ukraine OR #Ucrania hashtags (one of them or both) and not collect retweets. And Will save all data in /ukraine/ folder
You can create multiple rows. Each row is a new capture, and it's useful to create a work queue.
capture_name Define the output folder in "capture_name" field. Each capture have it's own output folder.
In terminal:
python3 main.py
graph TD;
capture_jobs.csv -->|Step 0 Define all your capture parameters and run| main.py;
main.py -->|Step 1| query.py;
query.py -->|Create API request| Twitter_API;
Twitter_API --> |API response control|api_response;
api_response --> |create json file|json_dumper.py;
main.py -->|Step 2| json_parser.py;
json_dumper.py -->|create output folder|outputfolder;
outputfolder -->|save API response|api_response_loop_n.json;
json_parser.py -->|get all .json files|outputfolder;
json_parser.py -->|generate csv| output.csv;
Twitter API V2, and more precisely, Twitter Full-archive search for Academic Research, have a rate limit of 300 request in a 15 min window. Please, don't change sleep times between queries.
During execution, the API query, or when parsing the data, various errors can occur. The script contains the following error handling:
- Request too many requests to the API
- API Timeout
- Empty API responses
- On big extractions: Native Python3 Recursion limit to 1000 loops. Then recursion limit error. This may occur near 500k tweets retrieved.