This project is 2 tier application namely; an ETL (Extract, Transform, Load) pipeline designed to process and load tweet data into a PostgreSQL database. The pipeline includes modules for extracting data from a JSON file, transforming it, and loading it into the database. Additionally, there is a Flask-based web tier to interact with the data.
.
├── etl-pipeline
│ ├── config.py
│ ├── db_utils.py
│ ├── etl_extract.py
│ ├── etl_transform.py
│ ├── etl_load.py
│ └── app.py
├── web-tier
│ ├── app
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── routes.py
│ │ ├── .gitignore
│ ├── run.py
├── .gitignore
├── README.md
└── requirements.txt
- Python 3.8+
- PostgreSQL
- Virtual Environment (optional but recommended)
git clone https://github.com/mugemanebertin2001/LSEP-coding-challenge.git
cd LSEP-coding-challenge
python -m venv .venv
.venv\Scripts\activate
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Ensure your PostgreSQL server is running and accessible. Modify etl-pipeline/config.py
and web-tier/config.py
with your database credentials. For example:
# config.py
DB_CONFIG = {
'host': 'localhost',
'port': '5432',
'dbname': 'yourdbname',
'user': 'yourdbuser',
'password': 'yourdbpassword'
}
python etl-pipeline/app.py
cd web-tier
python app.py
The web application will be accessible at http://127.0.0.1:5000/
.
GET /
welcome message
GET /run_etl
This end point will return success confirmation message after loading data into db.
GET /q2?user_id=<user_id>&type=<type>&phrase=<phrase>&hashtag=<hashtag>
Queries tweets based on user_id, type, phrase, and hashtag. Any of these parameters can be omitted.
-
Ensure PostgreSQL is properly installed and running on your machine.
-
Modify the JSON file path in
etl-pipeline/app.py
if necessary:file_path = os.path.join('D:', 'query2_ref.json') # Modify as needed
This project is licensed under the MIT License. See the LICENSE file for more details.
For any inquiries or support, please contact [email protected].