-
The backend behind : python-station
-
Full data pipeline to scrape http://planetpython.org
-
Output: Every Github (Python) project featured on the history of planetpython.
-
Also includes data enrichment using Github + Reddit + Hackernews APi.
-
Download the pages from planetPython.org clone
-
Use BeautifulSoup to transform raw page into posts
-
Use Github API to get basic project data (And filter no python projects)
-
Use Praw (Reddit) + HN Api + Github Trending to enrich data
-
Show data using Github pages + Vue.js
- Clone the project
python3 -m venv ./venv && source venv/bin/activate && pip install -r requirements.txt
venv/bin/python pipeline.py --pages-to-download 5
- To download Reddit data you need to fill in your reddit creds in:
requests_utils.py
- If you get limit on your Github requests you need to fill in your Github creds in:
requests_utils.py
+-------------------+
| Download Pages |
+---------+---------+
|
+---------v---------+
|Transform to Posts |
+---------+---------+
|
+---------v---------+
|Extract projects |
+---------+---------+
|
+---------v---------+
|Enrich Using Apis |
+---------+---------+
|
+---------v----------+
|Deploy Using Github |
| Pages |
+--------------------+
Want to contribute? Great! Feel free to open PR/Issue :)
MIT - Free Software, Hell Yeah!