Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic scraping #17

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# This workflow will install Python dependencies, run all scraper files and then
# push the files to the data directory.

name: auto scraper

on:
pull_request:
branches: [ main ]

jobs:
build:

runs-on: ubuntu-latest

steps:
- name: checkout repo content
uses: actions/checkout@v2
- name: setup python
uses: actions/setup-python@v2
with:
python-version: "3.10"
- name: install python packages
run: |
python -m pip install --upgrade pip
pip install flake8 pytest requests bs4 html5lib
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: execute py script
run: for f in src/*.py; do python "$f"; done

- name: push to data directory
id: push_directory
uses: cpina/github-action-push-to-another-repository@main
env:
API_TOKEN_GITHUB: ${{ secrets.API_TOKEN_GITHUB }}
with:
source-directory: src
destination-github-username: 'rpi-crisis'
destination-repository-name: 'data'
user-email: [email protected]
commit-message: Data scraped from the various scrapers
target-branch: main
- name: Test get variable exported by push-to-data-directory
run: echo $DESTINATION_CLONED_DIRECTORY
1 change: 1 addition & 0 deletions catalog.rpi.edu/fall_calendar.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"August 6": ["Fall tuition and fees due."], "August 27": ["Official date of August graduation; diplomas mailed to students after final clearance is completed in September. Degree recipients may take part in the May 2022Commencement ceremony.", "Residence dining halls open with dinner. Residence halls and apartments open for upperclass and new graduate students."], "August 30": ["Fall 2021 semester classes begin."], "September 6": ["Labor Day - no classes."], "September 7": ["Classes resume. Follow a Monday schedule."], "September 13": ["Last day for graduate and undergraduate students to add courses, change sections or to put courses on audit. Deadline for completion of NE/Igrade expectations related toSpring 2021courses."], "September 17": ["Nomination of Masters Thesis Committee forms due to the Office of Graduate Education for December graduates."], "October 8": ["Last day to file an online degree application via SIS for December 31, 2021graduation."], "October 8 - October 9": ["Reunion andHomecoming 2021."], "October 11": ["Columbus Day - no classes."], "October 12": ["Classes resume."], "October 22": ["Last day for undergraduate and graduate students to drop a course."], "October 25- November 5": ["Consultation weeks.Advisement for Spring 2022registration. Students should consult with their faculty advisers."], "October 29": ["Doctoral dissertations due to advisers."], "November 5": ["Masters thesis and Engineering projects due to advisors."], "November 8- November 22": ["Pre-registration for the Spring 2022semester opens for currently enrolled students."], "November 12": ["Last day for undergraduates to add or remove Pass/No Credit designation."], "November 15": ["Masters theses due in the Office of Graduate Education. Last day to defend doctoral dissertations."], "November 24- November 26": ["Thanksgiving break - no classes. Dining halls closed."], "November 28": ["Dining halls reopen for dinner."], "November 29": ["Classes resume. Doctoral dissertations due in the Office of Graduate Education."], "December 10": ["Last day of classes. Deadline for completion of NE/I grade expectations related to Summer 2021 courses."], "December 11- December 14": ["Reading/Study days. Instructors can schedule no exams nor require any student work expectations on these days."], "December 15 - December 21": ["Final Examinations."], "December 15": ["Registration add/drop reopens for the Spring 2022 term."], "December 25 - January 1": ["Holiday winter break, Institute is closed."], "December 31": ["Official date of December graduation; diplomas mailed to students after final clearance is completed in January. Degree recipients may take part in the May 2022Commencement ceremony."]}
1 change: 1 addition & 0 deletions catalog.rpi.edu/out.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"School of Architecture": [{"degree": "Architecture", "offered": ["B.Arch.", "M.Arch.", "M.S."], "hegis": "0202"}, {"degree": "Architectural Sciences", "offered": ["M.S.", "Ph.D."], "hegis": "0202"}, {"degree": "Building Sciences", "offered": ["B.S."], "hegis": "0202"}, {"degree": "Lighting", "offered": ["M.S."], "hegis": "0299"}], "School of Engineering": [{"degree": "Aeronautical Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0902"}, {"degree": "Biomedical Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0905"}, {"degree": "Chemical Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0906"}, {"degree": "Civil Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0908"}, {"degree": "Computer and Systems Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0999"}, {"degree": "Decision Sciences and Engineering Systems", "offered": ["Ph.D."], "hegis": "0913"}, {"degree": "Electrical Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0909"}, {"degree": "Engineering Physics", "offered": ["M.S.", "D.Eng.", "Ph.D."], "hegis": "0919"}, {"degree": "Engineering Science", "offered": ["B.S.", "M.S.", "Ph.D."], "hegis": "0901"}, {"degree": "Environmental Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0922"}, {"degree": "Industrial and Management Engineering", "offered": ["B.S.", "M.Eng.", "M.S."], "hegis": "0913"}, {"degree": "Materials Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0915"}, {"degree": "Mechanical Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0910"}, {"degree": "Nuclear Engineering", "offered": ["B.S.", "M.Eng.", "M.S.", "D.Eng."], "hegis": "0920"}, {"degree": "Nuclear Engineering and Science", "offered": ["Ph.D."], "hegis": "0920"}, {"degree": "Systems Engineering and Technology Management", "offered": ["M.E."], "hegis": "0913"}, {"degree": "Transportation Engineering", "offered": ["M.Eng.", "M.S.", "D.Eng.", "Ph.D."], "hegis": "0908"}], "School of Humanities, Arts, and Social Sciences": [{"degree": "Cognitive Science", "offered": ["B.S.", "M.S.", "Ph.D."], "hegis": "0499"}, {"degree": "Communication, Media, and Design", "offered": ["B.S."], "hegis": "0601"}, {"degree": "Communication and Rhetoric", "offered": ["M.S."], "hegis": "0601"}, {"degree": "Communication and Rhetoric", "offered": ["Ph.D."], "hegis": "0602"}, {"degree": "Design, Innovation, and Society", "offered": ["B.S."], "hegis": "4903"}, {"degree": "Ecological Economics", "offered": ["Ph.D."], "hegis": "0517"}, {"degree": "Ecological Economics, Values, and Policy", "offered": ["M.S."], "hegis": "2299"}, {"degree": "Economics", "offered": ["B.S.", "M.S."], "hegis": "2204"}, {"degree": "Electronic Arts", "offered": ["B.S.", "M.F.A.", "Ph.D."], "hegis": "1099"}, {"degree": "Electronic Media, Arts, and Communication", "offered": ["B.S."], "hegis": "0605"}, {"degree": "Games and Simulation Arts and Sciences", "offered": ["B.S."], "hegis": "2299"}, {"degree": "Human-Computer Interaction", "offered": ["M.S."], "hegis": "0799"}, {"degree": "Philosophy", "offered": ["B.S."], "hegis": "1509"}, {"degree": "Psychological Science", "offered": ["B.S."], "hegis": "2001"}, {"degree": "Science, Technology, and Society", "offered": ["B.S."], "hegis": "4903"}, {"degree": "Science and Technology Studies", "offered": ["M.S.", "Ph.D."], "hegis": "4903"}, {"degree": "Sustainability Studies", "offered": ["B.S."], "hegis": "4903"}, {"degree": "Technical Communication", "offered": ["M.S."], "hegis": "0601"}], "School of Management": [{"degree": "Business Analytics", "offered": ["B.S.", "M.S."], "hegis": "0506"}, {"degree": "Business and Management", "offered": ["B.S."], "hegis": "0506"}, {"degree": "Quantitative Finance and Risk Analytics", "offered": ["M.S."], "hegis": "0504"}, {"degree": "Management", "offered": ["M.S.", "MBA", "Ph.D."], "hegis": "0506"}, {"degree": "Supply Chain Management", "offered": ["M.S."], "hegis": "0506"}, {"degree": "Technology, Commercialization and Entrepreneurship", "offered": ["M.S."], "hegis": "5004"}], "School of Science": [{"degree": "Applied Physics", "offered": ["B.S."], "hegis": "1902"}, {"degree": "Applied Science", "offered": ["M.S."], "hegis": "4902"}, {"degree": "Astronomy", "offered": ["M.S."], "hegis": "1911"}, {"degree": "Biology", "offered": ["B.S.", "M.S.", "Ph.D."], "hegis": "0401"}, {"degree": "Biochemistry and Biophysics", "offered": ["B.S.", "M.S. Ph.D."], "hegis": "0499"}, {"degree": "Biological Neuroscience", "offered": ["B.S."], "hegis": "0425"}, {"degree": "Chemistry", "offered": ["B.S.", "M.S.", "Ph.D."], "hegis": "1905"}, {"degree": "Computer Science", "offered": ["B.S.", "M.S.", "Ph.D."], "hegis": "0701"}, {"degree": "Environmental Science", "offered": ["B.S."], "hegis": "1999"}, {"degree": "Geology", "offered": ["B.S.", "M.S.", "Ph.D."], "hegis": "1914"}, {"degree": "Hydrogeology", "offered": ["B.S.", "M.S."], "hegis": "1914"}, {"degree": "Interdisciplinary Science", "offered": ["B.S."], "hegis": "4902"}, {"degree": "Applied Mathematics", "offered": ["M.S."], "hegis": "1703"}, {"degree": "Mathematics", "offered": ["B.S.", "M.S.", "Ph.D."], "hegis": "1701"}, {"degree": "Multidisciplinary Science", "offered": ["M.S.", "Ph.D."], "hegis": "4902"}, {"degree": "Physics", "offered": ["B.S.", "M.S.", "Ph.D."], "hegis": "1902"}], "Information Technologyand Web Science": [{"degree": "Information Technology", "offered": ["M.S."], "hegis": "0702"}, {"degree": "Information Technology and Web Science", "offered": ["B.S"], "hegis": "0702"}]}
1 change: 1 addition & 0 deletions catalog.rpi.edu/spring_calendar.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"": ["Spring Term 2022\nJanuary 7\nSpring 2022 tuition and fees due\nJanuary 9\r\n\t\t\tResidence dining halls with dinner.\nJanuary 10\r\n\t\t\tSpring 2022 semester classes begin.\nJanuary 17\r\n\t\t\tMartin Luther King Jr. Day - no classes.\nJanuary 21\r\n\t\t\tLast day for graduate and undergraduate students to add courses, change sections, or to put courses onaudit.\nFebruary 4\nNomination of Masters Thesis Committee forms due to the Office of Graduate Education for May 2022 graduates.\nFebruary 21\r\n\t\t\tPresidents Day - no classes.\nFebruary 22\r\n\t\t\tClasses resume. Follow a Monday schedule.\nFebruary 28\nDoctoral dissertations due to advisers. Masters theses and Engineering projects due to advisers.\n\nFebruary 28 - March 18\nConsultation weeks. Advisement for Fall 2022registration. Students should consult with their faculty advisers.\n\nMarch 4\r\n\t\t\tLast day to file a Degree Application online via SIS for May 2022graduation.\n\n\nMarch 4\nLast day for undergraduate and graduate students to drop a course. Resident dining halls close after dinner."], "March 7": ["Registration begins for The Arch summer classes"], "March 7 - March 11": ["Spring Break"], "March 13": ["Resident dining halls reopen for dinner."], "March 14": ["Classes resume."], "March 21- March 25": ["Grand Marshal Week (Student Government Elections)."], "March 21- April 4": ["Pre-registration for the Fall 2022 semester opens for all currently enrolled students."], "March 23": ["GM week events - no classes."], "March 25": ["Last day to defend doctoral dissertations. Masters theses due in the Office of Graduate Education."], "April 8": ["Doctoral dissertations due in the Office of Graduate Education."], "April 15": ["Last day for undergraduates to add or remove Pass/No Credit designation."], "April 22": ["Summer Arch tuition and fees due."], "April 27": ["Last day of classes.Deadline for completion of NE/I grade expectations related to Fall2021 courses."], "April 28\u00a0- May 1": ["Reading/Study days. Instructors can schedule no exams nor require any student work expectations on these days.."], "May 2- May 6": ["Final Examinations"], "May 6": ["All resident dining halls close after dinner. Residence halls and apartments close at noon for all students not participating in Commencement."], "May 20": ["ROTC Commissioning Ceremony."], "May 21": ["Commencement."], "May 23\nSummer I and II classes begin (including The Arch classes).\n\u00a0\nMay 30": ["Memorial Day - no classes."]}
2 changes: 1 addition & 1 deletion docs/format.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ an object with the following form:
| `"title"` | Course name. | `"CALCULUS I"` |
| `"department"` | Department code. | `"MATH"` |
| `"id"` | Course code. | `1010` |
| `"credits"` | Number of credits for the course. May be a range of values. | `"4"`, or `"1-4"` |
| `"credits"` | Number of credits for the course. May be a range of values. | `"4"`, or `"1-4"` |
| `"ci"` | Whether this course is **c**ommunication **i**ntensive. | `false` |
| `"description"` | Course description. | `"Functions, limits, continuity, derivatives, implicit..."` |
| `"offered"` | When the course can be taken. | `"Fall and spring terms annually."` |
Expand Down