Skip to content
This repository has been archived by the owner on Oct 18, 2024. It is now read-only.

feat: ✨ add calendar scraper and dump endpoint #132

Merged
merged 9 commits into from
Feb 28, 2024
Merged

Conversation

ecxyzzy
Copy link
Member

@ecxyzzy ecxyzzy commented Feb 17, 2024

Summary

  • Add automated scraper for the calendar endpoint, which will run every three months (probably on the generous side but you never know).
  • Calling /v1/rest/calendar with no params will now dump all terms in the database. The GraphQL query allTermDates accomplishes the same.
  • Add the field socAvailable to the response type, which indicates when the Schedule of Classes will become available for that term.

Issues

Closes #111; closes #124.

@ecxyzzy ecxyzzy temporarily deployed to staging-132-docs February 17, 2024 22:46 — with GitHub Actions Inactive
@ecxyzzy ecxyzzy temporarily deployed to staging-132 February 17, 2024 22:46 — with GitHub Actions Inactive
@github-actions github-actions bot changed the title feat: add calendar scraper and dump endpoint feat: ✨ add calendar scraper and dump endpoint Feb 17, 2024
@ecxyzzy ecxyzzy temporarily deployed to staging-132-docs February 17, 2024 22:53 — with GitHub Actions Inactive
@ecxyzzy ecxyzzy temporarily deployed to staging-132 February 17, 2024 22:53 — with GitHub Actions Inactive
Copy link

@adcockdalton adcockdalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot discern where the issue might be in the code as I have little to no experience with webscraping, but the output for the finals start and end fields is identical.

I'm not sure where that issue could be sourced, but the output is wrong there. Otherwise looks great, thank you so much!

@ecxyzzy
Copy link
Member Author

ecxyzzy commented Feb 26, 2024

Yeah I narrowed down the issue, it's just a data integrity problem that I need to resolve as I may have imported faulty data into the dev database. I'll do that at some point and re-request your review then.

Update for posterity: It was, in fact, not "just a data integrity problem"; the algorithm was inherently ill-suited to handle the myriad edge cases with the formatting of the UCI Registrar's Quarterly Academic Calendar, so I rewrote the calendar library using a more sane algorithm that locates the entries based on the keywords, rather than assuming that their locations relative to the table's first entry will remain constant.

This should also make it more extensible should we desire additional fields down the line.

@ecxyzzy ecxyzzy temporarily deployed to staging-132-docs February 28, 2024 19:16 — with GitHub Actions Inactive
@ecxyzzy ecxyzzy temporarily deployed to staging-132 February 28, 2024 19:16 — with GitHub Actions Inactive
@ecxyzzy ecxyzzy requested a review from adcockdalton February 28, 2024 19:20
Copy link

@adcockdalton adcockdalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior is as expected. Thank you so much for this endpoint!

@adcockdalton adcockdalton merged commit 011e6d3 into main Feb 28, 2024
5 checks passed
@adcockdalton adcockdalton deleted the calendar-scraper branch February 28, 2024 21:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calendar Dump for Term Start and Finals Start Create seeding tool/scraper for calendar endpoint
2 participants