Skip to content

A program which crawls the RPI catalog through the website, parses it, and writes a dot file to visualize course progressions in a tree graph. This can be used by incoming students to understand the core classes we need to take in order to reach the higher level electives.

License

Notifications You must be signed in to change notification settings

jmccand/crawl-classes

Repository files navigation

crawl-classes

RPI has an amazing variety of courses, but this makes its course catalog very long and tedious to navigate. It would be much easier to visualize the prerequisites of each course using a graph! This project includes all of the tools which are necessary to produce such a graph, so that RPI students can see a clear course progression for their major:

  • Web scraping the catalog using Python's 'requests' and 'beautifulSoup4' libraries
  • Storing cached versions of the online html pages so that re-running the program doesn't cause extra burden on RPI servers
  • Instantiating Course objects for each course in the catalog
  • Writing the catalog (a Python dictionary) to a pickled file for later use
  • Converting the Python dictionary of courses to a '.dot' file for graph display
  • Using dot's command line interface (dot -Ktwopi -o <output_file.type> -T <file.dot>) to generate an image from the .dot file

Here's a sample output of all courses in the RPI catalog with prerequisites: prereqs graph

Here's an interesting graph of Computer Science departement courses at RPI: cs graph

About

A program which crawls the RPI catalog through the website, parses it, and writes a dot file to visualize course progressions in a tree graph. This can be used by incoming students to understand the core classes we need to take in order to reach the higher level electives.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages