You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
THIS IS A TWO PERSON JOB and will require guidance. Please find someone to work on this closely with and check in with Trevor or Lily periodically and when deciding on making any changes to specifications.
Please comment your code - each method should have a description of what it does and what it returns, and any code that is difficult to understand should be commented to explain what it does.
The program must load this page and ensure that it is on the most recent catalog (of the current year). Then it must load the "Programs" page. From the "Programs" page, it will gather all of the links in the "Integrative Pathway" section. It will then go to each of those links and record the following information in the following output JSON:
The "pathway name" is whatever is in the <h1 id="acalog-page-title">
Economics
The "description text" is all of the text within each <p> below the <p class="acalog-breadcrumb acalog-highlight-ignore">
Study different types of theories and statistical methods used by economists. Students are prepared to gain a broad understanding of how consumers, firms and governments make decisions, and their implications.
To complete this integrative pathway, students must choose a minimum of 12 credits as described:
The "raw requirements html" is the content within <div class="custom_leftpad_20">
Please keep in mind that there can be any number of required courses in a req# section, and that there can be any number of req# sections in the HASS Pathway listing. "num-required" is the number of courses required from that section. Since it says "Choose one of the following", we must obtain from that the number 1 for req1, and then do the same process for req2. If you find the use of " or " in between possible options to be insufficient, you can choose another reasonable delimiter such as "|", " | ", or " OR ". If there is a course that is simply required for a pathway, the "num-required" should be set to 0 and the "options" should only list the course id of that course.
Also consider the possibility of there not being a "Choose remaining credits from the following:" section, and the number of credits listed in the text "with at least credits at the -level" which occasionally appears. It is not displayed above, but you will want to account for this with a "-credits" field with an integer value in the remaining category (i.e. "4000-credits": 8) to show that there is 8 credits at the 4000 level required.
The other section is for any remaining category titles and the text within them, usually this is just "Compatible minor:": "".
After the code for this is complete, instead of storing the "raw requirements html" in the initial output JSON, you should instead store the output of giving that "raw requirements html" to the parser function.
The text was updated successfully, but these errors were encountered:
We need a HASS Pathway scraper which will collect the HASS pathways from http://catalog.rpi.edu/index.php
THIS IS A TWO PERSON JOB and will require guidance. Please find someone to work on this closely with and check in with Trevor or Lily periodically and when deciding on making any changes to specifications.
Please comment your code - each method should have a description of what it does and what it returns, and any code that is difficult to understand should be commented to explain what it does.
The program must load this page and ensure that it is on the most recent catalog (of the current year). Then it must load the "Programs" page. From the "Programs" page, it will gather all of the links in the "Integrative Pathway" section. It will then go to each of those links and record the following information in the following output JSON:
Using this page as an example (http://catalog.rpi.edu/preview_program.php?catoid=22&poid=5545&returnto=542)
The "pathway name" is whatever is in the
<h1 id="acalog-page-title">
The "description text" is all of the text within each
<p>
below the<p class="acalog-breadcrumb acalog-highlight-ignore">
The "raw requirements html" is the content within
<div class="custom_leftpad_20">
After this task is completed, create a function which will parse the "raw requirements html" and turns it into useful information of the form
Please keep in mind that there can be any number of required courses in a req# section, and that there can be any number of req# sections in the HASS Pathway listing. "num-required" is the number of courses required from that section. Since it says "Choose one of the following", we must obtain from that the number 1 for req1, and then do the same process for req2. If you find the use of " or " in between possible options to be insufficient, you can choose another reasonable delimiter such as "|", " | ", or " OR ". If there is a course that is simply required for a pathway, the "num-required" should be set to 0 and the "options" should only list the course id of that course.
Also consider the possibility of there not being a "Choose remaining credits from the following:" section, and the number of credits listed in the text "with at least credits at the -level" which occasionally appears. It is not displayed above, but you will want to account for this with a "-credits" field with an integer value in the remaining category (i.e.
"4000-credits": 8
) to show that there is 8 credits at the 4000 level required.The other section is for any remaining category titles and the text within them, usually this is just "Compatible minor:": "".
After the code for this is complete, instead of storing the "raw requirements html" in the initial output JSON, you should instead store the output of giving that "raw requirements html" to the parser function.
The text was updated successfully, but these errors were encountered: