Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monkey #31

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Monkey #31

wants to merge 17 commits into from

Conversation

SimonBerend
Copy link
Contributor

Some issues I didnt know how to fix:

  • there are some errors in the get_survey_responses function, related to 'responses_dict' (lines 39, 48, 50)
  • All the constituent parts work but I havent ran the script as a whole
  • Im not sure whether I use the scrape function at the bottom correctly (line 195), it differs a bit from scrape functions used in others scripts
  • Im not sure whether I have placed the variables in MasterpeaceScraper(search_string = "MEAL", survey_id = '297005313') (line 214) correctly

surveys = self.client.get_survey_lists()
if 'data' not in surveys:
raise ValueError(
f"Surveys not found for search string {search_string}, data returned: {surveys}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this to f"Surveys not found from client. data returned: {surveys}"
as this part of the code doesn't do anything with the search_string yet


def get_survey_questions(self, survey_id: str) -> Dict[str, str]:
"""Get the questions associated with the row number in a survey."""
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this passing, do you have the questions somewhere?

Comment on lines +73 to +74
"""Get id of respondent """
"""Note: multiple projects per respondent"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this into one note?

eg:

"""
Get id of respondent. 
       Note: multiple projects per respondent
"""

This'll make the Note a part of the docstring and visible for everyone who's mousing over the function

Comment on lines +81 to +82
def get_questions_dict(self, survey_details):
questions = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a docstring:
"""Get questions and their answers.
Returns a Dict: {question_id : answer}
"""

def get_respondent_data(self, responses, survey_id, respondent_id, club_data_dict, entity) -> str:
"""get data entities per respondent"""
for page in responses[survey_id][respondent_id]['pages']:
if page['id'] == '146876559':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this specific number? could use a documentation at least, or turn it into a variable if it could change later down the line

Comment on lines +130 to +131
if len(question["headings"]) > 0:
project_data_ids[question['id']] = question['headings'][0].get('heading', "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always the first heading of question["headings"] that contains important data, and never the second?

return club_data

def get_project_data_ids(self, survey_details: dict):
"""Get ids of the answers to questions on projects."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure what this returns - The dict has the format

{
  "question_id" : "information_from_heading"
}

the docstring says Get ids of the answers to questions on projects.
Is the question_id then also the id of the answer? What's the heading?

Comment on lines +138 to +143
for x in unique_values:
ids_per_value = [key for key, value in project_data_ids.items() if value == x]
ids_per_value.sort()
if len(ids_per_value) > 1:
split_project_dict[x] = ids_per_value
return split_project_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this does is it flips the keys and values in project_data_ids and if there are duplicate values, it makes a new dict where the initial value is the new key, and the new value is a list of the keys that belong to that value.
However, if there's 1 or less values, this just returns an empty dict

project_list.append({
'id' : 'respondent_id'+ str(i),
'title' : self.get_project_data(responses, survey_id, respondent_id, split_project_data_dict, project_number = i, entity = 'Project Title'),
'description' : self.get_project_data(responses, survey_id, respondent_id, split_project_data_dict, project_number = i, entity = 'Describe your project (at least 300 words)<br><br><em>- Context (What is the dilemma that the project is trying to tackle? Why is it important for this neighbourhood/group of people/the country?</em><br><em>- Activities (What did you do?)</em><br><em>- Results (What did you achieve? What did you create, produce, accomplish? Try to include numbers, if possible).</em><br><em>- Impact (What changed in the community? What did you learn yourself or as a team? Did you meet your own expectations)?</em>'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that this is a very long string for an entity

except Exception as e:
self.logger.exception('Failed to get projects from current page')

self.write_to_file(projects, str(search_string + respondent_id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll write a separate file per respondent ; so sometimes 1 and sometimes 4 projects. I think we were using a one file per scrape methodology, so I'd say do something like create a projects list at the start of the scrape() method: projects = [] and then in the try: change it to

resp_projects = self.process_response(responses, survey_details, survey_id, respondent_id)
projects.extend(resp_projects)

responses = self.client.get_all_pages_response(survey_id)
for response in responses:
if not response.get("data", []):
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will throw an error for the first item in the list that does not have data param. Is this the intended behaviour?

If not, you could do something like this:

for response in responses:
    data = response.get('data', [])

    for response_data in data:
        respondent_id = response_data.get("id", "")
        responses_dict[survey_id][respondent_id] = response_data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants