Monkey #31

SimonBerend · 2021-10-26T13:22:56Z

Some issues I didnt know how to fix:

there are some errors in the get_survey_responses function, related to 'responses_dict' (lines 39, 48, 50)
All the constituent parts work but I havent ran the script as a whole
Im not sure whether I use the scrape function at the bottom correctly (line 195), it differs a bit from scrape functions used in others scripts
Im not sure whether I have placed the variables in MasterpeaceScraper(search_string = "MEAL", survey_id = '297005313') (line 214) correctly

Updating scrapers before introducing keywords funcionality

andrewsutjahjo · 2021-11-03T15:38:15Z

forward43/scraper_masterpeace.py

+        surveys = self.client.get_survey_lists()
+        if 'data' not in surveys:
+            raise ValueError(
+                f"Surveys not found for search string {search_string}, data returned: {surveys}"


I'd change this to f"Surveys not found from client. data returned: {surveys}"
as this part of the code doesn't do anything with the search_string yet

andrewsutjahjo · 2021-11-03T16:08:41Z

forward43/scraper_masterpeace.py

+
+    def get_survey_questions(self, survey_id: str) -> Dict[str, str]:
+        """Get the questions associated with the row number in a survey."""
+        pass


Why is this passing, do you have the questions somewhere?

andrewsutjahjo · 2021-11-03T16:13:43Z

forward43/scraper_masterpeace.py

+        """Get id of respondent """
+        """Note: multiple projects per respondent"""


Could you make this into one note?

eg:

""" Get id of respondent. Note: multiple projects per respondent """

This'll make the Note a part of the docstring and visible for everyone who's mousing over the function

andrewsutjahjo · 2021-11-03T16:20:14Z

forward43/scraper_masterpeace.py

+    def get_questions_dict(self, survey_details):
+        questions = {}


This could use a docstring:
"""Get questions and their answers.
Returns a Dict: {question_id : answer}
"""

andrewsutjahjo · 2021-11-03T16:31:14Z

forward43/scraper_masterpeace.py

+    def get_respondent_data(self, responses, survey_id, respondent_id, club_data_dict, entity) -> str:
+        """get data entities per respondent"""
+        for page in responses[survey_id][respondent_id]['pages']:
+            if page['id'] == '146876559':


What's this specific number? could use a documentation at least, or turn it into a variable if it could change later down the line

andrewsutjahjo · 2021-11-03T16:46:27Z

forward43/scraper_masterpeace.py

+                    if len(question["headings"]) > 0:
+                        project_data_ids[question['id']] = question['headings'][0].get('heading', "")


Is it always the first heading of question["headings"] that contains important data, and never the second?

andrewsutjahjo · 2021-11-03T16:49:41Z

forward43/scraper_masterpeace.py

+        return club_data
+
+    def get_project_data_ids(self, survey_details: dict):
+        """Get ids of the answers to questions on projects."""


I'm not too sure what this returns - The dict has the format

{ "question_id" : "information_from_heading" }

the docstring says Get ids of the answers to questions on projects.
Is the question_id then also the id of the answer? What's the heading?

andrewsutjahjo · 2021-11-03T16:56:06Z

forward43/scraper_masterpeace.py

+        for x in unique_values:
+            ids_per_value = [key for key, value in project_data_ids.items() if value == x]
+            ids_per_value.sort()
+            if len(ids_per_value) > 1:
+                split_project_dict[x] = ids_per_value
+        return split_project_dict


What this does is it flips the keys and values in project_data_ids and if there are duplicate values, it makes a new dict where the initial value is the new key, and the new value is a list of the keys that belong to that value.
However, if there's 1 or less values, this just returns an empty dict

andrewsutjahjo · 2021-11-03T17:02:14Z

forward43/scraper_masterpeace.py

+            project_list.append({
+                'id'              : 'respondent_id'+ str(i),
+                'title'           : self.get_project_data(responses, survey_id, respondent_id, split_project_data_dict, project_number = i, entity = 'Project Title'),
+                'description'     : self.get_project_data(responses, survey_id, respondent_id, split_project_data_dict, project_number = i, entity = 'Describe your project (at least 300 words)<br><br><em>- Context (What is the dilemma that the project is trying to tackle? Why is it important for this neighbourhood/group of people/the country?</em><br><em>- Activities (What did you do?)</em><br><em>- Results (What did you achieve? What did you create, produce, accomplish? Try to include numbers, if possible).</em><br><em>- Impact (What changed in the community? What did you learn yourself or as a team? Did you meet your own expectations)?</em>'),


I'm worried that this is a very long string for an entity

andrewsutjahjo · 2021-11-03T17:05:53Z

forward43/scraper_masterpeace.py

+            except Exception as e:
+                self.logger.exception('Failed to get projects from current page')
+
+            self.write_to_file(projects, str(search_string + respondent_id))


This'll write a separate file per respondent ; so sometimes 1 and sometimes 4 projects. I think we were using a one file per scrape methodology, so I'd say do something like create a projects list at the start of the scrape() method: projects = [] and then in the try: change it to

resp_projects = self.process_response(responses, survey_details, survey_id, respondent_id) projects.extend(resp_projects)

akashrajkn · 2021-11-05T05:08:57Z

forward43/scraper_masterpeace.py

+            responses = self.client.get_all_pages_response(survey_id)
+            for response in responses:
+                if not response.get("data", []):
+                    raise ValueError(


This will throw an error for the first item in the list that does not have data param. Is this the intended behaviour?

If not, you could do something like this:

for response in responses: data = response.get('data', []) for response_data in data: respondent_id = response_data.get("id", "") responses_dict[survey_id][respondent_id] = response_data

SimonBerend and others added 16 commits May 6, 2021 12:33

Merge pull request #1 from CorrelAidxNL/master

f128329

Updating scrapers before introducing keywords funcionality

keywords in kickstartscraper v1 + list in hparams

ada2c47

first version monkey API/scraper

82da657

Delete scraper_masterpeace.py~Stashed changes

7cee2de

pickup on MEAL work

338262f

working on process_response

c445694

first version monkey API/scraper

826c0f1

Delete scraper_masterpeace.py~Stashed changes

2994422

pickup on MEAL work

c301c13

working on process_response

442d642

merging with master

941972a

remove .idea/ from repo

aa4dd68

Merge branch 'CorrelAidxNL:master' into monkey

1efb602

first version survey monkey scraper

708154e

Delete forward43/notebooks directory

8a8466e

deleted dated survey monkey setup

a5d4d85

SimonBerend requested a review from andrewsutjahjo October 26, 2021 13:22

error fix in monkey scraper

04e5714

SimonBerend requested a review from akashrajkn October 27, 2021 16:29

andrewsutjahjo requested changes Nov 3, 2021

View reviewed changes

akashrajkn reviewed Nov 5, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monkey #31

Monkey #31

SimonBerend commented Oct 26, 2021

andrewsutjahjo Nov 3, 2021

andrewsutjahjo Nov 3, 2021

andrewsutjahjo Nov 3, 2021

andrewsutjahjo Nov 3, 2021

andrewsutjahjo Nov 3, 2021

andrewsutjahjo Nov 3, 2021

andrewsutjahjo Nov 3, 2021

andrewsutjahjo Nov 3, 2021

andrewsutjahjo Nov 3, 2021

andrewsutjahjo Nov 3, 2021

akashrajkn Nov 5, 2021

		"""Get id of respondent """
		"""Note: multiple projects per respondent"""

		if len(question["headings"]) > 0:
		project_data_ids[question['id']] = question['headings'][0].get('heading', "")

Monkey #31

Are you sure you want to change the base?

Monkey #31

Conversation

SimonBerend commented Oct 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment