added feature to alumni and follower count #107

Lina-Pawar · 2022-10-16T16:44:50Z

No description provided.

austinoboyle

Thanks for the Pull Request! (And sorry for the slow review, I don't have a ton of time these days to dedicate to this). I've added comments on some things I'd like to see change before submission, with some less important comments marked as optional.

austinoboyle · 2022-10-26T03:00:15Z

scrape_linkedin/CompanyScraper.py

+        if page == "people":
+            interval = 2.0
+        else:
+            interval = 0.1
+
        try:
            self.driver.get(f"{self.url}/{page}")
+            # people/alumni javascript takes more time to load
+            time.sleep(interval)
+


Instead of a hard-coded sleep, which can cause both unnecessary delays for people with fast internet, and false delays for those with slow internet, I'd instead suggest using a WebDriverWait(self.driver, self.timeout).until(...), which you can see an example of in load_initial.

austinoboyle · 2022-10-26T03:06:54Z

scrape_linkedin/Company.py

+                                 '.org-grid__content-height-enforcer')
+        people = text_or_default(content, 'div > div > div > h2')
+        people = people.replace("employees", "").replace("alumni", "").strip()
+        return people


All of the other properties return dictionaries of key/value pairs, but this returns a single string.

Even if only a single key is currently used, this should return a dictionary for consistency with every other property. It will also more easily allow adding new fields in the future, if appropriate.

austinoboyle · 2022-10-26T03:09:04Z

scrape_linkedin/Company.py

@@ -92,14 +95,20 @@ def overview(self):
        overview["name"] = text_or_default(self.overview_soup, "#main h1")
        overview['description'] = text_or_default(container, 'section > p')

-        logo_image_tag = one_or_default(
-            banner, '.org-top-card-primary-content__logo')
+        banner_desp = text_or_default(banner,


I might be missing something, but what is "desp"? Should this be "desc"?

austinoboyle · 2022-10-26T03:10:58Z

examples/people-to-csv.py

+    for name in my_company_list:
+        sc = scraper.scrape(company=name, people=True)
+        overview = sc.overview
+        overview['company_name'] = name


optional: The overview already has a name field that seems redundant with this. The ID that is being saved here is more typically referred to as an "id" or "slug", which may be more appropriate field names if you need to save this.

austinoboyle · 2022-10-26T03:11:59Z

examples/people-to-csv.py

+with CompanyScraper() as scraper:
+    # Get each company's overview, add to company_data list
+    for name in my_company_list:
+        sc = scraper.scrape(company=name, people=True)


optional: As a naive reader it would be unclear to be what sc is supposed to be. I would suggest: company_info, or similar.

austinoboyle · 2022-10-26T03:22:03Z

scrape_linkedin/CompanyScraper.py

+
+    def scrape(self,
+               company,
+               org_type="company",


org_type is used only to generate a URL, but LinkedIn seems to automatically redirect the URL in the case you use /company/school-id. For example, https://www.linkedin.com/company/harvard-university works (otherwise, your example in people-to-csv.py would break.

I think this should be removed as an option for now.

added feature to alumni and follower count

f2b113d

austinoboyle requested changes Oct 26, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added feature to alumni and follower count #107

added feature to alumni and follower count #107

Lina-Pawar commented Oct 16, 2022

austinoboyle left a comment

austinoboyle Oct 26, 2022

austinoboyle Oct 26, 2022

austinoboyle Oct 26, 2022

austinoboyle Oct 26, 2022

austinoboyle Oct 26, 2022

austinoboyle Oct 26, 2022

added feature to alumni and follower count #107

Are you sure you want to change the base?

added feature to alumni and follower count #107

Conversation

Lina-Pawar commented Oct 16, 2022

austinoboyle left a comment

Choose a reason for hiding this comment

austinoboyle Oct 26, 2022

Choose a reason for hiding this comment

austinoboyle Oct 26, 2022

Choose a reason for hiding this comment

austinoboyle Oct 26, 2022

Choose a reason for hiding this comment

austinoboyle Oct 26, 2022

Choose a reason for hiding this comment

austinoboyle Oct 26, 2022

Choose a reason for hiding this comment

austinoboyle Oct 26, 2022

Choose a reason for hiding this comment