Pull requests to update population-city dataset #7

judeleonard · 2023-09-15T14:35:36Z

@anuveyatsu could please review my PR for the issue #6

.github/workflows/update.yaml

anuveyatsu · 2023-09-15T15:32:29Z

scripts/scraper.py

+from selenium.webdriver.chrome.options import Options
+from selenium.webdriver.support.ui import WebDriverWait
+from selenium.webdriver.support import expected_conditions as EC
+from bs4 import BeautifulSoup


@judeleonard where are you using it?

https://github.com/judeleonard/population-city/blob/population_city/scripts/scraper.py#L1

here

@judeleonard you imported BeautifulSoup but I couldn't find where are you using it.

The beautifulSoup is used here https://github.com/judeleonard/population-city/blob/6f193c14dc59db3af4e6b78971b1202bb1d0fc9b/scripts/scraper.py#L39 as a flavour for reading the HTML table

anuveyatsu

@judeleonard am I correct that you are parsing the HTML table on the page? Do you think it is the right approach vs downloading the data in CSV?

judeleonard · 2023-09-15T15:39:51Z

Based on the initial task to update the data, I thought about fetching first table where the update happened and then use that to update each of the file in the data.

judeleonard · 2023-09-25T04:02:48Z

Hi @anuveyatsu please what is the update on this task?

However, I forgot also to mention that the API endpoint to the UN website for fetching the data was blocked due to an SSL certification issue, so I settled for selenium for this task. Could it be possible to assign me another task if that is possible? I would be more than happy to pick that up. Thanks!

anuveyatsu · 2023-09-25T05:47:07Z

@judeleonard I'm not sure about "the API endpoint to the UN website for fetching the data was blocked due to an SSL certification issue", could you elaborate on that? We normally wouldn't fetch HTML pages if we can access the API. My initial comment was also about that - why would one parse HTML when there is a clearer way to fetch data?

Also, this PR's title is not very descriptive for me.

judeleonard · 2023-09-25T10:25:22Z

@anuveyatsu So after checking through the website I initially thought I could easily extract the link to the download button on the webpage but that was not the case. So I used their API documentation to understand how it can be used with the request library to retrieve the data. Below is a sample of my code that throws SSL error each time I make a request to get the data.

import requests
from io import StringIO
import pandas as pd

url = "https://data.un.org/ws/rest/data/DF_UNData_UNFCC"
headers = {"Accept": "text/csv"}
response = requests.get(url, headers=headers)
data = response.json()

judeleonard added 6 commits September 15, 2023 14:23

first commit

12c0c37

first commit

837d4b2

fixed pipeline syntax error

f7b2920

fixed pipeline syntax error

b35fbec

fixed pipeline syntax error

3e980b9

fixed pipeline syntax error

b5c8a02

anuveyatsu self-requested a review September 15, 2023 15:29

anuveyatsu reviewed Sep 15, 2023

View reviewed changes

.github/workflows/update.yaml Outdated Show resolved Hide resolved

fixed pipeline syntax error

6f193c1

anuveyatsu reviewed Sep 15, 2023

View reviewed changes

judeleonard changed the title ~~first commit~~ Pull requests to update population-city dataset Sep 25, 2023

rufuspollock closed this Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull requests to update population-city dataset #7

Pull requests to update population-city dataset #7

judeleonard commented Sep 15, 2023

anuveyatsu Sep 15, 2023

judeleonard Sep 15, 2023

anuveyatsu Sep 25, 2023

judeleonard Sep 25, 2023

anuveyatsu left a comment

judeleonard commented Sep 15, 2023

judeleonard commented Sep 25, 2023

anuveyatsu commented Sep 25, 2023

judeleonard commented Sep 25, 2023 •

edited

Loading

Pull requests to update population-city dataset #7

Pull requests to update population-city dataset #7

Conversation

judeleonard commented Sep 15, 2023

anuveyatsu Sep 15, 2023

Choose a reason for hiding this comment

judeleonard Sep 15, 2023

Choose a reason for hiding this comment

anuveyatsu Sep 25, 2023

Choose a reason for hiding this comment

judeleonard Sep 25, 2023

Choose a reason for hiding this comment

anuveyatsu left a comment

Choose a reason for hiding this comment

judeleonard commented Sep 15, 2023

judeleonard commented Sep 25, 2023

anuveyatsu commented Sep 25, 2023

judeleonard commented Sep 25, 2023 • edited Loading

judeleonard commented Sep 25, 2023 •

edited

Loading