Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull requests to update population-city dataset #7

Closed
wants to merge 7 commits into from

Conversation

judeleonard
Copy link

@anuveyatsu could please review my PR for the issue #6

@anuveyatsu anuveyatsu self-requested a review September 15, 2023 15:29
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judeleonard where are you using it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judeleonard you imported BeautifulSoup but I couldn't find where are you using it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@anuveyatsu anuveyatsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judeleonard am I correct that you are parsing the HTML table on the page? Do you think it is the right approach vs downloading the data in CSV?

@judeleonard
Copy link
Author

Based on the initial task to update the data, I thought about fetching first table where the update happened and then use that to update each of the file in the data.

@judeleonard
Copy link
Author

Hi @anuveyatsu please what is the update on this task?

However, I forgot also to mention that the API endpoint to the UN website for fetching the data was blocked due to an SSL certification issue, so I settled for selenium for this task. Could it be possible to assign me another task if that is possible? I would be more than happy to pick that up. Thanks!

@anuveyatsu
Copy link
Member

@judeleonard I'm not sure about "the API endpoint to the UN website for fetching the data was blocked due to an SSL certification issue", could you elaborate on that? We normally wouldn't fetch HTML pages if we can access the API. My initial comment was also about that - why would one parse HTML when there is a clearer way to fetch data?

Also, this PR's title is not very descriptive for me.

@judeleonard judeleonard changed the title first commit Pull requests to update population-city dataset Sep 25, 2023
@judeleonard
Copy link
Author

judeleonard commented Sep 25, 2023

@anuveyatsu So after checking through the website I initially thought I could easily extract the link to the download button on the webpage but that was not the case. So I used their API documentation to understand how it can be used with the request library to retrieve the data. Below is a sample of my code that throws SSL error each time I make a request to get the data.

import requests
from io import StringIO
import pandas as pd

url = "https://data.un.org/ws/rest/data/DF_UNData_UNFCC"
headers = {"Accept": "text/csv"}
response = requests.get(url, headers=headers)
data = response.json()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants