Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ex. 0.4.1 #8

Open
Casper17-max opened this issue Jul 16, 2021 · 14 comments
Open

Ex. 0.4.1 #8

Casper17-max opened this issue Jul 16, 2021 · 14 comments

Comments

@Casper17-max
Copy link

I'm not familiar with the ftp link format, I can't open it, or find a good guide on how to do it.

@joachimkrasmussen
Copy link
Contributor

Hi Casper,

Maybe this page (link here) will help you when it comes to reading csv-files.

When it comes to finding the right url, follow the link that was given to you in the assignment (some browsers might struggle here as far as I remember). Right click on the link that is associated with the relevant year and copy the url. This is the url that you will need.

Does this make sense?

Best,
Joachim

@Casper17-max
Copy link
Author

Hi,
I found a guide to open the ftp link, but I don't know which app to use to download/open the '1863.csv.gz' so I can see the data in a format that I am familiar with, like Excel.
I found a guide to do the following code, but it doesn't seem quite right.
image

@joachimkrasmussen
Copy link
Contributor

Hi Casper,

You simply only need the relevant url. In practice, you don't even have to open the file in your browser. Just get the url and put it where you have '1863.csv.gz'. Next, you should think about your other arguments in .read_csv(). Would the default values maybe sometimes be more appropriate? For instance, how does sep=' ' help you here? Again, the link that I sent before can be helpful here!

Best,
Joachim

@lassearpe
Copy link

I found Internet Explorer 11 to be suitable for the .ftp-format.

Best,
Lasse

@jonasfredslundkofoed
Copy link

Hi Joachim,

I have the same issue as Lasse, and i'm afraid that I cannot use Internet Explorer 11 since I'm on a Mac. I have tried to open the link on Safari, Google Chrome and Firefox and every time I'm asked for a username and password that I don't have. What am I missing? I have attached a picture of the dialogue box asking for the password, and when I enter the link in my browser (any of the three) I'm asked if i want to open the file in Finder.

Best,
Jonas

Skærmbillede 2021-07-19 kl  19 15 47

@joachimkrasmussen
Copy link
Contributor

Thanks for the comment Lasse - Internet Explorer can indeed be used here.

If you are struggling with access, I will give you the url that you will need to open with pd.read_csv():

url = 'ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/1863.csv.gz'

Are you able to proceed now?

Best,
Joachim

@jonasfredslundkofoed
Copy link

Yes, I'm able to continue. Thank you :)

@Casper17-max
Copy link
Author

I still don't know what to do with the last columns, the guide or Google didn't help that much.
image

@joachimkrasmussen
Copy link
Contributor

You should pay attention to two features of .read_csv(): compression and header. What is the appropriate argument for each of these? Pay attention to the last two characters of the url and look at possible values for compression here.

Best,
Joachim

@lucasaabech
Copy link

lucasaabech commented Jul 21, 2021

Now I've tried all of the different compression types, and im not getting anything from my dataset. I've tried a numerous types of headers aswell. Nothing really helps
image

@joachimkrasmussen
Copy link
Contributor

Hi Lucas, you are very close here. Try with header = None in order to solve the problem with your data entering the names for the columns. Then you should be able to proceed to the next exercise, right?

And Casper! Sorry that I did not notice, but your last columns are just fine (in the next exercises, it will be clear that you should only work with the first four columns). You also just seem to struggle with the column labels.

Best,
Joachim

@johankll
Copy link

Hi Joachim,

  1. How do we infer the right compression from the link, you posted earlier? (this link Is it simply the ".gz" at the end of the URL that tells us that the file is compressed using gzip?
  2. Is compression='gzip' the correct specification for CSV-files in general?
  3. Is the compression-statement necessary? I started without any compression-statement, and did not notice any problems.

Thanks in advance.

@joachimkrasmussen
Copy link
Contributor

Hi Johan,

Let me try and answer all three questions:
1: In general, you can infer the right compression by looking at the file extension.
2 + 3: As the link also mentions, .read_csv() will generally infer the correct without specifying gzip (the default input for the argument compression is infer). If this for some reason fails, you may want to take a closer look at your file and understand how it is compressed. If you do not get any problems without the compression-statement, then the compression was probably carried out correctly with the default specification.

Was this helpful?

Best,
Joachim

@johankll
Copy link

Hi Joachim,

Indeed. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants