-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
need timezone specification on import or read timestamps as string not number #347
Comments
I'll take this in a few pieces. Yes readxl brings everything in as being UTC, because Excel has no notion of timezone. It is harder than meets the eye to bring date times in as text (which is #118). The text you see is ephemeral -- created just for your eyeballs. On disk, the cell just is a number and a reference to a date format. I'd like to offer this but haven't gotten to it yet. I think you should import these as proper dates and then change the timezone. This StackOverflow thread seems to offer both base and lubridate solutions for exactly this problem: http://stackoverflow.com/questions/14141537/change-timezone-in-a-posixct-object I'm not eager to add timezone to the Let me know how it works out. I'd love some example code re: post hoc timezone switching. I'd add it to the docs somewhere. |
@jennybc Thanks for quick reply. Your rationale makes sense. Safest solution seems to be to coerce dttm column to string using Here's an example using the attached file Book3.xlsx Book3.xlsx contains a single row in a single column with foo <- readxl::read_excel("C:/Users/cholbrook/Desktop/Book3.xlsx", sheet = 1) note that tzone = UTC (default read_excel behavior)
which is not correct, should look like this should_be <- as.POSIXct("2015-05-05 12:00:02", tz="US/Eastern")
safest method seems to be coerce to text (from POSIXct in UTC) and then back to POSIXct (with correct time zone) new <- as.POSIXct(format(foo$timestamp_us_eastern), tz="US/Eastern")
|
@jennybc Note also that changing the tzone attribute (as shown in http://stackoverflow.com/questions/14141537/change-timezone-in-a-posixct-object) is not a solution to this problem because the tzone attribute does not change the internal value of the POSIXct object; only how it is displayed. For example:
|
Thanks for closing the loop! Yeah, I wondered about the attribute adjustment, as what I saw in that SO thread didn't seem to be exactly what you needed. I'll find a logical place to record your workaround for other people who have this same problem. |
Hey @chrisholbrook, Here's an even better solution to your problem: ## make three identical POSIXct objects
dttm_raw <- dttm_attr <- dttm_force <-
as.POSIXct("2015-05-05 12:00:02", tz="UTC")
dttm_raw
#> [1] "2015-05-05 12:00:02 UTC"
## editing the attribute preserves moment of time, but not clock time
attr(dttm_attr, "tzone") <- "US/Eastern"
dttm_attr
#> [1] "2015-05-05 08:00:02 EDT"
## this doesn't actually change the moment of time
## use lubridate::force_tz() to preserve clock time and modify
## moment of time
dttm_force <- lubridate::force_tz(dttm_force, "US/Eastern")
dttm_force
#> [1] "2015-05-05 12:00:02 EDT"
## take a look under the hood at moment of time
as.numeric(dttm_raw)
#> [1] 1430827202
as.numeric(dttm_attr) # same as dttm_raw
#> [1] 1430827202
as.numeric(dttm_force) # offset
#> [1] 1430841602 |
@jennybc, excellent. yes, somehow I'd missed |
+1 for adding the timezone functionality. Would it be easier to have it not be an arg and just set it to system timezone? |
I think for now this is best left as a well-documented, after-import transformation. If anyone on the thread is game to make a PR re: how to document, that would be great. When/if readxl gets column-typing more consistent with readr, then it would make sense to have a way to set the timezone. |
Unless I've missed something, it seems all timestamps are assumed to be in UTC in read_excel.
Absent of option to specify time zone for each dttm column, I figured I could import as text and then coerce using proper time zone specification in as.POSIXct. Unfortunately my timestamps come in as a number formatted as text when I specify
col_types = "text"
in read_excel. An example xlsx file is attached and some code below showing the issue. (using readxl v1.0.0)Book2.xlsx
it seems read_excel assumes all dttm input in UTC... is this correct? Any chance column-specific timezone argument could be added?
try reading as text and then coerce with as.POSIXct, but data come in as number formatted as text, rather than string... can cobble together conversion, but it's not intuitive.
I thought perhaps I could convert from number by supplying an origin, but that hasn't panned out...
The text was updated successfully, but these errors were encountered: