-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow user to specify col type is just date or time (vs full datetime) #504
Comments
As I've been thinking about this more, it may be simpler to use a heuristic like the following:
|
This is tied up with other issue clusters around column typing, col spec, formats.
I know what you're getting at, but this isn't technically true. Excel stores all of this as a floating point, serial date time. Full stop. The formats only control what is presented to the user visually. You can switch between all the formats listed above and it does not change the numeric value stored for a cell. But yes it would be nice for a user to be able to specify they expect a date, with no time, for example. We can't guess this automatically because this would actually throw away data. From my experience in general, I will say that lots of people are deeply confused about and not intentionally managing their cell formats. So guessing col type based on format will cause new problems. |
@jennybc, I know that Excel stores everything as a floating point, serial number, but the file specification provides an indication of intent via the format (as defined on page 1777 pdf page 1787 of the standard).
My experience is the same as yours, but what I see is that most people expect what is read into R (or more exactly what they are sending me to read into R) is what they see in Excel. If they see a floating point value represented by number format 18, 19, 20, 21, 45, 46, or maybe 47, they think that I will receive something that is a time without a date. A few things that occur to me for brainstorming ways to handle formats:
|
I'm sorry for necroposting, but I wanted to add that currently, |
readxl/src/ColSpec.h
Lines 129 to 134 in 5fbe997
In the above lines of code, the numFmt values, in most cases, specify dates without times. For example:
All of the above, and their equivalents in other languages, should be returned to the R user with the correct date precision.
For the formats that do not include times, they should be returned as a class
Date
object while formats that do include times should return as a classPOSIXct
object. Returning everything as aPOSIXct
object gives an inaccurate picture of the precision.Returning more than what is visually presented (especially the time-only formats showing as being on the day 1899-12-31) misrepresents the available data.
The representation of more data than are provided is also related to tidyverse/lubridate#690.
The text was updated successfully, but these errors were encountered: