Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: use column names in col_types #415

Closed
RobertMyles opened this issue Dec 22, 2017 · 3 comments
Closed

feature request: use column names in col_types #415

RobertMyles opened this issue Dec 22, 2017 · 3 comments
Labels
feature a feature request or enhancement

Comments

@RobertMyles
Copy link

Would it be possible to use column names with col_types in read_excel()? I have some Excel spreadsheets that have many columns, and that need to be formatted with col_types because of some badly formatted cells. It gets to a point where it's hard to keep track of what column is what type! For example, I have this snippet of code:

obras <- read_excel("CONTROLE DE DOCUMENTOS 2017.xlsx", sheet = "DATAS - ETAPAS",
                 col_types = c("text", "text", "text", "text", "text", 
                               "text", "text", "text", "text", "text", "text", 
                               "text", "text", "text", "text", "text", 
                               "text", "text", "text", "text", "text",
                               "text", "date", "date", "date", "date",
                               "date", "date", "date", "date", "date",
                               "date", "date", "date", "date", "date", 
                               "date", "date", "date", "date", "numeric", 
                               "date", "numeric", "text"), na = "----------")

It would be great to be able to use something like col_types = c("Column1" = "text", "Column2" = "date"), (with Column1 the actual name of the first column).

@RobertMyles RobertMyles changed the title feature request: use column names in col_types feature request: use column names in col_types Dec 22, 2017
@jennybc
Copy link
Member

jennybc commented Dec 22, 2017

I mostly consider this a subset of "Add column specification as in readr" #198. It would certainly be implied if readxl got a readr-like col spec. I'll leave this open for now, in case it's easy to implement this specific feature on a shorter timeframe.

@jennybc jennybc added the feature a feature request or enhancement label Apr 15, 2018
@jennybc
Copy link
Member

jennybc commented Apr 15, 2018

Improved col spec handling, here and in readr, is on the horizon now. So I seriously I doubt I will extend readxl's current col_names / col_types system. Therefore, folding this into #198.

@jennybc jennybc closed this as completed Apr 15, 2018
@apsalverda
Copy link

For the time being, while readxl lacks improved column specification handling, it would be extremely useful if its parsing functions could output the column formats it applies when a file is parsed, much like read_csv does, for instance:

read_excel("my_excel_file.xls")
(...)
Parsed with column specification:
col_types = c(
"text",
"text"
)

This would allow the user to:

  1. Parse a file
  2. Check the resulting tibble and note which columns are parsed incorrectly
  3. Copy the "col_types" output by the parsing function
  4. Update the format for only those columns parsed incorrectly
  5. Parse the file again, using the col_types specification from step 4 as an argument

I'm working with an Excel file with 58 columns and second @RobertMyles 's comment that it's quite challenging to put together and manage the appropriate col_types specification, given how this is currently handled in readxl--where I have to create from scratch a long string with each column's format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants