feature request: use column names in col_types #415

RobertMyles · 2017-12-22T12:01:01Z

Would it be possible to use column names with col_types in read_excel()? I have some Excel spreadsheets that have many columns, and that need to be formatted with col_types because of some badly formatted cells. It gets to a point where it's hard to keep track of what column is what type! For example, I have this snippet of code:

obras <- read_excel("CONTROLE DE DOCUMENTOS 2017.xlsx", sheet = "DATAS - ETAPAS",
                 col_types = c("text", "text", "text", "text", "text", 
                               "text", "text", "text", "text", "text", "text", 
                               "text", "text", "text", "text", "text", 
                               "text", "text", "text", "text", "text",
                               "text", "date", "date", "date", "date",
                               "date", "date", "date", "date", "date",
                               "date", "date", "date", "date", "date", 
                               "date", "date", "date", "date", "numeric", 
                               "date", "numeric", "text"), na = "----------")

It would be great to be able to use something like col_types = c("Column1" = "text", "Column2" = "date"), (with Column1 the actual name of the first column).

The text was updated successfully, but these errors were encountered:

jennybc · 2017-12-22T19:22:17Z

I mostly consider this a subset of "Add column specification as in readr" #198. It would certainly be implied if readxl got a readr-like col spec. I'll leave this open for now, in case it's easy to implement this specific feature on a shorter timeframe.

jennybc · 2018-04-15T02:35:58Z

Improved col spec handling, here and in readr, is on the horizon now. So I seriously I doubt I will extend readxl's current col_names / col_types system. Therefore, folding this into #198.

apsalverda · 2019-10-09T14:48:43Z

For the time being, while readxl lacks improved column specification handling, it would be extremely useful if its parsing functions could output the column formats it applies when a file is parsed, much like read_csv does, for instance:

read_excel("my_excel_file.xls")
(...)
Parsed with column specification:
col_types = c(
"text",
"text"
)

This would allow the user to:

Parse a file
Check the resulting tibble and note which columns are parsed incorrectly
Copy the "col_types" output by the parsing function
Update the format for only those columns parsed incorrectly
Parse the file again, using the col_types specification from step 4 as an argument

I'm working with an Excel file with 58 columns and second @RobertMyles 's comment that it's quite challenging to put together and manage the appropriate col_types specification, given how this is currently handled in readxl--where I have to create from scratch a long string with each column's format.

RobertMyles changed the title ~~feature request: use column names in col_types~~ feature request: use column names in col_types Dec 22, 2017

jennybc added the feature a feature request or enhancement label Apr 15, 2018

jennybc closed this as completed Apr 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: use column names in col_types #415

feature request: use column names in col_types #415

RobertMyles commented Dec 22, 2017

jennybc commented Dec 22, 2017

jennybc commented Apr 15, 2018

apsalverda commented Oct 9, 2019

feature request: use column names in col_types #415

feature request: use column names in col_types #415

Comments

RobertMyles commented Dec 22, 2017

jennybc commented Dec 22, 2017

jennybc commented Apr 15, 2018

apsalverda commented Oct 9, 2019