Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have to manually open certain xls files in Excel with RStudio running in order for read_excel to work #618

Closed
SteveDeitz opened this issue Jun 3, 2020 · 4 comments
Labels
reprex needs a minimal reproducible example xls 👵

Comments

@SteveDeitz
Copy link

Hello, I've encountered an issue with read_excel function with certain xls files I received. I need to open RStudio, then separately open and close each xls file in Excel before read_excel will function properly. If I do not, it creates an object but with 0 obs 8 x 0 columns.

I don't know how these files were originally created and if they had any special properties but the cause may be because they were shared via the accellion secure file transfer platform (https://www.accellion.com/platform/simple/secure-file-sharing/) Any ideas as to what could be causing this, and why the opening of the file in Excel--even if I don't make any edits or change filetype--resolves it?

Read excel is working fine for me otherwise, including on other xls (MS Excel 1997-2003) files even after I perform multiple steps of downloading, compressing, emailing, extracting, etc.

Unfortunately I can't share the files, so I don't have much of a reproducible example. Below are some comments on what happens.

> library(readxl)
> #First attempt to read .xls file ("Microsoft Excel 97-2003 Worksheet")
> a<-read_excel("state49.xls")
> str(a)
tibble [0 x 0] (S3: tbl_df/tbl/data.frame)
Named list()
> #Open file state49.xls manually in Excel then close it without editing or resaving. Now try second attempt:
> b<-read_excel("state49.xls")
> str(b)
tibble [703 x 18] (S3: tbl_df/tbl/data.frame)
[Followed by info on the 18 columns703 x 18 is correct]
@jennybc
Copy link
Member

jennybc commented Jun 3, 2020

if they had any special properties but the cause may be because they were shared via the accellion secure file transfer platform (https://www.accellion.com/platform/simple/secure-file-sharing/) Any ideas as to what could be causing this, and why the opening of the file in Excel--even if I don't make any edits or change filetype--resolves it?

I suspect there is something peculiar about the files and that, despite your lack of explicit effort, something is getting changed by the opening. Can you set something up where you do a scrupulous check of file info re: created time, last modified time, etc?

There are many examples in the issues over the years of especially weird third-party-created files being openable by Excel itself and, after Excel has done some mysterious normalization, they become readable by readxl (which really means libxls, in this case) and other spreadsheet-reading tools.

@jennybc jennybc added reprex needs a minimal reproducible example xls 👵 labels Jun 3, 2020
@SteveDeitz
Copy link
Author

I've found two unusual properties for these files that I'm not seeing on my machine for other xls or xlsx files:

  1. Just opening the file will cause some hidden change that updates the Date modified time
  2. Perhaps more important: after opening, when clicking “Save as…”, the default filetype in the dropdown is “Microsoft Excel 5.0/95 Workbook (*.xls)”. Is it possible that a file says it's Microsoft Excel 97-2003 Worksheet (.xls) in Windows Explorer and right click>properties but is actually an even older format?

@jennybc
Copy link
Member

jennybc commented Jun 3, 2020

Just for context (from http://www.gaia-gis.it/gaia-sins/freexl-1.0.5-doxy-doc/html/Format.html)

Screen Shot 2020-06-03 at 9 25 08 AM

Just opening the file will cause some hidden change that updates the Date modified time

This is what I expected.

Is it possible that a file says it's Microsoft Excel 97-2003 Worksheet (.xls) in Windows Explorer and right click>properties but is actually an even older format?

Hard to say if it's truly an even older format (explicitly, with intention) or if whoever wrote the .xls was targeting a more modern format but implemented it in such a weird way that it messes with the version detection.

No matter what, I think we can conclude that this problem is sadly beyond the reach of readxl and libxls, i.e. both would declare this "wontfix". I think you'll have to create some sort of workaround on your end.

@jennybc jennybc closed this as completed Jun 3, 2020
@hrdawson
Copy link

Leaving a comment here for other frustrated gas analysis scientists who're trying to read their LICOR IRGA data into R (and other people using .xls files generated by devices built in the late 1990s/early 2000s):

This appears to be an issue with LICOR files generated with a LI-6400 XT running OPEN 6.1.4. When I try to read my .xls data into R as downloaded from the LICOR using readxl::read_excel, R returns a libxls error: Unable to open file error. If I open the file in Excel and save it as .xls (Excel 97-2004 Workbook), readxl::read_excel reads the new file no problem.

No workaround as of yet, although I note that base::readLines accesses the original file just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reprex needs a minimal reproducible example xls 👵
Projects
None yet
Development

No branches or pull requests

3 participants