Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency on pandas #44

Open
timcu opened this issue Jul 23, 2019 · 4 comments
Open

Remove dependency on pandas #44

timcu opened this issue Jul 23, 2019 · 4 comments

Comments

@timcu
Copy link
Owner

timcu commented Jul 23, 2019

Currently pandas is used purely to read the excel file and pandas uses xlrd to do all the work anyway. Installing pandas on some computers is difficult (eg raspberry pi running raspbian) so if we could remove the requirement for pandas then the software could be deployed in more places requiring less resources. Documentation for xlrd says it is best for .xls files but openpyxl is best for .xlsx files. Unfortunately source data file can currently be either format. May need to use both libraries until gov stops using .xls files.

@anniequasar
Copy link
Contributor

so remove pandas and replace it with xlrd and openpyxl for now?

@timcu
Copy link
Owner Author

timcu commented Aug 4, 2019

Should be able to use just xlrd. I don't think openpyxl can do xls files but xlrd can do both xls and xlsx files.

@timcu
Copy link
Owner Author

timcu commented Aug 6, 2019

I have replaced pandas with xlrd in the tests

@timcu
Copy link
Owner Author

timcu commented Aug 6, 2019

To replace pandas in datasheet.py you will need to load workbook from a http response rather than from a file. xlrd doesn't do that out of the box but you can combine it with urllib to do the following

book = xlrd.open_workbook(file_contents=urlopen(find_xl_url()).read())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants