Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to retain special symbols/ errors in excel as a string #229

Open
durgeksh opened this issue Apr 8, 2024 · 3 comments
Open

Option to retain special symbols/ errors in excel as a string #229

durgeksh opened this issue Apr 8, 2024 · 3 comments
Labels
🦀 rust 🦀 Pull requests that edit Rust code feature request

Comments

@durgeksh
Copy link

durgeksh commented Apr 8, 2024

Hi team,
in excel, we have special symbols like #VALUE!, #DIV/0! etc. As of now, these symbols are turned into null. Can we have an option to retain them as it is as a string?

Thank you for the wonderful and performant wrapper.

@lukapeschke lukapeschke added 🦀 rust 🦀 Pull requests that edit Rust code feature request labels Apr 8, 2024
@PrettyWood
Copy link
Member

#220 (comment) for reference

@PrettyWood PrettyWood added this to the v0.12.0 milestone Jul 1, 2024
@PrettyWood
Copy link
Member

Hello @durgeksh
I pushed #293 to showcase the current behavior, which seems ok to me.
If it's not can you please share a file and explain the current behavior and the desired one?

@skytwosea
Copy link

Hello @PrettyWood , I've gone over the tests pushed in #293 and I'm unable to replicate the desired behaviour. I can't find a way to read excel error codes as strings - they are always transcribed as null.
My environment:

Linux kernel 6.8.0 on x86-64
Ubuntu 24.04.1 LGS

In the following tables, the column 'from_m' has 5 broken formulas, so the error code for each is #REF!. The column litho_unit is empty, so its null values are expected and desired.
Herein lies the problem: if I'm unable to display and query for excel error codes, then I can't differentiate true null values from those nulls that are hiding broken excel functionality. Information is being lost.
Note that all columns in the source excel sheet are nominally 'text' type; that is for workflow reasons beyond my control. Regardless, the source column type should not impact the display of Excel error codes: in this case, the columns should be brought in as strings, and the five cells with broken formulas should display #REF! in the dataframe.
Apologies for the long format, but I want to show my 'tests' in full:

$ python --version                                                                                                                                                                                                                                            [127:NOTFND]|ARIS-Scripts|off|.aris|ref_error_handling
Python 3.12.6
$ uv pip show polars fastexcel                                                                                                                                                                                                                                         [0]|ARIS-Scripts|off|.aris|ref_error_handling
Using Python 3.12.6 environment at .aris
Name: fastexcel
Version: 0.12.0
Location: /path/to/venv
Requires: pyarrow
Required-by:
---
Name: polars
Version: 1.14.0
Location: /path/to/venv
Requires:
Required-by:
$ python                                                                                                                                                                                                                                                               [0]|ARIS-Scripts|off|.aris|ref_error_handling
Python 3.12.6 (main, Sep  9 2024, 22:11:19) [Clang 18.1.8 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import polars as pl
>>> partial_path = "/path/to/30061_test_broken_formula_partial_column_litho.xlsx"
>>> df = pl.read_excel(partial_path, read_options={'schema_sample_rows':30,})
>>> with pl.Config(set_tbl_rows=30): print(df)
shape: (29, 5)
┌───────────┬────────┬───────┬────────────┬─────────────────────────────────┐
│ hole_name ┆ from_m ┆ to_m  ┆ litho_unit ┆ comments                        │
│ ---       ┆ ---    ┆ ---   ┆ ---        ┆ ---                             │
│ str       ┆ str    ┆ str   ┆ null       ┆ str                             │
╞═══════════╪════════╪═══════╪════════════╪═════════════════════════════════╡
│ S07-01    ┆ 0.0    ┆ 2.2   ┆ null       ┆ overburden                      │
│ S07-01    ┆ 2.2    ┆ 150.0 ┆ null       ┆ Dacitic Ash Flow Tuff (TS)      │
│ S07-01    ┆ 2.2    ┆ 3.0   ┆ null       ┆ Csg. Fresh Dacitic Ash Flow Tu… │
│ S07-01    ┆ 3.0    ┆ 4.0   ┆ null       ┆ Fresh daft.  Bockv fracturino   │
│ S07-01    ┆ 4.0    ┆ 4.15  ┆ null       ┆ 15cm of coarse 3.0-5.0mm feld … │
│ S07-01    ┆ 5.3    ┆ 6.0   ┆ null       ┆ around core - no record         │
│ S07-01    ┆ 6.55   ┆ 6.95  ┆ null       ┆ Silty sand- type material reco… │
│ S07-01    ┆ 7.0    ┆ 7.5   ┆ null       ┆ Bleached, altered dacite, no s… │
│ S07-01    ┆ 8.0    ┆ 9.0   ┆ null       ┆ fresh blocky fracture TS        │
│ S07-01    ┆ 9.9    ┆ 10.0  ┆ null       ┆ Mod. Clav altered TS one thin … │
│ S07-01    ┆ null   ┆ 11.0  ┆ null       ┆ Mod clay altered TS  Blocky br… │
│ S07-01    ┆ null   ┆ 12.0  ┆ null       ┆ very weekly altered TS broken … │
│ S07-01    ┆ null   ┆ 13.8  ┆ null       ┆ 12.0-13.8mostly gr. And lost g… │
│ S07-01    ┆ null   ┆ 14.7  ┆ null       ┆ Mod clay altered zone with qua… │
│ S07-01    ┆ null   ┆ 16.0  ┆ null       ┆ wk-mod clay altered TS          │
│ S07-01    ┆ 16.0   ┆ 17.0  ┆ null       ┆ wk clay altered TS              │
│ S07-01    ┆ 18.2   ┆ 18.3  ┆ null       ┆ Cal healino bx TX at  20-30° C… │
│ S07-01    ┆ 19.1   ┆ 19.95 ┆ null       ┆ Very intensely clay altered TS… │
│ S07-01    ┆ 20.0   ┆ 21.0  ┆ null       ┆ fresh to very weakly clay alte… │
│ S07-01    ┆ 21.1   ┆ 21.95 ┆ null       ┆ Highly broken crumbly core, mo… │
│ S07-01    ┆ 22.0   ┆ 23.0  ┆ null       ┆ Fresh to  v.  weak altered TS   │
│ S07-01    ┆ 23.0   ┆ 24.0  ┆ null       ┆ verv uniform unalterd TS        │
│ S07-01    ┆ 24.0   ┆ 28.0  ┆ null       ┆ uniform unaltered TS            │
│ S07-01    ┆ 28.0   ┆ 29.0  ┆ null       ┆ rock is weaklv oxidized. oranq… │
│ S07-01    ┆ 29.1   ┆ 29.4  ┆ null       ┆ finer grained band of tuff in … │
│ S07-01    ┆ 31.05  ┆ 32.20 ┆ null       ┆ Several lim coated fractures a… │
│ S07-01    ┆ 34.5   ┆ 35.0  ┆ null       ┆ major fault - gg and soft dk g… │
│ S07-01    ┆ 35.0   ┆ 36.0  ┆ null       ┆ soft partly bx, fr and altd fo… │
│ S07-01    ┆ 36.0   ┆ 37.15 ┆ null       ┆ Perv K-spr altd zone with qtz-… │
└───────────┴────────┴───────┴────────────┴─────────────────────────────────┘
>>> df = pl.read_excel(partial_path, read_options={'schema_sample_rows':1,'dtypes':'string'})
>>> with pl.Config(set_tbl_rows=30): print(df)
shape: (29, 5)
┌───────────┬────────┬───────┬────────────┬─────────────────────────────────┐
│ hole_name ┆ from_m ┆ to_m  ┆ litho_unit ┆ comments                        │
│ ---       ┆ ---    ┆ ---   ┆ ---        ┆ ---                             │
│ str       ┆ str    ┆ str   ┆ str        ┆ str                             │
╞═══════════╪════════╪═══════╪════════════╪═════════════════════════════════╡
│ S07-01    ┆ 0.0    ┆ 2.2   ┆ null       ┆ overburden                      │
│ S07-01    ┆ 2.2    ┆ 150.0 ┆ null       ┆ Dacitic Ash Flow Tuff (TS)      │
│ S07-01    ┆ 2.2    ┆ 3.0   ┆ null       ┆ Csg. Fresh Dacitic Ash Flow Tu… │
│ S07-01    ┆ 3.0    ┆ 4.0   ┆ null       ┆ Fresh daft.  Bockv fracturino   │
│ S07-01    ┆ 4.0    ┆ 4.15  ┆ null       ┆ 15cm of coarse 3.0-5.0mm feld … │
│ S07-01    ┆ 5.3    ┆ 6.0   ┆ null       ┆ around core - no record         │
│ S07-01    ┆ 6.55   ┆ 6.95  ┆ null       ┆ Silty sand- type material reco… │
│ S07-01    ┆ 7.0    ┆ 7.5   ┆ null       ┆ Bleached, altered dacite, no s… │
│ S07-01    ┆ 8.0    ┆ 9.0   ┆ null       ┆ fresh blocky fracture TS        │
│ S07-01    ┆ 9.9    ┆ 10.0  ┆ null       ┆ Mod. Clav altered TS one thin … │
│ S07-01    ┆ null   ┆ 11.0  ┆ null       ┆ Mod clay altered TS  Blocky br… │
│ S07-01    ┆ null   ┆ 12.0  ┆ null       ┆ very weekly altered TS broken … │
│ S07-01    ┆ null   ┆ 13.8  ┆ null       ┆ 12.0-13.8mostly gr. And lost g… │
│ S07-01    ┆ null   ┆ 14.7  ┆ null       ┆ Mod clay altered zone with qua… │
│ S07-01    ┆ null   ┆ 16.0  ┆ null       ┆ wk-mod clay altered TS          │
│ S07-01    ┆ 16.0   ┆ 17.0  ┆ null       ┆ wk clay altered TS              │
│ S07-01    ┆ 18.2   ┆ 18.3  ┆ null       ┆ Cal healino bx TX at  20-30° C… │
│ S07-01    ┆ 19.1   ┆ 19.95 ┆ null       ┆ Very intensely clay altered TS… │
│ S07-01    ┆ 20.0   ┆ 21.0  ┆ null       ┆ fresh to very weakly clay alte… │
│ S07-01    ┆ 21.1   ┆ 21.95 ┆ null       ┆ Highly broken crumbly core, mo… │
│ S07-01    ┆ 22.0   ┆ 23.0  ┆ null       ┆ Fresh to  v.  weak altered TS   │
│ S07-01    ┆ 23.0   ┆ 24.0  ┆ null       ┆ verv uniform unalterd TS        │
│ S07-01    ┆ 24.0   ┆ 28.0  ┆ null       ┆ uniform unaltered TS            │
│ S07-01    ┆ 28.0   ┆ 29.0  ┆ null       ┆ rock is weaklv oxidized. oranq… │
│ S07-01    ┆ 29.1   ┆ 29.4  ┆ null       ┆ finer grained band of tuff in … │
│ S07-01    ┆ 31.05  ┆ 32.20 ┆ null       ┆ Several lim coated fractures a… │
│ S07-01    ┆ 34.5   ┆ 35.0  ┆ null       ┆ major fault - gg and soft dk g… │
│ S07-01    ┆ 35.0   ┆ 36.0  ┆ null       ┆ soft partly bx, fr and altd fo… │
│ S07-01    ┆ 36.0   ┆ 37.15 ┆ null       ┆ Perv K-spr altd zone with qtz-… │
└───────────┴────────┴───────┴────────────┴─────────────────────────────────┘
>>> df = pl.read_excel(partial_path, read_options={'schema_sample_rows':1,'dtypes':{'from_m':'string'}})
>>> with pl.Config(set_tbl_rows=30): print(df)
shape: (29, 5)
┌───────────┬────────┬───────┬────────────┬─────────────────────────────────┐
│ hole_name ┆ from_m ┆ to_m  ┆ litho_unit ┆ comments                        │
│ ---       ┆ ---    ┆ ---   ┆ ---        ┆ ---                             │
│ str       ┆ str    ┆ str   ┆ null       ┆ str                             │
╞═══════════╪════════╪═══════╪════════════╪═════════════════════════════════╡
│ S07-01    ┆ 0.0    ┆ 2.2   ┆ null       ┆ overburden                      │
│ S07-01    ┆ 2.2    ┆ 150.0 ┆ null       ┆ Dacitic Ash Flow Tuff (TS)      │
│ S07-01    ┆ 2.2    ┆ 3.0   ┆ null       ┆ Csg. Fresh Dacitic Ash Flow Tu… │
│ S07-01    ┆ 3.0    ┆ 4.0   ┆ null       ┆ Fresh daft.  Bockv fracturino   │
│ S07-01    ┆ 4.0    ┆ 4.15  ┆ null       ┆ 15cm of coarse 3.0-5.0mm feld … │
│ S07-01    ┆ 5.3    ┆ 6.0   ┆ null       ┆ around core - no record         │
│ S07-01    ┆ 6.55   ┆ 6.95  ┆ null       ┆ Silty sand- type material reco… │
│ S07-01    ┆ 7.0    ┆ 7.5   ┆ null       ┆ Bleached, altered dacite, no s… │
│ S07-01    ┆ 8.0    ┆ 9.0   ┆ null       ┆ fresh blocky fracture TS        │
│ S07-01    ┆ 9.9    ┆ 10.0  ┆ null       ┆ Mod. Clav altered TS one thin … │
│ S07-01    ┆ null   ┆ 11.0  ┆ null       ┆ Mod clay altered TS  Blocky br… │
│ S07-01    ┆ null   ┆ 12.0  ┆ null       ┆ very weekly altered TS broken … │
│ S07-01    ┆ null   ┆ 13.8  ┆ null       ┆ 12.0-13.8mostly gr. And lost g… │
│ S07-01    ┆ null   ┆ 14.7  ┆ null       ┆ Mod clay altered zone with qua… │
│ S07-01    ┆ null   ┆ 16.0  ┆ null       ┆ wk-mod clay altered TS          │
│ S07-01    ┆ 16.0   ┆ 17.0  ┆ null       ┆ wk clay altered TS              │
│ S07-01    ┆ 18.2   ┆ 18.3  ┆ null       ┆ Cal healino bx TX at  20-30° C… │
│ S07-01    ┆ 19.1   ┆ 19.95 ┆ null       ┆ Very intensely clay altered TS… │
│ S07-01    ┆ 20.0   ┆ 21.0  ┆ null       ┆ fresh to very weakly clay alte… │
│ S07-01    ┆ 21.1   ┆ 21.95 ┆ null       ┆ Highly broken crumbly core, mo… │
│ S07-01    ┆ 22.0   ┆ 23.0  ┆ null       ┆ Fresh to  v.  weak altered TS   │
│ S07-01    ┆ 23.0   ┆ 24.0  ┆ null       ┆ verv uniform unalterd TS        │
│ S07-01    ┆ 24.0   ┆ 28.0  ┆ null       ┆ uniform unaltered TS            │
│ S07-01    ┆ 28.0   ┆ 29.0  ┆ null       ┆ rock is weaklv oxidized. oranq… │
│ S07-01    ┆ 29.1   ┆ 29.4  ┆ null       ┆ finer grained band of tuff in … │
│ S07-01    ┆ 31.05  ┆ 32.20 ┆ null       ┆ Several lim coated fractures a… │
│ S07-01    ┆ 34.5   ┆ 35.0  ┆ null       ┆ major fault - gg and soft dk g… │
│ S07-01    ┆ 35.0   ┆ 36.0  ┆ null       ┆ soft partly bx, fr and altd fo… │
│ S07-01    ┆ 36.0   ┆ 37.15 ┆ null       ┆ Perv K-spr altd zone with qtz-… │
└───────────┴────────┴───────┴────────────┴─────────────────────────────────┘
>>> df = pl.read_excel(partial_path, read_options={'schema_sample_rows':30,'dtypes':{'from_m':'string'}})
>>> with pl.Config(set_tbl_rows=30): print(df)
shape: (29, 5)
┌───────────┬────────┬───────┬────────────┬─────────────────────────────────┐
│ hole_name ┆ from_m ┆ to_m  ┆ litho_unit ┆ comments                        │
│ ---       ┆ ---    ┆ ---   ┆ ---        ┆ ---                             │
│ str       ┆ str    ┆ str   ┆ null       ┆ str                             │
╞═══════════╪════════╪═══════╪════════════╪═════════════════════════════════╡
│ S07-01    ┆ 0.0    ┆ 2.2   ┆ null       ┆ overburden                      │
│ S07-01    ┆ 2.2    ┆ 150.0 ┆ null       ┆ Dacitic Ash Flow Tuff (TS)      │
│ S07-01    ┆ 2.2    ┆ 3.0   ┆ null       ┆ Csg. Fresh Dacitic Ash Flow Tu… │
│ S07-01    ┆ 3.0    ┆ 4.0   ┆ null       ┆ Fresh daft.  Bockv fracturino   │
│ S07-01    ┆ 4.0    ┆ 4.15  ┆ null       ┆ 15cm of coarse 3.0-5.0mm feld … │
│ S07-01    ┆ 5.3    ┆ 6.0   ┆ null       ┆ around core - no record         │
│ S07-01    ┆ 6.55   ┆ 6.95  ┆ null       ┆ Silty sand- type material reco… │
│ S07-01    ┆ 7.0    ┆ 7.5   ┆ null       ┆ Bleached, altered dacite, no s… │
│ S07-01    ┆ 8.0    ┆ 9.0   ┆ null       ┆ fresh blocky fracture TS        │
│ S07-01    ┆ 9.9    ┆ 10.0  ┆ null       ┆ Mod. Clav altered TS one thin … │
│ S07-01    ┆ null   ┆ 11.0  ┆ null       ┆ Mod clay altered TS  Blocky br… │
│ S07-01    ┆ null   ┆ 12.0  ┆ null       ┆ very weekly altered TS broken … │
│ S07-01    ┆ null   ┆ 13.8  ┆ null       ┆ 12.0-13.8mostly gr. And lost g… │
│ S07-01    ┆ null   ┆ 14.7  ┆ null       ┆ Mod clay altered zone with qua… │
│ S07-01    ┆ null   ┆ 16.0  ┆ null       ┆ wk-mod clay altered TS          │
│ S07-01    ┆ 16.0   ┆ 17.0  ┆ null       ┆ wk clay altered TS              │
│ S07-01    ┆ 18.2   ┆ 18.3  ┆ null       ┆ Cal healino bx TX at  20-30° C… │
│ S07-01    ┆ 19.1   ┆ 19.95 ┆ null       ┆ Very intensely clay altered TS… │
│ S07-01    ┆ 20.0   ┆ 21.0  ┆ null       ┆ fresh to very weakly clay alte… │
│ S07-01    ┆ 21.1   ┆ 21.95 ┆ null       ┆ Highly broken crumbly core, mo… │
│ S07-01    ┆ 22.0   ┆ 23.0  ┆ null       ┆ Fresh to  v.  weak altered TS   │
│ S07-01    ┆ 23.0   ┆ 24.0  ┆ null       ┆ verv uniform unalterd TS        │
│ S07-01    ┆ 24.0   ┆ 28.0  ┆ null       ┆ uniform unaltered TS            │
│ S07-01    ┆ 28.0   ┆ 29.0  ┆ null       ┆ rock is weaklv oxidized. oranq… │
│ S07-01    ┆ 29.1   ┆ 29.4  ┆ null       ┆ finer grained band of tuff in … │
│ S07-01    ┆ 31.05  ┆ 32.20 ┆ null       ┆ Several lim coated fractures a… │
│ S07-01    ┆ 34.5   ┆ 35.0  ┆ null       ┆ major fault - gg and soft dk g… │
│ S07-01    ┆ 35.0   ┆ 36.0  ┆ null       ┆ soft partly bx, fr and altd fo… │
│ S07-01    ┆ 36.0   ┆ 37.15 ┆ null       ┆ Perv K-spr altd zone with qtz-… │
└───────────┴────────┴───────┴────────────┴─────────────────────────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🦀 rust 🦀 Pull requests that edit Rust code feature request
Projects
None yet
Development

No branches or pull requests

4 participants