-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes pd.concat
error halting processing
#972
Fixes pd.concat
error halting processing
#972
Conversation
Yep! I tried the same and it was driving me mad. I still have no idea of the difference 🤷♂️
And also yes, if somehow the same column in all images returned None / NaN 's then that column would be missing. This would cause a problem when searching for that column to plot, but then again so would trying to plot values for a column of all NaN's. |
I say that 1 - this is unlikely and 2 - this would cause an error anyways so from a user perspective then it isn't really solving anything so until someone complains and it becomes a legit issue, let's not try to preempt something that might not be an issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me, let's just keep an eye out for missing column errors if users experience that.
Also I'm very in favour of the WWND
Thanks for your nice detective work on this @MaxGamill-Sheffield! 🕵️ This seems a sensible fix to me and enables the data set I mentioned in issue #969 to run through to completion and successfully produces the all_statistics.csv. Happy for this to be merged! |
I've added to the usage docs in the the tracing outputs which were missing, and the warning about the columns so it is documented somewhere not in our brains :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't have remembered to add documentation 👍
closes #969
TLDR:
PR adds the following:
Think I got it. The cause according to the docs is that pandas doesn't like concatenating objects with whole columns (or rows) as NaN's or None's (we had a mix which is why it was hard to find).
Not only that, but for some reason the "warning" also halts processing.
The initial fix I tried was to stop the warning using the below code which seemed to work, so maybe a 🔥hotfix🔥 idea for something in the future:
Then I thought... WWND (what would @ns-rse do) 👼, and so I solved the root issue and stopped the message from popping up in the first place by dropping all columns with all values as None or NaN's, and applied this across:
so this shouldn't happen again and we will still be told of depreciation warnings.
Now I've tested this slightly to ensure that if a column is present in
df1
but notdf2
, it will be added, but filled with NaNs for the values in theresultant_df
fordf2
.