Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #374 #375

Merged
merged 5 commits into from
Dec 12, 2023
Merged

Fixes #374 #375

merged 5 commits into from
Dec 12, 2023

Conversation

saks
Copy link
Contributor

@saks saks commented Nov 23, 2023

This PR fixed #374. The logic is inspired by the implementation of https://github.com/SheetJS/sheetjs/. Example file is copied from https://github.com/SheetJS/test_files.

I do realize that this change fixes an issue that can only occur when file was created using really old software but customers still upload such files, hence I truly believe it is a useful addition to this project.

Please let me know if there is anything that I can improve/fix.

Copy link
Owner

@tafia tafia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!
I have made a few comments, do you mind using an enum and specifying in comments where the logic for biff_version == 0 is coming from? I can't find it. Thanks!

src/xls.rs Outdated

let mut biff = match biff_version {
// BIFF8
0x0600 => 8,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind I'd rather have a Biff enum here? So we're sure not to forget any variant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks

src/xls.rs Outdated
// BIFF2
0x0200 | 0x0002 | 0x0007 => 2,
_ => {
if r.data[4..].len() > 6 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be checked for all Bofs? Spec says that is r.data.len() == 16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, although I'm not sure if we need this check at all. Initially I've copied it from sheetjs, but now I don't see a purpose to be honest. Especially if I cannot write a test (I don't have an example xls file) to hit this line. What do you think?

src/xls.rs Outdated
}
};

if biff_version == 0 && dt == 0x1000 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How come a biff_version = 0 is valid? Even looking at spces from 2012 (the oldest one available on Microsoft website) I don't see it mentioned.
  2. I think I'd prefer having these checks in the _ branch only
  3. I am not sure where this logic is coming from? I didn't find it in sheet.js?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It doesn't appear that the BIFF0 is a valid version to me either. However it looks like this logic is related to how the beginning of BOF record is interpreted.
  2. Agree!
  3. This is where I found it. Not sure all of that logic is in agreement with the rest of the code I've seen in the same file though. I guess some of it is not making sense to me now. Let me clean it up a little.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,
As we do not have a proper test file for this and I don't understand why it is here, could we perhaps comment this part of the code, so if someone has an issue we can uncomment and put a proper test, in the meantime staying "true" to the spec seems like a better choice.

@saks
Copy link
Contributor Author

saks commented Nov 26, 2023

Thanks a lot! I have made a few comments, do you mind using an enum and specifying in comments where the logic for biff_version == 0 is coming from? I can't find it. Thanks!

Thanks for the review! PR updated. Please let me know if changes are making sense.

src/xls.rs Outdated Show resolved Hide resolved
src/xls.rs Outdated
Ok(Bof { biff })
}

/// BoundSheet8 [MS-XLS 2.4.28]
fn parse_sheet_metadata(
r: &mut Record<'_>,
encoding: &XlsEncoding,
biff: u8,
biff: &Biff,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the Biff: Copy and remove the ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

0x0500 => Biff::Biff5,
0x0600 => Biff::Biff8,
0 => {
if dt == 0x1000 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find an example of biff5 file that would trigger this logic. Please let me know if you'd like me to remove this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: I'm talking about removing this code since, if we comment it out, as you've suggested earlier, we'd have no use for dt variable. Then I guess we'll need to comment out more code. I agree that following the spec is the best option, especially with no evidence that we'll ever encounter files that can trigger this logic.

@tafia tafia merged commit 3c74204 into tafia:master Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text is truncated for xls document with BIFF5
2 participants