-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 Encoding #133
Comments
I think you are correct. I don’t see any test cases that test this behavior. Back when I started this library pretty much no producers were generating Ofx 2.2 and the code shows this. I wonder if nowadays it would be better to try parsing as xml first and if that errors out, switch to the sgml parsing. This would give us proper encoding detection for free for 2.2 and newer files. At any rate, thanks for bringing this up. :) -Jerry |
I confirm that the bug is still present today in ofxparse 0.20 and in master. My OFX files starts with
When I read the OFX 2.2.0 spec here https://www.ofx.net/downloads/OFX%202.2.pdf "The standard XML declaration must come first. This Processing Instruction (PI) includes an option to specify the version of XML being used, and may include options to show such things as the encoding declaration, and the standalone status of the document." "The OFX declaration must come next in the file. This PI identifies the contents as an Open Financial Exchange file and provides the version number of the Open Financial Exchange declaration itself (not the version number of the contents). The Open Financial Exchange PI contains the following attributes: So it's clear that, with OFX 2.2, the encoding of the file must be read from the first line that should look like
and not from the second line "<?OFX OFXHEADER=" that doesn't have an ENCODING attribute any more. With the current version of ofxparse, my OFX file fails because it contains
but, as the file is encoded in utf-8 but read as ascii by ofxparse, I get:
I'm ready to contribute to fix this bug. Any advice for the best way to fix it would be welcomed. |
When looking at the test suite under the directory "tests/fixtures", I can find 4 files in OFX 2.x that start with
(anzcc.ofx, error_message.ofx, multiple_accounts2.ofx and multiple_accounts.ofx) The first problem is to fix the method read_headers() https://github.com/jseutter/ofxparse/blob/master/ofxparse/ofxparse.py#L86 My ideas: the current methods read_headers() and handle_encoding() work well for OFX 1.x but not for OFX 2.x. Should we try to patch them to work on both, or should we have separate methods for OFX 2.x and use some code to detect if the file is OFX 1.x or 2.x ? |
I made a first prototype here: https://people.via.ecp.fr/~alexis/ofxparse-proto-bug-133.diff This patch only changes the method read_headers(): if the file starts with '<?xml ', I use lxml to parse the ofx file and get the encoding. @jseutter What do you think about this approach ? |
I'm okay with this approach. Although I try to avoid adding dependencies, I think it is warranted in this case. I haven't run your code but the way I read it, it should still work for 1.x files. Is this the case? If you have time to add a test case that highlights the broken (and now fixed) behavior I would be grateful. If not, I'll try to add one myself. |
@jseutter Thanks for your feedback. It should work with ofx v1 and v2. I'll prepare a PR with a test with a v2 file. |
the PR is merged to master? I have the same problem |
You can try my patch: https://people.via.ecp.fr/~alexis/ofxparse-proto-bug-133.diff |
The header of my ofx file is:
and ofxparser fails on Thank you @alexis-via, After patching your patch, that give me other errors. I'm not sure if it is from my ofx file or no. After modify the header to:
The ofxparser runs without the patch. Why don't submit your PR if the patch solve the issue for some case? The project seems unmaintained since 2021. |
Hello,
I am a bit confused by these lines: https://github.com/jseutter/ofxparse/blob/master/ofxparse/ofxparse.py#L85-L95
These are used to determine the headers of the file, and more particularly the encoding.
However, a typical header is:
This method won't parse the
encoding="UTF-8"
header. Therefore, the encoding will (always?) be considered as ASCII in https://github.com/jseutter/ofxparse/blob/master/ofxparse/ofxparse.py#L111-L117You can imagine that if the OFX file contains non-ASCII characters, the reading will fail. As a workaround, I added
encoding:UTF-8
on top of the file. But that seems odd...Thanks
cc @qdp-odoo
The text was updated successfully, but these errors were encountered: