-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an importer for Vanguard 529 CSV data #83
base: main
Are you sure you want to change the base?
Conversation
0986ff8
to
007d71c
Compare
Great, thank you for this! I'll get back on this in the next few days. Very helpful description, thank you! And I like the thoughtful defaults and configurability for (2). Quick question: there seem to be some moves for prior Vanguard files from the |
No rush on this review- take your time! There was an existing unit test for the Vanguard importer, but it took up the whole top level tests folder, so I created two subfolders ( I just double checked the files on disk and all of the test files are under |
|
||
def add_decimal(x): | ||
if '.' not in x: | ||
return x+".00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For international compatibility, it would be best to use locale
to print this, since decimal separator characters vary based on one's region. You may already be familiar with it, but if not, a bit of digging into this might provide a good solution. In particular, the setting of locale
needs to be figured out. I'd imagine Beancount does this already, so looking at what it does might provide the right solution here. Thoughts?
@@ -108,10 +108,17 @@ def convert_columns(self, rdr): | |||
# fixup currencies | |||
def remove_non_numeric(x): | |||
return re.sub("[^0-9\.-]", "", str(x).strip()) # noqa: W605 | |||
|
|||
def add_decimal(x): | |||
if '.' not in x: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the previous comment: the .
character varies internationally.
|
||
def add_decimal(x): | ||
if '.' not in x: | ||
return x+".00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also consider having add_currency_precision
be False
or an integer, which would specify the number of zeros to add.
Looks good overall! Left a few comments. Would be happy to merge this once we work through those. Please let me know your thoughts. Thanks! |
…nflicts. I hope to test this soon and get the PR back up to date with redstreet's suggestions
This PR has three major parts, broken out into three separate commits:
csvreader.convert_columns
for adding.00
decimal precision to any numbers without decimals1. The Vanguard 529 Importer
Vanguard only provides CSV data downloads for their 529 (child education savings plan) accounts. These files (which are frustratingly named
ofxdownload
but contain CSV data) have two CSV tables inside of them:This is a pattern seen before in importers, but one difference here is that the Vanguard data doesn't provide section headers before the tables. To address this, I added a
section_titles_are_headers
flag that the multitable reader looks for which modifies how many rows it skips when gathering up the next table into a section.I wanted to share the code from
csvreader
that processes columns to clean up currency and date formatting so I broke out a new function calledprocess_table
incsvreader
to clean up the columns, which now gets called from the multitable reader as well.2. Adding decimal precision to currency columns
Once I got my importer up and running I ran into this frustrating error with the imported data:
After reading way more than I ever wanted to know about beancount and precision tolerances I found out that this was due to Vanguard reporting the net amount as
$250
, which beancount takes to imply infinite decimal precision.I know it's possible to change this in my beancount file with
inferred_tolerance_default
but I think this is a situation that will confuse anyone else who uses this importer. I believe beancount made the wrong design decision here on handling numbers without a decimal (the implied precision should be 0.5 in this case) but we're past the point of being able to fix that.My fix for handling this is to add a new option to the csv reader
add_currency_precision
. If this exists, then any currency field without a decimal in the number will have.00
appended to it. The default behavior of the Vanguard 529 importer is to add precision, since Vanguard doesn't provide precision on round numbers (and 529 contributions are usually a nice round number that someone contributes every month with an automatic withdrawal), but I left the rest of the importers unchanged. This can also be overridden by the user-provided config by setting'add_currency_precision' : False
.3. Unit tests for the Schwab balance data importer
Because I had tweaked the multitable base class code I wanted to make sure I hadn't broken any other importers. It looks like the only other importer that used this code is
schwab_csv_balances.py
.I don't have a Schwab account, so I manually created a csv file for this unit test based on what the importer is expecting. I don't know if this is what the actual data looks like, so if anyone has access to a Schwab account and can provide an example of one of these files I would be happy to scrub the personal data and use it for the unit test instead of my hand crafted file.