Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an importer for Vanguard 529 CSV data #83

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

farktronix
Copy link
Contributor

This PR has three major parts, broken out into three separate commits:

  1. An importer for Vanguard 529 data in CSV multitable format
  2. Add an option in csvreader.convert_columns for adding .00 decimal precision to any numbers without decimals
  3. Add a unit test for the Schwab balance data importer

1. The Vanguard 529 Importer

Vanguard only provides CSV data downloads for their 529 (child education savings plan) accounts. These files (which are frustratingly named ofxdownload but contain CSV data) have two CSV tables inside of them:

Fund Account Number,Fund Name,Price,Shares,Total Value
535672845-01,Vanguard Target Enrollment 2040/2041 Portfolio,$9.45,348.5265,$3293.58

Account Number,Trade Date,Process Date,Transaction Type,Transaction Description,Investment Name,Share Price,Shares,Gross Amount,Net Amount
535672845-01,01/31/2023,01/31/2023,Contribution AIP,Contribution AIP,Vanguard Target Enrollment 2040/2041 Portfolio,$9.03,27.6855,$250,$250
535672845-01,01/03/2023,01/03/2023,Contribution AIP,Contribution AIP,Vanguard Target Enrollment 2040/2041 Portfolio,$8.42,29.6912,$250,$250

This is a pattern seen before in importers, but one difference here is that the Vanguard data doesn't provide section headers before the tables. To address this, I added a section_titles_are_headers flag that the multitable reader looks for which modifies how many rows it skips when gathering up the next table into a section.

I wanted to share the code from csvreader that processes columns to clean up currency and date formatting so I broke out a new function called process_table in csvreader to clean up the columns, which now gets called from the multitable reader as well.

2. Adding decimal precision to currency columns

Once I got my importer up and running I ran into this frustrating error with the imported data:

529.bean:5:       Transaction does not balance: (-0.000096 USD)

   2023-01-03 * "Contribution AIP" "[VTE2040] Vanguard Target Enrollment 2040/2041 Portfolio"
     Assets:Investments:Vanguard:529  29.6912 VTE2040 {8.42 USD, 2023-01-03}
     Assets:Transfers:Vanguard:529      -250 USD

After reading way more than I ever wanted to know about beancount and precision tolerances I found out that this was due to Vanguard reporting the net amount as $250, which beancount takes to imply infinite decimal precision.

I know it's possible to change this in my beancount file with inferred_tolerance_default but I think this is a situation that will confuse anyone else who uses this importer. I believe beancount made the wrong design decision here on handling numbers without a decimal (the implied precision should be 0.5 in this case) but we're past the point of being able to fix that.

My fix for handling this is to add a new option to the csv reader add_currency_precision. If this exists, then any currency field without a decimal in the number will have .00 appended to it. The default behavior of the Vanguard 529 importer is to add precision, since Vanguard doesn't provide precision on round numbers (and 529 contributions are usually a nice round number that someone contributes every month with an automatic withdrawal), but I left the rest of the importers unchanged. This can also be overridden by the user-provided config by setting 'add_currency_precision' : False.

3. Unit tests for the Schwab balance data importer

Because I had tweaked the multitable base class code I wanted to make sure I hadn't broken any other importers. It looks like the only other importer that used this code is schwab_csv_balances.py.

I don't have a Schwab account, so I manually created a csv file for this unit test based on what the importer is expecting. I don't know if this is what the actual data looks like, so if anyone has access to a Schwab account and can provide an example of one of these files I would be happy to scrub the personal data and use it for the unit test instead of my hand crafted file.

@redstreet
Copy link
Owner

Great, thank you for this! I'll get back on this in the next few days. Very helpful description, thank you! And I like the thoughtful defaults and configurability for (2).

Quick question: there seem to be some moves for prior Vanguard files from the /test/.* dir to its parent, was this intentional. I could be viewing this wrongly since I'm viewing this on a mobile and it's hard to tell.

@farktronix
Copy link
Contributor Author

No rush on this review- take your time!

There was an existing unit test for the Vanguard importer, but it took up the whole top level tests folder, so I created two subfolders (vanguard and vanguard_529) and moved the files into those subfolders. You might be seeing files move from tests into vanguard but the full path is beancount_reds_importers/vanguard/tests/vanguard. Maybe Github is just showing the last path component on mobile.

I just double checked the files on disk and all of the test files are under tests and their corresponding vanguard and vanguard_529 subfolders.


def add_decimal(x):
if '.' not in x:
return x+".00"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For international compatibility, it would be best to use locale to print this, since decimal separator characters vary based on one's region. You may already be familiar with it, but if not, a bit of digging into this might provide a good solution. In particular, the setting of locale needs to be figured out. I'd imagine Beancount does this already, so looking at what it does might provide the right solution here. Thoughts?

@@ -108,10 +108,17 @@ def convert_columns(self, rdr):
# fixup currencies
def remove_non_numeric(x):
return re.sub("[^0-9\.-]", "", str(x).strip()) # noqa: W605

def add_decimal(x):
if '.' not in x:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the previous comment: the . character varies internationally.


def add_decimal(x):
if '.' not in x:
return x+".00"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also consider having add_currency_precision be False or an integer, which would specify the number of zeros to add.

@redstreet
Copy link
Owner

Looks good overall! Left a few comments. Would be happy to merge this once we work through those. Please let me know your thoughts. Thanks!

…nflicts. I hope to test this soon and get the PR back up to date with redstreet's suggestions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants