Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplication of data? #5

Open
rjbutterworth opened this issue Oct 16, 2019 · 0 comments
Open

Duplication of data? #5

rjbutterworth opened this issue Oct 16, 2019 · 0 comments

Comments

@rjbutterworth
Copy link

I have a StitchData integration tracking opens and clicks tables in my Campaign Monitor data, which pushes CSV files to an S3 bucket at hourly intervals.

I assumed that only new data would be pushed in these CSVs (I understand the replication method is key-based incremental), but clearly there's loads of duplication going on. As an example: in a 24 hour period, looking at the data in Campaign Monitor, I see ~800 clicks in total. However, if I merge the (69!) CSV files generated in the exact same 24 period, I end up with over 133,000 rows of data.

Even during times where there is clearly zero Campaign Monitor activity (3am - 4am on a Sunday night/Monday morning - no mailings having been sent) I get thousands and thousands of rows of data pushed to S3.

Clearly I'm doing something wrong, so would appreciate some help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant