Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transactional mode #2

Open
gilesbradshaw opened this issue Dec 18, 2018 · 8 comments
Open

transactional mode #2

gilesbradshaw opened this issue Dec 18, 2018 · 8 comments

Comments

@gilesbradshaw
Copy link

could you possibly explain this - many thanks :)

@Vanlightly
Copy link
Owner

Check my blog post: https://jack-vanlightly.com/blog/2018/4/28/sql-server-cdc-to-redshift-pipeline
It explains the different strategies that either respect global or table level transactional boundaries and those that simply stream changes as they occur. Reading from a transaction log directly would make the grouping of changes of a given transaction easy (if you can decode the log), but SQL Server CDC makes changes available at a table level. I wanted to experiment with solutions that could export changes of the same db transaction together, regardless of whether the transaction crossed multiple tables or if the transaction was very large.

Let me know if that blog post adequately explains it.

@gilesbradshaw
Copy link
Author

Thanks for that! Yes I see what you mean now.

Might another approach be to put insert triggers on all the CDC tables - converting their contents to json and inserting them into one common table. Then you'd only need to scan that rather than each cdc table.

I have a hundred or so tables I want to CDC...

@Vanlightly
Copy link
Owner

It is a tempting idea as it makes the solution simpler, but it does have two drawbacks:

  1. You'll lose the natural order of the events. From my experiments I find that changes get written to the CDC tables in batches, losing the true total ordering of the transaction log. I did an experiment with triggers and find that also, each batch gets written in reverse order. You could try to reorder the changes in a database query but that could prove difficult.

  2. With CDC enabled we are already adding extra load on the server. Each write ends up written again to a CDC table. This trigger technique involves another write again and in JSON format potentially increasing overall write load dramatically.

If you experiment with the trigger idea let me know how it goes.

@gilesbradshaw
Copy link
Author

It's a shame that microsoft don't make available a better programmatic interface to the transaction log - then one could roll one's own cdc system.

@Vanlightly
Copy link
Owner

I've never used them, but products such as Attunity and Debezium read the transaction log directly, they might offer some functionality that you need.

@gilesbradshaw
Copy link
Author

I've made a bit of progress - and have implemented an app to stream cdc using node js.

I'll put it on github and link you to it but essentially..

1) maxLsn = getMaxLsn
2) if maxLsn has changed from last time
  3) for all tables - get changes between last lsn obtained for that table and maxLsn
  4) now we can publish changes + we can order them across all tables
  5) store the latest lsn and seqval obtained for each table
  6) go back to 1)

There are two improvements..

  1. when there are no changes all we do is step 1) so hardly any impact on database.
  2. in step 4) we have all the changes for all the tables up to maxLsn - s we can order them and send them to a single topic - or send them related to maxLsn - and also send maxLsn event - so we could order them on kafka.

@gilesbradshaw
Copy link
Author

This is it.. https://github.com/gilesbradshaw/node-mssql-cdc-kafka

It's not currently ordering across tables - but the publish step could..

@dongfo
Copy link

dongfo commented Apr 30, 2021

You are no longer developing the project, which is unfortunate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants