-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Educational purpose: Implementation question #1641
Comments
Exact-once upload has not much to do with which upload manager, it depends
on the following things:
1. The file name generation is deterministic: kafka-partition-number +
begin-kafka-offset
2. S3 upload first then commit kafka consumer offset
3. In the case when S3 upload succeeds but kafka consumer offset commit
fails, the next secor worker will continue working on this partition and
re-upload the whole thing again (starting with the same begin_offset since
that offset was not committed to kafka), the file would still be named the
same and it will overwrite the the existing file on S3
…On Fri, Oct 16, 2020 at 10:18 AM Jay Patel ***@***.***> wrote:
Hello there,
This question is just for implementation purpose.
In readme file, it is written that
as long as Kafka is not dropping messages (e.g., due to aggressive cleanup
policy) before Secor is able to read them, it is guaranteed that each
message will be saved in exactly one S3 file. This property is not
compromised by the notorious temporal inconsistency of S3 caused by the
eventual consistency model,
Although, doesn't it also depend on what is the underlining implementation
for uploadManager?
Does Hadoops3uploadmanager provide strong consistency?
Thanks
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1641>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYJP77VUZAKB4NVDWSCQ63SLB55VANCNFSM4STTGFUA>
.
|
Thanks Henry, does any of this operation do a get before put? Looks like if you are just replacing the file in 3rd step then there might not be a need for get. |
There is no get, it's a file replacing operation on S3.
…On Mon, Oct 19, 2020 at 11:31 AM Jay Patel ***@***.***> wrote:
Thanks Henry, does any of this operation do a get before put? Looks like
if you are just replacing the file in 3rd step then there might not be a
need for get.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1641 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYJP75HMHWI3ZQ7A5E7Q2DSLSAXLANCNFSM4STTGFUA>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello there,
This question is just for implementation purpose.
In readme file, it is written that
as long as Kafka is not dropping messages (e.g., due to aggressive cleanup policy) before Secor is able to read them, it is guaranteed that each message will be saved in exactly one S3 file. This property is not compromised by the notorious temporal inconsistency of S3 caused by the eventual consistency model,
Although, doesn't it also depend on what is the underlining implementation for uploadManager?
Does
Hadoops3uploadmanager
provide strong consistency?Thanks
The text was updated successfully, but these errors were encountered: