-
Notifications
You must be signed in to change notification settings - Fork 110
api,client,http,main,prod,storage: Add query param for publisher and solve flooding in recovery #2202
api,client,http,main,prod,storage: Add query param for publisher and solve flooding in recovery #2202
Conversation
…on download cli and query param for download; injected publisher into ctx
… from manifest as its passed along ctx in the download path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no major issues, i think.
the PR branch is behind the base branch by 1 commit, don't forget to update it and re-do the tests.
api/client/client.go
Outdated
func (c *Client) List(hash, prefix, credentials, publisher string) (*api.ManifestList, error) { | ||
uri := c.Gateway + "/bzz-list:/" + hash + "/" + prefix | ||
if publisher != "" { | ||
uri = uri + "?publisher=" + publisher | ||
} | ||
req, err := http.NewRequest(http.MethodGet, uri, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please briefly comment on why you needed to make these changes to the List
func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will add to the description of the PR.
But to resume as the download command is invoked
first, the List is issued and this will fail if the contents are GC'd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
i would like to ask if @zelig has any comments regarding this change.
storage/netstore.go
Outdated
@@ -201,7 +205,8 @@ func (n *NetStore) Get(ctx context.Context, mode chunk.ModeGet, req *Request) (c | |||
if ok { | |||
ch, err = n.RemoteFetch(ctx, req, fi) | |||
if err != nil { | |||
if n.recoveryCallback != nil { | |||
if n.recoveryCallback != nil && publisher != "" { | |||
log.Debug("gp netstore recovery triggered") | |||
n.recoveryCallback(ctx, ref) | |||
time.Sleep(500 * time.Millisecond) // TODO: view what the ideal timeout is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you try measuring how long this actually takes?
i think we should either remove this retry, or use a Sleep
amount which could realistically work (500
is too little for even 3 nodes).
we can also look into outputting a "recovery triggered, retry later" message into the CLI, although this could be done in a different issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relevant: #2172
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes around 2-3 seconds which is more than this Sleep.
The problem is that a 3-second delay will halt the execution and also it can differ on other setups.
I would prefer to remove this and emit a message with a status, so the client can retry at their own retry interval.
What do you think @zelig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the book i resurrected error code 420 ('enhance your calm' of twitter fame) to be returned if recovery was initiated. Indeed the client could then just retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the book i resurrected error code 420 ('enhance your calm' of twitter fame) to be returned if recovery was initiated. Indeed the client could then just retry
love this idea
…nd fixed naming issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will pull and update as well
api/client/client.go
Outdated
func (c *Client) List(hash, prefix, credentials, publisher string) (*api.ManifestList, error) { | ||
uri := c.Gateway + "/bzz-list:/" + hash + "/" + prefix | ||
if publisher != "" { | ||
uri = uri + "?publisher=" + publisher | ||
} | ||
req, err := http.NewRequest(http.MethodGet, uri, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will add to the description of the PR.
But to resume as the download command is invoked
first, the List is issued and this will fail if the contents are GC'd
storage/netstore.go
Outdated
@@ -201,7 +205,8 @@ func (n *NetStore) Get(ctx context.Context, mode chunk.ModeGet, req *Request) (c | |||
if ok { | |||
ch, err = n.RemoteFetch(ctx, req, fi) | |||
if err != nil { | |||
if n.recoveryCallback != nil { | |||
if n.recoveryCallback != nil && publisher != "" { | |||
log.Debug("gp netstore recovery triggered") | |||
n.recoveryCallback(ctx, ref) | |||
time.Sleep(500 * time.Millisecond) // TODO: view what the ideal timeout is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes around 2-3 seconds which is more than this Sleep.
The problem is that a 3-second delay will halt the execution and also it can differ on other setups.
I would prefer to remove this and emit a message with a status, so the client can retry at their own retry interval.
What do you think @zelig?
Updated with Base branch |
storage/netstore.go
Outdated
@@ -201,7 +205,8 @@ func (n *NetStore) Get(ctx context.Context, mode chunk.ModeGet, req *Request) (c | |||
if ok { | |||
ch, err = n.RemoteFetch(ctx, req, fi) | |||
if err != nil { | |||
if n.recoveryCallback != nil { | |||
if n.recoveryCallback != nil && publisher != "" { | |||
log.Debug("gp netstore recovery triggered") | |||
n.recoveryCallback(ctx, ref) | |||
time.Sleep(500 * time.Millisecond) // TODO: view what the ideal timeout is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the book i resurrected error code 420 ('enhance your calm' of twitter fame) to be returned if recovery was initiated. Indeed the client could then just retry
…to Query().Add, return 402 when recovery attempt is being processed
…l return recovery attempt instead and should not be absorbed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, almost there.
make sure to switch 402
status with 420
and i think we will be good to go.
have you manually tested this current implementation for both down
and curl
?
@@ -243,7 +242,6 @@ func readManifest(mr storage.LazySectionReader, addr storage.Address, fileStore | |||
if err != nil { // size == 0 | |||
// can't determine size means we don't have the root chunk | |||
log.Trace("manifest not found", "addr", addr) | |||
err = fmt.Errorf("Manifest not Found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving a comment here mainly for @zelig:
the Manifest not Found
error created here was swallowing up the 420
that was trying to be returned.
in any case, i believe without line 246
the returned error will be 500
or something of the sort (@santicomp2014 can confirm/refute this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, the error code becomes Manifest not found and the error code is 500.
The error manifest not found
is already being thrown in the error.
return n.RemoteFetch(ctx, req, fi) | ||
if n.recoveryCallback != nil && publisher != "" { | ||
log.Debug("content recovery callback triggered", "ref", ref.String()) | ||
go n.recoveryCallback(ctx, ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on some tests, we think it might be necessary to call the recovery callback with go
here.
do you see any issues with this @zelig?
api/http/server.go
Outdated
postPinFail = metrics.NewRegisteredCounter("api/http/post/pin/fail", nil) | ||
deletePinCount = metrics.NewRegisteredCounter("api/http/delete/pin/count", nil) | ||
deletePinFail = metrics.NewRegisteredCounter("api/http/delete/pin/fail", nil) | ||
errRecoveryAttempt = errors.New("recovery was initiated") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define this in netstore once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worried about circular dependency, moved it.
Thanks
api/http/server.go
Outdated
@@ -252,6 +254,11 @@ func (s *Server) HandleBzzGet(w http.ResponseWriter, r *http.Request) { | |||
respondError(w, r, err.Error(), http.StatusUnauthorized) | |||
return | |||
} | |||
if isRecoveryAttemptError(err) { | |||
w.Header().Set("WWW-Authenticate", fmt.Sprintf("Basic realm=%q", uri.Address().String())) | |||
respondError(w, r, err.Error(), http.StatusPaymentRequired) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i meant 420 not 402, you can just construct the response as:
respondError(w, r, err.Error(), 420)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I was confused it did not make sense.
Changed
req, err := http.NewRequest("GET", uri, nil) | ||
values := req.URL.Query() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just
req.URL.Query().Add("publisher", publisher)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, GJ
This PR resolves request flooding by only triggering content repair for the chunks related to the requested hash, as described in #2205.
The publisher is passed from query params or by
SwarmPublisher
on download by CLI.It also removes Publisher from the manifest as it's not needed now.
The
bzz-list
is modified to receive the publisher in the query so it can also get the manifest before the actual download takes place. If this is not done the Download in CLI will fail as its first step is to issue aList
command.closes #2193
closes #2205