Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

api,client,http,main,prod,storage: Add query param for publisher and solve flooding in recovery #2202

Merged
merged 19 commits into from
Jun 16, 2020

Conversation

santicomp2014
Copy link
Contributor

@santicomp2014 santicomp2014 commented Jun 11, 2020

This PR resolves request flooding by only triggering content repair for the chunks related to the requested hash, as described in #2205.

The publisher is passed from query params or by SwarmPublisher on download by CLI.

It also removes Publisher from the manifest as it's not needed now.

The bzz-list is modified to receive the publisher in the query so it can also get the manifest before the actual download takes place. If this is not done the Download in CLI will fail as its first step is to issue a List command.

closes #2193
closes #2205

…on download cli and query param for download; injected publisher into ctx
@santicomp2014 santicomp2014 changed the base branch from master to global-pinning June 11, 2020 11:13
@mortelli mortelli added global pinning experimental implementation of global pinning in progress labels Jun 11, 2020
@mortelli mortelli mentioned this pull request Jun 12, 2020
4 tasks
@santicomp2014 santicomp2014 marked this pull request as ready for review June 12, 2020 14:58
@santicomp2014 santicomp2014 requested review from mortelli and zelig June 12, 2020 14:58
Copy link
Contributor

@mortelli mortelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no major issues, i think.

the PR branch is behind the base branch by 1 commit, don't forget to update it and re-do the tests.

api/client/client.go Outdated Show resolved Hide resolved
api/client/client.go Outdated Show resolved Hide resolved
api/client/client.go Outdated Show resolved Hide resolved
api/client/client.go Outdated Show resolved Hide resolved
Comment on lines 421 to 426
func (c *Client) List(hash, prefix, credentials, publisher string) (*api.ManifestList, error) {
uri := c.Gateway + "/bzz-list:/" + hash + "/" + prefix
if publisher != "" {
uri = uri + "?publisher=" + publisher
}
req, err := http.NewRequest(http.MethodGet, uri, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please briefly comment on why you needed to make these changes to the List func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add to the description of the PR.
But to resume as the download command is invoked
first, the List is issued and this will fail if the contents are GC'd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

i would like to ask if @zelig has any comments regarding this change.

cmd/swarm/flags.go Outdated Show resolved Hide resolved
prod/prod.go Outdated Show resolved Hide resolved
storage/netstore.go Outdated Show resolved Hide resolved
storage/netstore.go Outdated Show resolved Hide resolved
@@ -201,7 +205,8 @@ func (n *NetStore) Get(ctx context.Context, mode chunk.ModeGet, req *Request) (c
if ok {
ch, err = n.RemoteFetch(ctx, req, fi)
if err != nil {
if n.recoveryCallback != nil {
if n.recoveryCallback != nil && publisher != "" {
log.Debug("gp netstore recovery triggered")
n.recoveryCallback(ctx, ref)
time.Sleep(500 * time.Millisecond) // TODO: view what the ideal timeout is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you try measuring how long this actually takes?

i think we should either remove this retry, or use a Sleep amount which could realistically work (500 is too little for even 3 nodes).

we can also look into outputting a "recovery triggered, retry later" message into the CLI, although this could be done in a different issue

Copy link
Contributor

@mortelli mortelli Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relevant: #2172

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes around 2-3 seconds which is more than this Sleep.
The problem is that a 3-second delay will halt the execution and also it can differ on other setups.

I would prefer to remove this and emit a message with a status, so the client can retry at their own retry interval.
What do you think @zelig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the book i resurrected error code 420 ('enhance your calm' of twitter fame) to be returned if recovery was initiated. Indeed the client could then just retry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the book i resurrected error code 420 ('enhance your calm' of twitter fame) to be returned if recovery was initiated. Indeed the client could then just retry

love this idea

Copy link
Contributor Author

@santicomp2014 santicomp2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will pull and update as well

api/client/client.go Outdated Show resolved Hide resolved
api/client/client.go Outdated Show resolved Hide resolved
api/client/client.go Outdated Show resolved Hide resolved
api/client/client.go Outdated Show resolved Hide resolved
Comment on lines 421 to 426
func (c *Client) List(hash, prefix, credentials, publisher string) (*api.ManifestList, error) {
uri := c.Gateway + "/bzz-list:/" + hash + "/" + prefix
if publisher != "" {
uri = uri + "?publisher=" + publisher
}
req, err := http.NewRequest(http.MethodGet, uri, nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add to the description of the PR.
But to resume as the download command is invoked
first, the List is issued and this will fail if the contents are GC'd

cmd/swarm/flags.go Outdated Show resolved Hide resolved
prod/prod.go Outdated Show resolved Hide resolved
storage/netstore.go Outdated Show resolved Hide resolved
storage/netstore.go Outdated Show resolved Hide resolved
@@ -201,7 +205,8 @@ func (n *NetStore) Get(ctx context.Context, mode chunk.ModeGet, req *Request) (c
if ok {
ch, err = n.RemoteFetch(ctx, req, fi)
if err != nil {
if n.recoveryCallback != nil {
if n.recoveryCallback != nil && publisher != "" {
log.Debug("gp netstore recovery triggered")
n.recoveryCallback(ctx, ref)
time.Sleep(500 * time.Millisecond) // TODO: view what the ideal timeout is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes around 2-3 seconds which is more than this Sleep.
The problem is that a 3-second delay will halt the execution and also it can differ on other setups.

I would prefer to remove this and emit a message with a status, so the client can retry at their own retry interval.
What do you think @zelig?

@santicomp2014
Copy link
Contributor Author

Updated with Base branch

@santicomp2014 santicomp2014 changed the title Add query param for publisher and solve flooding in recovery api,client,http,main,prod,storage: Add query param for publisher and solve flooding in recovery Jun 12, 2020
@@ -201,7 +205,8 @@ func (n *NetStore) Get(ctx context.Context, mode chunk.ModeGet, req *Request) (c
if ok {
ch, err = n.RemoteFetch(ctx, req, fi)
if err != nil {
if n.recoveryCallback != nil {
if n.recoveryCallback != nil && publisher != "" {
log.Debug("gp netstore recovery triggered")
n.recoveryCallback(ctx, ref)
time.Sleep(500 * time.Millisecond) // TODO: view what the ideal timeout is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the book i resurrected error code 420 ('enhance your calm' of twitter fame) to be returned if recovery was initiated. Indeed the client could then just retry

cmd/swarm/flags.go Outdated Show resolved Hide resolved
api/client/client.go Outdated Show resolved Hide resolved
api/client/client.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mortelli mortelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, almost there.

make sure to switch 402 status with 420 and i think we will be good to go.

have you manually tested this current implementation for both down and curl?

api/client/client.go Show resolved Hide resolved
api/client/client.go Show resolved Hide resolved
api/client/client.go Show resolved Hide resolved
api/http/server.go Outdated Show resolved Hide resolved
api/http/server.go Outdated Show resolved Hide resolved
api/http/server.go Outdated Show resolved Hide resolved
api/http/server.go Outdated Show resolved Hide resolved
@@ -243,7 +242,6 @@ func readManifest(mr storage.LazySectionReader, addr storage.Address, fileStore
if err != nil { // size == 0
// can't determine size means we don't have the root chunk
log.Trace("manifest not found", "addr", addr)
err = fmt.Errorf("Manifest not Found")
Copy link
Contributor

@mortelli mortelli Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving a comment here mainly for @zelig:

the Manifest not Found error created here was swallowing up the 420 that was trying to be returned.

in any case, i believe without line 246 the returned error will be 500 or something of the sort (@santicomp2014 can confirm/refute this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, the error code becomes Manifest not found and the error code is 500.
The error manifest not found is already being thrown in the error.

return n.RemoteFetch(ctx, req, fi)
if n.recoveryCallback != nil && publisher != "" {
log.Debug("content recovery callback triggered", "ref", ref.String())
go n.recoveryCallback(ctx, ref)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on some tests, we think it might be necessary to call the recovery callback with go here.

do you see any issues with this @zelig?

postPinFail = metrics.NewRegisteredCounter("api/http/post/pin/fail", nil)
deletePinCount = metrics.NewRegisteredCounter("api/http/delete/pin/count", nil)
deletePinFail = metrics.NewRegisteredCounter("api/http/delete/pin/fail", nil)
errRecoveryAttempt = errors.New("recovery was initiated")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define this in netstore once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about circular dependency, moved it.
Thanks

@@ -252,6 +254,11 @@ func (s *Server) HandleBzzGet(w http.ResponseWriter, r *http.Request) {
respondError(w, r, err.Error(), http.StatusUnauthorized)
return
}
if isRecoveryAttemptError(err) {
w.Header().Set("WWW-Authenticate", fmt.Sprintf("Basic realm=%q", uri.Address().String()))
respondError(w, r, err.Error(), http.StatusPaymentRequired)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i meant 420 not 402, you can just construct the response as:

respondError(w, r, err.Error(), 420)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I was confused it did not make sense.
Changed

req, err := http.NewRequest("GET", uri, nil)
values := req.URL.Query()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just

 req.URL.Query().Add("publisher", publisher)

Copy link
Contributor Author

@santicomp2014 santicomp2014 Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used req.URL.Query().Add("publisher", publisher) and the query param was not reaching the other side.
I looked at other places of the code using Add/Set and it was encoding after adding it.

After changing it to .Encode() it works correctly.
@mortelli saw the same thing

Copy link
Contributor

@mortelli mortelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, GJ

@santicomp2014 santicomp2014 merged commit 53c0fcc into global-pinning Jun 16, 2020
@mortelli mortelli deleted the global-pinning-fix-request-flooding branch June 16, 2020 19:02
@mortelli mortelli restored the global-pinning-fix-request-flooding branch June 17, 2020 02:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
global pinning experimental implementation of global pinning ready for another review ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants