Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shepherd updates back and forth between sha version and latest #110

Open
GuyKh opened this issue Sep 3, 2023 · 20 comments
Open

Shepherd updates back and forth between sha version and latest #110

GuyKh opened this issue Sep 3, 2023 · 20 comments

Comments

@GuyKh
Copy link

GuyKh commented Sep 3, 2023

See pic:
image

Very often I'm getting two updates, one from latest (for example, or a versioned image) to a version with sha and then back to non-sha one.

e.g.

[Shepherd] Service general_notify updated on 865954dc3903
Sat Sep  2 06:23:14 IDT 2023 Service general_notify was updated from mazzolino/apprise-microservice:0.1@sha256:3abc60085e429a51455f2e4dee656cbb96b20f4a76f2510c1a75c6e24cd0193c to mazzolino/apprise-microservice:0.1

[Shepherd] Service general_notify updated on 865954dc3903
Sat Sep  2 06:49:34 IDT 2023 Service general_notify was updated from mazzolino/apprise-microservice:0.1 to mazzolino/apprise-microservice:0.1@sha256:3abc60085e429a51455f2e4dee656cbb96b20f4a76f2510c1a75c6e24cd0193c
@djmaze djmaze added the bug label Sep 27, 2023
@awptechnologies
Copy link

may i ask what you are using for notifications?

@GuyKh
Copy link
Author

GuyKh commented Nov 13, 2023

may i ask what you are using for notifications?

Telegram

@moschlar
Copy link
Collaborator

@GuyKh Could you please verify whether this issue still persists with the latest shepherd version? If yes, please run shepherd with VERBOSE=true and share the corresponding log file with us.

Make sure to update your image specifier to containrrr/shepherd.

@GuyKh
Copy link
Author

GuyKh commented Jan 31, 2024

@moschlar latest update occurred 2 months ago, so is your question relates to the last 2 months - in this case - the answer is definitely YES

I can tell that since Jan 17th, things have been quiet on this front

@GuyKh
Copy link
Author

GuyKh commented Jan 31, 2024

It's here again :)

general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 15:16:41 IST 2024 Sleeping 60m before next update
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:16:54 IST 2024 Trying to update service general_adguard-exporter with image ebrianne/adguard-exporter:latest
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:07 IST 2024 Service general_adguard-exporter was updated!
general_shepherd.1.45n0qi5aws3p@nuc    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.45n0qi5aws3p@nuc    |                                  Dload  Upload   Total   Spent    Left  Speed
100   313  100     2  100   311      3    512 --:--:-- --:--:-- --:--:--   514
general_shepherd.1.45n0qi5aws3p@nuc    | okWed Jan 31 16:17:08 IST 2024 Cleaning up old docker images, leaving last 5
general_shepherd.1.45n0qi5aws3p@nuc    | no such manifest: docker.io/tiredofit/traefik-cloudflare-companion:latest
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:12 IST 2024 Error updating service general_cf-companion! Image tiredofit/traefik-cloudflare-companion:latest does not exist or it is not available
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:18 IST 2024 Trying to update service general_cf-ddns with image oznu/cloudflare-ddns:latest
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:33 IST 2024 Service general_cf-ddns was updated!
general_shepherd.1.45n0qi5aws3p@nuc    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.45n0qi5aws3p@nuc    |                                  Dload  Upload   Total   Spent    Left  Speed
100   285  100     2  100   283      5    778 --:--:-- --:--:-- --:--:--   785
general_shepherd.1.45n0qi5aws3p@nuc    | okWed Jan 31 16:17:34 IST 2024 Cleaning up old docker images, leaving last 5
general_shepherd.1.45n0qi5aws3p@nuc    | no such manifest: docker.io/library/docker:latest
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:38 IST 2024 Error updating service general_image-prune! Image docker:latest does not exist or it is not available
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:44 IST 2024 Trying to update service general_notify with image mazzolino/apprise-microservice:0.1
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:18:06 IST 2024 Service general_notify was updated!
general_shepherd.1.45n0qi5aws3p@nuc    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.45n0qi5aws3p@nuc    |                                  Dload  Upload   Total   Spent    Left  Speed
100   297  100     2  100   295      4    682 --:--:-- --:--:-- --:--:--   687
general_shepherd.1.45n0qi5aws3p@nuc    | okWed Jan 31 16:18:07 IST 2024 Cleaning up old docker images, leaving last 5

image

@moschlar
Copy link
Collaborator

From that log and screenshot I can't see that any service is "flapping"...

Can you share your Docker Swarm stack files?

@GuyKh
Copy link
Author

GuyKh commented Feb 1, 2024

Looking at the swarm files - I was using mazzolino/shepherd and not container -- so seeing:
Image mazzolino/shepherd:latest does not exist or it is not available

Retrying this with containrrr/shepherd

@GuyKh
Copy link
Author

GuyKh commented Feb 1, 2024

Well... this still occurs:

general_shepherd.1.5w81t3foj5bk@nuc    | okThu Feb  1 12:25:00 IST 2024 Cleaning up old docker images, leaving last 3
general_shepherd.1.5w81t3foj5bk@nuc    | Thu Feb  1 12:25:06 IST 2024 Trying to update service general_cf-ddns with image oznu/cloudflare-ddns:latest
general_shepherd.1.5w81t3foj5bk@nuc    | Thu Feb  1 12:25:22 IST 2024 Service general_cf-ddns was updated!
general_shepherd.1.5w81t3foj5bk@nuc    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.5w81t3foj5bk@nuc    |                                  Dload  Upload   Total   Spent    Left  Speed
100   304  100     2  100   302      6    936 --:--:-- --:--:-- --:--:--   944
general_shepherd.1.5w81t3foj5bk@nuc    | okThu Feb  1 12:25:22 IST 2024 Cleaning up old docker images, leaving last 3

image

@martadinata666
Copy link

martadinata666 commented Feb 1, 2024

The updating image of someimagename:sometag@shanum should be correct. As images with the same tag identified with their sha sum. That is what determines last year's "latest" with today's "latest".
Sample of mine update log

service nextcloud_imaginary was updated from nextcloud/aio-imaginary:latest@sha256:f7fb3f35cdbacbaa06dbcf6bbc567e39037af1251fb3600b44c8626e3bbf0b01 to nextcloud/aio-imaginary:latest@sha256:3d1cb04f90eca6dbbaaed0f773ed092a024b0eca742b73f88f8b010025d3ab9b

What I can't really tell is why it does non-sha number. Like your first post:

[Shepherd] Service general_notify updated on 865954dc3903
Sat Sep  2 06:23:14 IDT 2023 Service general_notify was updated from mazzolino/apprise-microservice:0.1@sha256:3abc60085e429a51455f2e4dee656cbb96b20f4a76f2510c1a75c6e24cd0193c to mazzolino/apprise-microservice:0.1

From sha to non-sha. Is it because same sha? I'm not sure.

@GuyKh
Copy link
Author

GuyKh commented Feb 1, 2024

I'm really haven't looked in, but I think the issue is the resolving of :latest - what's the logic between resolving it to a specific sha and between the one to keep it 'latest`...

@moschlar
Copy link
Collaborator

moschlar commented Feb 2, 2024

Like I said yesterday, from your recent reports, I can not see the issue that you have been describing in your first post and the title of this issue.

Please try to reproduce this with the latest official image and show us the logging output.

@mjrj97
Copy link

mjrj97 commented Apr 3, 2024

I'm experiencing this issue as well. Here are the messages from Apprise:

[Shepherd] Service backend_file-proxy updated on 5d4000a267bd
Wed Apr  3 02:49:03 CEST 2024 Service backend_file-proxy was updated from [ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342](http://ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342) to [ghcr.io/REDACTED/file-proxy:1.0](http://ghcr.io/REDACTED/file-proxy:1.0)
AppriseApprise | Today at 2:49 AM

[Shepherd] Service backend_file-proxy updated on 5d4000a267bd
Wed Apr  3 03:00:26 CEST 2024 Service backend_file-proxy was updated from [ghcr.io/REDACTED/file-proxy:1.0](http://ghcr.io/REDACTED/file-proxy:1.0) to [ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342](http://ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342)
AppriseApprise | Today at 3:00 AM

We're experiencing this service using either Docker Hub or GitHub container registry, so the problem is probably not the registries. Here are the logs from shepherd (verbose):

Wed Apr  3 02:48:57 CEST 2024 Trying to update service backend_file-proxy with image ghcr.io/REDACTED/file-proxy:1.0
image ghcr.io/REDACTED/file-proxy:1.0 could not be accessed on a registry to record
its digest. Each node will access ghcr.io/REDACTED/file-proxy:1.0 independently,
possibly leading to different nodes running different
versions of the image.
Wed Apr  3 02:49:03 CEST 2024 Service backend_file-proxy was updated!
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   323  100     2  100   321      9   1446 --:--:-- --:--:-- --:--:--  1461

Wed Apr  3 03:00:25 CEST 2024 Trying to update service backend_file-proxy with image ghcr.io/REDACTED/file-proxy:1.0
Wed Apr  3 03:00:26 CEST 2024 Service backend_file-proxy was updated!
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   323  100     2  100   321     10   1664 --:--:-- --:--:-- --:--:--  1682

The example has a defined version tag, but this is also happening to services using images with a latest tag.
Here is our YAML file for the service:

  shepherd:
    image: containrrr/shepherd
    environment:
      SLEEP_TIME: '5m'
      FILTER_SERVICES: 'label=shepherd.autodeploy'
      ROLLBACK_ON_FAILURE: 'true'
      REGISTRIES_FILE: /var/run/secrets/shepherd-registries-auth
      WITH_REGISTRY_AUTH: 'true'
      APPRISE_SIDECAR_URL: 'notify:5000'
      TZ: Europe/Berlin
    secrets:
      - shepherd-registries-auth
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - notification
    deploy:
      placement:
        constraints:
          - node.role == manager

btw. thanks for a great service. Shepherd has really improved our deployment strategy.

@GuyKh
Copy link
Author

GuyKh commented Apr 4, 2024

Like I said yesterday, from your recent reports, I can not see the issue that you have been describing in your first post and the title of this issue.

Please try to reproduce this with the latest official image and show us the logging output.

Just reproduced this with latest version:
Getting this:

[Shepherd] Service general_ouroboros updated on 08186ba56442
Thu Apr  4 12:29:43 IDT 2024 Service general_ouroboros was updated from pyouroboros/ouroboros:latest to pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372

Logs:

general_shepherd.1.xcjzlmsdmhry@ubuntu    | Thu Apr  4 12:29:20 IDT 2024 Trying to update service general_ouroboros with image pyouroboros/ouroboros:latest
general_shepherd.1.xcjzlmsdmhry@ubuntu    | Thu Apr  4 12:29:43 IDT 2024 Service general_ouroboros was updated!
general_shepherd.1.xcjzlmsdmhry@ubuntu    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.xcjzlmsdmhry@ubuntu    |                                  Dload  Upload   Total   Spent    Left  Speed
100   310  100     2  100   308      5    783 --:--:-- --:--:-- --:--:--   788
general_shepherd.1.xcjzlmsdmhry@ubuntu    | okThu Apr  4 12:29:43 IDT 2024 Cleaning up old docker images, leaving last 2

@djmaze
Copy link
Collaborator

djmaze commented Apr 9, 2024

I guess this might be connected to the docker version. Can you tell us the version @GuyKh ?

It would be good to know if these commands both return sha-hashed image ids, in your cluster:

docker service inspect general_ouroboros  -f '{{.PreviousSpec.TaskTemplate.ContainerSpec.Image}}'
docker service inspect general_ouroboros  -f '{{.Spec.TaskTemplate.ContainerSpec.Image}}'

@GuyKh
Copy link
Author

GuyKh commented Apr 10, 2024

Last message was:

[Shepherd] Service general_ouroboros updated on 08186ba56442
Thu Apr  4 12:29:43 IDT 2024 Service general_ouroboros was updated from pyouroboros/ouroboros:latest to pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372

Here are my stats:

docker version
Client: Docker Engine - Community
 Version:           26.0.0
 API version:       1.45
 Go version:        go1.21.8
 Git commit:        2ae903e
 Built:             Wed Mar 20 15:17:56 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          26.0.0
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.8
  Git commit:       8b79278
  Built:            Wed Mar 20 15:17:56 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ docker service inspect general_ouroboros  -f '{{.PreviousSpec.TaskTemplate.ContainerSpec.Image}}'
pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372

$docker service inspect general_ouroboros  -f '{{.Spec.TaskTemplate.ContainerSpec.Image}}'
pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372

Looks like all latest messages include sha and not latest.
I can try when I'll see such message appearing again, if this would help.

@djmaze
Copy link
Collaborator

djmaze commented Apr 14, 2024

Mhh, not sure what to make out of this. I have to say I did not test shepherd with docker 26 yet myself.

@kb1ibt
Copy link

kb1ibt commented May 26, 2024

I was running into this issue on docker 20, but it still exists in docker 26. What I also notice is the sha doesn’t match between what shepherd pulls and what docker stack deploy --prune -c docker-compose.yml --resolve-image always <stack_name> pulls. Because when I first run shepherd it replaces every running container on my swarm, and then when I run docker stack deploy it replaces them all again.

@shizunge
Copy link

I am not sure this is the root cause, but I am able to create a service with an image without the digest by doing the following:

  1. Build a new image locally, but not push it to the registry.
  2. Start the service based on the local image.
  3. After service started, push the image to the registry.

From this post: https://stackoverflow.com/questions/39811230/why-doesnt-my-newly-created-docker-have-a-digest

Normally, two scenarios could make an image doesn't have associated manifest:

    This image has not been pushed to or pulled from a V2 registry.
    This image has been pulled from a V1 registry.

@shizunge
Copy link

Based on this comment, run docker update for an image that requires login, but without --with-registry-auth, resulting in no digest on image of the service. Then it will update back and forth between two versions.

@djmaze
Copy link
Collaborator

djmaze commented Sep 26, 2024

So based on @shizunge's comments, this sounds like a docker / usability problem rather than a shepherd bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants