Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus metric for repository's last update #197

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oxzi
Copy link

@oxzi oxzi commented Oct 9, 2022

What is the purpose of this change? What does it change?

A new gauge metric rest_server_repo_last_update_timestamp was added to monitor each repository's last write access. This allows a basic monitoring for each repository's freshness.

Was the change discussed in an issue or in the forum before?

This might be related to the feature request in #176.

Checklist

  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

Copy link
Contributor

@wojas wojas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having this feature, but I have concerns about the way it does its own repo scanning on startup. It is also unclear what the metric should show after the initial load.

Issue #176 so far only discusses metrics for disk size, not the last modified timestamp.


var repoPaths []string

walkFunc := func(path string, d fs.DirEntry, err error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should really use existing validation functions like folderPathValid and splitURLPath (note the maxDepth param) to ensure that the list of repositories returned really matches the repositories that are accessible.

Recursing to arbitrary depth could be very slow when data is stored on remote spinning disks and the directories for some reason contain other large directory trees that are not restic repositories.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The folderPathValid function will be used below in line 122, as both the repository's absolute path as well as the relative folder path are needed. I don't see any advantage in using the splitURLPath function directly, especially as both repo.ObjectTypes and repo.FileTypes are being checked in the walkFunc to determine if some directory path should be followed further.

As any occurrence of an repo.ObjectTypes like directory aborts this path by returning filepath.SkipDir, I don't expect too much directory walking. However, there might be special cases where this might perform poorly. I'm open to suggestions how to address this.

return nil
}

if err := filepath.WalkDir(s.Path, walkFunc); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that WalkDir will not follow symlinks, while symlinks to repos are accepted by rest-server.

An implementation that does follow symlinks risks a loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are totally right about this. I wasn't aware that following symlinks was a requirement of the rest-server, as it isn't part of the README. I can exchange filepath.WalkDir with filepath.Walk and check the type. However, this would have an impact on performance, especially on a huge amount of repositories.

@@ -126,6 +126,12 @@ func runRoot(cmd *cobra.Command, args []string) error {
log.Println("Private repositories disabled")
}

if server.Prometheus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this could be slow for certain configurations, I think that we need a flag to control this feature. I'm not sure if it should be enabled or disabled by default, it is a useful feature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have added the --prometheus-no-preload flag to disable preloading.

repo/repo.go Outdated
log.Printf("%v.PreloadMetrics()", h)
}

stat, err := os.Lstat(h.getSubPath("index"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have new snapshots without the index changing, if the contents did not change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

repo/repo.go Outdated
@@ -113,6 +113,8 @@ const (
BlobRead = 'R' // A blob has been read
BlobWrite = 'W' // A blob has been written
BlobDelete = 'D' // A blob has been deleted

RepoLastUpdate = 'U' // Set an repository's last update timestamp for preloading
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only when preloading? What about changes after the preload?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RepoPreloadLastUpdate flag - previously named RepoLastUpdate - is only used for preloading. That's also why I changed the name.
Later on every BlobWrite, the timestamp will also being bumped to the current time.

A new gauge metric `rest_server_repo_last_update_timestamp` was added to
monitor each repository's last write access. This allows a basic
monitoring for each repository's freshness.

In order to have this metric available at startup, a basic preloading for
Prometheus metrics has been implemented. This operates by scanning the file
system for restic repositories and using their last modified time.
Subsequently, each write access updates the last update time.

If scanning each repository takes too long, it can be disabled through the
`--prometheus-no-preload` flag.

This might be related to the feature request in restic#176.
@oxzi
Copy link
Author

oxzi commented Oct 14, 2022

@wojas: Thanks a lot for your review! I have made some changes and answered in the conversations above.

Furthermore, I tried to motivate why I introduced the preloading, which is not directly part of those metrics, but are imho necessary for a complete view after (automatically) restarting the rest-server.

Please feel free to take another look; and thanks again!

@riotbib
Copy link

riotbib commented Jun 4, 2023

This is relevant to me, please consider merging. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants