-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus metric for repository's last update #197
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having this feature, but I have concerns about the way it does its own repo scanning on startup. It is also unclear what the metric should show after the initial load.
Issue #176 so far only discusses metrics for disk size, not the last modified timestamp.
|
||
var repoPaths []string | ||
|
||
walkFunc := func(path string, d fs.DirEntry, err error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should really use existing validation functions like folderPathValid
and splitURLPath
(note the maxDepth
param) to ensure that the list of repositories returned really matches the repositories that are accessible.
Recursing to arbitrary depth could be very slow when data is stored on remote spinning disks and the directories for some reason contain other large directory trees that are not restic repositories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The folderPathValid
function will be used below in line 122, as both the repository's absolute path as well as the relative folder path are needed. I don't see any advantage in using the splitURLPath
function directly, especially as both repo.ObjectTypes
and repo.FileTypes
are being checked in the walkFunc
to determine if some directory path should be followed further.
As any occurrence of an repo.ObjectTypes
like directory aborts this path by returning filepath.SkipDir
, I don't expect too much directory walking. However, there might be special cases where this might perform poorly. I'm open to suggestions how to address this.
return nil | ||
} | ||
|
||
if err := filepath.WalkDir(s.Path, walkFunc); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that WalkDir will not follow symlinks, while symlinks to repos are accepted by rest-server.
An implementation that does follow symlinks risks a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally right about this. I wasn't aware that following symlinks was a requirement of the rest-server, as it isn't part of the README. I can exchange filepath.WalkDir
with filepath.Walk
and check the type. However, this would have an impact on performance, especially on a huge amount of repositories.
cmd/rest-server/main.go
Outdated
@@ -126,6 +126,12 @@ func runRoot(cmd *cobra.Command, args []string) error { | |||
log.Println("Private repositories disabled") | |||
} | |||
|
|||
if server.Prometheus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this could be slow for certain configurations, I think that we need a flag to control this feature. I'm not sure if it should be enabled or disabled by default, it is a useful feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I have added the --prometheus-no-preload
flag to disable preloading.
repo/repo.go
Outdated
log.Printf("%v.PreloadMetrics()", h) | ||
} | ||
|
||
stat, err := os.Lstat(h.getSubPath("index")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have new snapshots without the index changing, if the contents did not change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed!
repo/repo.go
Outdated
@@ -113,6 +113,8 @@ const ( | |||
BlobRead = 'R' // A blob has been read | |||
BlobWrite = 'W' // A blob has been written | |||
BlobDelete = 'D' // A blob has been deleted | |||
|
|||
RepoLastUpdate = 'U' // Set an repository's last update timestamp for preloading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only when preloading? What about changes after the preload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RepoPreloadLastUpdate
flag - previously named RepoLastUpdate
- is only used for preloading. That's also why I changed the name.
Later on every BlobWrite
, the timestamp will also being bumped to the current time.
A new gauge metric `rest_server_repo_last_update_timestamp` was added to monitor each repository's last write access. This allows a basic monitoring for each repository's freshness. In order to have this metric available at startup, a basic preloading for Prometheus metrics has been implemented. This operates by scanning the file system for restic repositories and using their last modified time. Subsequently, each write access updates the last update time. If scanning each repository takes too long, it can be disabled through the `--prometheus-no-preload` flag. This might be related to the feature request in restic#176.
3655900
to
9bbaf27
Compare
@wojas: Thanks a lot for your review! I have made some changes and answered in the conversations above. Furthermore, I tried to motivate why I introduced the preloading, which is not directly part of those metrics, but are imho necessary for a complete view after (automatically) restarting the rest-server. Please feel free to take another look; and thanks again! |
This is relevant to me, please consider merging. Thanks! |
What is the purpose of this change? What does it change?
A new gauge metric
rest_server_repo_last_update_timestamp
was added to monitor each repository's last write access. This allows a basic monitoring for each repository's freshness.Was the change discussed in an issue or in the forum before?
This might be related to the feature request in #176.
Checklist
changelog/unreleased/
that describes the changes for our users (template here)gofmt
on the code in all commits