[feature] Make managing the artifact storage and cache available in the UI #8104

TobiasGoerke · 2022-08-04T17:05:33Z

I'd like users to be able to manage the artifacts their pipelines have created. This involves disabling or invalidating the cache for certain artifacts so that a pipeline can be re-executed without having to touch code.

I've first mentioned this feature here(#7939) and posted the following GIF, exemplifying how such a feature could look like.

Also, I've created a design document that addresses all details. If you like the feature (or don't), please leave comments here on directly in the document.

Feature Area

/area frontend
/area backend
/area components

Love this idea? Give it a 👍.

juliusvonkohout · 2022-08-10T09:24:12Z

Is it also possible to do this from the runs page? E.g. if you delete a run it tries to delete the artifacts and if you delete an experiment it deletes all runs and therefore all corresponding artifacts. This is what most datascientists expect when they delete a run. You are using ml-metadata which is not namespace isolated yet and therefore 100 % insecure #4790. I am also to open to have it in the insecure ml-metadata part as an addition, but it should be at least available in the currently secure parts, that are structured by runs and experiments.

The API belongs in the apiserver and as suggested above we could implement this without UI or API schema changes. We just have to modify the delete run API call to call a new function delete_cache(run) at the end that deletes the corresponding artifacts from the database and or minio as shown in my PR. This is what most datascientists expect when they delete a run.

If desired we can also expose in addition a proper AUTHENTICATED #7819 API delete_cache(run), invalidate_cache(run) for more specialized approaches and the invalidate/delete button in the UI . Then everybody can also automate it as desired. And last but not least maybe environmant variables DELETE_CACHE_ON_RUN_DELETION=TRUE and or INVALIDATE_CACHE_ON_RUN_DELETION=TRUE in pipeline-install-config that are mapped to and respected by the apiserver.

This approach covers the basic needs for most users and provides flexibility for advanced usecases.

@chensun @difince @TobiasGoerke what do you think?

TobiasGoerke · 2022-08-10T11:56:06Z

Any way users can delete artifacts and invalidate caches in the UI is fine for me. Your suggestion to not offer these features separately might even be more user-friendly. As you mentioned, your approach wouldn't require modifying the frontend and is more secure (for now, at least).

However, what about multiple runs sharing the same cache? Deleting any older run would force newer runs to lose their shared cache / artifacts, too

juliusvonkohout · 2022-08-11T08:28:54Z

Any way users can delete artifacts and invalidate caches in the UI is fine for me. Your suggestion to not offer these features separately might even be more user-friendly. As you mentioned, your approach wouldn't require modifying the frontend and is more secure (for now, at least).

However, what about multiple runs sharing the same cache? Deleting any older run would force newer runs to lose their shared cache / artifacts, too

Is this a real problem? The same happens if you would invalidate a single artifact from the ml-metadata UI. It can always affect later runs that used the same artifact as cache.

You should definitely present it here Kubeflow Pipelines Community Meeting (PST AM)
https://meet.google.com/jkr-dupp-wwm and register it here https://docs.google.com/document/d/1cHAdK1FoGEbuQ-Rl6adBDL5W2YpDiUbnMLIwmoXBoAU/edit first

Maybe you can reach @chensun or @zijianjoy on slack to discuss it with them first.

difince · 2022-08-17T17:24:50Z

E.g. if you delete a run it tries to delete the artifacts and if you delete an experiment it deletes all runs and therefore all corresponding artifacts.

This does not happen really, doesn't it?
I have opened a few issues you may find relevant in some ways to the current issue:

Your two approaches make sense to me, but I'm leaning more toward what Julius suggests because of the existing infrastructure - security /namespace isolation...
But just to add that for the first time, I took a look at the ml metadata so .. :)

TobiasGoerke · 2022-08-22T13:27:27Z

E.g. if you delete a run it tries to delete the artifacts and if you delete an experiment it deletes all runs and therefore all corresponding artifacts.

This does not happen really, doesn't it? I have opened a few issues you may find relevant in some ways to the current issue:

DeletePipeline does not clean PipelienVersion's data from Minio

DeleteExperiment does not clean up all relevant children objects from the DB

Your two approaches make sense to me, but I'm leaning more toward what Julius suggests because of the existing infrastructure - security /namespace isolation... But just to add that for the first time, I took a look at the ml metadata so .. :)

Thanks for your reply, @difince. I've also been inclined to implement @juliusvonkohout's idea. However, turns out it is impossible to deduce cache entries in the db from runs. There simply is no information that could be used to match these. In case you have an idea how this could work, I'd be very glad to hear it!

Given this limitation, I've decided to take a different approach and simply make disabling caching entries available in the UI (see here. While this doesn't delete database entries once they were created, there are other open PRs, that should take care of overrunning databases.

juliusvonkohout · 2022-08-23T09:15:31Z

I discussed with @TobiasGoerke that we go for a disable cache switch in the pipeline run UI and a slightly modified version of #7939 (comment) . If the maximum cache staleness leads to an empty list from

pipelines/backend/src/cache/storage/execution_cache_store.go

Line 62 in 555447f

var executionCaches []*model.ExecutionCache

then we should scan the database and delete all entries older than max_cache_staleness. This ensures that it is still performant in very large installations and solves another long standing issue of an indefinitely growing cachedb.

So TWO environment variables for the cache-server that provide a default (if the pipeline does not have values) AND maximum (larger values in the pipeline are ignored) cache staleness. Then an administrator can set the expiration date on his Minio/S3/GCS storage backend to the same value as the maximum cache staleness and provide a sensible staleness default value for its users pipelines. We limit the user-set value int he pipeline definition to the maximum value from the administrator. The users also do not need to recompile existing pipelines anymore because they can disable the cache from the UI. I think setting the exact cache duration from the UI is overkill and a disable/enable switch is enough.

I think this covers most usecases, is independent of the storage backend and rather easy to implement. Tobias Goerke already has a POC master...TobiasGoerke:pipelines:master #8177 for the UI change.

@chensun this is also very much in line what google wants or do you think different?

github-actions · 2024-03-06T07:41:53Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions · 2024-04-25T07:42:37Z

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

TobiasGoerke added the kind/feature label Aug 4, 2022

google-oss-prow bot added area/frontend area/backend area/components labels Aug 4, 2022

TobiasGoerke mentioned this issue Aug 22, 2022

feat(frontend): caching may now be disabled when starting pipeline runs #8177

Closed

github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 6, 2024

github-actions bot closed this as completed Apr 25, 2024

tmvfb mentioned this issue May 10, 2024

[feature] delete MinIO artifacts on pipeline run deletion #10816

Open

HumairAK mentioned this issue May 22, 2024

[feature] Option to disable Caching in V2 at the KFP, Pipeline, and Run level #10839

Open

github-project-automation bot added this to KFP Runtime Triage Aug 29, 2024

github-project-automation bot moved this to Closed in KFP Runtime Triage Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Make managing the artifact storage and cache available in the UI #8104

[feature] Make managing the artifact storage and cache available in the UI #8104

TobiasGoerke commented Aug 4, 2022

juliusvonkohout commented Aug 10, 2022 •

edited

Loading

TobiasGoerke commented Aug 10, 2022

juliusvonkohout commented Aug 11, 2022

difince commented Aug 17, 2022 •

edited

Loading

TobiasGoerke commented Aug 22, 2022

juliusvonkohout commented Aug 23, 2022 •

edited

Loading

github-actions bot commented Mar 6, 2024

github-actions bot commented Apr 25, 2024

[feature] Make managing the artifact storage and cache available in the UI #8104

[feature] Make managing the artifact storage and cache available in the UI #8104

Comments

TobiasGoerke commented Aug 4, 2022

Feature Area

juliusvonkohout commented Aug 10, 2022 • edited Loading

TobiasGoerke commented Aug 10, 2022

juliusvonkohout commented Aug 11, 2022

difince commented Aug 17, 2022 • edited Loading

TobiasGoerke commented Aug 22, 2022

juliusvonkohout commented Aug 23, 2022 • edited Loading

github-actions bot commented Mar 6, 2024

github-actions bot commented Apr 25, 2024

juliusvonkohout commented Aug 10, 2022 •

edited

Loading

difince commented Aug 17, 2022 •

edited

Loading

juliusvonkohout commented Aug 23, 2022 •

edited

Loading