Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after upgrading to 1.16.0, images dont load in web UI. #248

Open
mathematicalmichael opened this issue Jul 1, 2024 · 18 comments
Open

after upgrading to 1.16.0, images dont load in web UI. #248

mathematicalmichael opened this issue Jul 1, 2024 · 18 comments

Comments

@mathematicalmichael
Copy link

I'm using the docker-compose stack. Basically everything to recreate my set up is here: https://github.com/ml-starter-packs/clearml-lightning except I bumped my version to 1.16.0

after upgrading my image tags to the latest release, I noticed that the clearml-fileserver emits Error getting token whenever I try to load images in the Plots tab.

image

data still works fine, and oddly enough so do Debug Samples despite them also coming from the fileserver.
Those load fine...
image

(top: manual download from web UI in artifacts tab... works fine, seems to authenticate happily)
(middle: errors when I load the Plots tab, first screenshot)
(bottom: opening the Debug Samples tab)

downgrading to 1.15.1 (as is in the repo linked above) restores all images in Web UI.

@jkhenning
Copy link
Member

Hi @mathematicalmichael,

The new server version has added the built-in authentication for the fileserver, I assume that for some reason (perhaps due to the fileserver url you're using?) the WebApp does not identify the fileserver and thus is not attaching the cookie when trying to download (the SDK obviously does that).

You can try disabling this feature in the fileserver (using the fileserver.conf file) by setting auth.enabled: false (you can also do that in the docker-compose or in the docker compose override file with an environment variable) and see if it helps

@mathematicalmichael
Copy link
Author

thanks. yes, I know the new version has auth, which is exactly what I want / need (in fact). So I do not want to disable it (though it's no better than downgrading, I know).

Could it be because the fileserver is not at the subdomain "files.."?

(unfortunately I don't have control over subdomain names)

I do however, have some cycles to try and fix it, if it's possible. I just need guidance on the cause of the issue / feasibility of solutions.

@jkhenning
Copy link
Member

In general, the server is configured to place the cookie with a specific domain - I assume the cookie is simply not propagated to the fileserver since it's hosted under a different domain name - in general, if the two services are hosted under some parent domain name (like app.my-domain.com and files.my-domain.com) its simply possible to set the cookie domain to the common domain name (e.g. .my-domain.com)
Can you share the pattern of the domains you're using?

@mathematicalmichael
Copy link
Author

mathematicalmichael commented Jul 3, 2024

@jkhenning thank you! so it sounds like my suspicion might have been directionally correct and that the cookie's scope is missing our URLs.

The networking set up I am constrained to with this particular ClearML deployment has the following structure:

https://<port>-<hash tied to EC2 instance>.<domain>.<tld>

so my setup is https://8080-....site.com https://8081-....site.com https://8008-....site.com

setting it to .site.com would be a security concern: way too broad a scope. each EC2 instance gets its own URL.

I wrote this part of the ClearML docs:
image
so I very much remember dealing with this on an earlier deployment (but one where I had control over subdomain names)

I was surprised when the deployment "just worked" with this new domain mapping (for this deployment), but I realize now that was because the fileserver was totally insecure until 1.16.0, so the domain didn't matter. We've been using these urls for six months now, so I'm not sure the aforementioned docs are "exactly correct" anymore.

@mathematicalmichael
Copy link
Author

that all said... take a look at my logs again. Notice that the Debug Images load just fine from the web app, and they're served behind the same backend fileserver URL.

So... what does that tell us about that cookie's scope... When one tab in the ClearML Web UI is able to load assets from the fileserver, but the neighboring tab does not???

@jkhenning
Copy link
Member

Ah, this might be a WebApp issue, some plots (which are too complicated to be stored as a plotly object) are stored as an image, but the link is embedded in the plot object, which means the WebApp has to parse it and decide whether to attach the cookie there, I think the WebApp only knows how to automatically do that for the standard port variants and the standard subdomains.
You should be able to explicitly specify the fileserver URL to the webapp by adding the following env var to the webapp service:
WEBSERVER__fileBaseUrl=https://8081-....site.com

@mathematicalmichael
Copy link
Author

mathematicalmichael commented Jul 3, 2024

ooh Ill try that env var! thank you!

but I'm not sure that explains why Debug Images work while Plotly image embeds do not. Is it because the two structure the urls differently?

(and I explicitly save some as images for better control over formatting - e.g. histograms. I send some to Debug and some to the Plots tab. Debug tab works, Plot does not. same underlying fileserver url structure, but console logs show 401 only on the latter)

is the scope of the cookie a problem given how the urls are structured? other customers (not us) using the same reverse proxy would have urls with the same domain name, and I dont want those to be valid against my instance...

@mathematicalmichael
Copy link
Author

mathematicalmichael commented Jul 3, 2024

    environment:
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-}
      CLEARML_API_HOST: ${CLEARML_API_HOST:-}
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
      WEBSERVER__fileBaseUrl: ${CLEARML_FILES_HOST:-}

yields

Error parsing WEBSERVER__fileBaseUrl JSON value `https://8081-.....com/`: Expecting value: line 1 column 1 (char 0)

if I prepend CLEARML_ to the front of it... it does not complain.
Does work with that env var on 1.15.1

and upgrading to 1.16.0 with that prepended env var does not bring back the images (had to force refresh to avoid browser cache tricking me)

@jkhenning
Copy link
Member

I guess you should put it in quotes?

@mathematicalmichael
Copy link
Author

mathematicalmichael commented Jul 3, 2024

one thing I noticed poking around the console:
the requests that are getting the 401 from Plot tab do not have a cookie set in the request header.
the requests that succeed from the Debug Samples tab do have a cookie set in the request header

@mathematicalmichael
Copy link
Author

mathematicalmichael commented Jul 3, 2024

I guess you should put it in quotes?

tried that, both single and double quotes still throw the same message.

I'm pretty sure the problem is that the cookie isn't set by the template that renders out the plotly images.

@jkhenning
Copy link
Member

It's possible docker compose removes the quotes, can you perhaps try:
WEBSERVER__fileBaseUrl: \"${CLEARML_FILES_HOST:-}\"

@mathematicalmichael
Copy link
Author

@jkhenning unfortunately that also throws the same Error parsing error.

to my comment about the browser Inspect tool showing a missing cookie (but valid artifact url) in the requests that are 401'ing... could this possibly explain the situation? (cookie not set in the first place)

@oren-allegro
Copy link

@mathematicalmichael - what version of docker-compose (or docker) are you using?
Also - does the environment value already contain quotes? Can you try setting the explicit value without the variable - just to see if it works:

  • WEBSERVER__fileBaseUrl=http://***:8081

@jkhenning
Copy link
Member

@mathematicalmichael any update?

@shyallegro
Copy link
Contributor

moving some info from this discussion https://clearml.slack.com/archives/CTK20V944/p1730193419574559 here

when a Plotly background image is set, Plotly used JS to fetch it, but it omits the cookie when the image and webapp are not on the same site.
we have a workaround, we setup a reverse proxy to redirect calls from /files to fileserver, and if the UI has WEBSERVER__fileBaseUrl set in compose webserver environment it will allow the UI to rewrite the call for the image through nginx reverse proxy.
this way the request to fetch Plotly background image stays on the same site and auth cookie get sent.
in 1.16 you will also need to set WEBSERVER__useFilesProxy to true for the magic to

@mathematicalmichael
Copy link
Author

mathematicalmichael commented Nov 11, 2024

@oren-allegro sorry for the late response - but that env var didnt work. Docker version 24.0.7, build afdd53b

my solution has been "stop letting clearml deal with security"

I've switched everything to localhost:port + ssh tunneling to avoid all of clearml's logic / assumption of domain structure. that seems to have resolved most issues I've had to be honest.

@shyallegro thanks for the help - would you like me to try this again on the latest versions and with https enabled instead?
one of my main issues is that I cannot use the subdomain structure you assume, which is why I moved to ssh-tunneling for my deployment. that one simple change allowed me to finally start using clearml instead of constantly configuring it, so that's kept me busy lately (we ran over half a million tasks through it!). This week I have a bit more time for maintenance / upgrading tasks.

@shyallegro
Copy link
Contributor

@mathematicalmichael WEBSERVER__fileBaseUrl is there for cases where we can't guess the fileserver location, using this parameter tells the UI where the fileserver is located from the browser point of view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants