Titiler on Google Cloud Run - 502 Http errors #1053
Replies: 2 comments 1 reply
-
Thanks for the report @AndreaGiardini 🙏 this is really interesting In titiler we use titiler/src/titiler/core/titiler/core/errors.py Lines 33 to 70 in 4937dfa in titiler/src/titiler/core/titiler/core/errors.py Lines 54 to 56 in 4937dfa we check for 202 and make sure to return Empty, maybe instead of adding this middleware you could def exception_handler_factory(status_code: int) -> Callable:
"""
Create a FastAPI exception handler from a status code.
"""
def handler(request: Request, exc: Exception):
if status_code == status.HTTP_204_NO_CONTENT:
return Response(content=b"", status_code=200, headers={"Content-Type": "image/png", "X-Rewritten-204": "true"})
return JSONResponse(content={"detail": str(exc)}, status_code=status_code)
return handler
def add_exception_handlers(
app: FastAPI, status_codes: Dict[Type[Exception], int]
) -> None:
"""
Add exception handlers to the FastAPI app.
"""
for (exc, code) in status_codes.items():
app.add_exception_handler(exc, exception_handler_factory(code)) |
Beta Was this translation helpful? Give feedback.
-
One thing that I found challenging was extending the main application without reinventing the wheel. I want to:
Originally, I tried to do that with a simple:
And adding a FastAPI middleware to rewrite those requests. No matter how I did it, I would always face this error: fastapi/fastapi#7187 . Even just adding an empty middleware without any rewrite would return this error. In a FastAPI application middlewares get added in order. My guess was that this middleware gets added at the end of the chain, and that causes all sorts of weird things. |
Beta Was this translation helpful? Give feedback.
-
Hello folks 👋
This is not a bug report but a return of experience running Titiler on Google Cloud Run. We have been running Titiler in our infrastructure for quite some time now, and since the early days, we have always noticed a consistent number of HTTP errors 502 being triggered by Cloud Run.
It was clear that something weird was going on... but the service was performing well. We started digging into it as this behavior created a lot of noise in our monitoring, and we didn't fully understand the cause.
What we noticed
We narrowed down the 502 errors to requests trying to retrieve tiles where the map was empty. By default, Titiler returns an HTTP 204 status for those requests, but in our deployment, those requests were 502 (bad gateway). In the beginning, we tried investigating why the upstream was not responding to those requests. Bad gateway errors can have a couple of different causes, so we increased the number of minimum instances, tuned our health checks, reduced the number of concurrent requests, etc... Unfortunately, none of those measures improved the situation.
Even more interesting was what we discovered when performing the same request from the browser or via
curl
:curl
-> 204Moreover, there was no way to reproduce the problem locally. On a local docker container, both requests would return a 204.
Debugging the issue
We compared the response of the
curl
requests against our local Titiler vs. thecurl
on our deployment running in Cloud Run.Both requests returned 204 in this case, but the returned headers differed. In particular, we noticed that our Cloud Run deployment had an extra
content-type
header. This is due to this Google Cloud Run bug.While this behavior can sound harmless (and we didn't overthink about it initially), it was the cause of our 502 errors. Several web clients (QGIS, Chrome, etc.) consider those requests malformed and will trigger a 502. Other than messing up our monitoring system, clients like QGIS retry those requests (since a 502 is a retriable error) up to three times, amplifying the problem.
How we fixed the issue
We had to build our own custom version of Titiler and add extra middleware to reduce the 204 responses to 200.
The result:
Beta Was this translation helpful? Give feedback.
All reactions