Replies: 10 comments 14 replies
-
I'm not sure what is possible. I'll take a look and get back to you. It does sound like there is a missing timeout somewhere. I've experienced that myself with HttpClient where a system seems stable enough for a few days until all the connections are busy waiting for servers that will never respond. One thing you can double check on your side is that you are closing all results iterators and closing all connections after you're done with them. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your reply and help! After investigating my particular problem a bit more, I realized that running several federated queries with overlapping repo URLs (on the same rdfj4-workbench instance; not sure that matters though) can get you into what seems to be a deadlock if they are fired in close succession to each other. This deadlock then persists, doesn't timeout, and doesn't produce any error messages. Being able to set a timeout likely solves this issue, if my understanding of the situation is correct. If more investigation is needed, I can try to make a simple reproducible example. |
Beta Was this translation helpful? Give feedback.
-
wrt closing connections, yes I have already looked into that, and I am closing them all properly as far as I can see. |
Beta Was this translation helpful? Give feedback.
-
I found a workaround for this via the config of the Nginx HTTP server running in front of rdf4j-workbench:
URIs like I have been running the server like this for 6 days now, and all the problems have gone. Except that I am now getting "503 Service Temporarily Unavailable" errors under high load. This is not per se a problem (it's good that it fails fast), but the overall performance of rdf4j-workbench is now quite seriously reduced, I believe, by the fact that it cannot run concurrent queries anymore. So, it's only a workaround and not a fix, and it would be great if this issue can be somehow addressed on the RDF4J side (as I am now even more confident than before that there is an issue there). Also, I haven't really stress-tested it, so it can be that the above setting can still trigger deadlocks, but just makes them less likely. Of course, if there is anything I can do to help fix this issue, I am more than happy to do so! Even looking at your source code, if you can point me to the place where I should start looking. And lastly, let me take the opportunity to thank you for providing these incredibly powerful and immensely useful pieces of software! RDF4J is absolutely amazing 🙂 |
Beta Was this translation helpful? Give feedback.
-
I believe that this is the main culprit: We should add support for users to configure the timeout. Maybe something like this:
|
Beta Was this translation helpful? Give feedback.
-
Great to see that you found a potential solution. I'd be happy to test it on my side, if you want to provide me with a patch or branch with this code change. |
Beta Was this translation helpful? Give feedback.
-
Super, thanks, will try to find time to try it out the next few days and will let you know! |
Beta Was this translation helpful? Give feedback.
-
I did some local tests and I can confirm that with I was only wondering why the default timeouts seem to be set to 24h, instead of a smaller more practical amount of time? Thanks a lot for this fast fix! |
Beta Was this translation helpful? Give feedback.
-
OK, I managed to reproduce the problem in a few simple steps on a fresh RDF4J instance. See here: https://github.com/tkuhn/rdf4j-timeout-test It doesn't even involve issuing many queries in close succession. A single slighlty complex query is sufficient to trigger it (Query 2). In the case of the latest release, it hangs forever. In the case of the improved branch with the timeouts, it properly times out, but in either case it remains blocked and also simpler queries (like Query 1) no longer work. I hope that repo has all the things needed to reproduce it on your side with minimal effort. If there is anything I can do to improve this test, let me know. |
Beta Was this translation helpful? Give feedback.
-
I just noticed that the latest Docker image I was also wondering whether there are any updates on the root cause of this? Happy to invest more time and effort on my side on this, if it helps. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I am using the latest version of the RDF4J Workbench via Docker, and I am wondering whether there is a way to set HTTP client options (version, pooling, timeouts) for the connections made when federation is applied (so the request to URL in
"service <URL> { ... }"
?We have a use case where we use this federation extensively, and it normally works well, but starts to hang after a number of hours or days. My suspicion is that it has something to do with the HTTP requests triggered by the service keywords, but I couldn't find any way how I could play around with the options of that HTTP client. Any ideas?
Tobias
Beta Was this translation helpful? Give feedback.
All reactions