-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Discussion) High CPU usage in DSpace frontend related to Angular Server Side Rendering (SSR) #3110
Comments
The built-in memory cache isn't even shared across processes on the same server. The Redis approach is interesting to me because, while Redis is another in-memory cache, it can be shared across processes. |
We tried Apache HTTPD mod_cache here and ran into indiscriminate sharing of Shibboleth tickets across distinct users' sessions. I haven't had time to investigate, so it might be due to unrelated changes that were hastily backed out at the same time that we disabled the external cache. If the cache was the culprit, I do not know whether a smarter configuration would fix that. |
We (Universität Konstanz) also have made some customizations. We've opted to use varnish (instead of redis/mod_cache, or built-in cache). This has a few advantages for us:
We've also disabled SSR for logged-in users, becasue they don't need it and it saves resources for us
We did something similar, but only for the "related publications list" on person entity pages, as that took quite a while to load. We may extend this to all searches in the future though We extended the robots.txt to disallow additional robots we don't want |
We provide documentation that you can find helpful. There is the solution we used to replace Server-Side rendered pages with Redis in the case study on the Jagiellonian University Repository: |
A person with some Angular experience (not a complete expert) asked me why we are making so many small requests to the backend instead of making less but larger requests to the backend. One example could be the submission form where we send one request for each section of the form and one request for each vocabulary and even more requests, instead of having one request for all information we need to build up the form. |
We've been wondering about it as well. Same with |
There was some good discussion in today's Dev Mtg about areas where we should consider "bundling" many smaller REST Requests into a single (or fewer) requests. They include:
Those in attendance agreed that we should work on efficiency of these requests. Ideally we should bundle requests better and send fewer requests per page. That said, this sort of work requires help/volunteers/investigation. I'll work on creating some tickets for each of these, as I think they are each separate issues which will require often individual solutions. (These many requests are semi-related to performance of SSR as they likely can contribute to the slowness. However, these are less likely to be "quick fixes", as some may require a new backend endpoint to be created and refactor the UI code.) |
There's another one: Browing the landing page of the demo and clicking to go to one of the communities also loads the data twice: Not only the number of requests, also the size. E.g. clicking on a publication to go to its page requests the whole "owningCollection" object in a separate request. It's unclear what for and if needed it should be embedded in the item instead of a separate request |
Any multiple request done during the submission form are not relevant for this problem since the submission form is not rendered during the SSR Anyway I agree with the fact that we should optimize the number of request to the server when is possible. We did something in this regarding on DSpace-CRIS with the item endpoint (https://github.com/4Science/Rest7Contract/blob/main-cris/items.md#findallbyids) to avoid to have multiple request to the item endpoint just to request for embedded objects as done in the search for thumbnails or other objects : This approach is helpful not only for the SSR but also for the search functionality in general because it allows to speed up the loading of search results. More in general as well as the high number of REST requests, the SSR can introduce overhead, especially for resource-intensive apps that require heavy server-side processing or complex data computations. In such cases, SSR may strain server resources and slow down rendering. |
May I suggest that removing redundant requests, and bundling related requests, are important and should be addressed, but that we may need a second ticket for issues that are unrelated to SSR. |
Today, I've moved a number of the performance issues described above into separate tickets. They are not directly related to this SSR issue (or only very loosely related), but obviously are performance issues themselves which require fixing:
If there are other obvious areas where DSpace is (inefficiently) making many requests, then I'd recommend others create additional tickets as well. While these issues are all semi-related, they all may need to be solved separately, as the solutions may not be identical in all cases. |
«4Science has updated DSpace-CRIS to exclude the Search component from SSR. This search component is not necessary for SEO, so it may make sense to disable SSR for this component.» I tend to disagree here. Search results are relevant for SEO since you can have faceted results that give relevance in some thematics to your repository/site. Imagine that your repository approaches the "horses" thematic. You could have faceted results per specie and that will give more relevance to your repository in the "horse" thematic and each specie itself. If you pay attention to your repository traffic, you will see several bots trying combinations of search results. If we remove the search from SSR, it would not be possible for crawlers to index them, nor promoting you site in some specific thematic. I think there is room for improvement regarding the search results....
If I change it to retrieve 10 records per page, then, the system performs more than 500 database queries to the database. This is not scalable. What if we just use Solr and retrieve data stored on Solr for search results? There is a trade-off here. More time spent on indexing documents, versus more time and performance degradation to retrieve them. |
@paulo-graca : I think your comment may (partially) belong over on #3231, which is the ticket created specifically for this task. As I've clarified on that ticket, the reason I noted that Search results are "not necessary for SEO" is because we already tell crawlers in our robots.txt file to ignore the search page & facets: https://github.com/DSpace/dspace-angular/blob/main/src/robots.txt.ejs#L11 So, it's odd to have this component available in SSR when (well behaving) crawlers are being told not to use it. That said, I do agree that we may want to make this configurable, so that sites can choose to allow search results via SSR. Overall the goal of #3231 is to see if there is "quick fix" performance improvement at least until we can determine ways to improve performance of search in general. I agree with you that performance of search is NOT ideal, but that also seems like a larger redesign / discussion. Any larger redesign would not help the performance of sites on 7.x or 8.x (as it wouldn't be possible to backport). However, a small change to optionally exclude the search.component from SSR could be backported to 7.x/8.x, as it might be considered a performance fix for those platforms. UPDATE: Also, it's worth noting we have a separate ticket that is about smaller fixes to search performance: #3163 However, this ticket is more about decreasing the number of REST API requests (from the UI) to perform a single search...which hopefully should decrease the database queries as well. |
Thank you @tdonohue. I agree, I'm proposing a big change that, perhaps, we can't add it easily to DS8 and DS7. I've also added a comment to #3231, I agree that it should be configurable. My only point is that it matters in terms of SEO to have Search Results in SSR available for crawlers and also to Internet Archives out there. |
This week @autavares-dev and me tried to disable SSR in all pages that not in sitemaps, in short terms all pages that not include "/items/" in url. Our infrainstructure have 5 pods for frontend that had a CPU usage of 1.9 cores, and decreased to 500mcores after this change. |
@vitorsilverio or @autavares-dev : Could you share how you disabled SSR (by linking to code, or providing an example)? I'm also not entirely clear what you mean by it started at a usage of "1.9 cores" and "decreased to 500mcores"? Could you clarify further? Overall, this sounds promising, but we'd want to understand more about what you've found to be successful. It might be that others could try a similar approach, or even help make it configurable (so that sites can easily turn SSR off for some pages as needed). |
@tdonohue /*
* The callback function to serve server side angular
*/
function ngApp(req, res) {
// Only use server-side rendering for specific routes that include items or entities
if (environment.universal.preboot && (req.url.includes('/items/') || req.url.includes('/entities/')) && req.method === 'GET') {
// Render the page to user via SSR (server side rendering)
serverSideRender(req, res);
} else {
// If preboot is disabled, just serve the client
console.log('Universal off, serving for direct client-side rendering (CSR)');
clientSideRender(req, res);
}
} CPU usage: |
@tdonohue after contact from Ollie (@ google scholar) we realize that not all endpoints in sitemaps are using SSR so we change to: diff --git a/server.ts b/server.ts
index 93f3e8687..a5ad17c59 100644
--- a/server.ts
+++ b/server.ts
@@ -56,6 +56,7 @@ import { extendEnvironmentWithAppConfig } from './src/config/config.util';
import { logStartupMessage } from './startup-message';
import { TOKENITEM } from './src/app/core/auth/models/auth-token-info.model';
+import { SSR_PATHS } from 'ssrpaths';
/*
* Set path for the browser application's dist folder
@@ -238,7 +239,8 @@ export function app() {
* The callback function to serve server side angular
*/
function ngApp(req, res) {
- if (environment.universal.preboot) {
+ // Only use server-side rendering for specific routes that include items or entities
+ if (environment.universal.preboot && req.method === 'GET' && SSR_PATHS.some(p => req.url.includes(p))) {
// Render the page to user via SSR (server side rendering)
serverSideRender(req, res);
} else {
diff --git a/ssrpaths.ts b/ssrpaths.ts
new file mode 100644
index 000000000..9e63d7a12
--- /dev/null
+++ b/ssrpaths.ts
@@ -0,0 +1,11 @@
+/**
+ * Path list for SSR
+ */
+export const SSR_PATHS = [
+ '/items/',
+ '/entities/',
+ '/collections/',
+ '/communities/',
+ '/bitstream/',
+ '/bitstreams/',
+]; |
2 new endpoints are required by Google Scholar |
I think that we are focusing on the wrong part of the optimization. IMO, there is no problem with SSR as it is in Angular. The issue is how CPU-consuming the whole application is. We can specify a few areas that cause this situation:
This complicated structure sometimes causes multiple unnecessary data refresh with additional mapping, and the script, instead of mapping data from response, is too busy making selections. That causing the CPU to reach maximum values. And on the SSR we just render a lot of pages at the same time so the CPU is a crucial resource. |
@pcg-kk : I think we'd need more information about your analysis. You may be entirely correct, but it's difficult to understand what you are looking at without linking to examples or providing suggestions for possible improvement. Maybe you could document your findings in a Google Doc or on the general DSpace Wiki (e.g. under our "Development > Discussion" page. I think more details would help others to better understand what you are seeing. Regardless, I do agree that this ticket is not entirely about SSR. As you can see from the discussion points above, we've spun off other tickets from this one about endpoints/pages which are sending abnormally large numbers of requests. So, we are aware that the core bug is not necessarily in SSR, but we are also working to find "quick fix" solutions which can help institutions running 7.x and 8.x to "stabilize" their system if they hit CPU issues specifically in the SSR processing. In any case, I'd encourage you to share more of your findings...that way I can bring them to the attention of other DSpace developers. |
@tdonohue please look into almost any request executed by the frontend. When you look into the initiator tab in dev tools there is a list of (in my example) 1410 methods called between on the subscription before the request is sent and... my browser doesn't present more. I still don't know the first method that was executed at the beginning. We execute a lot of requests and each of them passes that crazy path before is executed. I can understand that in DSpace we try to create a lot of generic methods that can do a lot of things immediately, but I think that we can agree that this approach that each request is executed in this way is a lost of resources. I don't know the decisions at the beginning why we just don't use effects and actions to call methods, and then success/error actions to save results in the store? Instead we process each request through the store on level before it was send, when it's pending and success. That Request is piped multiple times with a lot of checks and subscriptions, combineLatests. --EDIT Here is one more nice example where we see at least what happen but we can also notice that we are going in the same code multiple times: |
Describe the bug
This is a discussion ticket related to ongoing issues with high CPU usage related to Angular SSR. It's a place to gather public notes & recommendations while developers work on solutions that can be brought back into DSpace code and documentation.
Hints / Notes
A few common tips/hints have been helpful:
inlineCriticalCss
, which can improve SSR performance.inlineCriticalCss
config to the frontend: Add configuration option to disable inlined CSS in SSR HTML #2067 (This new config is available out-of-the-box as of both DSpace 8.0 and 7.6.2)inlineCriticalCss
by default for better SSR performance #2944 for examples. (This "false" setting is the default value as of both DSpace 8.0 and 7.6.2)Known Community Customizations
What follows are a few (known) customizations from DSpace community members to work around or partially solve these CPU issues:
Brainstorms
Random ideas/brainstorms that are unproven at this time:
Mailing List Discussions
These are (known) mailing lists discussions related to this topic:
How can you help?
DSpace Committers & Developers are striving to improve the performance of DSpace 7+ to minimize its resource requirements. We welcome brainstorms, solutions, shared code, improved docs that aligns with these goals. DSpace is community built/maintained software (with no centralized development team). The more details community members can share, the more quickly we can locate/design solutions to benefit everyone.
The text was updated successfully, but these errors were encountered: