-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No data on various dashboards #699
Comments
Hello, I am sorry to hear you are having some issues. The first troubleshooting steps I recommend are:
The Perf/Node Utilization dashboard primarily reports metrics collected from the node-exporter component. The node-exporter is a deployed via a daemonset and should have a pod running on every node in the cluster. If any metrics are not showing up, or if metrics are there for some nodes and not others, I would confirm that there is, in fact, a node-exporter pod running on every node and their pod logs show no problems. The two SAS Launched Jobs dashboards require that the SAS Workload Orchestrator (SWO) be part of your SAS Viya deployment. Metrics will only show on those dashboards if there SAS jobs (launched via SWO) are running or have been running during the time period selected. The Prometheus Pushgateway component also needs to be deployed (and running) within the same namespace as the SAS Viya deployment. We deploy that component when the monitoring/bin/deploy_monitoring_viya.sh script is run. So, if you have not run that script, you will need to run it. If you have run it, please confirm the Pushgateway pod is running and there are no error messages in the pod logs. I hope these steps will help you identify the problem. Please let us know how things go. Regards, |
Hi Greg, First of all, thank you for your prompt reply. Thanks again, |
Adding some troubleshooting:
Perf / Node utilization is fixed. The exported_nodename was not matching the $Node value due to a custom DNS name applied to the exported_nodename. Now I am still struggling with the SAS Launched Jobs dashboard with 0 metrics with :sas_launcher_pod_info:. Thank you, |
Thank you for the step by step troubleshooting. The root cause seems to come from the pushgateway since it is not exposing any metrics. I tried to investigate a bit more and found a rather odd behavior. When I launch a new compute node and start an interactive compute session, I can see some metrics coming into the dashboard. However as soon as I have batch servers running as reusable compute context it seems to break the workload orchestrator as it is no longer emitting metrics to the pushgateway. If I reset my Studio session (same node) I won't see any metrics. Regards, |
Interesting. In your original post, you mentioned you have another cluster where everything had deployed without problems. Have you tried this same sequence of steps in that environment? And, if so, do you see the same problem? And, at this point, I think it would be best for you to open an issue with SAS Technical Support. I suspect this may be more than a simple misconfiguration or deployment scripting problem. I think we may need to bring some of the Compute Server experts into the discussion as well. When opening the Tech Support ticket, please tell them about this GitHub issue and ask them to add me to the ticket. Once we've figured everything out and resolved things, I'll post that information here (for future reference) before closing this issue. Regards, |
Hello,
I have an issue, deploying the generic sample on EKS.
I created a monitoring namespace and deployed successfully the solution.
However the dashboards below are returning No Data for most if not all metrics.
How can I troubleshoot the source of problem in this case ?
I've deployed the solution on another cluster and it works fine.
Thank you,
Joseph de Clerck
The text was updated successfully, but these errors were encountered: