Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves stellar-core-debug-info script and adds docs #4553

Merged
merged 2 commits into from
Nov 25, 2024

Conversation

SirTyson
Copy link
Contributor

Description

Resolves #4545

This PR updates documentation regarding the stellar-core-debug-info.

Additionally, while helping people debug nodes, the script was difficult to use and had many default values specific only to SDF infrastructure. I've updated the script to be easier to use. Specifically, it requires an output directory argument, and creates the directory automatically if it does not exist. The script also automatically detects the stellar-core executable path and config via the stellar-core.service file. Finally, I've added additional error checking around offline-info and better path resolution, which previously was buggy.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@SirTyson SirTyson force-pushed the improve-debug-script branch from 9323c23 to e1afb24 Compare November 21, 2024 01:56
Copy link
Contributor

@jacekn jacekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvements. I added one question about docker and one non-blocking idea.

scripts/stellar-core-debug-info Outdated Show resolved Hide resolved
scripts/stellar-core-debug-info Outdated Show resolved Hide resolved
@SirTyson SirTyson force-pushed the improve-debug-script branch from ac28092 to 0f7fe88 Compare November 21, 2024 21:57
@anupsdf
Copy link
Contributor

anupsdf commented Nov 22, 2024

Here is the console log from running this on my mac.

  • gathering core info has this vm space error
  • Core was not running when I ran this so its strange that offline-info complained but it gave me this log where it was having trouble getting db schema version
  • gather_os_info is not mac friendly i guess
  • i was able to get the buckets directory
  • gathering sqllite db info also complained
 ./scripts/stellar-core-debug-info -c ../pubnet_watcher.cfg -p /Users/anuppani/sdf_git/stellar-core/src/stellar-core -s . -b buckets  ../core_logs
Getting get_full_path_for_command /Users/anuppani/sdf_git/stellar-core/src/stellar-core
Gathering OS information...
Error calling function gather_os_info
Gathering stellar-core version and config...
stellar-core(40716,0x1feb14f40) malloc: nano zone abandoned due to inability to reserve vm space.
Warning: running non-release version v22.0.0rc2-85-g4b4cd3657 of stellar-core
Error calling function gather_core_info
Gathering stellar-core offline-info...
Warning: offline-info command failed. Maybe stellar-core is still running? For more information check /Users/anuppani/sdf_git/core_logs/stellar-core-debug-info-2024-11-22-08-18-09/offline-info/output
Gathering logs...
Error calling function gather_logs
Gathering buckets directory
Gathering sqlite DB
Error calling function gather_sqlite_db
Results stored in /Users/anuppani/sdf_git/core_logs/stellar-core-debug-info-2024-11-22-08-18-09.tar.gz
Encountered some errors when gathering data

@SirTyson
Copy link
Contributor Author

Here is the console log from running this on my mac.

  • gathering core info has this vm space error
  • Core was not running when I ran this so its strange that offline-info complained but it gave me this log where it was having trouble getting db schema version
  • gather_os_info is not mac friendly i guess
  • i was able to get the buckets directory
  • gathering sqllite db info also complained
 ./scripts/stellar-core-debug-info -c ../pubnet_watcher.cfg -p /Users/anuppani/sdf_git/stellar-core/src/stellar-core -s . -b buckets  ../core_logs
Getting get_full_path_for_command /Users/anuppani/sdf_git/stellar-core/src/stellar-core
Gathering OS information...
Error calling function gather_os_info
Gathering stellar-core version and config...
stellar-core(40716,0x1feb14f40) malloc: nano zone abandoned due to inability to reserve vm space.
Warning: running non-release version v22.0.0rc2-85-g4b4cd3657 of stellar-core
Error calling function gather_core_info
Gathering stellar-core offline-info...
Warning: offline-info command failed. Maybe stellar-core is still running? For more information check /Users/anuppani/sdf_git/core_logs/stellar-core-debug-info-2024-11-22-08-18-09/offline-info/output
Gathering logs...
Error calling function gather_logs
Gathering buckets directory
Gathering sqlite DB
Error calling function gather_sqlite_db
Results stored in /Users/anuppani/sdf_git/core_logs/stellar-core-debug-info-2024-11-22-08-18-09.tar.gz
Encountered some errors when gathering data

Hmm this looks like an issue with your core build. Can you run version command normally without the script? It looks like the malloc error is coming from within stellar-core, not the script.

Wrt os_info, that's only supported on linux. This is fine, as the script is intended for production environments. Finally, I don't think gathering sqllite info worked because you fed the script a bad path via the -s . flag, this should be something like -s ./sql.db. You should be able to just run ./scripts/stellar-core-debug-info -c ../pubnet_watcher.cfg -p /Users/anuppani/sdf_git/stellar-core/src/stellar-core ../core_logs and the db and buckets path will be automatically pulled from the provided config.

@anupsdf
Copy link
Contributor

anupsdf commented Nov 22, 2024

Hmm this looks like an issue with your core build. Can you run version command normally without the script? It looks like the malloc error is coming from within stellar-core, not the script.

Wrt os_info, that's only supported on linux. This is fine, as the script is intended for production environments. Finally, I don't think gathering sqllite info worked because you fed the script a bad path via the -s . flag, this should be something like -s ./sql.db. You should be able to just run ./scripts/stellar-core-debug-info -c ../pubnet_watcher.cfg -p /Users/anuppani/sdf_git/stellar-core/src/stellar-core ../core_logs and the db and buckets path will be automatically pulled from the provided config.

My config file didn't have the db and buckets path, I will add them. For now, -s ./stellar.db option worked for SQLite.
My version command output also throws this malloc error but does print the details afterwards.

@SirTyson
Copy link
Contributor Author

My config file didn't have the db and buckets path, I will add them. For now, -s ./stellar.db option worked for SQLite.
My version command output also throws this malloc error but does print the details afterwards.

I think the script is working as intended then. I'm not sure why your core image is throwing and still reporting version info, but it's definitely failing, and the correct behavior script wise is probably to just give up on processing any sort of output if the stellar-core invocation is returning a non zero exit code. In this particular instance it looks like we could still parse output on failure, but I don't think that can be generalized.

Is this build the current master? This might be a mac specific issue, the offline-info and version commands run fine for me on linux.

@anupsdf
Copy link
Contributor

anupsdf commented Nov 22, 2024

I think the script is working as intended then. I'm not sure why your core image is throwing and still reporting version info, but it's definitely failing, and the correct behavior script wise is probably to just give up on processing any sort of output if the stellar-core invocation is returning a non zero exit code. In this particular instance it looks like we could still parse output on failure, but I don't think that can be generalized.

Is this build the current master? This might be a mac specific issue, the offline-info and version commands run fine for me on linux.

The malloc error was because I had asan enabled. The error went away with core build without --enable-asan.
Yeah, this doesn't seem like a normal processing when encountering error.

Copy link
Contributor

@anupsdf anupsdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thanks for fixing.

@anupsdf anupsdf added this pull request to the merge queue Nov 25, 2024
Merged via the queue into stellar:master with commit 7f54c88 Nov 25, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document stellar-core-debug-info
3 participants