Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix multiple issues, make it work with the latest kernel and performance improvements. #22

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

galli-leo
Copy link

Opening this PR as a work in progress, so that we can discuss some aspects.
Specifically, I would like to see if my performance improvements also help in the other test cases.
How were those run?

I have tried to squeeze more performance out of it, but at the moment it seems like there is not any obvious slow parts.

I also for now removed the snapshotting of file descriptors, since I at least don't need that.

Fixed Issues

Bug Fixes

get_cpu_var / put_cpu_var

At least for my kernel, get_cpu_var / put_cpu_var need to match up exactly. Meaning, if you have two get_cpu_var, you need to have two put_cpu_var.

blocklist / allowlist memory leak

At least AFAICT those lists where never freed.

clean_memory_snapshot: use after free of snapshot_page

Freeing the snapshot_page while iterating over the hash table means that we use the free'd element when iterating.
This caused crashes for me. It's now fixed :)

There's probably more I am forgetting.

Performance Improvements

keep list of dirty pages

By keeping a list of dirty pages, we don't need to iterate over all pages when restoring a snapshot. For me this had >50% performance improvement, might be more depending on the target.

There's probably also more I am forgetting.

Maintainability improvements

remove need for patching syscall table

Using ftrace_helper.h, we can hook syscalls easily.

switch to ftrace hooking

TODO here, we should be able to fully use kallsyms_lookup_name for the unexported symbols and hence no need for lookup_symbols.py.

@vanhauser-thc
Copy link
Member

@galli-leo tag andreafioraldi once you are confident the PR is done and ready :)

@vanhauser-thc
Copy link
Member

@galli-leo hey great to see your effort here! that gives me hope that this module is still useful.

@galli-leo
Copy link
Author

@vanhauser-thc Yes definitely. For qemu I am getting a speedup of up to 500x, which makes even intermittent kernel panics worth it 😅. Although I haven't found any more of those, I have encountered some stalls with one of my patches. Currently trying to diagnose these. It also seems that some pages might not get properly restored, which I am investigating as well (test3.c reproduces this).

@EliaGeretto
Copy link

@galli-leo Are you still planning to work on this? I am working on a PR that fully implements the snapshotting of file descriptors and I have based it off this one, since it fixes a lot of build issues. I wanted to know if it will get merged soon or if I should just submit mine as is.

@andreafioraldi
Copy link
Member

@galli-leo Are you still planning to work on this? I am working on a PR that fully implements the snapshotting of file descriptors and I have based it off this one, since it fixes a lot of build issues. I wanted to know if it will get merged soon or if I should just submit mine as is.

If you need it and @galli-leo stopped working on it, I can merge as is. The module is broken anyway so even is a bit less broken with this PR is better.

@EliaGeretto
Copy link

No no, there are a few things that are unfinished, from what I can see. If he really stopped working on this, I will just pick up from where he left and submit a new draft PR.

@EliaGeretto
Copy link

EliaGeretto commented Jul 6, 2021

@galli-leo I am not completely sure what were your issues since you did not discuss them extensively. However, I think your problems were due to the snapshot_page::in_dirty_list member not being initialized in add_snapshot_page. Doing that resolves all the warnings.

@galli-leo
Copy link
Author

No no, there are a few things that are unfinished, from what I can see. If he really stopped working on this, I will just pick up from where he left and submit a new draft PR.

Hey sorry for not responding earlier. I don't really have time to finish it at the moment (exams and other stuff), so feel free to pick up from here. The major issue I currently still have, is that the first page is not snapshot correctly. make test should show that immediately (the binary does some stuff and then one page is not snapshot correctly). See test3.c for what it does exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants