-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fix multiple issues, make it work with the latest kernel and performance improvements. #22
base: master
Are you sure you want to change the base?
Conversation
@galli-leo tag andreafioraldi once you are confident the PR is done and ready :) |
@galli-leo hey great to see your effort here! that gives me hope that this module is still useful. |
@vanhauser-thc Yes definitely. For qemu I am getting a speedup of up to 500x, which makes even intermittent kernel panics worth it 😅. Although I haven't found any more of those, I have encountered some stalls with one of my patches. Currently trying to diagnose these. It also seems that some pages might not get properly restored, which I am investigating as well (test3.c reproduces this). |
@galli-leo Are you still planning to work on this? I am working on a PR that fully implements the snapshotting of file descriptors and I have based it off this one, since it fixes a lot of build issues. I wanted to know if it will get merged soon or if I should just submit mine as is. |
If you need it and @galli-leo stopped working on it, I can merge as is. The module is broken anyway so even is a bit less broken with this PR is better. |
No no, there are a few things that are unfinished, from what I can see. If he really stopped working on this, I will just pick up from where he left and submit a new draft PR. |
@galli-leo I am not completely sure what were your issues since you did not discuss them extensively. However, I think your problems were due to the |
Hey sorry for not responding earlier. I don't really have time to finish it at the moment (exams and other stuff), so feel free to pick up from here. The major issue I currently still have, is that the first page is not snapshot correctly. |
Opening this PR as a work in progress, so that we can discuss some aspects.
Specifically, I would like to see if my performance improvements also help in the other test cases.
How were those run?
I have tried to squeeze more performance out of it, but at the moment it seems like there is not any obvious slow parts.
I also for now removed the snapshotting of file descriptors, since I at least don't need that.
Fixed Issues
flush_tlb_mm_range
#20Bug Fixes
get_cpu_var / put_cpu_var
At least for my kernel, get_cpu_var / put_cpu_var need to match up exactly. Meaning, if you have two get_cpu_var, you need to have two put_cpu_var.
blocklist / allowlist memory leak
At least AFAICT those lists where never freed.
clean_memory_snapshot: use after free of snapshot_page
Freeing the snapshot_page while iterating over the hash table means that we use the free'd element when iterating.
This caused crashes for me. It's now fixed :)
There's probably more I am forgetting.
Performance Improvements
keep list of dirty pages
By keeping a list of dirty pages, we don't need to iterate over all pages when restoring a snapshot. For me this had >50% performance improvement, might be more depending on the target.
There's probably also more I am forgetting.
Maintainability improvements
remove need for patching syscall table
Using
ftrace_helper.h
, we can hook syscalls easily.switch to ftrace hooking
TODO here, we should be able to fully use
kallsyms_lookup_name
for the unexported symbols and hence no need forlookup_symbols.py
.