Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize relevant portions of IR2Vec with OMP #101

Open
svkeerthy opened this issue Apr 14, 2024 · 5 comments
Open

Parallelize relevant portions of IR2Vec with OMP #101

svkeerthy opened this issue Apr 14, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@svkeerthy
Copy link
Member

No description provided.

@svkeerthy svkeerthy added the enhancement New feature or request label Apr 14, 2024
@m-atalla
Copy link
Contributor

m-atalla commented Jun 15, 2024

Hi, this seems like an interesting enhancement that I'd like to help out on.

I think its important to have a baseline to compare against for any potential improvements, is the TimeTaken experiment suitable for that? Further, is there a script I could use to generate time taken as in experiments/TimeTaken/TimeTaken_Algos.csv?

I'd be happy to add an additional benchmark as well, the SQLite Amalgamation might be an interesting option.

@svkeerthy
Copy link
Member Author

Hi @m-atalla,

Apologies for the delay in response. We do not have a script for this yet. It would be great if you could help in this. SQLite Amalgamation is also very interesting and would be a valuable addition.

We have started integrating OMP with IR2Vec (See #105, which is a work in progress).

Please feel free to reach out if you need any inputs or have further questions. Will be happy to help :)

Best,
Venkat

@m-atalla
Copy link
Contributor

Hi, I wanted to follow up with profiling info on SQLite benchmark now that its added!

I used Linux perf to get the profile data using the following commands:

$ perf record -g --call-graph dwarf build/bin/ir2vec --sym -level p ./src/test-suite/PE-benchmarks-llfiles-llvm17/sqlite3.ll -o sqlite.txt
$ perf script > /tmp/sym-perf.out

And I used the firefox profiler to analyze and upload the profile data which could be found here. From the call tree it seems that about 53% of the time is spent on parsing (not much could be done about it) and 44% is spent in IR2Vec_Symbolic::bb2Vec which should a good target for parallelism. Fortunately it looks like #105 is already making progress on it!

Similarly, I generated a profile for the flow-aware (FA) mode which could found here. The call tree shows the following functions IR2Vec_FA::solveInsts and IR2Vec_FA:func2Vec with 33% and 24% of the time respectively.

It'd be happy to assist further as needed.

Thank you.
Mohamed.

@svkeerthy
Copy link
Member Author

Hi @m-atalla,

Thanks for the perf report :) It exposes more opportunities for optimizations in addition to parallelization.

On the top of my mind, I have two things:

  1. As you had also pointed out, one of the major overheads in FA flow is the solveInsts method that internally invokes the Eigen solver. We recently made Eigen an optional dependency. i.e., if Eigen is not available, we approximate the solution with a handwritten solver. It would be interesting to see if it reduces the current overhead.
  2. 14% of the total time is spent on SmallVector copy in the IR2Vec_FA::func2vec method. It would be good to eliminate or reduce this overhead by using references or moves.

Perhaps I will create separate issues to track these as the objective of these points is a bit different from that of the current issue. Please give me some time. I will have a more detailed look at the perf report and get back with more possible improvements.

@m-atalla
Copy link
Contributor

Hi @svkeerthy, sorry I kinda lost track of this issue as I'm currently in the midst of working on my masters thesis, I think I can send a PR for SmallVector copy part in IR2Vec_FA::func2vec by next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants