Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running moltemplate on large systems is slow and needs a lot of memory #82

Open
hnadeem2 opened this issue Aug 24, 2022 · 5 comments
Open

Comments

@hnadeem2
Copy link

hnadeem2 commented Aug 24, 2022

Hi andrew,

I have a martini based coursegrain simulation in which the size of the box is large. There are 3 species of atoms totaling around 12million atoms. Moltemplate always crashes through the run and it takes forever, is it possible it on multiple cores with mpi/mpiexec.

Hassan

@jewettaij
Copy link
Owner

jewettaij commented Aug 25, 2022

Hi Hassan
There is currently no way to run moltemplate in parallel. I think it is likely that memory limitations are the main problem you are having.
Moltemplate requires between 3GB up to 12 GB of RAM per 10^6 atoms in your simulation. This is very wasteful, but it's difficult to fix. (See historical details below.) The large memory used by moltemplate also contributes to the long time it takes moltemplate to run. (Computers run slow when running low on RAM and using swap.) To work around both of these issues, you have two options:

  1. Here's a weird hack which can reduce memory usage and time:
    If possible, divide your system into smaller pieces. For example, divide your simulation into 8 identical pieces, each of which is half as large in the X,Y,Z directions. Use moltemplate.sh to create LAMMPS files describing one of these small pieces ("system.data" and "system.in.init" and "system.in.settings", and "system.in.charges" if present). Then use ltemplify.py to convert these files into a single LT file (eg "subsystem.lt". Then move your original files into another directory and create a new "system.lt" file which makes 8 copies of this "subsystem.lt". (I'm passing out now, but if I remember tomorrow, I'll edit this post to add some more details how this is done.) If you run out of memory again during this step, try using TopoTools to load the DATA file for the subsystem and duplicate it. (Topotools might be a lot faster than moltemplate as well.)
  2. Alterntively, rent computer time from a machine with at least 128GB of RAM (preferably 256GB). Such computer can be rented from Amazon). (If I remember correctly the price was about a dollar per hour, and cheaper if you use "spot pricing")
  3. You can reduce the time it takes to run moltemplate by running moltemplate.sh with the "-nocheck" argument. This will not reduce memory usage, but it should reduce the time it takes to run moltemplate by roughly a factor of 2. Unfortunately this also disables syntax-error messages which are very useful when you are designing your simulation. So only use this argument when you are certain there are no errors in your LT files. If you do this, I suggest you build a much smaller version of your system first and make sure there are no errors. Then when you are ready to build the full-size system, run moltemplate with the "-nocheck" argument.

History

Unfortunately, when I wrote moltemplate, I was not aware of the large memory overhead required by python. In python, every object you instantiate (including a tiny molecule in a simulation with a single atom) requires about a kilobyte of memory. (After some effort, I think I was able to bring this down to about 300 bytes using "slots"). I started out running simple, small coarse-grained simulations using moltemplate. I was not thinking of running huge simulations at the time. But moltemplate has grown more than I thought. I am honestly flattered that moltemplate has been so successful that these kinds of questions even come up. But I'm sorry it creates headaches for you.

@jewettaij jewettaij changed the title running moltemplate on multiple cores running moltemplate on large systems in parallel Aug 25, 2022
@jewettaij
Copy link
Owner

jewettaij commented Aug 26, 2022

I just re-read your message and I noticed you are using the MARTINI force field. I'm curious to know how you prepared your simulation. Did you use the MARTINI 2 files that come with moltemplate. Or did you download files from the MARTINI (3) web site (in gromacs format) and convert them into moltemplate format? (I feel somewhat guilty that I have not provided more support for MARTINI users. Eventually, I'd like to write a script to convert the most recent GROMACS files into moltemplate format. If you have such a script, please share it.)

@hnadeem2
Copy link
Author

Apologies for the late response. The easiest solution for me was to run it on a machine with 128GB of RAM and it took some time but it ran well.
Actually I'm using MARTINI 3 downloaded from the website but although my system size was large, the largest molecule was only course-grained to 4 cg atoms, so I built the .lt files by hand (got lazy). If my next system is more complex, I will definitely write something for that.

@hnadeem2 hnadeem2 closed this as completed Sep 5, 2022
@jewettaij
Copy link
Owner

Thank you very much Hassan for getting back to me. If I have a chance to work on an ITP file converter that might benefit MARTINI users, I'll let you know.
Take care
-Andrew

@jewettaij
Copy link
Owner

I think I'll reopen this in case anyone else has the same question.

@jewettaij jewettaij reopened this Sep 8, 2022
@jewettaij jewettaij changed the title running moltemplate on large systems in parallel running moltemplate on large systems is slow and needs a lot of memory Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants