Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MT-REXEE development #58

Merged
merged 76 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
2d33a51
Add deafult swapping function
ajfriedman22 Sep 4, 2024
ee7cc45
Update replica_exchange_EE.py
ajfriedman22 Sep 5, 2024
a9f46a2
Fix Errors
ajfriedman22 Sep 9, 2024
90976c5
Fix minor error
ajfriedman22 Sep 10, 2024
5532d43
Fix comment typo
ajfriedman22 Sep 10, 2024
12d6959
Remove unnessary prints
ajfriedman22 Sep 10, 2024
71e3d8f
Tweaked README.md and fixed linting errors for run_REXEE.py and repli…
wehs7661 Sep 10, 2024
f85d914
Some work on tweaking the docstrings in coordinate_swap.py
wehs7661 Sep 11, 2024
42c6a0d
Finished tweaking the docstrings in coordinate_swap.py
wehs7661 Sep 11, 2024
9179c62
Fixed some linting errors
wehs7661 Sep 11, 2024
7f2e09b
Fixed all the linting errors! Modified lint.yaml
wehs7661 Sep 11, 2024
e70770c
Added simple unit tests for get_dimensions and compute_angle in coord…
wehs7661 Sep 11, 2024
c5cbd33
Tweaked the docstrings of the new functions in replica_exchange_EE.py
wehs7661 Sep 11, 2024
2ea895b
Fill in missing docstring for add_or_swap function
ajfriedman22 Sep 11, 2024
dbd487a
Merge branch 'pr-57' of https://github.com/wehs7661/ensemble_md into …
wehs7661 Sep 11, 2024
f3fdbfa
Fix error in compute_angle function
ajfriedman22 Sep 11, 2024
5f8080f
Merge branch 'pr-57' of https://github.com/wehs7661/ensemble_md into …
wehs7661 Sep 11, 2024
878121b
Fill in docstring for get_miss_coord
ajfriedman22 Sep 11, 2024
710db92
Fixed linting errors
wehs7661 Sep 11, 2024
b77ab14
Merge branch 'pr-57' of https://github.com/wehs7661/ensemble_md into …
wehs7661 Sep 11, 2024
b6d5fe1
Fixed the unit test for compute_angle
wehs7661 Sep 11, 2024
d366422
Fix minor error for large systems(>10,000 atoms)
ajfriedman22 Sep 18, 2024
59f477d
add test
ajfriedman22 Sep 18, 2024
0aa3ef9
fix error in coordinate_swap find_common function
ajfriedman22 Sep 18, 2024
90edc32
Resolved conflicts of merging into pr-57
wehs7661 Sep 19, 2024
c789ec2
Merge pull request #59 from wehs7661/test_MTREXEE
wehs7661 Sep 19, 2024
046d236
Fix error in pbc shift function in coordinate_swap
ajfriedman22 Sep 19, 2024
b221831
add tests
ajfriedman22 Sep 19, 2024
0451c84
Update tests
ajfriedman22 Sep 19, 2024
5a9587c
Add more tests
ajfriedman22 Sep 19, 2024
3594fbe
Add more tests
ajfriedman22 Sep 19, 2024
23144c8
Fix linting error
ajfriedman22 Sep 19, 2024
25ee50f
Add more tests for coordinate_swap
ajfriedman22 Sep 20, 2024
1ae3b38
Fix linting errors
ajfriedman22 Sep 20, 2024
44d8801
Fix linting errors
ajfriedman22 Sep 20, 2024
12d784f
Add more tests
ajfriedman22 Sep 23, 2024
4c7d8ec
Fix linting errors
ajfriedman22 Sep 23, 2024
d210b89
Add more tests
ajfriedman22 Sep 23, 2024
978ea06
fix linting errors
ajfriedman22 Sep 23, 2024
b2392d7
Add tests for built-in MT-REXEE functions
ajfriedman22 Sep 23, 2024
db64e8b
Fix linting errors
ajfriedman22 Sep 23, 2024
7d67f25
Fix linting errors
ajfriedman22 Sep 23, 2024
5c77dfa
Fix file paths
ajfriedman22 Sep 23, 2024
991cef5
Fix test error
ajfriedman22 Sep 23, 2024
87dc95b
Fix read_top when top files are in a different dir
ajfriedman22 Sep 23, 2024
3b58267
Fix test errors
ajfriedman22 Sep 24, 2024
262ef80
Fix test errors
ajfriedman22 Sep 26, 2024
65b487f
Fix test file paths
ajfriedman22 Sep 27, 2024
47710ce
Fix file path issue in test
ajfriedman22 Sep 30, 2024
1a73e7e
Fix lint error
ajfriedman22 Sep 30, 2024
8ccb59d
Fix test file path
ajfriedman22 Oct 4, 2024
c39916c
Fix tests
ajfriedman22 Oct 7, 2024
3addd41
Move read_top function
ajfriedman22 Oct 7, 2024
c21bd0d
Make compute angle an internal function
ajfriedman22 Oct 7, 2024
15c1fd2
Combine perform shift functions
ajfriedman22 Oct 7, 2024
b7abcdb
Fix linting errors
ajfriedman22 Oct 7, 2024
f17660a
Fix linting errors
ajfriedman22 Oct 7, 2024
01c4de3
Fix test errors
ajfriedman22 Oct 7, 2024
0ef16ce
Merge branch 'master' into pr-57
ajfriedman22 Oct 7, 2024
e7f6919
Improve test coverage for gmx_parser
ajfriedman22 Oct 9, 2024
db3d682
Increase test coverage for set_params
ajfriedman22 Oct 9, 2024
b726199
Improve test coverage and remove redundant code
ajfriedman22 Oct 9, 2024
8b25cd6
Merge branch 'pr-57' of github.com:wehs7661/ensemble_md into pr-57
ajfriedman22 Oct 9, 2024
6546093
Fix linting
ajfriedman22 Oct 9, 2024
713a7b0
Fix linting
ajfriedman22 Oct 9, 2024
18042b5
Fix linting
ajfriedman22 Oct 9, 2024
2fcee10
Revise the GRO file writting step for coordinate modification
ajfriedman22 Oct 10, 2024
cf8b95f
Fix minor error with topology reading
ajfriedman22 Oct 10, 2024
296c181
Fix linting error
ajfriedman22 Oct 10, 2024
7024610
Fix tests
ajfriedman22 Oct 11, 2024
424fec2
Fix tests
ajfriedman22 Oct 11, 2024
867ae17
Generalize file writting and update test
ajfriedman22 Oct 11, 2024
69f307d
Clean-up files and move to internal functions
ajfriedman22 Oct 11, 2024
d635f9d
Improve test coverage
ajfriedman22 Oct 11, 2024
41edb07
Fix linting
ajfriedman22 Oct 11, 2024
98af948
Fix small error
ajfriedman22 Oct 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,21 @@ Ensemble Molecular Dynamics
[![Downloads](https://static.pepy.tech/badge/ensemble-md)](https://pepy.tech/project/ensemble-md)


`ensemble_md` is a Python package that provides methods for setting up, running, and analyzing GROMACS simulation ensembles. Currently, the package implements all the necessary algorithms for running synchronous replica exchange (REX) of expanded ensembles (EE), abbreviated as REXEE, as well as its multi-topology (MT) variation, MT-REXEE. Our future work includes implementing asynchronous REXEE and other possible variations of the REXEE method. For installation instructions, theory overview, tutorials, and API references, please visit the [documentation](https://ensemble-md.readthedocs.io/en/latest/?badge=latest) and our [JCTC paper](https://pubs.acs.org/doi/epdf/10.1021/acs.jctc.4c00484).
`ensemble_md` is a Python package that provides methods for setting up, running, and analyzing GROMACS simulation ensembles. Currently, the package implements all the necessary algorithms for running synchronous replica exchange (REX) of expanded ensembles (EE), abbreviated as REXEE, as well as its multi-topology (MT) variation, MT-REXEE. Our future work includes implementing asynchronous REXEE and other possible variations of the REXEE method. For installation instructions, theory overview, tutorials, and API references, please visit the [documentation](https://ensemble-md.readthedocs.io/en/latest/?badge=latest) and our papers (listed in the next section).

### Reference
### References
If you use any components of the Python package `ensemble_md` or the REXEE method in your research, please cite the following paper:

Hsu, W. T., & Shirts, M. R. (2024). Replica Exchange of Expanded Ensembles: A Generalized Ensemble Approach with Enhanced Flexibility and Parallelizability. *Journal of Chemical Theory and Computation*. 20.14 (2024): 6062-6081. doi: [10.1021/acs.jctc.4c00484](https://doi.org/10.1021/acs.jctc.4c00484)
> Hsu, W. T., & Shirts, M. R. (2024). Replica Exchange of Expanded Ensembles: A Generalized Ensemble Approach with Enhanced Flexibility and Parallelizability. *Journal of Chemical Theory and Computation*. 20.14 (2024): 6062-6081. doi: [10.1021/acs.jctc.4c00484](https://doi.org/10.1021/acs.jctc.4c00484)

### Copyright
If you use the MT-REXEE method in your research, please cite the following paper:

Copyright (c) 2022, Wei-Tse Hsu
> Friedman, A. J., Hsu, W. T., & Shirts, M. R. (2024). Multiple Topology Replica Exchange of Expanded Ensembles (MT-REXEE) for Multidimensional Alchemical Calculations. arXiv preprint arXiv:2408.11038. doi: [10.48550/arXiv.2408.11038](
https://doi.org/10.48550/arXiv.2408.11038)

### Authors
- Wei-Tse Hsu, University of Colorado, Boulder ([email protected])
- Anika Friedman, University of Colorado, Boulder ([email protected])


### Acknowledgements
Expand Down
2 changes: 2 additions & 0 deletions ensemble_md/cli/run_REXEE.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,8 @@ def main():
os.mkdir(f'{REXEE.working_dir}/sim_{i}/iteration_0')
MDP = REXEE.initialize_MDP(i)
MDP.write(f"{REXEE.working_dir}/sim_{i}/iteration_0/expanded.mdp", skipempty=True)
if REXEE.modify_coords == 'default' and (not os.path.exists('residue_connect.csv') or not os.path.exists('residue_swap_map.csv')): # noqa: E501
REXEE.process_top()

# 2-2. Run the first set of simulations
REXEE.run_REXEE(0)
Expand Down
174 changes: 164 additions & 10 deletions ensemble_md/replica_exchange_EE.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import ensemble_md
from ensemble_md.utils import utils
from ensemble_md.utils import gmx_parser
from ensemble_md.utils import coordinate_swap
from ensemble_md.utils.exceptions import ParameterError

comm = MPI.COMM_WORLD
Expand Down Expand Up @@ -157,6 +158,8 @@ def set_params(self, analysis):
optional_args = {
"add_swappables": None,
"modify_coords": None,
"resname_list": None,
"swap_rep_pattern": None,
"nst_sim": None,
"proposal": 'exhaustive',
"w_combine": False,
Expand Down Expand Up @@ -254,6 +257,10 @@ def set_params(self, analysis):
raise ParameterError(f"The parameter '{i}' should be a boolean variable.")

params_list = ['add_swappables', 'df_ref']
if self.resname_list is not None:
params_list.append('resname_list')
if self.swap_rep_pattern is not None:
params_list.append('swap_rep_pattern')
for i in params_list:
if getattr(self, i) is not None and not isinstance(getattr(self, i), list):
raise ParameterError(f"The parameter '{i}' should be a list.")
Expand Down Expand Up @@ -441,17 +448,24 @@ def set_params(self, analysis):

# 7-12. External module for coordinate modification
if self.modify_coords is not None:
module_file = os.path.basename(self.modify_coords)
module_dir = os.path.dirname(self.modify_coords)
if module_dir not in sys.path:
sys.path.append(module_dir) # so that the module can be imported
module_name = os.path.splitext(module_file)[0]
module = importlib.import_module(module_name)
if not hasattr(module, module_name):
err_msg = f'The module for coordinate manipulation (specified through the parameter) must have a function with the same name as the module, i.e., {module_name}.' # noqa: E501
raise ParameterError(err_msg)
if self.modify_coords == 'default':
if self.swap_rep_pattern is None and (not os.path.exists('residue_connect.csv') or not os.path.exists('residue_swap_map.csv')): # noqa: E501
raise Exception('swap_rep_pattern option must be filled in if using default swapping function and not swap guide') # noqa: E501
if self.resname_list is None and (not os.path.exists('residue_connect.csv') or not os.path.exists('residue_swap_map.csv')): # noqa: E501
raise Exception('resname_list option must be filled in if using default swapping function and not swap guide') # noqa: E501
self.modify_coords_fn = self.default_coords_fn
else:
self.modify_coords_fn = getattr(module, module_name)
module_file = os.path.basename(self.modify_coords)
module_dir = os.path.dirname(self.modify_coords)
if module_dir not in sys.path:
sys.path.append(module_dir) # so that the module can be imported
module_name = os.path.splitext(module_file)[0]
module = importlib.import_module(module_name)
if not hasattr(module, module_name):
err_msg = f'The module for coordinate manipulation (specified through the parameter) must have a function with the same name as the module, i.e., {module_name}.' # noqa: E501
raise ParameterError(err_msg)
else:
self.modify_coords_fn = getattr(module, module_name)
else:
self.modify_coords_fn = None

Expand Down Expand Up @@ -1496,3 +1510,143 @@ def run_REXEE(self, n, swap_pattern=None):
# want it to start parsing the dhdl file (in the if condition of if rank == 0) of simulation 3 being run by
# rank 3 that has not been generated, which will lead to an I/O error.
comm.barrier()

def default_coords_fn(self, molA_file_name: str, molB_file_name: str):
"""
Swap coordinates between two GRO files

Parameters
----------
molA_file_name : str
GRO file name for the moleucle to be swapped
molB_file_name : str
GRO file name for the other moleucle to be swapped
Return
------
None
"""
# Step 1: Load necessary files
import mdtraj as md
import pandas as pd

# Determine name for transformed residue
molA_dir = molA_file_name.rsplit('/', 1)[0] + '/'
molB_dir = molB_file_name.rsplit('/', 1)[0] + '/'

# Load trajectory trr for higher precison coordinates
molA = md.load_trr(f'{molA_dir}/traj.trr', top=molA_file_name).slice(-1) # Load last frame of trr trajectory
molB = md.load_trr(f'{molB_dir}/traj.trr', top=molB_file_name).slice(-1)

# Load the coordinate swapping map
connection_map = pd.read_csv('residue_connect.csv')
swap_map = pd.read_csv('residue_swap_map.csv')

# Step 2: Read the GRO input coordinate files and open temporary Output files
molA_file = open(molA_file_name, 'r').readlines() # open input file
molB_new_file_name = 'B_hybrid_swap.gro'
molB_new = open(molB_new_file_name, 'w')
molB_file = open(molB_file_name, 'r').readlines() # open input file
molA_new_file_name = 'A_hybrid_swap.gro'
molA_new = open(molA_new_file_name, 'w')

# Step 3: Determine atoms for alignment and swapping
nameA = coordinate_swap.deter_res(molA.topology, swap_map['Swap A'].to_list() + swap_map['Swap B'].to_list())
nameB = coordinate_swap.deter_res(molB.topology, swap_map['Swap A'].to_list() + swap_map['Swap B'].to_list())
df_atom_swap = coordinate_swap.deter_common(molA_file, molB_file, nameA, nameB)

# Step 4: Fix break if present for solvated systems only
if len(molA.topology.select('water')) != 0:
A_dimensions = coordinate_swap.get_dimensions(molA_file)
B_dimensions = coordinate_swap.get_dimensions(molB_file)
molA = coordinate_swap.fix_break(molA, nameA, A_dimensions, connection_map[connection_map['Resname'] == nameA]) # noqa: E501
molB = coordinate_swap.fix_break(molB, nameB, B_dimensions, connection_map[connection_map['Resname'] == nameB]) # noqa: E501

# Step 5: Determine coordinates of atoms which need to be reconstructed as we swap coordinates between molecules # noqa: E501
miss_B = df_atom_swap[(df_atom_swap['Swap'] == 'B2A') & (df_atom_swap['Direction'] == 'miss')]['Name'].to_list() # noqa: E501
miss_A = df_atom_swap[(df_atom_swap['Swap'] == 'A2B') & (df_atom_swap['Direction'] == 'miss')]['Name'].to_list() # noqa: E501
if len(miss_B) != 0:
df_atom_swap = coordinate_swap.get_miss_coord(molB, molA, nameB, nameA, df_atom_swap, 'B2A', swap_map[(swap_map['Swap A'] == nameB) & (swap_map['Swap B'] == nameA)]) # noqa: E501
if len(miss_A) != 0:
df_atom_swap = coordinate_swap.get_miss_coord(molA, molB, nameA, nameB, df_atom_swap, 'A2B', swap_map[(swap_map['Swap A'] == nameA) & (swap_map['Swap B'] == nameB)]) # noqa: E501

# Reprint preamble text
line_start = coordinate_swap.print_preamble(molA_file, molB_new, len(miss_B), len(miss_A))

# Print new coordinates to file for molB
coordinate_swap.write_new_file(df_atom_swap, 'A2B', 'B2A', line_start, molA_file, molB_new, nameA, nameB, copy.deepcopy(molA.xyz[0]), miss_A) # noqa: E501

# Print new coordinates to file
# Reprint preamble text
line_start = coordinate_swap.print_preamble(molB_file, molA_new, len(miss_A), len(miss_B))

# Print new coordinates for molA
coordinate_swap.write_new_file(df_atom_swap, 'B2A', 'A2B', line_start, molB_file, molA_new, nameB, nameA, copy.deepcopy(molB.xyz[0]), miss_B) # noqa: E501

# Rename temp files
os.rename('A_hybrid_swap.gro', molB_dir + '/confout.gro')
os.rename('B_hybrid_swap.gro', molA_dir + '/confout.gro')

def process_top(self):
"""
Process the input topologies in order to determine the atoms for alignment in the default GRO swapping
function. Output as csv files to prevent needing to re-run this step.

Parameters
----------
None
Return
------
None
"""
import pandas as pd

if not os.path.exists('residue_connect.csv'):
df_top = pd.DataFrame()
for f, file_name in enumerate(self.top):
# Read file
input_file = coordinate_swap.read_top(file_name, self.resname_list[f])

# Determine the atom names corresponding to the atom numbers
start_line, atom_name, state = coordinate_swap.get_names(input_file)

# Determine the connectivity of all atoms
connect_1, connect_2, state_1, state_2 = [], [], [], [] # Atom 1 and atom 2 which are connected and which state they are dummy atoms # noqa: E501
for l, line in enumerate(input_file[start_line:]): # noqa: E741
line_sep = line.split(' ')
if line_sep[0] == ';':
continue
if line_sep[0] == '\n':
break
while '' in line_sep:
line_sep.remove('')
connect_1.append(atom_name[int(line_sep[0])-1])
connect_2.append(atom_name[int(line_sep[1])-1])
state_1.append(state[int(line_sep[0])-1])
state_2.append(state[int(line_sep[1])-1])
df = pd.DataFrame({'Resname': self.resname_list[f], 'Connect 1': connect_1, 'Connect 2': connect_2, 'State 1': state_1, 'State 2': state_2}) # noqa: E501
df_top = pd.concat([df_top, df])
df_top.to_csv('residue_connect.csv')
else:
df_top = pd.read_csv('residue_connect.csv')

if not os.path.exists('residue_swap_map.csv'):
df_map = pd.DataFrame()
for swap in self.swap_rep_pattern:
# Determine atoms not present in both molecules
X, Y = [int(swap[0][0]), int(swap[1][0])]
lam = {X: int(swap[0][1]), Y: int(swap[1][1])}
for A, B in zip([X, Y], [Y, X]):
input_A = coordinate_swap.read_top(self.top[A], self.resname_list[A])
start_line, A_name, state = coordinate_swap.get_names(input_A)
input_B = coordinate_swap.read_top(self.top[B], self.resname_list[B])
start_line, B_name, state = coordinate_swap.get_names(input_B)

A_only = [x for x in A_name if x not in B_name]
B_only = [x for x in B_name if x not in A_name]

# Seperate real to dummy switches
df = coordinate_swap.deter_connection(A_only, B_only, self.resname_list[A], self.resname_list[B], df_top, lam[A]) # noqa: E501

df_map = pd.concat([df_map, df])

df_map.to_csv('residue_swap_map.csv')
Loading
Loading