Note this is the Flatiron Institute fork of the original ManifoldEM. This particular fork is maintained by Robert Blackwell and Sonya Hanson. This work has made significant contributions in refactoring, code cleanup, optimization, portability, standardization, and distribution since the fork.
This repository contains the Python software implementation of ManifoldEM for determination of conformational continua of macromolecules from single-particle cryo-EM data, as was first introduced by Dashti, et al. (2014). A detailed user manual is provided here. Carefully going through this manual will prepare you for running ManifoldEM on your own data sets. If you have any questions about ManifoldEM after reading this entire document, carefully check this GitHub forum for similar inquiries or, if no similar posts exist, create a new thread detailing your inquiry.
This software was initially developed in the Frank research group at Columbia University (https://joachimfranklab.org) in collaboration with members from UWM (see below). The following resources may prove useful for a review of ManifoldEM history, theory and implementations:
- Dashti, A. et al. Trajectories of the ribosome as a Brownian nanomachine. PNAS, 2014.
- Dashti, A. et al. Retrieving functional pathways of biomolecules from single-particle snapshots. Nature Communications, 2020.
- Mashayekhi, G. ManifoldEM Matlab repository. https://github.com/GMashayekhi/ManifoldEM_Matlab
- Seitz, E. et al. Geometric machine learning informed by ground truth: Recovery of conformational continuum from single-particle cryo-EM data of biomolecules. bioRxiv, 2021.
Should be installable in any modern python/conda environment (python 3.9+, though mayavi
and
pyqt
packages don't always immediately work with the most recent version of python).
python:
# create virtual environment. feel free to change the path!
python3 -m venv ~/envs/manifoldem
source ~/envs/manifoldem/bin/activate
pip install --upgrade pip
pip install "git+ssh://[email protected]/flatironinstitute/ManifoldEM"
manifold-gui
conda:
conda create -n manifoldem
conda activate manifoldem
conda install mayavi pyqt=5 python=3.10 -c conda-forge
pip install "git+ssh://[email protected]/flatironinstitute/ManifoldEM"
manifold-gui
Note that when using conda, this bypasses conda's package management system and can lead to
problems if you later install packages into this environment with conda install
. It's
recommended to keep an environment purely for ManifoldEM
.
Some environments might not allow hardware 3D acceleration, such as via X forwarding or most
VNC/virtual desktop environments. To work around this, you can disable any 3D visualization
widgets in the GUI. This can be done by setting the environment variable by providing the -V
flag to manifold-gui
manifold-gui -V
For most steps in the ManifoldEM pipeline, the GUI is unnecessary and sometimes even burdensome. For
this reason we supply a basic command line interface. The CLI allows the user to make more granular
steps through the analysis pipeline and is generally more useful for cluster/remote environment and
debugging. All CLI invocations start with the program manifold-cli
. If you run manifold-cli
with
no arguments, it will print a help message and exit.
% manifold-cli
ManifoldEM version: 0.2.0b1.dev190+g447ab76.d20231109
usage: manifold-cli [-h] [-n NCPU] {init,threshold,calc-distance,manifold-analysis,psi-analysis,nlsa-movie,find-ccs,energy-landscape,trajectory} ...
Command-line interface for ManifoldEM package
positional arguments:
{init,threshold,calc-distance,manifold-analysis,psi-analysis,nlsa-movie,find-ccs,energy-landscape,trajectory}
init Initialize new project
threshold Set upper/lower thresholds for principal direction detection
calc-distance Calculate S2 distances
manifold-analysis Initial embedding
psi-analysis Analyze images to get psis
nlsa-movie Create 2D psi movies
find-ccs Find conformational coordinates
energy-landscape Calculate energy landscape
trajectory Calculate trajectory
options:
-h, --help show this help message and exit
-n NCPU, --ncpu NCPU
The output shows that there are nine sub-commands listed in the order they belong in the
pipeline. Some commands support additional arguments, especially the init
command, which creates a
new project in your current working directory. To see how to use a given command, simply run the
command with a following -h
flag, e.g.
% manifold-cli init -h
ManifoldEM version: 0.2.0b1.dev190+g447ab76.d20231109
usage: manifold-cli init [-h] -p PROJECT_NAME [-v AVG_VOLUME] [-a ALIGNMENT] [-i IMAGE_STACK] [-m MASK_VOLUME] -s PIXEL_SIZE -d DIAMETER -r RESOLUTION [-x APERTURE_INDEX]
[-o]
options:
-h, --help show this help message and exit
-p PROJECT_NAME, --project-name PROJECT_NAME
Name of project to create
-v AVG_VOLUME, --avg-volume AVG_VOLUME
-a ALIGNMENT, --alignment ALIGNMENT
-i IMAGE_STACK, --image-stack IMAGE_STACK
-m MASK_VOLUME, --mask-volume MASK_VOLUME
-s PIXEL_SIZE, --pixel-size PIXEL_SIZE
-d DIAMETER, --diameter DIAMETER
-r RESOLUTION, --resolution RESOLUTION
-x APERTURE_INDEX, --aperture-index APERTURE_INDEX
-o, --overwrite Replace existing project with same name automatically
An example invocation then might look like
manifold-cli init -v J310/J310_003_volume_map.mrc -a J310/from_csparc.star -i J310/signal_subtracted.mrcs -s 1.22 -d 160.0 -r 3.02 -x 1 -p my_J310_analysis
The rest of the commands will take as their final argument the "toml" file generated from the
initialization step. Let's set the image count thresholds and calculate the distance matrices for
the leading five eigenvalue decompositions ("psis") next as an example. Note the -n 16
before
the sub-command. Most processing steps support this option, which specifies how many workers to use
in processing. In most cases you want roughly the output of the nproc
command. My workstation has
16 physical cores, so I specify 16 below for the matrix calculation step. Supplying -n
for
commands that don't support parallel processing is harmless.
% manifold-cli threshold --prd_thres_low 100 --prd_thres_high 4000 params_my_J310_analysis.toml
ManifoldEM version: 0.2.0b1.dev190+g447ab76.d20231109
% manifold-cli -n 16 calc-distance --num_psi 5 params_my_J310_analysis.toml
ManifoldEM version: 0.2.0b1.dev190+g447ab76.d20231109
Computing the distances...
Calculating projection direction information
RELION Optics Group found.
Number of PDs: 132
Neighborhood epsilon: 0.053387630212191464
Number of Graph Edges: (926, 2)
Performing connected component analysis.
Number of connected components: 2
Number of Graph Edges: (485, 2)
Number of Graph Edges: (441, 2)
100%|████████████████| 132/132 [01:09<00:00, 1.89it/s]
This has created a significant amount of data stored in the output/my_J310_analysis/distances
--
one file for each principal direction. Currently there isn't much tooling to visualize these
outputs, though that is a work in progress. Each file is a python pickle
file and can be inspected
using the usual python tooling for the curious user.
Let's finish up the first major stage of the pipeline. I'm hiding the output for clarity's sake.
% manifold-cli -n 16 manifold-analysis params_my_J310_analysis.toml &> /dev/null
% manifold-cli -n 16 psi-analysis params_my_J310_analysis.toml &> /dev/null
% manifold-cli -n 16 nlsa-movie params_my_J310_analysis.toml &> /dev/null
At this point, if you wanted to visualize and manually manipulate the principle directions and
associated data, you could simply manifold-gui -R params_my_J310_analysis.toml
. Here you could set
the anchor directions, manually control the sense of each direction, remove directions, and other
various things. Once you hit the "Compile Results" command, you can continue using the command
line. Here I set a few anchors and will continue on...
% manifold-cli -n 16 find-ccs params_my_J310_analysis.toml &> /dev/null
% manifold-cli -n 16 energy-landscape params_my_J310_analysis.toml &> /dev/null
% manifold-cli -n 16 trajectory params_my_J310_analysis.toml &> /dev/null
Researchers/the curious are possibly interested in various things "under the hood" in
ManifoldEM
. We've provided a basic interface for accessing some of the internal data
calculated/written during various stages of the ManifoldEM
pipeline. Most internal state
isn't fully documented, and is likely to change, so it might take some sleuthing/educated
guesswork to figure out what you're actually looking at, and that might disappear in future
versions. Regardless, all exposed data can be accessed via the data_store
API
, which is
documented and accessible via the python
help
interface. An example ipython session...
In [1]: from ManifoldEM.data_store import data_store
...: from ManifoldEM.params import params
...: params.load('params_J310.toml')
...: prd = data_store.get_prd_data(117)
...: npix = params.ms_num_pixels
...: print(prd)
prd_index: 117
S2_bin_index: 17310
bin_center: [ 0.01355376 -0.00546575 0.9998932 ]
n_images: 216
occupancy: 216
trash: False
anchor: True
cluster_id: 0
In [2]: help(prd)
...
class PrdData(builtins.object)
| PrdData(prd_index: int)
|
| Represents a single projection direction, providing access to its raw and transformed images, CTF images, and metadata.
|
| Attributes
| ----------
| info : PrdInfo
| Metadata about the projection direction.
| raw_images : ndarray
| The raw images from the image stack associated with the projection direction.
| transformed_images : ndarray
| The filtered and "in-plane" rotated images associated with the projection direction.
| ctf_images : ndarray
| The Contrast Transfer Function (CTF) images associated with the projection direction.
| psi_data : dict
| The embedding data associated with the projection direction.
| EL_data : dict
| The NLSA/eigenvalue data associated with the projection direction.
| dist_data : dict
| The distance information between images in the projection direction, including transformed images
| in the `transformed_images` attribute.
|
| ...
|
| ----------------------------------------------------------------------
| Readonly properties defined here:
|
| EL_data
|
| ctf_images
|
| info
|
| psi_data
|
| raw_images
|
| transformed_images
|
| ----------------------------------------------------------------------
...
In [3]: import matplotlib.pyplot as plt
In [4]: plt.imshow(prd.EL_data['IMG1'][:,-1].reshape(npix, npix))
Out[4]: <matplotlib.image.AxesImage at 0x7fde924838b0>
In [5]: plt.show()
In [6]: prd.EL_data.keys()
Out[6]: dict_keys(['IMG1', 'IMGT', 'posPath', 'PosPsi1', 'psirec', 'tau', 'psiC1', 'mu', 'VX', 'sdiag', 'Topo_mean', 'tauinds'])
In [7]: prd.psi_data.keys()
Out[7]: dict_keys(['lamb', 'psi', 'sigma', 'mu', 'posPath', 'ind', 'logEps', 'logSumWij', 'popt', 'R_squared'])
In [8]: prd.raw_images.shape
Out[8]: (216, 192, 192)
In [9]: plt.imshow(prd.raw_images[10]); plt.show()
In [10]: plt.imshow(prd.transformed_images[10]); plt.show()
In [11]: prd.transformed_images.shape
Out[11]: (216, 192, 192)
Original ManifoldEM Python team (alphabetically ordered):
- Ali Dashti, University of Wisconsin-Milwaukee
- Joachim Frank, Columbia University
- Hstau Liao, Columbia University
- Suvrajit Maji, Columbia University
- Ghoncheh Mashayekhi, University of Wisconsin-Milwaukee
- Abbas Ourmazd, University of Wisconsin-Milwaukee
- Peter Schwander, University of Wisconsin-Milwaukee
- Evan Seitz, Columbia University
The original individual author contributions are usually provided in the headers of each source file, or in the functions. While reasonable effort has been made to retain copyright notices for individual contributions from the source material, significant refactorings have made some individual contributions hard to track or ultimately meaningless.
If you find this code useful in your work, please cite
{E. Seitz et al., "ManifoldEM Python repository," Zenodo, 2021, doi: 10.5281/zenodo.5578874}
ManifoldEM Copyright (C) 2020-2023 Robert Blackwell, Ali Dashti, Joachim Frank, Sonya Hanson, Hstau Liao, Suvrajit Maji, Ghoncheh Mashayekhi, Abbas Ourmazd, Peter Schwander, Evan Seitz
The software, code sample and their documentation made available on this repository could include technical or other mistakes, inaccuracies or typographical errors. We may make changes to this software or documentation at any time without prior notice, and we assume no responsibility for errors or omissions therein.
For further details, please see the LICENSE file.