Skip to content
YC.S edited this page Mar 12, 2019 · 30 revisions

All test codes can be found in $SPIPY_HOME/test_spipy/*


1. io example

from spipy.image import io

(1) transfer pdb to electron density map

density = io.pdb2density(pdb_file='1N0U1.pdb', resolution=1.5)

(2) write/read ccp4

io.writeccp4(volume=volume, save_file='1N0U1.ccp4')

(3) read pdb (only atom coordinates, index and mass)

atoms = io._readpdb(pdb_file='$YOUR_PDB_FILE')

(4) write xyz coordinates into a pdb file

# write 3 atoms to a pdb file
xyz_array = np.array([[0,0,0],[1,1,1],[-1,-1,1]])
atom_type = ['C','O','O']
io.xyz2pdb(xyz_array, atom_type, None, './carbon-dioxide.pdb')

(5) parse cxi/h5 file and print out its structures to help you find your interested part, actually it is the same with h5dump -N xxx.h5

io.cxi_parser('cxilr2816-r0024-c00.cxi', out='std')

cxiparser


2. pre-process example

from spipy.image import preprocess

(1) hit find

data = ...
background = ...
mask = ...
center = [center_x, center_y]
hits = preprocess.hit_find(dataset=data, background=background, radii_range=[center_x, center_y, 10, 100], mask=mask, cut_off=10)

(2) fix artifact and transfer adu to photon counts

data = np.load("test_adu.npy")
artifact = np.load("artifacts.npy")
artifact = np.where(artif==1)
artifact = np.vstack((artif[0], artif[1])).T
mask = np.load("mask.npy")
fix_data = preprocess.fix_artifact(dataset=data, estimated_center=np.array(data[0].shape)/2, artifacts=artifact, mask=mask )
adu, newdata = preprocess.adu2photon(dataset=fix_data, mask=mask, photon_percent=0.1, nproc=1, transfer=True, force_poisson=False)

(3) fix artifact without any artifacts location provided

# make sure the the input dataset is large enough and thus pattern numbers in each bin are no less than 100
newdata = preprocess.fix_artifact_auto(dataset=data, estimated_center=np.array(data[0].shape)/2, njobs=2, mask=mask, vol_of_bins=50)

NOTICE: to use this function, you should do classification first !

fix_art_auto_

(4) particle size estimation (only support spherical-like shape particles)

# you can use "lsq" or "q0" method
D = saxs.particle_size_sp(dataset=dataset, exparam=[581,7.5,0.55], fitarea=[10,40], badsearchr=60, method="lsq", mask=mask, center=cen, verbose=True)

[Explain]: the returned diameters may be <=0 (see figure below), which means those corresponding patterns can not be correctly estimated

particel-size-sp


3. classification example

from spipy.image import classify

One thing to mention : the input dataset should have both single-hit patterns and non-single-hit patterns

(1) high dimensional embedding and spectral clustering

# return d_comp (data after embedding) and predict (label prediction)
# (a) local linear embedding (LLE)
d_comp, predict = classify.cluster_fSpec(data, mask=mask, low_filter=0.3, decomposition='LLE', ncomponent=2, nneighbors=5, LLEmethod='standard')
# (b) singular-value decomposition (SVD)
d_comp, predict = classify.cluster_fSpec(data, mask=mask, low_filter=0.3, decomposition='SVD', ncomponent=2)
# (c) spectral embedding (SpecEM)
d_comp, predict = classify.cluster_fSpec(data, mask=mask, low_filter=0.3, decomposition='SpecEM', ncomponent=2)

You can get test outputs from a small dataset (50 patterns) given in the package: embed Usually, SVD method can have a precision of 70~90%, and the other two reach 70~100%

(2) t-SNE embedding with k-means

d_comp, predict = classify.cluster_fTSNE(data, mask=mask, low_filter=0.3, no_dims=2, perplexity=5, use_pca=True, initial_dims=20, max_iter=500, theta=0.5, randseed=-1, verbose=True)

t-sne

In most cases, the stability of t-SNE embedding is strongly affected by its parameters, and the accuracy varies from 60~100%. Also, you should be noticed that 'use_pca' do not guarantee to have better results.


4. merge

from spipy.merge import emc

(1) merge patterns using emc algorithm

# set up project work dir
fpath = '../test_pattern.h5'    # input dataset of patterns (hdf5 file)
emc.new_project(data_path = fpath, inh5 = "patterns", path = "./", name = None)
# configure parameters
config_essential = {'parameters|detd' : 581, 'parameters|lambda' : 7.9, \
					'parameters|detsize' : '260 257', 'parameters|pixsize' : 0.3, \
					'parameters|stoprad' : 40, 'parameters|polarization' : 'x', \
					'emc|num_div' : 10, 'emc|need_scaling' : 1, \
					'emc|beta' : 0.006, 'emc|beta_schedule' : '1.414 10' }
config_optional = {'parameters|ewald_rad' : '650.', 'make_detector|in_mask_file' : 'sample_mask.byt', \
					'emc|sym_icosahedral' : 0, 'emc|selection' : 'None', \
					'emc|start_model_file' : 'None'}
params = dict(config_essential, **config_optional)
emc.config(params)
# run, generate a job-submission script (cluster=True)
emc.run(num_proc=8, num_thread=12, iters=5, nohup=False, resume=False, cluster=True)

NOTICE : For explanation, here I list all the parameters, you don't need to do this (just change parameters whose default values are not proper to your problem), or copy this codes to your program directly

Then you can find a directory named "emc_01" is generated in present path. Inside there is a file named submit_job.sh, which you can modify and submit to your job arrangement system (such as PBS using 'qsub') :

merge

Switch to project dir and run 'python autoplot.py' to see merging results:

exp

(2) Utility functions for merging

from spipy.merge import utils as merge
import numpy as np

# cd to ./test_spipy/merge
quats = merge.get_quaternion(Num_level=20)[np.random.choice(16000,2000,replace=False)]
model_0 = np.fromfile('../phase/3dvolume.bin').reshape((125,125,125))
mask = np.zeros((81,81))
mask[39:42,:] = 1
# generate 2000 slices from model_0
slices = merge.get_slice(model=model_0, quaternions=quats[:2000,:4], det_size=[81,81], det_center=None, mask=mask)
# merge those 200 slices to a new model
model_1 = np.zeros(model_0.shape)
merge.merge_slice(model=model_1, quaternions=quats[:2000,:4], slices=slices, weights=None, det_center=None, mask=mask)
# calculate poisson likelihood between slice and experiment pattern
# pat is a experiment pattern with size=(81,81)
R_jk = merge.poisson_likelihood(W_j=slices[i], K_k=pat, beta=5, weight=None)

The time cost for get_slice is 3~5ms / slice, and for merge_slice is 5~9ms / slice in this case.

merge_utils likelihood

5. phase retrieval

from spipy.phase import phase2d
from spipy.phase import phase3d

(1) phasing patterns

# configure parameters 
# Again, you don't need to list all parameters in your program,
# just change parameters whose default values are not proper to your problem,
# or copy this codes to your program directly

# we repeat 10*2=20 times
params_essential = {'input|shape' : '123,123', 'input|padd_to_pow2' : True, \
		'input|inner_mask' : 6, 'input|outer_mask' : 64, \
		'input|outer_outer_mask' : None, 'input|mask_edges' : True, \
		'phasing|repeats' : 10, 'phasing|iters' : '100RAAR 200DM 200ERA', \
		'phasing_parameters|support_size' : 100, 'phasing_parameters|beta' : 0.8}
params_optional = {'input|subtract_percentile' : None, 'input|spherical_support' : None, \
		'phasing_parameters|background' : 'True', 'input|init_model' : None}

# create project, configure and run
phase2d.new_project(data_mask_path=['pattern.bin','pat_mask.npy'], path='./', name=None)
parameters = dict(params_essential, **params_optional)
phase2d.config(params = parameters)
phase2d.run(num_proc=2,nohup=True,cluster=False)   # run in the background, change nohup=False if you want it to print log in terminal

Then you will find a new directory named "phase2d_1" is created in local path, and inside it 'phase.log' file records real-time log of this phasing procedure.

Here in "params", the 'phasing|repeat' is the repeat time of every process (not total repeat time).

After finishing, a 'output.h5' file will be generated. Run:

python show_result.py output.h5

to see phasing results (in the fifth PRTF_rav figure, as we do not supply experiment parameters, the q values are actually in pixels).

phase2d

(2) phasing merged scattering volume

# It is very similar to phase2d
params_essential = {'input|shape' : '125,125,125', 'input|padd_to_pow2' : True, \
		'input|inner_mask' : 4, 'input|outer_mask' : 55, \
		'input|outer_outer_mask' : 64, 'input|mask_edges' : True, \
		'phasing|repeats' : 10, 'phasing|iters' : '100RAAR 200DM 200ERA', \
		'phasing_parameters|voxel_number' : 2000, 'phasing_parameters|beta' : 0.8}
params_optional = {'input|subtract_percentile' : None, 'input|spherical_support' : None, \
		'phasing_parameters|background' : 'True', 'input|init_model' : None}

phase3d.new_project(data_path='3dvolume.bin', mask_path=None, path='./', name='mytest')
parameters = dict(params_essential, **params_optional)
phase3d.config(params = parameters)
phase3d.run(num_proc=2, nohup=False, cluster=True)

Similar to phase2d, a new directory named "mytest" (as we specified in program!) is created. Here we set 'cluster=True' which means you have to submit jobs to your cluster by your self (a 'submit_job.sh' is created as template).

Here in "params", the 'phasing|repeat' is the repeat time of every process (not total repeat time).

Finally, run 'python show_result.py output.h5' to show phasing results:

intersection_030


6. simulation

from spipy.simulation import sim
from spipy.simulation import sim_adu

(1) simulate (photon count) patterns by Fourier transform

# all parameters, change what you need
config_default = {'parameters|detd' : 150.0, 'parameters|lambda' : 2.5, \
                  'parameters|detsize' : 128, 'parameters|pixsize' : 0.3, \
                  'parameters|stoprad' : 5, 'parameters|polarization' : 'x', \
                  'make_data|num_data' : 100, 'make_data|fluence' : 1.0e14}

# create project and start simulation !
sim.generate_config_files(pdb_file='./1N0U1.pdb', workpath=None, name='simu_test', params=config_default)
sim.run_simulation()

After finishing, a new directory named "simu_test" (as we specified) is created in local path, and inside it in path './simu_test/data' there is output data file '1N0U1.h5'. Show some simulation results here:

simulation

The program uses FFT algorithm to calculate Fourier transform, and we can get 100 simulated pattern within 30 seconds.

(2) simulate adu patterns by calculating atom scattering

# parameters
pdb="./1N0U1.pdb"
config_param = {'parameters|detd' : 200.0, 'parameters|lambda' : 2.5, \
	'parameters|detsize' : 128, 'parameters|pixsize' : 0.3, \
	'parameters|stoprad' : 6, 'parameters|polarization' : 'x', \
	'make_data|num_data' : 10, 'make_data|fluence' : 1.0e14, \
	'make_data|scatter_factor' : True, 'make_data|ram_first' : True, \
	'make_data|poisson' : False}
euler_range = np.array([[0,np.pi/2.0],[0,np.pi/2.0],[0,np.pi/2.0]])

# use single process
sim_adu.single_process(pdb_file=pdb, param=config_param, \
		euler_mode='helix', euler_order='zxz', euler_range=euler_range, predefined=None, save_dir='./')

# use 2 processes (Note: n processes usually cost 1.5n~2n cpu cores as there are thread parallel inside)
# run "mpirun -n 2 python my_file_name.py" in terminal
sim_adu.multi_process(save_dir='./', pdb_file=pdb, param=config_param, \
		euler_mode='random', euler_order='zxz', euler_range=euler_range, predefined=None)

After it finishes, a hdf5 file named 'spipy_adu_simulation_xxx.h5' is generated ('xxx' is the local time while generating the file) and all information about the simulation are inside it. Here show some simulation results:

sim_adu

NOTICE: As we need exponential and dot calculation of very large complex matrix, the simulation is pretty slow. For this case, a 6700 atoms protein with 128*128 detector, it costs ~4s to generate a pattern (using Core i5 6th U) with <100MB RAM cost. However, the time is proportional to detector pixel number. If you turn off 'make_data|ram_first' then about 5s per pattern and cost 2GB RAM.

You can disable 'make_data|scatter_factor' to save 25% of the time.


7. Useful tools for analysis

(1) reciprocal unit q (shannon pixel) with real pixel

from spipy.analyse import q
qinfo = q.cal_q(detd=581, lamda=7.9, det_r=128, pixsize=0.3)
oversampling_rate = q.oversamp_rate(sample_size=75, detd=581, lamda=7.9, pixsize=0.3)

It's very convenient to calculate q values of each pixel on patterns, and get the oversampling rate of experiment.

(2) radial profile calculation

from spipy.image import radp

# 2d/3d radial profile, here 3d for example
# You should calculate pat_center using saxs.frediel_search() function (discuss below)
radpro = radp.radial_profile_3d(data=volume, center=vol_center, mask=vmask)

# generate shells on a circle/sphere surface, here 3d for example
shell3d = radp.shells_3d(rads=[20,30], data_shape=(64,64,64), center=(32,32,32))

# normalize pattern 'pat' by comparing reference radial profile 'this_ref_Iq'
# You should calculate pat_center using saxs.frediel_search() function (discuss below)
newpatt = radp.radp_norm_3d(ref_Iq=this_ref_Iq, data=pat, center=pat_center, mask=pmask)

# generate solid cirle/sphere, 3d for example
cir = radp.circle(data_shape=3, rad=20)

Results are shown below: radp

(3) model comparison and model align

from spipy.analyse import rotate
from spipy.analyse import criterion

grid_unit = [0.3, 0.03, 0.01]
nproc = 11  # 2*pi/0.3~21, 0.3/0.03=10, 0.03/0.01=3

outer_cut = 50
inner_cut = 10
d1 = np.fromfile(fix).reshape((125,125,125))
d2 = np.fromfile(mov).reshape((125,125,125))
center = np.array(d1.shape, dtype=int)/2
# we'd better cut it to a smaller one to save time
d1_small = copy.deepcopy(d1[center[0]-outer_cut:center[0]+outer_cut, center[1]-outer_cut:center[1]+outer_cut, center[2]-outer_cut:center[2]+outer_cut])
d2_small = copy.deepcopy(d2[center[0]-outer_cut:center[0]+outer_cut, center[1]-outer_cut:center[1]+outer_cut, center[2]-outer_cut:center[2]+outer_cut])
d1_small[center[0]-inner_cut:center[0]+inner_cut, center[1]-inner_cut:center[1]+inner_cut, center[2]-inner_cut:center[2]+inner_cut] = 0
d2_small[center[0]-inner_cut:center[0]+inner_cut, center[1]-inner_cut:center[1]+inner_cut, center[2]-inner_cut:center[2]+inner_cut] = 0

# now align model d2 to model d1
_,ea,_ = rotate.align(d1_small, d2_small, grid_unit, nproc)
newd2 = rotate.rot_ext(ea, 'zxz', d2)
# calculate (overall) r-factor
rf = criterion.r_factor_shell(newd2, d1, np.arange(2, d2.shape[0]/2-1))
rf_all = criterion.r_factor(newd2, d1)

After calculation, you can generate a hdf5 file and save the aligned model and other information together. Further more, You can also estimate the rotation euler angle using SASTBX.superpose, as it is much faster than this program.

align

(4) saxs analysis

from spipy.analyse import saxs

# calculate saxs from a set of patterns
saxs_pat = saxs.cal_saxs(data=dataset)

# search center of a scattering pattern, using frediel symmetry[1]
cen = saxs.frediel_search(pattern=saxs_pat, estimated_center=(64,64), mask=mask)

# calculate averaged intensity radial profile of this dataset (saxs profile)
inten_prof = saxs.inten_profile_vfast(dataset=dataset, mask=mask, 581, 7.9, 64, 0.6)

# Evaluate average particle size using saxs profile
[estimated, radial_intensity] = saxs.particle_size(saxs=saxs_pat, estimated_center=cen, exparam='578,7.9,0.6',\
	 high_filter_cut=0.5, power=0.7, mask=mask)

[NOTICE]

  1. saxs.frediel_search function can be applied to any scattering patterns, not only saxs patterns
  2. saxs.inten_profile_vfast runs faster but less accurate, you can use saxs.inten_profile_vaccurate to calculate saxs profile too (but it's really slow)
  3. The output value 'estimated' of saxs.particle_size is only valid when analyzing saxs patterns. You can use 'radial_intensity' to handle normal scattering patterns. Actually the last "hill" of 'radial_intensity' correspond to Dmax value of the particle.

saxs

(5) orientation analysis

from spipy.analyse import orientation

# draw probability distribution of emc output orientation, 'orientation_xxx.bin'
# in-plane rotation not considered!
ori_data = 'orientations_100.bin'
orientation.draw_ori_Df(ori_bin=ori_data, q_level=10)

# generate uniform/random distributed orientations, which is similar to 'radp.shells_xd'
xyz_1, theta_phi_1 = orientation.Sphere_randp(algo='uniform-2', radius=50, num=1000)
xyz_2, theta_phi_2 = orientation.Sphere_randp(algo='random', radius=50, num=1000)
  • Besides drawing emc output, you can also process quaternions with weights and display them with other functions, like orientation.proc_Hammer and orientation.draw_Hammer
  • The returned xyz/theta-phi coordinates from Sphere_randp are FLOAT, and are more accurate than voxel based function radp.shells_xd

Here we show the generated orientation distribution figure:

ori