Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to example-config.yaml, model_generator.py, run_experiment.py, simulator.py, requirements.txt, .gitignore, beeline-inputs-boolean.yaml. Reorganized genVis.py #15

Open
wants to merge 67 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
8d17f26
updated example-config.yaml with proper parameter
mallenjon Jun 25, 2021
5ef0715
updated genVis.py, example-config.yaml, and model_generator.py
mallenjon Jun 28, 2021
3448680
genVis.py updated labels and author credits
mallenjon Jun 28, 2021
dcb6f01
removed test comments from edited line 115 of genVis.py
mallenjon Jun 28, 2021
d69da1d
removed irrelevant ICs file for EMT model
mallenjon Jun 28, 2021
4ef09fa
attempt to remove local files from remote branch
mallenjon Jun 28, 2021
4d4e2c4
Delete outputs directory
mallenjon Jun 28, 2021
099fde4
attempt to remove local files from remote branch
mallenjon Jun 28, 2021
50dd090
Merge branch 'jon-mallen' of https://github.com/Murali-group/BoolODE …
mallenjon Jun 28, 2021
3fbde16
resolve unnecessary discrepancies with master
mallenjon Jun 28, 2021
d6d59c4
address issue with underscore in gene names
mallenjon Jun 28, 2021
aa23070
address issue with underscore in gene names
mallenjon Jun 28, 2021
a7becea
address issue with underscore in gene names
mallenjon Jun 28, 2021
dbed96d
address issue with underscore in gene names
mallenjon Jun 28, 2021
63d37fe
address issue with underscore in gene names
mallenjon Jun 28, 2021
6920311
test .gitignore
mallenjon Jun 29, 2021
d4653aa
test .gitignore
mallenjon Jun 29, 2021
886e7ac
test .gitignore
mallenjon Jun 29, 2021
77eec0f
test .gitignore
mallenjon Jun 29, 2021
5892776
test .gitignore
mallenjon Jun 29, 2021
e85c759
test .gitignore
mallenjon Jun 29, 2021
54e26ef
test .gitignore
mallenjon Jun 29, 2021
309dcd9
test .gitignore
mallenjon Jun 29, 2021
6821940
test .gitignore
mallenjon Jun 29, 2021
f38a2f9
test .gitignore
mallenjon Jun 29, 2021
cfdc05d
test .gitignore
mallenjon Jun 29, 2021
3324f53
test .gitignore
mallenjon Jun 29, 2021
da0c8a2
test .gitignore
mallenjon Jun 29, 2021
7423fd9
test .gitignore
mallenjon Jun 29, 2021
56bb932
test .gitignore
mallenjon Jun 29, 2021
cef2151
test .gitignore
mallenjon Jun 29, 2021
d7c871b
test .gitignore
mallenjon Jun 29, 2021
2419525
test .gitignore
mallenjon Jun 29, 2021
97ba9f9
test .gitignore
mallenjon Jun 29, 2021
ef104a5
change name of example output directory
mallenjon Jun 29, 2021
ed64843
add customization instructions for output in .gitignore
mallenjon Jun 29, 2021
d6e8f44
give all requirements ability to have versions beyond minimum
mallenjon Jun 29, 2021
79634db
update genVis.py
mallenjon Jun 29, 2021
9555eb4
update genVis.py
mallenjon Jun 29, 2021
aeb7cbd
Update genVis.py and dependencies, add comments
mallenjon Jul 1, 2021
039e159
Update genVis.py and dependencies, add comments
mallenjon Jul 1, 2021
4e7abdb
Rename beeling-inputs-boolean.yaml to beeline-inputs-boolean.yaml
mallenjon Jul 1, 2021
943e284
update plot name argument
mallenjon Jul 1, 2021
973dd39
Merge branch 'jon-mallen' of https://github.com/Murali-group/BoolODE …
mallenjon Jul 1, 2021
d51c0e2
add changes to example-config.yaml
mallenjon Jul 1, 2021
1407cc6
add changes to example-config.yaml
mallenjon Jul 1, 2021
9dd798f
fix error in --plotName argument parse
mallenjon Jul 1, 2021
e41c00d
edit help messages in genVis.py for clarity
mallenjon Jul 1, 2021
e4077da
edit help messages in genVis.py for clarity
mallenjon Jul 1, 2021
6b4a7cc
add additional developer features and instructions for .gitignore
mallenjon Jul 2, 2021
8465599
add additional developer features and instructions for .gitignore
mallenjon Jul 2, 2021
2c0e213
blocking .yaml file pushes to git no longer necessary
mallenjon Jul 2, 2021
a4e96bf
remove leftover comment out line in genVis.py
mallenjon Jul 6, 2021
5b45bd1
change 'Clusters' to 'k-Means Clusters'
mallenjon Jul 6, 2021
e060dd7
genVis.py: plot titles fixed and dimred file now saved as .csv
mallenjon Jul 9, 2021
e7078ef
update network name argument in genVis.py
mallenjon Jul 9, 2021
9df0977
add PyYAML to requirements.txt
mallenjon Jul 15, 2021
5b9d1fd
move outer labels command to better location
mallenjon Jul 23, 2021
af87f35
Remove deprecated parameter of sklearn k-Means
mallenjon Jul 26, 2021
f7bcb82
update required version of scikit-learn in requirements,txt
mallenjon Jul 26, 2021
c6dedf8
fix some comments in genVis.py
mallenjon Jul 27, 2021
1422c6c
fix labels in genVis.py
mallenjon Jul 27, 2021
7052f59
adjust font size in genVis.py
mallenjon Jul 27, 2021
5c28a76
reorganize genVis.py
mallenjon Feb 4, 2022
a3a7cea
update function name in genVis
mallenjon Feb 4, 2022
576459e
update function name in genVis
mallenjon Feb 4, 2022
6115392
fix annotations in plot section of genVis
mallenjon Feb 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -103,16 +103,21 @@ venv.bak/
# mypy
.mypy_cache/

# Local files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A core principle is that a user should never mix the directory for the outputs with the source code. So we should not need this comment. Nevertheless, people make this mistake. So move this comment down to occur just before the line containing "outputs/"

*.png
*.csv
model.py*
/*.txt
/data/*
model.py
*.txt
*.yaml
!example-config.yaml
data/
/src/final-plots.py
/src/run-in-parallel.py
/src/tsne-gene-gradients.py
/start-scripts.py
/simulatedmodels.sh
/.Rhistory
*/ExpressionsData.csv
*.csv
ExpressionData.csv
# Be sure to change 'outputs/' to '<your output directory name>/'. If you have more than one output directory, add
# accordingly by the same format.
outputs/
5 changes: 2 additions & 3 deletions BoolODE/model_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,9 @@ def readBooleanRules(self):
self.withoutRules = list(self.allnodes.difference(set(self.withRules)))

## Every node without a rule is treated as follows:
## If the user has specified a Parameter Input file treat as parameter, else
## If the user has specified a Parameter Input file treat as parameter, else
for n in self.withoutRules:
if not self.parameterInputsDF.empty\
and n in self.parameterInputsDF['Input'] :
if not self.parameterInputsDF.empty and n in self.parameterInputsDF.values:
print("Treating %s as parameter" % {n})
else:
print(n, "has no rule, adding self-activation.")
Expand Down
2 changes: 1 addition & 1 deletion BoolODE/simulator.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,5 +140,5 @@ def getInitialCondition(ss, ModelSpec, rnaIndex,
pss = ((ModelSpec['pars']['r_' + genename])/\
(ModelSpec['pars']['l_p_' + genename]))\
*new_ics[revvarmapper['x_' + genename]]
new_ics[revvarmapper['p_' + genename.replace('_','')]] = pss
new_ics[revvarmapper['p_' + genename]] = pss
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this replace was there in the first place. Can you try to trace back to the commit that introduced this line and see if you can find a reasons?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have viewed the version history for simulator.py. The line of code new_ics[revvarmapper['p_' + genename.replace('_','')]] = pss has not changed since the commit by Dr. Jalihal on March 29, 2020. In fact, the entire file has not experienced any changes since that commit. It is not clear why Dr. Jalihal made this choice. However, I have tracked the use of the method getInitialCondition within the 'BoolODE' package using PyCharm, and also looked at how these parameters are written to the model.py file. I see no evidence that maintaining the underscore in the gene name would hinder functionality. The only downside would be that when viewing the actual parameter names, the user may not find clear which underscores separate genes and which are just part of the gene name. However, it would be of no difference to the computer, which merely reads the value assigned to that parameter name.

return(new_ics)
17 changes: 10 additions & 7 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
scipy==1.2.1
matplotlib==3.0.3
numpy==1.16.2
tqdm==4.31.1
seaborn==0.9.0
pandas==0.24.2
scikit-learn==0.21.3
scipy>=1.2.1
matplotlib>=3.0.3
numpy>=1.16.2
tqdm>=4.31.1
seaborn>=0.9.0
pandas>=0.24.2
scikit-learn>=0.21.3
freetype-py
pypng
umap-learn
203 changes: 158 additions & 45 deletions scripts/genVis.py
Original file line number Diff line number Diff line change
@@ -1,58 +1,171 @@
__author__ = 'Jon Mallen'

import os
import sys
from sklearn.manifold import TSNE
#from MulticoreTSNE import MulticoreTSNE as TSNE
from sklearn.decomposition import PCA
import numpy as np
from umap import UMAP
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import pandas as pd
from optparse import OptionParser
parser = OptionParser()
import argparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for using argparse rather than OptionParser?

Copy link
Collaborator Author

@mallenjon mallenjon Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I too asked the same thing before making the switch from OptionParser to argparse. The reason is flexibility. A careful user specifies the dimension they wish for each dimensional reduction as a value after each argument. However, sometimes the user might forget to specify a dimension value, and in that case, I believe it should just default to 2 instead of failing altogether. Therefore, the argument should be able to take in 0 or 1 values. I set it up to give an error message if anything beyond that is input. Take t-SNE, for example. I can argue -t and the script would return a 2D t-SNE plot. Or, I can argue -t 2 or -t 3 to return 2D or 3D t-SNE, respectively. It is argparse that allows for this flexibility, while OptionParser from optparse does not have this same functionality.

import itertools

parser.add_option('-i','--inFile',default='',type=str,
help='Specify input expression matrix file name')
# Define arguments
parser = argparse.ArgumentParser("Visualize the simulated single-cell gene expression data output by BoolODE.")
parser.add_argument('-f', '--pathToFiles', default='', type=str, help='Specify path to ExpressionData.csv and '
'PseudoTime.csv files generated by the BoolODE '
'simulation, as well as the ClusterIds.csv '
'present if the user specified at least two '
'clusters in the simulation.')
parser.add_argument('-p', '--pca', nargs='*', help='Use PCA for visualizing the data. '
'Specify the number of dimensions (2 or 3) as argument. '
'Default is 2.')
parser.add_argument('-t', '--tsne', nargs='*', help='Use t-SNE for visualizing the data. '
'Specify the number of dimensions (2 or 3) as argument. '
'Default is 2.')
parser.add_argument('-u', '--umap', nargs='*', help='Use UMAP for visualizing the data. '
'Specify the number of dimensions (2 or 3) as argument. '
'Default is 2.')
parser.add_argument('-c', '--clusterFile', action='store_true', default=False,
help='Use the cluster file ClusterIds.csv to assign clusters if the user specified at least two '
'clusters in the simulation.')
parser.add_argument('-n', '--plotName', default='', nargs='*', help='Name of the plot.')
mallenjon marked this conversation as resolved.
Show resolved Hide resolved

parser.add_option('-p','--pseudoTimeFile',default='',type=str,
help='Specify pseudoTimeFile file name')
# Parse arguments and exit if proper files are not present
args = parser.parse_args()
path = args.pathToFiles
inFile = path + "/ExpressionData.csv"
if not os.path.exists(inFile):
sys.exit('Error: No ExpressionData.csv file is present in the specified path to files.')
timeFile = path + "/PseudoTime.csv"
if not os.path.exists(timeFile):
sys.exit('Error: No PseudoTime.csv file is present in the specified path to files.')
cluster_flag = args.clusterFile
clusterFile = args.pathToFiles + "/ClusterIds.csv"
if cluster_flag and not os.path.exists(clusterFile):
sys.exit('Error: No ClusterIds.csv file is present in the specified path to files.')
pca_flag = args.pca is not None
tsne_flag = args.tsne is not None
umap_flag = args.umap is not None

parser.add_option('-t','--tsne',action='store_true',default=False,
help='Visualized tsne instead of default PCA')
# Do PCA, tSNE, UMAP
DF = pd.read_csv(inFile, sep=',', index_col=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should first check if inFile and timeFile exist. Print an error and exit if they do not.

Cells = DF.T.values
DRDF = pd.DataFrame(index=pd.Index(list(DF.columns)))

(opts, args) = parser.parse_args()
inFile = opts.inFile
tsne_flag = opts.tsne

def dimensional_reduction(method, method_arg):

# Check dimensions
if len(method_arg) == 0:
dim = 2
elif len(method_arg) > 1:
sys.exit('Error: Gave too many values for the number of dimensions. Please specify only a single number of '
'dimensions (2 or 3) for each dimensional reduction method.')
else:
dim = int(method_arg[0])
if dim != 2 and dim != 3:
sys.exit('Error: Specified an invalid number of dimensions. Only 2 and 3 are valid dimensions.')

# Perform dimensional reduction
embed = eval("%s(n_components=%s).fit_transform(Cells)" % (method, dim))
for n in range(dim):
DRDF['%s%d' % (method, n+1)] = embed[:, n]
return dim

DF = pd.read_csv(inFile,sep=',',index_col=0)
Cells = DF.T.values

####################
# Do PCA and tSNE
PC = PCA(n_components=2).fit_transform(Cells)
embed = TSNE(n_components=2).fit_transform(Cells)
####################
ptDF = pd.read_csv(opts.pseudoTimeFile, sep=',', index_col=0)

colors = ptDF.min(axis='columns').values
print(colors)
experiments = set([h.split('_')[0] for h in DF.columns])
PCDF = pd.DataFrame(PC,columns=['PC1','PC2'],index=pd.Index(list(DF.columns)))

PCDF['tsne1'] = embed[:,0]
PCDF['tsne2'] = embed[:,1]

PCDF['time'] = colors
PCDF.to_csv(inFile + '_dimred.txt')

f,ax = plt.subplots(2,1,figsize=(5,10))

colors = [h.split('_')[1] for h in DF.columns]
ax[0].scatter(PCDF['PC1'], PCDF['PC2'], c= PCDF['time'])
ax[0].set_title('PCA')
ax[1].scatter(PCDF['tsne1'], PCDF['tsne2'], c= PCDF['time'])
ax[1].set_title('tSNE')


plt.legend('')
plt.savefig(inFile.split('.csv')[0] + '_dimensionality-reduction.png')
# Do PCA, t-SNE, UMAP
if pca_flag:
pca_dim = dimensional_reduction('PCA', args.pca)
if tsne_flag:
tsne_dim = dimensional_reduction('TSNE', args.tsne)
if umap_flag:
umap_dim = dimensional_reduction('UMAP', args.umap)

# Color preparation

# Prepare time-dependent color scheme
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how we figure out the time corresponding to each cell. What does the ics file contain? Adding a detailed comment here will be helpful for the future.

# To prepare the time-dependent color scheme, the pseudo-time file is read for its maximum value, i.e the simulation
# time. Then, the list of times corresponding to each cell is acquired as a list by splitting the title of each cell
# into its sample number and time slice, and choosing only the time slice value. The time slice values can then be
# scaled by the maximum value such that they are a value in [0,1], and can then be used to map each data point in the
# dimensional reduction by time using a color map.

ptDF = pd.read_csv(timeFile, sep=',', index_col=0)
time_color_scale = max(ptDF["Time"])
time_colors = [int(h.split('_')[1]) / time_color_scale for h in DF.columns]
DRDF['Simulation Time'] = time_colors

# Prepare cluster-dependent color scheme
# Just like the list of times for the time-dependent color scheme, the cluster values for each cell prepared from
# k-means clustering are read for their maximum value. The maximum value is used to scale the list of cluster
# assignments to values in [0,1], which can then be used to map each data point in the dimensional reduction by
# cluster using a color map.

if cluster_flag:
CF = pd.read_csv(clusterFile, sep=',', index_col=0)
cluster_colors_raw = CF['cl'].tolist()
cluster_color_scale = max(CF['cl'])
cluster_colors = [y / cluster_color_scale for y in cluster_colors_raw]
else:
cluster_colors = list(itertools.repeat(.5, len(DF.columns)))
DRDF['Clusters'] = cluster_colors

# Write dimensionality reduction data to text file
DRDF.to_csv(inFile + '_dimred.txt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need different files if the user passes more than one of the pca, tsne, and umap options?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this feature from the original version of the genVis.py script. All the dimensional reduction data for each cell is stored in a row of the same Dimensionality Reduction Data Frame (DRDF). All this line does is write the DRDF to a single file so that the numerical data from the dimensionality reduction(s) can be accessed at a later time if need be. Would you perhaps prefer that the data be saved as a .tsv or a .csv instead?



def subplot_format(f, ax, plot_index, method, dim, map_title, color_map):
plt.rcParams['image.cmap'] = color_map
labels_df = pd.DataFrame()
for m in range(1, dim + 1):
label_list = ['%s%d' % (method, m)]
if method == 'TSNE':
label_list.append('t-SNE %d' % m)
else:
label_list.append('%s %d' % (method, m))
labels_df[str(m)] = label_list
if dim == 3:
ax[plot_index].set_axis_off()
ax[plot_index] = f.add_subplot(1, 2, plot_index+1, projection="3d")
ax[plot_index].scatter3D(DRDF[labels_df.at[0, '1']], DRDF[labels_df.at[0, '2']], DRDF[labels_df.at[0, '2']],
c=DRDF[map_title])
ax[plot_index].set_zlabel(labels_df.at[1, '3'])
else:
ax[plot_index].scatter(DRDF[labels_df.at[0, '1']], DRDF[labels_df.at[0, '2']], c=DRDF[map_title])
ax[plot_index].set_xlabel(labels_df.at[1, '1'])
ax[plot_index].set_ylabel(labels_df.at[1, '2'])
ax[plot_index].set_aspect('auto')
ax[plot_index].set_title(map_title)


def make_subplot(method, dim):
plot_title = ' '.join(args.plotTitle)
f, ax = plt.subplots(1, 2, figsize=(10, 5))

# Plot each cell in the dimensional reduction and map by simulation time using a color map.
subplot_format(f, ax, 0, method, dim, 'Simulation Time', 'viridis')

# Plot each cell in the dimensional reduction and map by cluster using a color map.
subplot_format(f, ax, 1, method, dim, 'Clusters', 'Spectral')

plt.suptitle(plot_title, fontsize=20)


# t-SNE plotting
if tsne_flag:
make_subplot('TSNE', tsne_dim)
plt.savefig(inFile.split('.csv')[0] + '_tSNE_%sd.png' % tsne_dim)

# PCA plotting
if pca_flag:
make_subplot('PCA', pca_dim)
plt.savefig(inFile.split('.csv')[0] + '_PCA_%sd.png' % pca_dim)

# UMAP plotting
if umap_flag:
# make_subplot('UMAP1', 'UMAP 1', 'UMAP2', 'UMAP 2', 'UMAP3', 'UMAP 3', umap_dim)
make_subplot('UMAP', umap_dim)
plt.savefig(inFile.split('.csv')[0] + '_UMAP_%sd.png' % umap_dim)

plt.show()