-
Notifications
You must be signed in to change notification settings - Fork 1
Manual
Papillon is a python alternative to cummeRbund to read and plot cuffdiff/Galaxy RNA-seq data.
You can install Papillon using Pypi:
pip install papillon
or Anaconda through conda-forge either installing only papillon:
conda install -c conda-forge papillon
or adding the the conda-forge channel and then installing it:
conda config --add channels conda-forge
conda install papillon
It should work with any IDE for data science (I tested jupyter-notebook and spyder).
After RNA-seq analysis with Galaxy,
download the 4 files generated by cuffdiff containing respectively:
... transcript_FPKM_tracking...
... gene_FPKM_tracking ...
... gene_differential_expression...
... transcript_differential_expression...
And put them in the same folder.
You can either use directly the files or change the names according to
cummeRbund in:
... transcript_FPKM_tracking... = isoforms.fpkm_tracking
... gene_FPKM_tracking ... = gene.fpkm_tracking
... gene_differential_expression... = gene_exp.diff
... transcript_differential_expression... = isoform_exp.diff
read_files(files, path=None, drop_comparison=None)
Accept cuffdiff/cummeRbund files as iterable ("transcript_FPKM_tracking", "gene_FPKM_tracking", "gene_differential_expression", "transcript_differential_expression") and return them to _papillon_builder() to create a Papillon_db object.
Parameters
files - accept an iterable with the cuffdiff files names
path - where to export Papillon generated files
drop_comparison - drop comparison (str) or list of comparisons to drop from the cuffdiff table
Example
pp.read_files(["Files/gene_exp.diff","Files/genes.fpkm_tracking","Files/isoform_exp.diff","Files/isoforms.fpkm_tracking"])
read_folder(path, drop_comparison=None)
Read the folder containing the cuffdiff/cummeRbund files, and return them to _papillon_builder() to create a Papillon_db object.
Parameters
path - accept a str with the folder path, containing the cuffdiff files
drop_comparison - drop comparison (str) or list of comparisons to drop from the cuffdiff table
Example
MyProject=pp.read_folder("MyFolder/Test_files")
read_db(path, drop_comparison=None)
Deprecated. Use read_folder() instead.
Create a Papillon_db object using read_folder() or read_files() and _papillon_builder
__init__(self, path, samples, comparisons, genes_detected, isoforms_detected)
self.path - files path
self.samples - samples found
self.comparisons - comparisons found
self.genes_detected - dataframe of genes detected
self.isoforms_detected - dataframe of isoforms detected
__str__(self)
Return str(self).
change_order(self, new_order)
Change the samples order
Parameters
new_order: list of samples order
Example
MyProject.change_order(["Sample 4","Sample 3","Sample 2","Sample 1"])
drop_comparison(self, comparison)
Drop Comparison (str) or list of comparisons
Parameters
comparison: comparison (str) or list of comparisons
Example
MyProject.drop_comparison(specific_comparison)
get_gene(self, genelist=None, comparison='all', comparison_sign=None, fold_ind=None, fold_sign='>')
This method select genes per name or conditions. It return a Papillon_list object
Parameters
genelist - accept string (1 gene name), list of gene names or file with a list of gene names
comparison - To select genes higher/lower in one condition compared to another. Accept either "all" to pass all the comparisons, or accept only 1 comparison as str (already present in the data)
comparison_sign - usable in combination with comparison, accept either ">" or "<"
fold_ind - fold induction (log2) higher/lower then number
fold_sign - usable in combination with fold_ind, accept either ">" or "<"
Example
Selection=MyProject.get_gene(["CD44","CCL15"], comparison="Sample 1_vs_Sample 2", comparison_sign="<", fold_ind=1, fold_sign="<")
get_isoform(self, genelist=None, comparison='all', comparison_sign=None, fold_ind=None, fold_sign='>')
This function select isoforms. It creates a Papillon object
Parameters
genelist - accept string (1 gene name), list of gene names or file with a list of gene names
comparison - To select genes higher/lower in one condition compared to another. Accept either "all" to pass all the comparisons, or accept only 1 comparison as str (already present in the data)
comparison_sign - usable in combination with comparison, accept either ">" or "<"
fold_ind - fold induction (log2) higher/lower then number
fold_sign - usable in combination with fold_ind, accept either ">" or "<"
Example
Selection=MyProject.get_isoform(["IL6","CCL15","IL17RC"], comparison="Sample 2_vs_Sample 4", comparison_sign="<")
search(self, word, where, how='table', export=False)
search among genes/isoforms names in detected and significant
Parameters
word - accept a str to search among the gene names
where - accept:
"genes_detected"
"genes_significant"
"isoforms_detected"
"isoforms_significant"
how - accept:
"table" return the dataframe with the genes found
"list" return a list of names, no duplicates
export - True/False
Example
search_result=MyProject.search(word="IL6",where="genes_significant", how="table")
Class containing a selected list of genes, with data associated, generated by Papillon_db.get_gene() or Papillon_db.get_isoform() methods
Is possible add the content of two or more Papillon_list objects.
Example:
P_list3 = P_list1+P_list2
P_list4=sum([PL1, PL2, PL3])
__add__(self, other)
__init__(self, df, what, comparisons, path, samples, comparison='all', comparison_sign=None, fold_ind=None, fold_sign='>', p=0.05)
Initialize self. See help(type(self)) for accurate signature.
__radd__(self, other)
__str__(self)
Return str(self).
compare(self, other)
Compare two Papillon_list objects
Parameters
other: another Papillon_list object
Example
Selection1.compare(Selection2)
export(self)
Export the selected genes/isoforms as excel file.
Example
Selection.export()
heatmap(self, z_score=False, col_cluster=False, method='complete', cmap='seismic', export=False, **options)
Generate heatmap using selected genes/isoforms
Parameters
z_score - True/False whether want or not apply z-score normalization
col_cluster - True/False whether want or not cluster the samples
method - clustering algorithm - default is complete-linkage
cmap - map color
export - True/False whether want or not export the dataframe of
selected genes
**options - all the options accepted by seaborn.clustermap
default metric is euclidean.
Example
Selection.heatmap(z_score=True,export=True)
lineplot(self, title='', legend=True, z_score=False, export=False, size=10, ci=None, **option)
LinePlot selected genes expression levels. Max number of genes 200
Parameters
title - accept a str as title of the plot
legend - True/False show the legend
z_score - True/False calculate the z-score normalization
export - True/False whether or not export the image
size - change the size of the plot
**options - all the options accepted by seaborn.factorplot
Example
Selection.lineplot(export=True,z_score=True)
onlyFPKM(self, return_as, remove_FPKM_name=False)
Take a Papillon_list object and return only FPKM columns.
Parameters
return as:
"df" - pandas DataFrame
"array" - numpy array
"gene name" - pandas DataFrame containing gene names
remove_FPKM_name: True/False
Example
df=Selection.onlyFPKM("df")
plot(**parameter)
Deprecated. Use self.lineplot() instead
search(self, string)
Search a string in the Papillon_list
Parameters
string - accept a str to search among the gene names
Example
search_results=Selection.search("IL")
select(self, genelist)
Create another Papillon_list object
Parameters
genelist: accept string (1 gene name), list of gene names or file
with a list of gene names
Example
Selection2=Selection1.select(["IL6","CCL15"])
show(self)
Show genes/isoforms as Dataframe
Example
Selection.show()
Last update for v 0.2.0