Skip to content
Domenico edited this page Sep 24, 2018 · 18 revisions

Papillon is a python alternative to cummeRbund to read and plot cuffdiff/Galaxy RNA-seq data.

Installation

Before to start

Functions

Classes

Installation

You can install Papillon using Pypi:

pip install papillon

or Anaconda through conda-forge either installing only papillon:

conda install -c conda-forge papillon

or adding the the conda-forge channel and then installing it:

conda config --add channels conda-forge

conda install papillon

It should work with any IDE for data science (I tested jupyter-notebook and spyder).

Before To start

After RNA-seq analysis with Galaxy,

download the 4 files generated by cuffdiff containing respectively:

... transcript_FPKM_tracking...

... gene_FPKM_tracking ...

... gene_differential_expression...

... transcript_differential_expression...

And put them in the same folder.

You can either use directly the files or change the names according to

cummeRbund in:

... transcript_FPKM_tracking... = isoforms.fpkm_tracking

... gene_FPKM_tracking ... = gene.fpkm_tracking

... gene_differential_expression... = gene_exp.diff

... transcript_differential_expression... = isoform_exp.diff

Functions

read_files(files, path=None, drop_comparison=None)

Accept cuffdiff/cummeRbund files as iterable ("transcript_FPKM_tracking", "gene_FPKM_tracking", "gene_differential_expression", "transcript_differential_expression") and return them to _papillon_builder() to create a Papillon_db object.

Parameters

files - accept an iterable with the cuffdiff files names

path - where to export Papillon generated files

drop_comparison - drop comparison (str) or list of comparisons to drop from the cuffdiff table

Example

pp.read_files(["Files/gene_exp.diff","Files/genes.fpkm_tracking","Files/isoform_exp.diff","Files/isoforms.fpkm_tracking"])


read_folder(path, drop_comparison=None)

Read the folder containing the cuffdiff/cummeRbund files, and return them to _papillon_builder() to create a Papillon_db object.

Parameters

path - accept a str with the folder path, containing the cuffdiff files

drop_comparison - drop comparison (str) or list of comparisons to drop from the cuffdiff table

Example

MyProject=pp.read_folder("MyFolder/Test_files")


read_db(path, drop_comparison=None)

Deprecated. Use read_folder() instead.


Classes

class Papillon_db(builtins.object)

Create a Papillon_db object using read_folder() or read_files() and _papillon_builder

Methods defined here:

__init__(self, path, samples, comparisons, genes_detected, isoforms_detected)

self.path - files path

self.samples - samples found

self.comparisons - comparisons found

self.genes_detected - dataframe of genes detected

self.isoforms_detected - dataframe of isoforms detected

__str__(self)

Return str(self).


change_order(self, new_order)

Change the samples order

Parameters

new_order: list of samples order

Example

MyProject.change_order(["Sample 4","Sample 3","Sample 2","Sample 1"])


drop_comparison(self, comparison)

Drop Comparison (str) or list of comparisons

Parameters

comparison: comparison (str) or list of comparisons

Example

MyProject.drop_comparison(specific_comparison)


get_gene(self, genelist=None, comparison='all', comparison_sign=None, fold_ind=None, fold_sign='>')

This method select genes per name or conditions. It return a Papillon_list object

Parameters

genelist - accept string (1 gene name), list of gene names or file with a list of gene names

comparison - To select genes higher/lower in one condition compared to another. Accept either "all" to pass all the comparisons, or accept only 1 comparison as str (already present in the data)

comparison_sign - usable in combination with comparison, accept either ">" or "<"

fold_ind - fold induction (log2) higher/lower then number

fold_sign - usable in combination with fold_ind, accept either ">" or "<"

Example

Selection=MyProject.get_gene(["CD44","CCL15"], comparison="Sample 1_vs_Sample 2", comparison_sign="<", fold_ind=1, fold_sign="<")


get_isoform(self, genelist=None, comparison='all', comparison_sign=None, fold_ind=None, fold_sign='>')

This function select isoforms. It creates a Papillon object

Parameters

genelist - accept string (1 gene name), list of gene names or file with a list of gene names

comparison - To select genes higher/lower in one condition compared to another. Accept either "all" to pass all the comparisons, or accept only 1 comparison as str (already present in the data)

comparison_sign - usable in combination with comparison, accept either ">" or "<"

fold_ind - fold induction (log2) higher/lower then number

fold_sign - usable in combination with fold_ind, accept either ">" or "<"

Example

Selection=MyProject.get_isoform(["IL6","CCL15","IL17RC"], comparison="Sample 2_vs_Sample 4", comparison_sign="<")


search(self, word, where, how='table', export=False)

search among genes/isoforms names in detected and significant

Parameters

word - accept a str to search among the gene names

where - accept:

"genes_detected"

"genes_significant"

"isoforms_detected"

"isoforms_significant"

how - accept:

"table" return the dataframe with the genes found

"list" return a list of names, no duplicates

export - True/False

Example

search_result=MyProject.search(word="IL6",where="genes_significant", how="table")


class Papillon_list(builtins.object)

Class containing a selected list of genes, with data associated, generated by Papillon_db.get_gene() or Papillon_db.get_isoform() methods

Is possible add the content of two or more Papillon_list objects.

Example:

P_list3 = P_list1+P_list2

P_list4=sum([PL1, PL2, PL3])

Methods defined here:

__add__(self, other)

__init__(self, df, what, comparisons, path, samples, comparison='all', comparison_sign=None, fold_ind=None, fold_sign='>', p=0.05)

Initialize self. See help(type(self)) for accurate signature.

__radd__(self, other)

__str__(self)

Return str(self).


compare(self, other)

Compare two Papillon_list objects

Parameters

other: another Papillon_list object

Example

Selection1.compare(Selection2)


export(self)

Export the selected genes/isoforms as excel file.

Example

Selection.export()


heatmap(self, z_score=False, col_cluster=False, method='complete', cmap='seismic', export=False, **options)

Generate heatmap using selected genes/isoforms

Parameters

z_score - True/False whether want or not apply z-score normalization

col_cluster - True/False whether want or not cluster the samples

method - clustering algorithm - default is complete-linkage

cmap - map color

export - True/False whether want or not export the dataframe of

selected genes

**options - all the options accepted by seaborn.clustermap

default metric is euclidean.

Example

Selection.heatmap(z_score=True,export=True)


lineplot(self, title='', legend=True, z_score=False, export=False, size=10, ci=None, **option)

LinePlot selected genes expression levels. Max number of genes 200

Parameters

title - accept a str as title of the plot

legend - True/False show the legend

z_score - True/False calculate the z-score normalization

export - True/False whether or not export the image

size - change the size of the plot

**options - all the options accepted by seaborn.factorplot

Example

Selection.lineplot(export=True,z_score=True)


onlyFPKM(self, return_as, remove_FPKM_name=False)

Take a Papillon_list object and return only FPKM columns.

Parameters

return as:

"df" - pandas DataFrame

"array" - numpy array

"gene name" - pandas DataFrame containing gene names

remove_FPKM_name: True/False

Example

df=Selection.onlyFPKM("df")


plot(**parameter)

Deprecated. Use self.lineplot() instead


search(self, string)

Search a string in the Papillon_list

Parameters

string - accept a str to search among the gene names

Example

search_results=Selection.search("IL")


select(self, genelist)

Create another Papillon_list object

Parameters

genelist: accept string (1 gene name), list of gene names or file

with a list of gene names

Example

Selection2=Selection1.select(["IL6","CCL15"])


show(self)

Show genes/isoforms as Dataframe

Example

Selection.show()


Last update for v 0.2.0