Releases: reneshbedre/bioinfokit
Releases · reneshbedre/bioinfokit
Bioinformatics data analysis and visualization toolkit
analys.gff.gff_to_gtf
function updated to handle dot value for phase in CDS features- `Breast Cancer Wisconsin (Diagnostic) Data Set added
visuz.stat.roc
function added for visualizing the ROCbartlett
andlevene
function added toanalys.stat
class for checking the ANOVA assumptions
for datasets in stacked formattukey_hsd
function updated for grouping order- Pandas series added as input for
fasta.extract_seq
function extract_seq
function moved tofasta
classextract_seq
function deprecated fromanalys
- visualization for single and multiple statistical bar charts updated for future releases
- Tukey HSD test updated for interaction effect. Pairwise comparison for interaction effect can be calculated.
gff_to_gtf
function updated for the GFF3 file for non-coding RNA transcripts. GFF3 files with non-coding transcripts
(e.g. from miRBase GFF3) can be converted to GTF- genFam enrichment analysis function added (
bioinfokit.analys.genfam.fam_enrich
) - genfam test added
- Tukey HSD test added to perform multiple pairwise comparisons (
bioinfokit.analys.stat.tukey_hsd
) - new option
mrna_feature_name
added inanalys.gff.gff_to_gtf
if the name of the feature (column 3 of GFF3 file) of
protein coding mRNA is other than 'mRNA' or 'transcript' (e.g. some GFF3 file has this feature named as
protein_coding_gene ) dim
option added tovisuz.cluster.screeplot
,visuz.cluster.pcaplot
andvisuz.cluster.biplot
to control the
figure sizeseqcov
moved tofastq
classsra_db
function added underfastq
class for batch download of FASTQ files
from NCBI SRA database- In t-test, the one sample t and paired t-test added
- Two sample t-test switched to class based method
- t-test function name changed to
ttest
fromttsam
- programmatic access to chi-squared independence test dataset added
- boxplot removed from t-test
- 'adjustText' module added in
setup.py
(issue #12) - In chi-squared test, the sum of probabilities is rounded to 10 for exact sum in case of floats
- chi-squared goodness of fit test added under the
stat.chisq
- chi-squared independence test updated for output as class attributes and mosaic plot removed
mergevcf
renamed toconcatvcf
to keep with conventional naming (issue # 9)- programmatic access to chi-squared independence test dataset added
marker.vcf_anot
function updated for tab-delimited text output- The error message for volcano, inverted volcano, and MA plot updated
when there are no significant or non-significant genes (issue # 7) - The
vcf_anot
function output updated for strand information - The manhatten plot updated to add the lables in sorted order for numerical strings
- The manhatten plot updated to add figname option
- TPM normalization function added
Bioinformatics data analysis and visualization toolkit
v0.9 has the following updates and changes (July 28, 2020)
- gene expression raw count normalization class added as 'analys.norm'
- CPM and RPKM normalization function added under 'analys.norm' class
- Sugarcane gene expression dataset added (Bedre et al., 2019)
- In
volcano
, 'ma', andinvolcano
plots, checks for lfc_thr, counts, and pv_thr added - legend labels, position, and figname parameters added in
volcano
plot - utility to check the non-numeric values added for
ma
,volcano
andinvolcano
- plotlegend parameter added to
ma
- the parameter for log fold change threshold lines added in
ma
plot - legend labels, position, and figname parameters added in
ma
plot tsneplot
added for t-SNE visualization- in
bardot
drop NA value function added to ignore missing values to plot dots - scRNA-seq dataset added (PBMC and Arabidopsis root cells)
fasta_reader
andrev_com
moved to newly createdfasta
classtsneplot
andvcf_anot
initialized for future release- more parameters added in
biplot
(cluster coloring, datapoints) figname
added inhmap
ma
function updated for absolute expression countssvg
figures addedpca
function will be deprecated in future release- 2D and 3D loadings plot, biplot and scree plot functions added under the
cluster
class for PCA - programmatic access to iris and cotton dataset added
pca
function will be deprecated in future release
Bioinformatics data analysis and visualization toolkit
v0.8 has the following updates and changes
- GFF3 to GTF file conversion utility added and updated under class
gff
- In Manhatten plot (
visuz.marker.mhat
), the labeling issue withmarkernames
parameter corrected (see issue # 4 on GitHub for details;) gstyle
parameter added in Manhatten plot for box style annotationsplitvcf
function added for splitting VCF file into individual VCF files for each chromosomemergevcf
moved toanalys.marker
classreg_lin
function updated for multiple regression- degree of freedom fixed for t-test for regression coefficients
- VIF calculation for MLR updated
- functions
fastq_reader
andfqreadcounter
moved tofastq
class
Bioinformatics data analysis and visualization toolkit
v0.7 has the following updates and changes
split_fastq
function added for splitting individual (left and right) paired-end fastq files from single
interleaved paired-end file- GFF3 to GTF file conversion utility added under class
gff
- two-sample and Welch's t-test updated for CI and alpha parameter added
- module termcolor removed
- Programmatic access of dataset for
ttsam
added
Bioinformatics data analysis and visualization toolkit
v0.6 has the following updates and changes
- Programmatic access of dataset added (class
get_data
) - More features for figures added (
figtype
,axtickfontsize
,axtickfontname
,axxlabel
,axylabel
,xlm
,ylm
,
yerrlw
,yerrcw
) - In volcano plot, the typo for xlabel corrected (-log2(FoldChange) to log2(FoldChange))
help
will be deprecated in future release- VIF calculation for MLR updated
- adjustText removed
Bioinformatics data analysis and visualization toolkit
v0.5
v0.5 has the following updates and changes
- Linear regression analysis added in
analys.stat
class volcano
,involcano
,ma
andheatmap
functions moved to newvisuz.gen_exp
class- In
volcano
, parameters for new box type labeling and threshold grid lines added corr_mat
updated for new colormaps and moved to stat class- To visualize the graph in the console itself (e.g. Jupyter notebook), show parameter added
- Pandas dataframe input added for
volcano
,involcano
,corr_mat
,ma
,ttsam
, andchisq
ttsam
andchisq
moved toanalys.stat
class- graph control parameters added for
volcano
,involcano
,ma
, andheatmap
- documentation can also be accessed at https://reneshbedre.github.io/blog/howtoinstall.html
help
will be deprecated in a future release- fixed the NumPy bug in
visuz.stat.bardot
. Theint
cast added to generate the number of samples, which does not accept
float (See details of NumPy bug: numpy/numpy#15345)
Bioinformatics data analysis and visualization toolkit
v0.4 has the following updates and changes
function analyis.format.fq_qual_var() added for detecting the FASTQ quality encoding format
help module added command-line help message
class fastq added for FASTQ related functions
Bioinformatics data analysis and visualization toolkit
v0.3 has the following updates and changes
- bar-dot plot function added
- command-line help message class added