- Add
--read_arguments_from_file
tosplit_libraries_fastq.py
, thus preventingmultiple_split_libraries_fastq.py
from failing with anArgument list too long error
when the number of input files is large, see #2069. - Fixed bug in start_parallel_jobs_slurm.py, which would cause jobs to not run if
slurm_memory
was specified inqiime_config
.
- Critical: Updated minimum required version of the qiime-default-reference package to 0.1.2. This release includes an important bug fix described in more detail in this QIIME blog post and in biocore/qiime-default-reference#14.
- Critical: Fixed bug in
differential_abundance.py
fitZIG algorithm (#1960). This was a serious bug that was encountered when users would calldifferential_abundance.py -a metagenomeSeq_fitZIG
. Any results previosuly generated with that command should be re-run. - Critical: Fixed bug in
observation_metadata_correlation.py
, described in #2009. All previous output generated withobservation_metadata_correlation.py
was incorrect, and analyses using those results should be re-run. This most commonly would have resulted in massive Type 2 error (false negatives), where observations whose abundance is correlated with metadata are not reported, though Type 1 error (false positives) are also possible. count_seqs.py
no longer fails on empty files. #1991- Updated minimum required version of biom-format package to 2.1.4. This is a bug fix release. Details are available in the biom-format ChangeLog.
- Updated minimum required version of Emperor package to 0.9.51.
- Forced BIOM table type to "OTU table" for all tables written with QIIME. This fixes #1928.
- The
--similarity
option inpick_otus.py
now only accepts sequence similarity thresholds between 0.0 and 1.0 (inclusive). Previous behavior would allow values outside this range, which would cause uninformative error messages to be raised by the external tools thatpick_otus.py
wraps (#1979). split_libraries_fastq.py
now explicitly disallows-p 0
. This could lead to empty sequences being written to the resulting output file (#1984).- Fixed issued where
filter_samples_from_otu_table.py
could only filter the mapping file when--valid_states
was passed as the filtering method (#2003). - Fixed bug where distance matrix files generated by QIIME (e.g., using
beta_diversity.py
) could have diagonals with values that were close to zero in rare cases (depending on input data, machine architecture, installed dependencies, etc.). These files could not be loaded by QIIME scripts that accepted distance matrix files as input (e.g.,principal_coordinates.py
) and would result in an error message stating that the distance matrix was not hollow. Values on the diagonal that are close to zero are now set to 0.0 (#1933).
- Removed parallel PyNAST
formatdb
step (#1989). The formatted database wasn't actually being used, this step was just left over from when BLAST was required by PyNAST. count_seqs.py
can now count records in fastq files that have the.fq
extenstion. This previously was only possible for fastq files that have the.fastq
extension.- If
temp_dir
is not defined in the QIIME config file, QIIME will use the system's default temporary directory instead of assuming that/tmp
is present and writeable. Note that the location of this default temporary directory can be changed with environment variables (#1995). - Improve error reporting from
filter_taxa_from_otu_table.py
,filter_otus_from_otu_table.py
, andfilter_samples_from_otu_table.py
when all OTUs/samples are filtered out resulting in an empty table (#1963), and generally when attempting to write an empty BIOM table from QIIME. - Added ability to pass user-defined runtime limit for jobs to
start_parallel_jobs_slurm.py
. This can be achieved by setting theslurm_time
variable inqiime_config
, or by passing--time
tostart_parallel_jobs_slurm.py
. - Distances matrices and UPGMA trees generated from the full (unrarefied) OTU table are now stored under
unrarefied_bdiv
in the output directory fromjackknifed_beta_diversity.py
. That UPGMA tree is optionally used (if the user passes--master_tree full
). This change makes their content more explicit so they're less likely to be used by accident (#2024).
observation_metadata_correlation.py
: Allows the calculation of correlations between feature abundances and continuous-valued metadata. This script replaces the continuous-valued correlation functionality that was inotu_category_significance.py
in QIIME 1.7.0 and earlier.compare_trajectories.py
: Allows analysis of volatility using different algorithms.compute_taxonomy_ratios.py
: Implements the microbial dysbiosis index (MD-index) from Gevers et al 2014.collapse_samples.py
: Allows collapsing groups of samples in BIOM tables and mapping files based on their metadata (see #1678). This can be used, for example, to collapse samples belonging to a replicate group. This also has replacedsummarize_otu_by_cat.py
(see discussion on #1798).multiple_split_libraries_fastq.py
,multiple_join_paired_ends.py
, andmultiple_extract_barcodes.py
: Facilitate initial QIIME processing of already-demultiplexed fastq files, as these are commonly being provided by sequencing centers.differential_abundance.py
: Supplementsgroup_significance.py
to support metagenomeSeq's fitZIG algorithm and DESeq2's negative binomial algorithm. The input for this is an unnormalized, raw BIOM table.normalize_table.py
: Adds support for BIOM table normalization algorithms in addition to rarefaction. Supported methods are metagenomeSeq's CSS and DESeq's variance stabilizing transformation.start_parallel_jobs_slurm.py
: Allows for parallel job submission using slurm.split_libraries_lea_seq.py
: Allows for demultiplexing of sequences using the LEA-Seq protocol, described in Faith et al. (2013). This script should be considered to be in beta testing status.extract_reads_from_interleaved_file.py
: Splits an interleaved FASTQ file (like the ones produced by JGI) into forward and reverse reads. See this section of the Illumina data preparation tutorial for more details.parallel_pick_otus_sortmerna.py
: Perform parallel OTU picking with SortMeRNA (Kopylova et al. (2012).
split_otu_table.py
now allows multiple fields to be passed to split a biom table, and optionally a mapping file. Check out the new documentation for the naming conventions (which have changed slightly) and an example.- Added new options to
make_otu_heatmap.py
: --color_scheme
, which allows users to choose from different color schemes here--observation_metadata_category
, which allows users to select a column other than taxonomy to use when labeling the rows--observation_metadata_level
, which allows the user to specify which level in the hierarchical metadata category to use in creating the row labels.-g
/--imagetype
,--dpi
,--width
, and--height
, which offer more control over the generation of heatmap figures.-m/--mapping_fps
is no longer required for split_libraries_fastq.py. The mapping file is not required when running with--barcode_type 'not-barcoded'
,but the mapping file would fail to validate when passing multiple sequence files and sample ids but a mapping file without barcodes (see #1400).- Added alphabetical sorting option (based on boxplot labels) to
make_distance_boxplots.py
. Sorting by boxplot median can now be performed by passing--sort median
(this was previously invoked by passing--sort
). Sorting alphabetically can be performed by passing--sort alphabetical
. - Scripts that write an OTU table will now write BIOM files in HDF5 format if HDF5 is installed. This improves performance for very large OTU tables.
merge_mapping_files.py
can now take an argument to convert the header names to upper case, so it will merge for example a category namedtreatment
and another one namedTREATMENT
from two different mapping files.- The script
make_distance_histograms.py
has been removed. This functionality should be accessed throughmake_distance_boxplots.py
. - Beta support has been added for performing OTU picking with open source software:
- subsampled open reference OTU picking using SortMeRNA (Kopylova et al. (2012) (for the closed-reference steps) and SumaClust (for the open reference steps). This can be accessed with
pick_open_reference_otus.py -m sortmerna_sumaclust
. - closed-reference OTU picking using SortMeRNA (Kopylova et al. (2012). This can be accessed with
pick_closed_reference_otus.py -p params.txt
where params.txt includes the linepick_otus:otu_picking_method sortmerna
. - de novo OTU picking using SumaClust or swarm (Mahe et al. (2014)). This can be accessed with
pick_de_novo_otus.py -p params.txt
where params.txt includes the linepick_otus:otu_picking_method sumaclust
orpick_otus:otu_picking_method swarm
. - sumaclust v1.0.00, swarm 1.2.19, and sortmerna 2.0 are now optional dependencies (see the QIIME install docs for details).
- Renamed
split_fasta_on_sample_ids_to_files.py
tosplit_sequence_file_on_sample_ids.py
, which now supports splitting FASTQ files, as well. Added a parameter,--file_type
, which is used to specify the type of the input file. - Added
--assign_taxonomy
option topick_closed_reference_otus.py
to allow taxonomy assignment using a classifier, rather than the default of using the taxonomic assignment of the cluster centroid. - Added
--suppress_taxonomy_assignment
option topick_closed_reference_otus.py
. - Updated output of
identify_paired_differences.py
to include more information in the pseudo-mapping file that it generates. This includes the "pre" and "post" values for all of the analysis categories on a per-subject basis. This is useful for plotting with other tools, or for generating legends for the plots that are currently generated by the script (see issue #1707). - Added
pick_otus_reference_seqs_fp
to the QIIME config file. This is a filepath to reference sequences to use with QIIME's OTU picking scripts/workflows. See the QIIME config docs and #1696 for more details. - The QIIME config settings
assign_taxonomy_id_to_taxonomy_fp
,assign_taxonomy_reference_seqs_fp
,pick_otus_reference_seqs_fp
, andpynast_template_alignment_fp
now default to reference data files in the qiime-default-reference project. - Installing QIIME via
pip install qiime
now works out-of-the-box by providing a functioning QIIME minimal (base) install (see #1696). cluster_jobs_fp
in the QIIME config file now defaults tostart_parallel_jobs.py
.seconds_to_sleep
now defaults to 1.- Added
--negate_sample_id_fp
option tofilter_samples_from_otu_table.py
(see #1117). - Added
--percent_variation_below_one
flag tomake_2d_plots.py
for when the percent variation is actually below 1 and not a relative measure. - The default confidence threshold for the Naive Bayes taxonomy assigners (RDP Classifier and mothur) is now
0.50
, as recommended by the RDP Classifier developers for partial sequences.
- Simplified and improved QIIME install documentation.
- Errors raised by scripts are easier to read and include a supplementary message on how to get help (see #1794).
- QIIME is now easier to install! Removed
qiime_scripts_dir
,python_exe_fp
,working_dir
,cloud_environment
, andtemplate_alignment_lanemask_fp
from the QIIME config file. If these values are present in your QIIME config file, they will be flagged as unrecognized byprint_qiime_config.py -t
and will be ignored by QIIME. QIIME will now use thepython
executable and QIIME scripts that are found in yourPATH
environment variable, andtemp_dir
will be used in place ofworking_dir
(this value was used by some parts of parallel QIIME previously).filter_alignment.py
will now use the 16S alignment Lane mask (Lane, D.J. 1991) by default if one is not provided via--lane_mask_fp
. --tail_type
option incompare_distance_matrices.py
now accepts "two-sided" instead of "two sided" for specifying a two-sided alternative hypothesis. The new name is easier to specify via the command-line (quotes aren't needed because it is a single word).print_qiime_config.py -t
now tests a QIIME minimal (base) install instead of a QIIME full install.print_qiime_config.py -tf
tests a QIIME full install.- Standardized use of underscores in option longnames. Affected scripts and options:
scripts/demultiplex_fasta.py
start-numbering-at
is nowstart_numbering_at
scripts/denoiser.py
low_cut-off
is nowlow_cut_off
high_cut-off
is nowhigh_cut_off
scripts/multiple_rarefactions.py
num-reps
is nownum_reps
scripts/multiple_rarefactions_even_depth.py
num-reps
is nownum_reps
scripts/parallel_multiple_rarefactions.py
num-reps
is nownum_reps
scripts/plot_rank_abundance_graph.py
no-legend
is nowno_legend
scripts/split_libraries.py
min-seq-length
is nowmin_seq_length
max-seq-length
is nowmax_seq_length
trim-seq-length
is nowtrim_seq_length
min-qual-score
is nowmin_qual_score
keep-primer
is nowkeep_primer
keep-barcode
is nowkeep_barcode
max-ambig
is nowmax_ambig
max-homopolymer
is nowmax_homopolymer
max-primer-mismatch
is nowmax_primer_mismatch
barcode-type
is nowbarcode_type
dir-prefix
is nowdir_prefix
max-barcode-errors
is nowmax_barcode_errors
start-numbering-at
is nowstart_numbering_at
- Removed
--output_dir
optional option frommake_otu_heatmap.py
and replaced it with the required option--output_fp
. - The parameters
--uclust_min_consensus_fraction
and--uclust_similarity
in*_assign_taxonomy_*
scripts have been changed to--min_consensus_fraction
and--similarity
since both of these parameters apply to the SortMeRNA taxon assigner as well. - Several changes were made to
alpha_diversity.py
metric names:ACE
is nowace
chao1_confidence
is nowchao1_ci
- Added
observed_otus
, which is equivalent toobserved_species
but is generally a more accurate name.observed_species
is retained for backward-compatibility.
- SortMeRNA 2.0, SUMACLUST 1.0.00, and swarm 1.2.19 are now installed automatically when QIIME is installed (e.g., via
pip install qiime
).
- Relaxed sanity tests for
compare_categories.py --method adonis
so that unique values are only checked for categories that are non-numeric (see issue #1316). core_diversity_analyses.py
now requires--tree_fp
unless--nonphylogenetic_diversity
is passed (see #1671).- Fixed bug in
assign_taxonomy.py -m blast
andparallel_assign_taxonomy_blast.py
that prevented multiple instances of either to run at the same time (see #1768). - Fixed bug where
--phred_offset
insplit_libraries_fastq.py
was ignored (see #1656). - Spaces in taxa will not cause an error when using
--assignment_method=mothur
inassign_taxonomy.py
. - Fixed bug where long axis labels were cut off in heatmaps generated by
make_otu_heatmap.py
(see #1571). - Fixed bug where
-S
/--suppress_submit_jobs
was being ignored by several of the parallel scripts (e.g.parallel_pick_otus_uclust_ref.py
) (see #1665). - Fixed bug where
make_distance_comparison_plots.py
would create empty groups (see #1627). qiime/workflow/pick_open_reference_otus.py
no longer copies the permission bits of the reference file which caused a file permission failure in some cases.- Fixed bug in
make_rarefaction_plots.py
where--generate_per_sample_plots
wasn't working (see #1475). - Fixed bug that resulted in samples being mislabeled in
make_otu_heatmap.py
when one of the following options was passed:--category
,--map_fname
,--sample_tree
, or--suppress_column_clustering
. This is discussed in #1790.
- Removed
-Y
/--python_exe_fp
and-N
options fromparallel_merge_otu_tables.py
script as these are not available in any of the other parallel QIIME scripts and we do not have good reason to support them (see QIIME 1.6.0 release notes below for more details). - Removed
insert_seqs_into_tree.py
. This code needs additional testing and documentation, and was not widely used. We plan to add this support back in the future, and progress on that can be followed on #1499. summarize_otu_by_cat.py
has been replaced withcollapse_samples.py
.- Removed options
-c
/--ci_type
,-a
/--alpha
, and-f
/--f_ratio
fromconditional_uncovered_probability.py
as these weren't being used by the script (i.e., supplying different values didn't change the computed CIs because the default were always used). - Removed
tax2tree
as a method inassign_taxonomy.py
. - Fasttree v1.x is no longer supported by
make_phylogeny.py
(see issue #1516). - Removed
submit_to_mgrast.py
script (see #1780). - Removed
make_otu_heatmap_html.py
in favor ofmake_otu_heatmap.py
(see discussion on #1724). - Removed
-m
/--include_html_counts
option from theplot_taxa_summary.py
script as the behavior was no longer useful or accurate.
- Changed default parameters for uclust-based OTU picking:
max_accepts
is now 1 (was 20),max_rejects
is now 8 (was 500),stepwords
is now 8 (was 20), andword_length
is now 8 (was 12). These changes greatly reduce runtime, with minimal effect on the results. See Rideout et al., 2014 (PeerJ pre-print) for more details. - Disabled the prefilter by default in
pick_open_reference_otus.py
. This change greatly reduces runtime, with minimal effect on the results. See Rideout et al., 2014 (PeerJ pre-print) for more details. - The alpha diversity measures available in QIIME (e.g.,
alpha_diversity.py
) are now powered by scikit-bio, and several of these methods are now considerably faster! See the scikit-bio docs on alpha diversity for more details on the methods. - ANOSIM and PERMANOVA (available in
compare_categories.py
) are now powered by scikit-bio and are approximately 1000 times faster than previous implementations. These additionally now provide more useful information in the output file. See the scikit-bio docs for ANOSIM and PERMANOVA for more detail. - Renamed
compare_categories.py
's BEST method to BIO-ENV to match the name used in R's vegan package (vegan::bioenv
) and the name of the program in the original paper. Usecompare_categories.py --method bioenv
instead ofcompare_categories.py --method best
. The underlying implementation has also been rewritten and is considerably faster than before, and the output more closely matches the vegan package, as environmental variables are now scaled before computing Euclidean distances. See the scikit-bio docs for BIO-ENV for more detail. - The Mantel test (
--method mantel
) and Mantel correlogram (--method mantel_corr)
incompare_distance_matrices.py
are considerably faster than previous implementations. See the scikit-bio docs for Mantel for more detail.
- New script, extract_barcodes.py, and associated tutorial added to support alternative illumina barcoding schemes.
- Added script join_paired_ends.py, which supports joining of overlapping paired-end reads in fastq files. This wraps fastq-join and SeqPrep.
- extract_barcodes.py script added-this script is intended to help process fastq data that is not in a compatible format with split_libraries_fastq.py.
- otu_category_significance.py has been removed in favor of a new script called
group_significance.py
which has significantly more functionality. - map_reads_to_reference.py has a new parameter,
--genetic_code
, which can be used to specify which genetic code should be used when doing translated searches (from nucleotide sequences against a protein database). Genetic codes are specified numerically, corresponding to the genetic codes detailed on the NCBI page here - core_diversity_analysis.py has a new parameter,
--recover_from_failure
, that allows the user to re-run on an existing output directory and will only re-run analyses that haven't already been run. This additionally allows the user to add additional categories to a previous run, which is very common and previously required a full re-run. - Added new script,
estimate_observation_richness.py
, which implements some of the interpolation and extrapolation richness estimators in Colwell et al. (2012), Journal of Plant Ecology. IMPORTANT: This script should be considered beta software; it is currently an experimental feature in QIIME. - QIIME now depends on qcli 0.1.0, a stand-alone package which performs command line interface parsing and testing.
- make_qiime_rst_file.py has been removed in favor of qcli_make_rst.
- transform_coordinate_matrices.py can now take more than two input coordinate matrices. When used this way, the first coordinate matrix will be treated as the reference, and the 2nd through nth will be compared against that reference. The output file names, which were all previously hard-coded, are now generated on the fly for clarity of the results.
- split_libraries_fastq.py can now handle per-sample, non-barcoded fastq files. Some sequencing centers are now providing data in this way - if this becomes more common, we'll want to make this more convenient, but for now it's possible.
- Added a parallel merge OTUs method that will combine OTU tables in parallel where possible.
- Added identify_paired_differences.py to support paired difference (i.e., Pre/Post) testing as discussed in issue #1040.
- Added new taxonomic assignment method,
qiime.assign_taxonomy.UclustConsensusTaxonAssigner
. This is accessible throughassign_taxonomy.py -m uclust
,parallel_assign_taxonomy_uclust.py
,pick_de_novo_otus.py
andpick_open_reference_otus.py
. This is being tested as an alternative to QIIME's existing taxonomic assignment methods. - Refactored beta_diversity_though_plots.py, jackknifed_beta_diversity.py, and core_diversity_analyses.py workflows to generate emperor PCoA plots instead of KiNG PCoA plots. QIIME now depends on Emperor 0.9.3. One interface change that will be noticeable to users is that the output PCoA plots from these workflows are no longer separated into "continuous" and "discrete" directories. Users can make these color choices from within emperor, so only one PCoA plot is necessary. This refactoring also involved script interface changes to beta_diversity_through_plots.py, which no longer generates 2d plots (interested users can call make_2d_plots.py directly - these won't be needed as often, since we no longer have a Java dependency) or distance histograms (these data are better accessed through make_distance_boxplots.py, which is better written and tested, though users can still call make_distance_histograms.py directly). As a result, beta_diversity_through_plots.py no longer takes the --suppress_2d_plots, --suppress_3d_plots, or --histogram_categories parameters, and now takes a new --suppress_emperor_plots parameter which can be used to disable PCoA plotting.
- Modified compare_alpha_diversity.py to generate box plots in addition to statistics, and added the ability to pass multiple categories (instead of just a single category) on the command line. Also fixed issue where options contain
dest
parameter, and therefore could have a different name then their longform parameter name. This involves several script interface changes: the --category option is now called --categories; script now takes --output_dir instead of --output_fp (because multiple files can be created, instead of just a single file); --alpha_diversity_filepath is now --alpha_diversity_fp; and --mapping_filepath is now --mapping_fp. - Refactored make_rarefaction_plots.py to add options --generate_per_sample_plots and --generate_average_tables. These are now suppressed by default to reduce run time and size of output.
- Refactored alpha_rarefaction.py to add option --retain_intermediate_files. Rarefied BIOM tables and alpha diversity results for each rarefied BIOM table are now removed by default to reduce size of output.
- Update to rtax 0.984.
- Required PyNAST version is now 1.2.2.
- Updated default taxonomy assigner to be the new uclust-based consensus taxonomy assigner. This was shown to be more accurate and faster than the existing methods in Bokulich, Rideout et al. (submitted).
- Renamed check_id_map.py to validate_mapping_file.py for clarity
- Change short option names in summarize_otu_by_cat.py to be consistent with other scripts.
- Increased default rdp_max_memory from 1500M to 4000M as this was almost always needing to be increased when re-training on modern reference databases.
- Required biom-format version is now 1.3.1.
- convert_unifrac_sample_mapping_to_otu_table.py and convert_otu_table_to_unifrac_sample_mapping.py have been moved to the FastUnifrac repo (https://github.com/qiime/FastUnifrac)
- Required matplotlib version is now >= 1.1.0, <= 1.3.1.
- Required numpy version is now >= 1.5.1, <= 1.7.1.
- QIIME has been added to PyPi and can be installed using
pip
.
- Required biom-format version is now 1.1.2.
- core_qiime_analyses.py has been replaced with core_diversity_analyses.py. This follows a re-factoring to support only "downstream" analyses (i.e., starting with a BIOM table). This makes the script more widely applicable as it's now general to any BIOM data and/or different OTU picking strategies.
- Added support for usearch v6.1 OTU picking and chimera checking. This is in addition to existing support for usearch v5.2.236.
- Added section on using usearch 6.1 chimera checking with
identify_chimeric_seqs.py
to "Chimera checking sequences with QIIME" tutorial. compare_alpha_diversity.py
output now includes average alpha diversity values as well as the comparison p and t vals.compare_distance_matrices.py
has a new option--variable_size_distance_classes
for running Mantel correlogram over distance classes that vary in size (i.e. width) but contain the same number of pairwise distances in each class.qiime.filter.sample_ids_from_category_state_coverage
now supports splitting on a category.- Modified add_qiime_labels.py script to use standard metadata mapping file with a column specified for fasta file names to make more consistent with other scripts.
- otu_category_significance.py now makes better use of the BIOM Table API, addressing a performance issue when using CSMat as the sparse backend.
- Added qiime.group.get_adjacent_distances, which is useful for plotting distances between "adjacent" sample ids in a list provided by the user. This is useful, for example, in plotting distances between adjacent temporal samples in a time series.
- Fixed a bug in make_3d_plots.py related to biplot calculations. This bug would change the placement of taxonomic groups based on how many taxa were included in the biplot analysis. Examples and additional details can be found here: #677.
- Major refactoring of workflow tests and organization of workflow code. The workflow library code and tests have now been split apart into separate files. This makes it a lot more manageable, which will support a more general refactoring of the workflow code in the future to make it easier to develop new workflows. The workflow tests have also been updated to use the new test data described in #582, which is now accessible through
qiime.test. get_test_data()
andqiime.test.get_test_data_fps()
. This provides improved testing of boundary cases in each workflow, as well as more consistent tests across the workflows. - otu_category_significance.py now supports an input directory of BIOM tables, and can write out either a single collated results file or an individual file for every input table in the directory. The -o output_fp is now a required parameter rather than an optional parameter.
- simsam.py now has a -m/--mapping_fp option and writes output to a directory instead of a single file. -n/--num and -d/--dissim now accept a single number or comma-separated list of values.
- supervised_learning.py can now handle input directorys of otu tables, can write a single collated results file if the input directory is of rarefied otu tables, and the -o output fp option is now a required parameter.
- The qiime_test_data repository has been merged into the main qiime repository, which will facilitate development by not requiring users to time pull requests against two repositories. Users will no longer have to specify qiime_test_data_dir in their qiime_config files to include the script usage tests in runs of all_tests.py. all_tests.py will now know how to find qiime_test_data, and will run all of the script usage tests by default.
- pick_reference_otus_through_otu_table.py now outputs otu_table.biom in top-level output directory rather than nested in the otu picking output directory.
- pick_reference_otus_through_otu_table.py has been renamed pick_closed_reference_otus.py (issue #708).
- pick_subsampled_reference_otus_through_otu_table.py has been renamed pick_open_reference_otus.py (issue #708).
- pick_otus_through_otu_table.py has been renamed pick_de_novo_otus.py (issue #708).
- make_distance_comparison_plots.py now supports auto-sizing of distribution plots via --distribution_width (which is the new default) and better handles numeric label types with very large or small ranges (e.g. elevation) by scaling x-axis units to [1, (number of data points)]. --group_spacing has been removed in favor of the new auto-sizing feature.
- per_library_stats.py removed in favor of biom-format's print_biom_table_summary.py.
- Add SourceTracker tutorial, and changed QIIME to depend on SourceTracker 0.9.5 (which is modified to facilitate use with QIIME).
- Moran's I (in compare_categories.py) now supports identical samples (i.e. zeros in the distance matrix that aren't on the diagonal).
- summarize_taxa.py now outputs taxa summary tables in both classic (TSV) and BIOM formats by default. This will allow taxa summary tables to be used with other QIIME scripts that expect BIOM files as input. This change is the first step towards adding full support for BIOM taxon tables in QIIME. summarize_taxa.py also has two new options: --suppress_classic_table_output and --supress_biom_table_output.
- make_distance_boxplots.py and make_distance_comparison_plots.py now explicitly state the alternative hypothesis used in the t-tests.
- parallel_blast.py now has a different option for providing a blast db (--blast_db). This implies that the current --refseqs_path should be used only for providing a fasta file of reference sequences. The --suppress_format_blastdb option has been removed since it is no longer needed.
- Added
filter_taxa_from_otu_table.py
to support filtering OTUs with (or without) specific taxonomy assignments from an OTU table. - Added parameters to
pick_subsampled_reference_otus_through_otu_table.py
to suppress taxonomy assignment (--suppress_taxonomy_assignment
), and alignment and tree building steps (--suppress_align_and_tree
). These are useful for cases where a taxonomy may not exist for the reference collection (not too common) or when the region doesn't work well for phylogenetic reconstruction (e.g., fungal ITS). Additionally fixed a bug where alternateassign_taxonomy
parameters provided in the parameters file would be ignored when running in parallel. - Detrending of quadratic curvature in ordination coordinates now a feature of QIIME. This approach was used in Harris JK, et al. "Phylogenetic stratigraphy in the Guerrero Negro hypersaline microbial mat.".
- Supervised learning mislabeling output now includes binary "mislabeled" columns at 5%, 10%, ..., 95%, 99%.
- Added tutorial on Fungal ITS analysis.
- Added tutorial on predicting mislabeled samples.
- Modified the parameters (de novo chimera detection, reference chimera detection, and size filtering) for USEARCH options with
pick_otus.py
tosuppress_X
andFalse
by default, rather thanTrue
and turned off by calling, to make them more intuitive to use and work better with the workflow scripts. - Added a
simpson_reciprocal
measure of alpha diversity, which is1/D
, following the definition here among other places. Note the measurereciprocal_simpson
is1/simpson
, not1/D
. It was removed for clarity. - Added new script,
compute_core_microbiome.py
, which identifies the core OTUs (i.e., those defined in some user-defined percentage of the samples). - Major refactoring of parallel QIIME. Repetitive code was consolidated into the ParallelWrapper class, which may ultimately move to PyCogent. The only script interface changes are that the
-Y/--python_exe_fp
,-N (serial script filepath)
, and-P/--poller_fp
parameters are no longer available to the user. These were very infrequently (if ever) modified from defaults, so it doesn't make sense to continue to support these. These changes will allow for easier development of new parallel wrappers and facilitate changes to the underlying parallel functionality. - Added new script,
compare_taxa_summaries.py
, and supporting library and test code (qiime/compare_taxa_summaries.py
andtests/test_compare_taxa_summaries.py
) to allow for the comparison of taxa summary files, including sorting and filling, expected, and paired comparisons using pearson or spearman correlation. Added accompanying tutorial (doc/tutorials/taxa_summary_comparison.rst
). - New script for parallel trie otu picker.
- Made
loaddata.r
more robust when making mapping files, distance matrices, etc. compatible with each other. There were rare cases that caused some R functions (e.g.betadisper
) to fail if empty levels were left in the parsed mapping file. - Fixed issue in
ParallelWrapper
class that could have caused a deadlock if run from within a subprocess with pipes. make_distance_boxplots.py
andmake_distance_comparison_plots.py
can now perform Student's two-sample t-tests to determine whether a pair of boxplots/distributions are significantly different (using both parametric and nonparametric Monte Carlo-based tests of significance). These changes include three new options to the two scripts (--tail_type
,--num_permutations
, and--suppress_significance_tests
), as well as a new functionall_pairs_t_test
inqiime.stats
. The accompanying tutorial has also been updated to cover the new statistical tests.- Checks are now in place to prevent asymmetric and non-hollow distance matrices from being used in
make_distance_boxplots.py
,make_distance_comparison_plots.py
,make_distance_histograms.py
,compare_categories.py
, andcompare_distance_matrices.py
. The relevant script help and underlying library code has been documented to warn against their use, and the symmetry checks can be easily disabled if performance becomes an issue in the future. qiime.util.DistanceMatrix
has new methodis_symmetric_and_hollow
.- Added the new Illumina Overview Tutorial which was developed for the ISME 14 Bioinformatics Workshop and added the IPython notebook files that were used in the ISME 14 workshop under the new
examples/ipynb
directory. These can be used by changing to theipynb
directory and runningipython notebook
on a system with IPython and the IPython Notebook dependencies installed. Also moved theqiime_tutorial
directory to the newexamples
folder. - Added support for translated database mapping through
map_reads_to_reference.py
andparallel_map_reads_to_reference.py
and related library code, parallel code, etc. This is analogous to closed-reference OTU picking, but can translate queries so is useful for mapping metagenomic or metatranscriptomic data against databases of functional genes (e.g., KEGG). Currently BLAT and usearch are supported for translated searching. qiime.util.qiime_system_call
now has an optional shell parameter that is passed through tosubprocess.Popen
.- Changed
compare_categories.py
script interface such that--method rda
is no longer supported and must now be--method dbrda
as the method we provide is db-RDA (capscale), not traditional RDA; added the ability to pass the number of permutations (-n
) for PERMDISP and db-RDA (these were previously not supported); updated script documentation, statistical method descriptions, and accompanying tutorial to be of overall better quality and clarity; output filename when method is PERMDISP is nowpermdisp_results.txt
instead ofbetadisper_results.txt
, which is consistent with the rest of the methods; significant refactor of underlying code to be better tested and maintained easier; added better error checking and handling for the types of categories that are accepted by the statistical methods (e.g. checking that categories are numeric if they need to be, making sure categories do not contain all unique values, or a single value); fixed output format for BEST method to be easier to read and consistent with the other methods;qiime.util.MetadataMap
class has a few new utility methods to suppport some of these changes. compare_alpha_diversity.py
now supports both parametric and nonparametric two sample t-tests (nonparametric is the default) with the new optional options-t/--test_type
and-n/--num_permutations
. Also fixed a bug that used the wrong degrees of freedom in the t-tests, yielding incorrect t statistics and p-values, and added correction for multiple comparisons.- Removed tree method
raxml
frommake_phylogeny.py
's choices for-t/--tree_method
. Tree methodraxml_v730
should now be used instead. RAxML v703 is no longer supported. - Minimum PyNAST version requirement upgraded to PyNAST 1.2.
make_distance_boxplots.py
,make_distance_comparison_plots.py
, andmake_distance_histograms.py
now correctly output TSV data files with.txt
extension instead of.xls
(this allows them to be opened easier in programs such as Excel).make_distance_boxplots.py
has a new option--color_individual_within_by_field
that allows the "individual within" boxplots to be optionally colored to indicate their membership in another mapping file field. A legend is also included.- Added
sample_ids_from_category_state_coverage
function toqiime/filter.py
to support filtering of samples based on a subject's category coverage. For example, this function is useful for filtering individuals out of a time series study that do not meet some sort of timepoint coverage criteria. assign_taxonomy.py
now supports assignment with tax2tree version 1.0 and mothur version 1.25.0.- Added new script
load_remote_mapping_file.py
and accompanying tutorial to allow exporting and downloading of mapping files stored as Google Spreadsheets. - Fixed bug in
parallel_assign_taxonomy_blast.py
which would cause the script to hang if a relative path was passed for-o
. - Added the
qiime_test_data
repository which contains example input and output for most QIIME scripts. The individual script documentation was completely refactored so that usage examples correspond to the example input and output files. The basic script testing functionality was removed fromall_tests.py
and replaced with more detailed testing of the scripts based on their usage examples. add_taxa.py
was removed in favor ofadd_metadata.py
(abiom-format
project script). See the new tutorial on adding metadata to BIOM files.- Updated
qiime.util.get_qiime_library_version
to return git commit hash rather than svn revision number (as we're using git for revision control now). - Added java version in output of
print_qiime_config.py
to assist with debugging. - Changed
plot_rank_abundance_graph.py
so-o
specifies the filename of the figure, not the output directory anymore. - Added new script
add_alpha_to_mapping_file.py
which adds alpha diversity data to a mapping file for incorporation in plots, etc. - Moved the QIIME website files from
Qiime/web
to their own GitHub repository: qiime.github.com. - Fixed bug in installation of QIIME Denoiser with setup.py.
supervised_learning.py
now produces mislabeling.txt and cv_probabilities.txt that look like QIIME mapping files, allowing them to be used for coloring points in PCoA plots, etc.- Updated RDP Classifier training code to allow any number of ranks in training files, as long as number of ranks is uniform. This removes the need for special RDP training files in reference OTU collections.
- Added table density and metadata listings to
per_library_stats.py
. - Updates to several dependencies. New dependencies (for those that changed in this release) are: Python 2.7.3; PyCogent 1.5.3; biom-format 1.1.1; PyNAST 1.2; usearch 5.2.236; rtax 0.983; AmpliconNoise 1.27; Greengenes OTUs 12_10; and RDP Classifier 2.2.
- OTU tables are now stored on disk in the BIOM file format (see http://biom-format.org). The BIOM format webpage describes the motivation for the switch, but briefly it will support interoperability of related tools (e.g., QIIME/MG-RAST/mothur/VAMPS), and is a more efficient representation of data/metadata. The biom-format projects DenseTable and SparseTable objects are now used to represent OTU tables in memory. See the convert_biom.py script in the biom-format project for converting between 'classic' and BIOM formatted OTU tables.
- Added a script, add_qiime_labels, that allows users to specify a directory of fasta files, along with a mapping file of SampleIDfasta file name, and combines the fasta files into a single combined fasta file with QIIME compatible labels. This is to handle situations where sequencing centers perform their own proprietary demultiplexing into separate fasta files per sample, instead of supplying raw data, but users would like to use QIIME to analyze their data.
- Added new compare_categories.py script to perform significance testing of categories/sample grouping. Added accompanying tutorial and new RExecutor class to util.py. Methods supported by compare_categories.py are Adonis, Anosim, BEST, Moran's I, MRPP, PERMANOVA, PERMDISP, and RDA. See doc/tutorials/category_comparison.rst for details.
- compare_distance_matrices.py can now perform partial Mantel and Mantel correlogram tests in addition to the traditional Mantel test. Additionally, the script has several new options. Added new supporting tutorial and generic statistical method library code (doc/tutorials/distance_matrix_comparison.rst, qiime/stats.py, qiime/compare_distance_matrices.py), and two new classes (DistanceMatrix and MetadataMap) to util.py.
- make_3d_plots.py added a new option "-s" which by default only outputs the unscaled points, whereas user can choose to show scaled, unscaled or both.
- split_libraries_fastq.py default parameters updated based on evaluation of parameter settings on real and mock community data sets. A manuscript describing these results is currently in preparation. Briefly, the -p/--min_per_read_length parameter was modified to take a fraction of the full read length that is acceptable as the minimum, rather than an absolute (integer) length. Additionally the --max_bad_run_length default was changed from 1 to 3.
- check_id_map.py code was completely refactored to increase readability and ease of modification. Now also creates html output to display locations of errors and warnings in the mapping file.
- Altered default value of min_length in align_seqs.py and parallel_align_seqs_pynast.py. This was previously set to 150 based on 454 FLX data, but it is now computed as 75% of the median input sequence length. This will scale better across platforms and read length, and allow for more consistent handling in of data from different sources. The user can still pass --min_length with a specific value to override the default.
- Altered the way split_libraries.py handles errors/warnings from the mapping file, and fixed a bug where suppression of warnings about variable length barcodes was not being properly passed. Now warnings will not cause split_libraries.py to halt execution, although more serious problems (errors) will. These includes problems with headers, SampleIDs, and invalid characters in DNA sequence fields.
- Increased allowed ambiguous bases in split_libraries.py default values from 0 to 6. This is to accommodate the FLX+ long read technology which will often make ambiguous base calls but still have quality sequences following the ambiguous bases. Also added an option to truncate at the first "N" character option (-x) to allow users to retain these sequences but remove ambiguous bases if desired.
- Updated merge_mapping_files.py to support merging of mapping files with overlapping sample ids.
- Added support for CASAVA 1.8.0 quality scores in split_libraries_fastq.py. This involved deprecating the --last_bad_quality_char parameter in favor of --phred_quality_threshold. The latter is now computed from the former on the basis of detecting which version of CASAVA is being used from the fastq headers (unfortunately they don't include this information in the file, but it is possible to detect).
- Added the possibility of printing the function of the curve that was fit to the points in plot_semivariogram.py
- Replaced filter_otu_table.py with filter_otus_from_otu_table.py. The interface was redesigned, and the script was renamed for clarity.
- Replaced filter_by_metadata.py with filter_samples_from_otu_table.py. The interface was redesigned, and the script was renamed for clarity.
- Add new script to compute the coverage of a sample (or its inverse - the conditional uncovered probability) in the script conditional_uncovered_probability.py. Current estimators include lladser_pe, lladser_ci, esty_ci and robbins.
- Updated usearch application wrapper, unit test, and documentation to handle usearch v5.2.32 as earlier version supported has bugs regarding consensus sequence generation (--consout parameter).
- Added support for the RTAX taxonomy assignment. RTAX is designed for assigning taxonomy to paired-end reads, but additionally works for single end reads. QIIME currently supports RTAX 0.981.
- Added the pick_subsampled_reference_otus_through_otu_tables.py, a more efficient open reference OTU picking workflow script for processing very large Illumina (or other) data sets. This is being used to process the Earth Microbiome Project data, so is designed to scale to tens of HiSeq runs. A new tutorial has been added that describes this process (doc/tutorials/open_reference_illumina_processing.rst).
- Added new script convert_fastqual_to_fastq.py to convert fasta/qual files to fastq.
- Added ability to output demultiplexed fastq from split_libraries_fastq.py.
- Added a new sort option to summarize_taxa_through_plots.py which is very useful for web-interface. By default, sorting is turned off.
- Added ability to output OTUs per sample instead of sequences per sample to per_library_stats.py.
- Updates and expansions to existing tutorials, including the using AWS and procrustes analysis tutorials.
- Added insert_seqs_into_tree.py to insert reads into an existing tree. This script wraps RAxML, ParsInsert, and PPlacer.
- Updated split_libraries_fastq.py to handle look only at the first n bases of the barcode reads, where n is automatically determined as the length of the barcodes in the mapping file. This feature is only use if all of the barcodes are the same length. It allows qiime to easily handle ignoring of a 13th base call in the barcode files - this is a technical artifact that sometimes arises.
- Added new stats.py module that provides an API for running biogeographical statistical methods, as well as a framework for creating new method implementations in the future (this code was moved over from qiimeutils/microbiogeo). Also added two new classes to the util module (DistanceMatrix and MetadataMap) that are used by the stats module.
- Updated Mothur OTU picker support from 1.6.0 to the latest (1.25.0) version.
- Added start_parallel_jobs_sg.py to support parallel jobs on SGE queueing systems.
- Modified split_libraries_fastq.py and format.py to show SampleIDs with zero sequence count and to show the total sum of sequences written in the log file.
- Implemented usearch (ie OTUPIPE) as chimera detection/quality filtering/OTU picking in the pick_otus.py module.
- All workflows now log the md5 sums of all input files (trac #92).
- Testing of QIIME with new dependency versions, updating of warnings and test failures (in print_qiime_config.py). No code changes were required to support new versions.
- split_libraries_fastq.py can now handle gzipped input files.
- Addition of code and tutorial to support plotting of raw distance data in QIIME (scripts/make_distance_comparison_plots.py, scripts/make_distance_boxplots.py, qiime/group.py, doc/tutorials/creating_distance_comparison_plots.rst).
- Updates to many scripts to support PyCogent custom option types (new_filepath, new_dirpath, etc.).
- Fixes to workflows to fail immediately on certain types of bad inputs (e.g., missing tree when building UniFrac plots) rather than failing only when the script reaches the relevant step in the workflow.
- Added ability to merge otu tables with overlapping sample ids (in merge_otu_tables.py). Values are summed when an OTU shows up in the same sample in different OTU tables.
- Added a new script (filter_distance_matrix.py) to filter samples directly from distance matrices.
- Added script nmds.py Non-Metric Multidimensional Scaling (NMDS).
- Added in the calculation of standard error in rarefaction plots, since only standard deviation was calculated. Also added an optional option choice for this.
- Support for pick_otus_through_otu_table.py to allow for uclust_ref to be run in parallel with creation of new clusters.
- Added script distance_matrix_from_mapping.py which allows to create a distance matrix from a metadata column.
- assign_taxonomy_reference_seqs_fp and assign_taxonomy_id_to_taxonomy_fp were added to qiime_config, which allows users to set defaults for the dataset they'd like to perform taxonomy assignment against. This works for the serial and parallel versions of assign_taxonomy for both BLAST and RDP.
- Added in make_3d_plots.py the possibility of calculating RMS vectors, using two methods: avg and trajectory, to assess power (movement) of the trajectories. Additionally this feature will return the significance of the difference of the trajectories using ANOVA.
- Added in make_3d_plots.py the possibility of adding vectors or traces of individuals in space; this can be helpful in time series analysis.
- Added additional allowed characters to data fields in mapping files. These include space and /:,; characters. All characters allowed now are: alphanumeric, underscore, space, and +-%./:,; characters.
- split_otu_table.py now can keep duplicated rows in the resulting mapping files and can rename sample names (SampleID), both in the resulting mapping files and the otu tables, with other column of the mapping file; this can be helpful for Procrustes analysis.
- plot_semivariogram.py now lets you control colors and axis of the resulting plots, and ignore missing samples, this can be useful when samples are missing after rarefying.
- default num_dimensions for transform_coordinate_matrices.py changed from all dimensions to 3 (trac ticket #119). This more closely corresponds with how we use this test (e.g., to determine if we would draw the same biological conclusions from two different methods of generating a PCoA plot). This was in response to our noticing that monte carlo p-values were lower than we would expect in controls.
- Removed the --suppress_distance_histograms option from beta_diversity_through_plots.py in favor using the -c/--histogram_categories option to determine whether these will be generated. If the user passes -c, distance histograms are generated. If they do not, these are not generated.
- Added support for fastq files in count_seqs.py.
- Several new tutorials including retraining of the RDP classifier, working with Amazon Web Services, basic unix/linux commands, and others.
- Fixed bug in process_qseq.py that would result in only a single input file per lane have it's data stored in the fastq.
- Fixed bug in filter_otu_table where sampleIDs would remain despite all OTU counts being zero.
- Fixed bug in serial pick_reference_otus_through_otu_table.py that was causing uclust to be used rather than uclust_ref as the default method for otu picking.
- Added option to support reverse complements of golay barcodes in the mapping file.
- Modifed beta_diversity_through_plots.py so distance histograms are only generated if the user specifies --histogram_categories on the command line. These are very slow to generate for all mapping categories, so it makes more sense for the user to turn on histogram plotting for the specific categories they're interested in.
- Added option, --reverse_primer_mismatches to split_libraries.py to allow setting of distinct mismatches from forward primer.
- Added option (-e/--max_rare_depth) to the command line of alpha_rarefaction.py. This allows for a convenient way for users to specify the maximum rarefaction depth on the command line, and is useful for when it needs to be set to something other than the median rarefaction depth. Also added option to control minimum rarefaction depth from the alpha_rarefaction.py command line.
- Added support for 5- and 10-fold and leave-one-out cross-validation to supervised learning.
- Added filter_by_metadata.py state string handling to filter_fasta.py for metadata-based fasta filtering.
- Added subsample_fasta module for randomly subsampling fasta files.
- Added script to split a post-split-lib fasta file into per-sample fasta files. This is useful for sharing Illumina data with collaborators or creating per-sample files for DB submission.
- Fixed bug where multiple_rarefactions_even_depth didn't work with --lineages_included.
- Modified pick_otus_through_otu_table.py so filter_alignment.py can be applied when the method is other than PyNAST. This previously wasn't possible because we only filtered with the lanemask, but we now allow entropy filtering, so this is relevant.
- Fixed two serious bugs in make_distance_histograms.py related to p-value calculations (both Monte Carlo and parametric p-values were affected).
- Removed several obsolete scripts (make_pie_charts.py and several denoiser-related scripts).
- Added muscle_max_memory option to align_seqs script.
- Changed default num_dimensions to 3 in transform_coordinate_matrices.py. This more closely corresponds with how we use this test (e.g., to determine if we would draw the same biological conclusions from two different methods of generating a PCoA plot). This was in response to our noticing that monte carlo p-values were lower than we would expect in controls.
- uclust and uclust_ref OTU pickers now incorporate a pre-filtering step where identical sequences are collapsed before calling uclust and then expanded after calling uclust. This gives a big speed improvement (5-20x) on reasonably sized input sets (>200k sequences) with no effect on the resulting OTUs. This is now the default behavior for pick_otus.py, and can be disabled by passing --suppress_uclust_prefilter_exact_match to pick_otus.py.
- Added ability to pass a file to sort_otu_table.py that contains a sorted list of sample ids, and use that information rather than the mapping file for sorting the OTU table. This allows users to, e.g., pass sorted mapping files as input.
- Added core_analyses.py script and workflow function. This plugs together many components of QIIME (split libraries, pick_otus_through_otu_table.py, beta_diversity_through_3d_plots.py, alpha_rarefaction.py) into a single command and parameters file.
- Added script (split_otu_table_by_taxonomy.py) which will create taxon-specific OTU tables from a master OTU table for taxon-specific analyses of alpha/beta diversity, etc.
- Changed default behavior of single_rarefaction.py. Now lineage information is included by default, but can be turned off with --suppress_include_lineages
- Added script (compare_distance_matrices.py) for computing mantel correlations between a set of distance matrices.
- Interface changes to summarize_otu_by_cat.py. This allows the user to pass the output file name, rather than a directory where the output file should be written.
- Parameter -r reassignment in parallel_assign_taxonomy_rdp.py. Now -r is used for reference_seqs_fp as before was for rdp_classifier_fp.
- Added script inflate_denoiser_output.py to expand clusters to fasta representing all sequences. This allows denoiser results to be passed directly to the OTU pickers (and OTU picking workflows) which should greatly reduce the complexity of denoiser runs. The "Denoising 454 Data" tutorial has been updated to reflect how the pipeline should now be run. The denoising functionality was removed from the pick_otus_through_otu_table.py workflow script as that could only be used in very special circumstances - this allows us to focus our attention on supporting the new pipeline described in the updated tutorial.
- Reorganized output from pick_otus_through_otu_table.py to get rid of the confusing output directory structure.
- Added script plot_semivariogram.py to plot semivariograms using two distance matrices. This script also plots a fitting curve of the data values.
- Changed beta diversity scripts to do unweighted_unifrac,weighted_unifrac by default.
- Changed output of summarize_taxa.py to a directory instead of filepath. This allows for multiple levels to be processed simultaneously.
- The beta_diversity_through_3d_plots.py now contains some additional functionality -- 2d plots and distance histograms. It has therefore been renamed beta_diversity_through_plots.py. Any of the plots can be disabled by passing the options --suppress_distance_histograms, --suppress_2d_plots, and --suppress_3d_plots.
- Updated required version of FastTree to 2.1.3 as this version contains some bug fixes over version 2.1.0.
- Modified single_rarefaction.py so default is to include lineages (previously did not include these by default).
- Added split_otu_table.py script which splits a single OTU table into several OTU tables based on the values in a specified column of the mapping file. This is useful, for example, when a single OTU table is generated that covers multiple studies.
- Fixed bug in mouseovers in taxa area and bar charts. These were misaligned when a lot of samples were included.
- Added support for RDP classifier 2.2. Versions 2.0 and 2.2 are both supported.
- Added support for AmpliconNoise with the ampliconnoise.py script.
- Added new page to the documentation to cover upgrades between versions of QIIME.
- Updated the make_distance_histograms.py output filepaths and HTML layout to be more consistent with other plotting scripts.
- Added a new taxonomy summary workflow (summarize_taxa_through_plots.py).
- Modified workflow scripts so stdout and stderr are written to the log file. This is very useful for debugging.
- Added new script (simsam.py) to simulate samples using a phylogentic tree.
- Complete overhaul of Illumina data processing code. QIIME now treats fastq format as the default for Illumina data, and various other formats can be converted to fastq using process_qseq.py and process_iseq.py. The "Processing Illumina Data tutorial" has also been completely overhauled and describes these changes. The primary script for demultiplexing Illumina data is now split_libraries_fastq.py.
- Dropped support for PyroNoise in favor of AmpliconNoise (the successor to PyroNoise) and the QIIME denoiser.
- Added inflate_denoiser_output.py script to simply the integration of denoiser results into the QIIME pipeline. See the "Denoising 454 Data" tutorial, which has been overhauled in this release. To reduce the possible pathways through QIIME with denoising, support for denoising was removed from pick_otus_through_otu_table.py in favor of working with the pipeline presented in the tutorial.
- Changed default behavior of split_libraries.py so unassigned reads are not stored by default. There is now a --retain_unassigned_reads option to achieve the previous behavior.
- Many clean-ups to the script documentations through-out QIIME.
- Adding scripts to plot semivariograms.
- Modified all workflow scripts so parameter files are now optional. This will simplify working with 'default' analyses in these scripts.
- Added more thorough support for floating point values in OTU tables. This was previously supported only in specific cases.
- Added support for users to pass jobs_to_start on the command line for all of the workflow scripts. This overrides this value in the parameters file and qiime_config, and is a more convenient way of controlling this.
- Added entropy filtering option to filter_alignment.py. This can be useful for position-filtering de novo alignments, or other alignments where no lanemask is available.
- Added new script (count_seqs.py) which will count the number of sequences in one or more fasta file, as well as the mean/stddev sequence lengths, and print the results to stdout or file.
- Added the plot_taxa_summary.py workflow script, which includes summarizing the OTU table by category.
- Overhauled the QIIME overview tutorial.
- Added new script (start_parallel_jobs_torque.py) which can be used for running parallel QIIME on clusters using torque for the queueing system. A new qiime_config value, torque_queue, can be specified to define the default queue.
- Integrated the QIIME Denoiser (Reeder and Knight, 2011) into Qiime.
- Added script (compare_alpha_diversity.py) for comparing rarefied alpha diversities across different mapping file categories.
- Fixed bug in pick_otus.py where reverse strand matching did not work for uclust/uclust_ref.
- Modified location where temp files are written for more consistency through-out QIIME. Temp files are now written the temp_dir (from qiime_config) or /tmp/ if temp_dir is not defined. There may still be a few temp files being written to other locations, but the goal is that all will write to the same user-defined (or default) directory.
- Added split_otu_table.py script which splits a single OTU table into several OTU tables based on the values in a specified column of the mapping file. This is useful, for example, when a single OTU table is generated that covers multiple studies.
- Added script (make_tep.py) that makes TopiaryExplorer project file (.tep) from an otu table, sample metadata table and tree file.
- Removed the rdp_classifier_fp from qiime_config. This was used inconsistently through-out QIIME, so was somewhat buggy, and with the switch to RDP 2.2 in QIIME 1.3.0 I think it will save a lot of support headaches to just get rid of it.
- Added tutorial for processing 18S data, along with a small 3 domain sample sequence file in the qiime_tutorial/18S_tutorial_files/ folder.
- Added filter_tree.py script, which functions similarly to filter_fasta.py. Moved some functions from filter_fasta.py to filter_tree.py that were generally useful.
- Added submit_to_mgrast.py script which takes a post-split-libraries fasta file and submits it to the MG-RAST database.
- Added sort_otu_table.py script which allows for sorting samples in an OTU table based on their associated values in a mapping file.
- Remove DOTUR OTU picker. This was requested by Pat Schloss as Mothur has replaced DOTUR.
- Removed support of SRA submission and processing scripts along with related documentation and tutorial. This included the following scripts: make_sra_submission, sra_spreadsheets_to_map_files, process_sra_submission (starting revision 1786).
- Added categorized_dist_scatterplot.py script.
- Added OTU gain as a new beta diversity metric to compute non-phylogenetic gain (G).
- Added features to split_libraries to allow truncation or removal of sequences with quality score windows, and increased information deposited in log file about sliding window quality score tests. Added unit test for quality score truncation/removal.
- Added reference-based OTU picking workflow script. This can be applied for database OTU picking, as well as for applying Shotgun UniFrac (Caporaso et al. 2011, PLoS One, accepted).
- Added a new list of distinct colors to the colors.py module
- Added Area and Bar taxa summary plots to a new script plot_taxa_summary.py. This script allows for writing of Pie Charts as well, thereby deprecating the make_pie_charts.py script.
- Added support for output of biplot coords to make_3d_plots script (SF feature req. 3124713).
- --stable_sort option enabled by default for uclust OTU pickers.
- Changed defaults for uclust and uclust_ref OTU pickers. The new parameters make both OTU pickers about 2-3x slower, but the resulting clusters are significantly better in terms of making the best choice of OTU for a given sequence, and ensuring that cluster seeds are less than 97% identical to one another. The default rep seq picking method was also changed to "first" from "most_abundant" which ensures that the seed sequence is chosen as the representative for a cluster. Abundance is instead taken into account at the otu picking stage (as it has been for a while) by pre-sorting the sequences by abundance so most abundant sequences are more likely to be seeds. In practice, with presorting by abundance, the same sequence is usually chosen as the representative when passing first or most_abundant as the OTU picking method.
- Added support for generating inVUE plots in make_3d_plots.py.
- Changed tree type default for upgma comparisons, to consensus tree rather than the upgma tree based on the full otu table.
- Disabled the check that jobs_to_start > 1 in a user's qiime_config before allowing them to start parallel jobs. This is inconvenient in several places (e.g., EC2 images when used with n3phele), and after some discussion we decided that it should be up to the user to have understood how parallel qiime should be configured before using it.
- Added ability to pool primers for mapping files passed to check_id_map and split_libraries.py. Primers are separated by commas, and autodetected.
- Added sort_otu_table.txt for sorting the sample IDs in an OTU table based on their value in a mapping file.
- Changed the method for p-value calculation in Procrustes analysis Monte Carlo in response to SF bug # 3189200.
- When computing jackknife support for sample clustering (e.g.: UPGMA sample trees), Qiime can now compute a consensus tree from the jackknife replicates, in addition to the existing functionality of using the full dataset as the master tree, and annotating that tree with jackknife support values. See jackknifed_beta_diversity.py --master_tree and consensus_tree.py .
- Added the ability to write out the flowgram file in process_sff.py, ability to define an output directory and convert Titanium reads to FLX length.
- SRA submission protocol updated to perform human screening with uclust_ref against 16S reference sequences, rather than cdhit/blast against reference sequences. This can be a lot faster, and reduces the complexity of the code by requiring users to have uclust installed for the human screen rather than cdhit and blast.
- Updated SRA protocol to allow users to skip the human screening step as this takes about 2/3 or more of the total analysis time, and is not relevant for non-human-derived samples (e.g., soil samples).
- Added ability to pass --max_accepts, --max_rejects, and --stable_sort through the uclust otu pickers.
- Added a -r parameters to pick_rep_set.py to allow users to pass "preferred" representative sequences in a fasta file. This is useful, for example, if users have picked OTUs with uclust_ref, and would like to use the reference sequences as their representatives, rather than sequences from their sequencing run.
- Renamed Qiime/scripts/jackknifed_upgma.py to Qiime/scripts/jackknifed_beta_diversity.py to reflect the addition of generating jackknifed 2d and 3d plots to this workflow script.
- Updated parallel_multiple_rarefactions.py, parallel_alpha_diversity.py, and parallel_beta_diversity.py to use the jobs_to_start value for better control over the number of parallel runs.
- uclust_ref otu picker now outputs an additional failures file listing the sequences which failed to cluster if the user passed --suppress_new_clusters. This is done for ease of parsing in downstream applications which want to do something special with these sequences. The failures list is no longer written to the log file (although the failures count is still written to the log file).
- Added the filter_fasta.py script which allows users to build a fasta file from an existing fasta where specified sequences are either included or excluded from the new file. The sequences to keep or exclude can be specified by a variety of different inputs, for example as a list of sequence identifiers in a text file.
- Added parallel version of uclust_ref OTU picker.
- Added negative screen option to process_sra_submission.py -- this allows users to screen by discarding all sequences that match a reference set, while the (default) positive screen allows users to screen by retaining only sequences that match a reference set.
- Added options to split_libraries.py to enable the detection and removal of reverse primers from input sequences, and an option to record a filtered quality score output file that matches the bases found in the output seqs.fna file.
- Added the trflp_file_to_otu_table.py script that allows users to create an OTU table simile from a Terminal restriction fragment length polymorphism (T-RFLP) text file.
- Added min_aligned_length parameter to the BLAST OTU picker. By default, BLAST alignments now must cover at least 50% of the input sequence for OTU assignment to occur.
- Changed default randomization strategy in Procrustes monte carlo from shuffling within coordinate vectors to shuffing the labels on the vectors themselves. This doesn't appear to affect clearly significant cases at all, but is more conservative and therefore favors non-significance of results in borderline cases.
- Added ability to run beta diversity calculations in parallel at the single OTU table level to improve performance when computing diversity on very large collections of samples. This functionality is now hooked up to the beta_diversity_through_3d_plots.py workflow script, and includes the new -r parameter to beta_diversity.py which allows users to specify samples to compute diversity vectors for (rather than requiring that the full all-against-all diversity matrix is created).
- uclust-based analyses now retain the .uc files as these contain a lot of useful information that was previously being discarded.
- Improved handling of blank lines in parse_otu_table -- these are now ignored. Other improvements were made to the parse_otu_table format to better support these files coming from sources other than QIIME (such as MG-RAST).
- Allow the -R option to be passed to ChimeraSlayer. Closes feature request 3007445.
- Added capability for pairwise sample/sample, monte carlo significance tests. These are frequently done via the unifrac web interface. Users hitting max size limitations on the web can now thrash their own hardware.
- Fixed a bug in make_rarefaction_plots where the table below the plots had column labels sorted by natsort, while the values in the table were sorted arbitrarily by dict keys. The plots themselves were fine.
- Added a Procrustes analysis/plotting tutorial.
- Added code to exclude OTU ids from an OTU table when building the OTU table. This allows users to discard OTUs that were identified as chimeric. Accessible by passing --exclude_otus_fp to make_otu_table.py.
- Modified identify_chimeric_sequences.py to no longer require the ref db in unaligned format when using chimeraSlayer.
- Added a tutorial document on applying chimera checking in QIIME.
- Added ability to pass -F T/F to parallel_blast to allow disabling of the low-complexity filtering in BLAST.
- Added new script (shared_phylotypes.py) for computing shared OTUs between pairs of samples. Batch mode can be used in combination with dissimilarity_mtx_stats.py to calculate stats for a set or rarefied OTU tables.
- Added min_aligned_percent parameter to BLAST OTU picker workflow, with default set at 50%. This will now require that an alignment must cover at least 50% of a sequence OTU assignment to occur.
- Add script to draw rank abundance graphs (plot_rank_abundance_graph.py).
- Modified interface of make_distance_histograms so --html_output is now the default. A new parameter, --suppress_html_output, was added to produce the old behavior.
- Added script (quality_scores_plot.py) to plot quality score by position given a .qual file. This is useful with another new script (truncate_fasta_qual_files.py) to truncate fasta/qual files at the point where quality begins to decrease, and has been useful in controlling for quality issues on 454 Ti runs.
- Added binary SFF parsing module from PyCogent, removed sfftools dependency from workflow test, process_sff, and other areas of QIIME.
- Added ACE calculation to alpha_diversity.py.
- Updated documentation on file formats used by Qiime.
- Added more extensive error checking in parse_mapping_file to handle some cryptic error messages that were arising from scripts that were passed bad mapping files.
- Added capability to perform supervised classification of metadata categories using the Random Forests classifier. Outputs include a ranking of OTUs by discriminatory power, and the estimated probability of each metadata category for each sample. The latter may be useful for detecting potentially mislabeled samples.
- Additional field added to BLAST assign taxonomy output to indicate the best BLAST hit of the query sequence -- this is in response to Sourceforge feature request 2988407.
- Added presorting by abundance to uclust OTU picker. The idea here is that sequences which are more abundant are better representatives when clustering, so they should come first in the file. Also added ability to pass the optimal flag to uclust, which should also improve uclust-picked OTUs, which comes with a performance hit.
- Added Confidence interval display (jackknifed pcoa) in make_2d_plots and make_3d_plots. After performing multiple_rarefactions, beta_diversity and principal_coordinates on an OTU table, the user can supply the resulting directory to both of these scripts. Currently the user has the option of performing InterQuartile Range (IQR) or standard-deviation (sdev) on the principal coordinate files and ellipses are drawn around each point to represent the confidence interval in each P.C. Along with this option, the user can manipulate the opacity of the ellipses as well.
- Updated the display for rarefaction plots, so the legend does not overlap with the plots and fixed the display of the rarefaction average table in the webpage. Now the user can switch between plots with different metrics and categories by using the drop down menus. The user can also display the samples that contribute to the average for that group. Below the plots, a table is displayed to show the rarefaction average data with all the distance metric values.
- Merged the make_rarefaction_averages into the make_rarefaction_plots script. Also removed the inputs (--rarefaction_ave and --ymax) options, since they are determined by the script. Also, restructured the output directory format and combined all metric data into one html.
- Added the uclust_ref OTU picker, which uses uclust to pick OTUs against a reference collection. Sequences which are within the similarity threshold to a reference sequnece will cluster to an OTU defined by that reference sequence, and sequences which are outside of the similarity threshold to any reference sequence will form new OTUs.
- The interface for exclude_seqs_by_blast.py has changed. -M and -W options are now lowercase to avoid conflicts with parallel scripts. Users can avoid formatting the database by passing --no_format_db. By default the files created by formatdb are now cleaned up. Users can choose not to clean up these files using the --no_clean option. Output file extensions have changed from ".excluded" to ".matching" and from ".screened" to ".non-matching" to be clear regardless of whether the sequences matching the database, or not matching the database, are to be excluded. A check was added for user-supplied BLAST databases in exclude_seqs_by_blast.py when run with --no_format_db: if the required files do not exist a parser error is thrown
- Added ability to chimera check sequences with ChimeraSlayer. See identify_chimeric_seqs.py for details.
- Added workflow script for second-stage SRA submission, process_sra_submission.py. The SRA submission tutorial has been extensively updated to reflect the use of this new script.
- Added the ability to supply a tree and sort the heatmap based on the supplied tree.
- Added the ability to handle variable length barcodes, variable length primers, and no primers with split_libraries.py. Error-correction is not supported for barcode types other than golay_12 and hamming_8. split_libraries.py also now throws an error if the barcode length passed on the commands line does not match the barcode length in the mapping file.
- Updated the print_qiime_config.py script to print useful debugging information about the QIIME environment.
- Added high-level logging functionality to the workflow scripts.
- Added RUN_ALIAS field to SRA experiment.txt spreadsheet in make_sra_submission.xml.
- uclust made default OTU picker (instead of cdhit).
- uclust made default pairwise aligner for PyNAST (instead of BLAST).
- Minimum PyNAST version requirement upgraded to PyNAST 1.1.
- Minimum PyCogent version requirement upgraded to PyCogent 1.4.1.
- tree_compare now can compare trees where some tips aren't present in all trees.
- --small_included option removed from rarefaction scripts.
- Added "remove outliers" functionality to filter_alignment.py. After removing lanemasked columns and gap columns, -r will remove outlying sequences, preventing odd spikes in phylo trees when some seqs are poorly aligned.
- Absent samples are now included in the output of unifrac like metrics - 0 dist between two samples that aren't there, 1 dist between an absent and a present sample.
- make phylogeny now does good midpoint rooting (still off by default).
- Consolidated parsing functionality to qiime.parse.
- Removed dependence on several qiime_config values - users should run Qiime/scripts/print_qiime_config.py -t to get information on parameter settings which are outdated.
- Added an example 'cluster_jobs' -- start_parallel_jobs.py -- script which will give users in multi-core or multi-proc environments very easy access to parallel QIIME. This also adds parallel support to the QIIME virtual box.
- Modified the default value of jobs_to_start to be 1 -- because of the addition of the example cluster_jobs script, the default value of 24 no longer makes sense (if it ever really did...). Because the new script is built for multi-core/multi-proc environments, 24 is too high for most cases. Users will need to modify this value from 1 (corresponding to no parallelization) to a value that makes sense for their environment (e.g., 2 for dual core, or 24 to get the previous default).
- Added colors module and tests to consolidate and standardize coloring code in QIIME - also updated the graphics scripts to use the colors module.
- Added ability for user to specify the background colors of plots in prefs files or on the command line.
- Tweaked SRA submission routines in accordance with accepted format from JCVI's survey of multiple body sites.
- Fixed SF bug #2971581, which was an issue with the path to qiime's scripts directory not being determined correctly when qiime was installed using setup.py. qiime_config now contains a key (empty by defualt) for the qiime_scripts_dir. If this is not specified by the user, it is determined from the qiime project dir.
- Renamed scripts/make_3d_prefs_file.py as scripts/make_prefs_file.py to reflect that the prefs files are now used by other scripts.
- Changed behavior of color-by option to make_3d_plots, make_2d_plots, and make_rarefaction_plots, so if no -b option or prefs files is provided, scripts default to coloring by all values. Consequently, mapping files are also now required for these scripts.
- Added a split_libraries_illumina.py script to handle processing of Illumina GAIIx data.
- Added an additional rarefaction script for clarity. There are now 3 scripts to handle rarefaction: single_rarefaction takes one input otu table into one output table, allows manual naming, multiple_rarefactions makes auto-named rarefied otu tables at a range of depths, and multiple_rarefactios_even_depth.py makes auto-named tables all at the same depth.
- Added workflow unit tests (with timeout functionality).
- Added default alpha and beta diversity metrics to qiime_parameters.txt.
- Integrated Denoiser (Jens Reeder's 454 denoiser) wrappers, and tied this into the workflow scripts.
- Added biplot functionality. make_3d_plots now takes the -t option (off by default) to include taxa on the pcoa plot.
- Updated the QIIME tutorial to use the workflow scripts where possible. Additionally added the tutorial data set in the svn repository.
- Reorganization and expansion of the documentation through-out.
- Added sanity checks to print_qiime_config.py. This will now allow users to evaluate their environment, and should help with debugging.
- Added new field to qiime_config (temp_dir) which will be used to specify where temp files should be written. Currently this is only used by the workflow tests, and is intended to allow users to specify something other than /tmp for cases when /tmp is not shared between all nodes that might be working on a job. This will eventually be used for all temp dir creation.
- Added ability to make summary plots for a directory of coordinate files in make_3d_plots and make_2d_plots. The summary plot adds ellipsoidal confidence intervals around each point in the plot.
- Removed outdated documentation PDFs, along with references to those PDFs in the README and INSTALL documents.
- Addition of a uclust-based OTU picker.
- Transfer of all command line interfaces from Qiime/qiime to Qiime/scripts -- this was an important change as it allowed us to get away from the previously one-to-one relationship between files in our library code (in Qiime/qiime) and the command line interfaces.
- Standardized command line interfaces for all code in Qiime/scripts by using a new function, Qiime.qiime.util.parse_command_line_parameters to handle the command line interfaces.
- Moved to Sphinx for documentation, and developed a framework for extracting script documentation directly from the scripts to populate the web documentation.
- Bug fixes through-out the code base, including but not limited to fixes for Sourceforge tickets: 2957503, 2953765, 2945548, 2942443, 2941925, 2941926, 2941717, 2941396, 2939588, 2939575, 2935939.
- Updated the all_tests.py script to perform a minimal test of the scripts (getting help text works as expected), and to alert users if unit tests may be failing due to missing external applications, in which case they may not be critical.
- Created a directory for pycogent_backports, where we can temporarily store new code that has been added to PyCogent, but which has not been added to a PyCogent release yet. This will allow us to keep QIIME's dependencies on the latest PyCogent version despite rapid and frequently related changes in both packages.
- Added code for performing Procrustes analyses of coordinate matrices, and graphing the results of those analyses in 3d plots (see transform_coordinate_matrices.py and compare_3d_plots.py).
- Performance enhancements related to golay barcode decoding.
- Added setup.py to help with installation of QIIME - this will put the library code in site-packages, and the scripts in /usr/local/bin (both locations can be changed via command line options to setup.py).
- Created a support_files directory to hold jar, js, png, and other required files.
- Added Pearson correlation to list of options in otu_category_significance.py.
- Workflow scripts added for running large repetitive processes with a single command rather than multiple commands -- in scripts, see beta_diversity_through_3d_plots.py, pick_otus_through_otu_table.py, alpha_rarefaction.py, jackknifed_upgma.py.
- Initial release