Skip to content

Commit

Permalink
Add example for --subsample-max-sequences
Browse files Browse the repository at this point in the history
  • Loading branch information
victorlin committed Aug 15, 2024
1 parent 2f0b736 commit b379bb1
Showing 1 changed file with 17 additions and 0 deletions.
17 changes: 17 additions & 0 deletions src/guides/bioinformatics/filtering-and-subsampling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,23 @@ sequence per month from each country:
--output-sequences subsampled_sequences.fasta \
--output-metadata subsampled_metadata.tsv
An alternative to ``--sequences-per-group`` is ``--subsample-max-sequences``.
This is useful if you don't know how many groups the metadata will be
partitioned into but you have a target sample size. For example, target 100
total sequences:

.. code-block:: bash
augur filter \
--sequences data/sequences.fasta \
--metadata data/metadata.tsv \
--min-date 2012 \
--exclude exclude.txt \
--group-by country year month \
--subsample-max-sequences 100 \
--output-sequences subsampled_sequences.fasta \
--output-metadata subsampled_metadata.tsv
Subsampling using multiple ``augur filter`` commands
====================================================

Expand Down

0 comments on commit b379bb1

Please sign in to comment.