Fix tiered subsampling example

I started adjusting the sample sizes in "Adjust multiple augur filter section for weighted sampling" (c6084f3) but did not properly follow through with the rest of the section. Changes: - 100 → 200 sequences from Washington state - 50 → 100 sequences from the rest of the United States
nextstrain · Aug 28, 2024 · 8006eb9 · 8006eb9
1 parent 21e038d
commit 8006eb9
Showing 1 changed file with 11 additions and 11 deletions.
diff --git a/src/guides/bioinformatics/filtering-and-subsampling.rst b/src/guides/bioinformatics/filtering-and-subsampling.rst
@@ -350,7 +350,7 @@ This approach has some caveats:
 
      {n_{\text{other sequences}}} * \frac{1}{{n_{\text{other states}}}}
      =                        100 * \frac{1}{49}
-     \approx                 1.02
+     \approx                 2.04
 
 2. Achieving a full *100 sequences from the rest of the United States* requires
    at least 2 sequences from each of the remaining states. This may not be
@@ -366,8 +366,8 @@ An alternative approach is to decompose this into multiple schemes, each handled
 by a single call to ``augur filter``. Additionally, there is an extra step to
 combine the intermediate samples.
 
-   1. Sample 100 sequences from Washington state.
-   2. Sample 50 sequences from the rest of the United States.
+   1. Sample 200 sequences from Washington state.
+   2. Sample 100 sequences from the rest of the United States.
    3. Combine the samples.
 
 Calling ``augur filter`` multiple times
@@ -378,20 +378,20 @@ well for ad-hoc analyses.
 
 .. code-block:: bash
 
-   # 1. Sample 100 sequences from Washington state
+   # 1. Sample 200 sequences from Washington state
    augur filter \
      --sequences sequences.fasta \
      --metadata metadata.tsv \
      --query "state == 'WA'" \
-     --subsample-max-sequences 100 \
+     --subsample-max-sequences 200 \
      --output-strains sample_strains_state.txt
  
-   # 2. Sample 50 sequences from the rest of the United States
+   # 2. Sample 100 sequences from the rest of the United States
    augur filter \
      --sequences sequences.fasta \
      --metadata metadata.tsv \
      --query "state != 'WA' & country == 'USA'" \
-     --subsample-max-sequences 50 \
+     --subsample-max-sequences 100 \
      --output-strains sample_strains_country.txt
  
    # 3. Combine using augur filter
@@ -428,8 +428,8 @@ system can be used. The following examples use `Snakemake`_.
    .. code-block:: yaml
 
       subsampling:
-        state: --query "state == 'WA'" --subsample-max-sequences 100
-        country: --query "state != 'WA' & country == 'USA'" --subsample-max-sequences 50
+        state: --query "state == 'WA'" --subsample-max-sequences 200
+        country: --query "state != 'WA' & country == 'USA'" --subsample-max-sequences 100
 
 2. Add two rules in a `Snakefile`_. If you are building a standard Nextstrain
    workflow, the output files should be used as input to sequence alignment. See
@@ -438,8 +438,8 @@ system can be used. The following examples use `Snakemake`_.
 
    .. code-block:: python
 
-      # 1. Sample 100 sequences from Washington state
-      # 2. Sample 50 sequences from the rest of the United States
+      # 1. Sample 200 sequences from Washington state
+      # 2. Sample 100 sequences from the rest of the United States
       rule intermediate_sample:
           input:
               metadata = "data/metadata.tsv",