NEW: peds-simulation action #58

cherman2 · 2023-05-11T23:22:33Z

This PR adds peds-bootstrap functionality.

bootstrapping peds shuffles the donors and test to see if the distribution of the real donors is higher than the distribution of the fake donors

TODO:

Implement an output for peds-bootstrap
Implement a visualization for peds-bootstrap
add unit-tests
add usage example

…strapping

gregcaporaso

@cherman2, this is looking good. I have a bunch of comments, but I don't think any should take too long. Maybe when I'm back from my trip and you've had a chance to work on this, we can sit together for a final review and get it all merged? I bet we can do that in a couple of hours when the time comes.

q2_fmt/plugin_setup.py

q2_fmt/_peds.py

gregcaporaso · 2024-06-10T19:19:23Z

q2_fmt/_peds.py

+                drop_incomplete_timepoint=drop_incomplete_timepoint)
+            real_temp = peds["measure"]
+        else:
+            shifted_list = recipient[reference_column].sample(frac=1).to_list()


I recommend calling this "shuffled" list, as "shifted" would imply something different. The operation we're carrying out here is like shuffling a deck of cards, not (for example) shifting every item's index in a list by adding one to it.

And would it work to bypass creating this new list all together? For example, this SO post shows how to do this operation in place. It looks like you don't use the metadata after this, so you could keep shuffling it over and over.

Re: not creating the list. Those examples dont quite do what I am doing because those shuffled lists keep the ID and value tied together so they are more just shuffling the rows in the dataframe.

The list allows me to keep the sample ids(index) the same and shuffle the refernece list. I will look in to a simpler way to do this.

q2_fmt/_peds.py

q2_fmt/tests/test_engraftment.py

gregcaporaso · 2024-06-10T19:55:52Z

q2_fmt/tests/test_engraftment.py

        Fs1 = feature_peds_df.set_index("id").at['Feature 1 __',
                                                 'subject']
        Fs2 = feature_peds_df.set_index("id").at['Feature 2',
                                                 'subject']
        self.assertEqual("1", Fs1)
        self.assertEqual("2", Fs2)
+
+
+class TestBoot(TestBase):


Are there relevant boundary conditions that should be tested, such as a single donor, a single recipient, PEDS=1, PEDS=0, ...)?

gregcaporaso · 2024-06-10T19:57:27Z

q2_fmt/tests/test_engraftment.py

+                               bootstrap_replicates=999)
+        real_median = stats["A:measure"].values
+        fake_median = stats["B:measure"].values
+        self.assertGreater(real_median, fake_median)


Do these tests pass all the time, or do you see intermittent failures? Ideally failures won't ever occur, but they should be really rare if there are intermittent failures.

I have not experienced intermittent failures. Did you? I ran the tests 20 times and all of them passed.

cherman2 · 2024-07-08T23:41:59Z

I am still working on this PR, I just wanted to jot down some notes in the interim

I am going to change the name of this to peds-simulation
I am going to replace bootstrap-replicates with replicates
I am going to add a method description that explains this is a monte carlo and that it can serve as a kinda replacement for baseline

q2_fmt/plugin_setup.py

gregcaporaso · 2024-07-31T21:58:20Z

q2_fmt/plugin_setup.py

+                'subject_column': T_subject,
+                'filter_missing_references': Bool,
+                'drop_incomplete_subjects': Bool,
+                'drop_incomplete_timepoint': List[Str],


The List[Str] type doesn't align with the description above - reminder to sort that out.

q2_fmt/plugin_setup.py

gregcaporaso · 2024-07-31T22:08:41Z

q2_fmt/tests/test_engraftment.py

+        np.testing.assert_array_equal(recip_mask, exp_r_mask)
+
+    def test_simulate_uniform_distro(self):
+        mismatch_peds = [0, 0, 0, 0, 0, 0]


I recommend adding some diversity to this test. For example, make mismatch_peds = [1, 2, 3], and then confirm that you see each of those values at least once in the results (intermittent failure is possible, so it's worth mentioning that in a comment, noting that it should be extremely rare).

Co-authored-by: Greg Caporaso <[email protected]>

cherman2 · 2024-08-01T03:05:03Z

Back to you, @gregcaporaso

Thanks for all the feedback!

cherman2 · 2024-08-08T18:18:22Z

fixing merge conflicts from #88

gregcaporaso · 2024-08-09T17:41:20Z

q2_fmt/_peds.py

    # Tranforming to series for easy series math in _per_subject_stats()
    peds_iters = pd.Series(peds_iters)
    return peds_iters


-def _per_subject_stats(mismatched_peds, actual_temp,
-                       iterations, mismatched_pairs_n):
+def peds_sim_stats(value, peds_iters, num_iterations):


Instead of passing in num_iterations, compute that internally here as len(peds_iters) to avoid a value being provided for num_iterations that doesn't match the number of values in ped_iters.

gregcaporaso

I have one little comment that should be addressed before merge, but after that this should be good to go.

@cherman2, just as a reminder, you should write up notes from our discussion about how dependence between samples and comparisons of baseline samples to (their actual) donor samples can both make the test overly conservative. This will be good to document along with instructions on how the user could address this if they want to (but it requires large recipient numbers, which is why we're not doing it by default).

cherman2 · 2024-08-09T18:24:12Z

There many be slight dependence issues here in this methods. @ebolyen, I would love your feedback on this.
From my understanding dependence issues will make the test more conservative which is less concerning( I think).
This data is longitudinal and I think there are two issues with this:

I am not sure if its an issue to have a sample from the same subject in the mismatched-peds values. I think this is similar to the fact that a real sample-donor pair can be compared to same sample but fake donor in mismatched peds. ex: sample1 -donor1 vs sample1-donor2. This may be an issue if the user has maintaince donors that differ between subject samples like: subject1sample1 has donor1, subject1sample2 has donor2. subject1sample2 could be randomized to have donor1 in mismatched peds and could have a relatively high overlap because donor1 did donate to subject1 just not in the sampling.
Baseline timepoints are currently included as real peds values but are expected to have low peds values because no transfer has occured. This would be a negative control and might indentify if the selected donor has a similar microbiome composition as there recipient (some studies are interested in that) but could make my global p-value less significant thant it should be. However, most studies should have enough post-fmt recipients to balance this out.

We have thought of a couple solutions:

This could be a per-timepoint method but that really lowers our comparison N
we can document these possible issues that would cause conservative test stats and let users decided how to filter. (like fitering out their baseline or filtering to one tp)

cherman2 · 2024-08-14T20:49:09Z

Evan and I discussed offline and decided that this is a limitation of the method and we will document accordingly. But like discussed above these limitations make the method more conservative and therefore these issues are less concerning

gregcaporaso

Just two test edits and this is good to go.

q2_fmt/plugin_setup.py

Co-authored-by: Greg Caporaso <[email protected]>

intial peds-bootstrap

8f8d233

cherman2 marked this pull request as draft May 11, 2023 23:22

colinvwood assigned cherman2 Dec 11, 2023

cherman2 added 11 commits May 8, 2024 10:00

updates from EMC paper to peds-bootstrap

2a3aa9d

Merge branch 'dev' into peds-bootstrapping

9bc36f3

lint from merging

82a2e53

add bootstrap to init

8370ad8

updates

0be6145

code dump

8930677

lint

12f6e02

tests and examples

30c0ae7

Merge branch 'dev' of https://github.com/qiime2/q2-fmt into peds-boot…

761f133

…strapping

merge py3.9 fixes

e5cef3b

im alittle confused

cc45fd5

cherman2 marked this pull request as ready for review May 14, 2024 18:40

cherman2 added this to the Manuscript Prep milestone May 23, 2024

cherman2 requested a review from gregcaporaso May 30, 2024 17:31

gregcaporaso requested changes Jun 10, 2024

View reviewed changes

cherman2 added 3 commits July 10, 2024 14:53

Merge branch 'dev' into peds-bootstrapping

1287aa3

lint from conflict merge

418fd4a

simulation not bootstrap

fdec4e5

cherman2 mentioned this pull request Jul 10, 2024

REF: Refector q2-FMT's CI to be a community plugin #82

Closed

cherman2 added 7 commits July 10, 2024 15:56

simulation not bootstrap pt2

a0506de

secret boots

bf2fd5c

ex

75ddf76

imports

b103953

np.nan

27ed822

address small comments and start stats

4280d89

added per-subject-stats

69f842a

gregcaporaso reviewed Jul 31, 2024

View reviewed changes

q2_fmt/plugin_setup.py Outdated Show resolved Hide resolved

gregcaporaso reviewed Jul 31, 2024

View reviewed changes

q2_fmt/plugin_setup.py Outdated Show resolved Hide resolved

gregcaporaso reviewed Jul 31, 2024

View reviewed changes

cherman2 assigned cherman2 and unassigned gregcaporaso Jul 31, 2024

cherman2 and others added 6 commits July 31, 2024 15:41

Apply suggestions from code review

22b61eb

Co-authored-by: Greg Caporaso <[email protected]>

review

caafb9a

fix doc string

e3d3f04

fix cites

43ff167

more tests!

7931b0e

whitespace

28a772a

cherman2 assigned gregcaporaso and unassigned cherman2 Aug 1, 2024

Merge branch 'dev' into peds-bootstrapping

13661e5

lint

78238f0

gregcaporaso reviewed Aug 9, 2024

View reviewed changes

gregcaporaso approved these changes Aug 9, 2024

View reviewed changes

dependency note

ea88903

gregcaporaso approved these changes Aug 14, 2024

View reviewed changes

q2_fmt/plugin_setup.py Outdated Show resolved Hide resolved

q2_fmt/plugin_setup.py Outdated Show resolved Hide resolved

cherman2 and others added 2 commits August 14, 2024 14:12

Update q2_fmt/plugin_setup.py

dad7333

Co-authored-by: Greg Caporaso <[email protected]>

Update q2_fmt/plugin_setup.py

3a1eb82

Co-authored-by: Greg Caporaso <[email protected]>

gregcaporaso merged commit 0335d7b into qiime2:dev Aug 14, 2024
4 checks passed

lizgehret assigned cherman2 and unassigned gregcaporaso Aug 15, 2024

cherman2 removed their assignment Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEW: peds-simulation action #58

NEW: peds-simulation action #58

cherman2 commented May 11, 2023 •

edited

Loading

gregcaporaso left a comment

gregcaporaso Jun 10, 2024

gregcaporaso Jun 10, 2024

cherman2 Jul 11, 2024

gregcaporaso Jun 10, 2024

gregcaporaso Jun 10, 2024

cherman2 Jul 11, 2024 •

edited

Loading

cherman2 commented Jul 8, 2024 •

edited

Loading

gregcaporaso Jul 31, 2024

gregcaporaso Jul 31, 2024

cherman2 commented Aug 1, 2024

cherman2 commented Aug 8, 2024

gregcaporaso Aug 9, 2024

gregcaporaso left a comment

cherman2 commented Aug 9, 2024

cherman2 commented Aug 14, 2024 •

edited

Loading

gregcaporaso left a comment

NEW: peds-simulation action #58

NEW: peds-simulation action #58

Conversation

cherman2 commented May 11, 2023 • edited Loading

gregcaporaso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cherman2 Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

cherman2 commented Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cherman2 commented Aug 1, 2024

cherman2 commented Aug 8, 2024

Choose a reason for hiding this comment

gregcaporaso left a comment

Choose a reason for hiding this comment

cherman2 commented Aug 9, 2024

cherman2 commented Aug 14, 2024 • edited Loading

gregcaporaso left a comment

Choose a reason for hiding this comment

cherman2 commented May 11, 2023 •

edited

Loading

cherman2 Jul 11, 2024 •

edited

Loading

cherman2 commented Jul 8, 2024 •

edited

Loading

cherman2 commented Aug 14, 2024 •

edited

Loading