-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW: peds-simulation action #58
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cherman2, this is looking good. I have a bunch of comments, but I don't think any should take too long. Maybe when I'm back from my trip and you've had a chance to work on this, we can sit together for a final review and get it all merged? I bet we can do that in a couple of hours when the time comes.
q2_fmt/_peds.py
Outdated
drop_incomplete_timepoint=drop_incomplete_timepoint) | ||
real_temp = peds["measure"] | ||
else: | ||
shifted_list = recipient[reference_column].sample(frac=1).to_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend calling this "shuffled" list, as "shifted" would imply something different. The operation we're carrying out here is like shuffling a deck of cards, not (for example) shifting every item's index in a list by adding one to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And would it work to bypass creating this new list all together? For example, this SO post shows how to do this operation in place. It looks like you don't use the metadata after this, so you could keep shuffling it over and over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: not creating the list. Those examples dont quite do what I am doing because those shuffled lists keep the ID and value tied together so they are more just shuffling the rows in the dataframe.
The list allows me to keep the sample ids(index) the same and shuffle the refernece list. I will look in to a simpler way to do this.
q2_fmt/tests/test_engraftment.py
Outdated
Fs1 = feature_peds_df.set_index("id").at['Feature 1 __', | ||
'subject'] | ||
Fs2 = feature_peds_df.set_index("id").at['Feature 2', | ||
'subject'] | ||
self.assertEqual("1", Fs1) | ||
self.assertEqual("2", Fs2) | ||
|
||
|
||
class TestBoot(TestBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there relevant boundary conditions that should be tested, such as a single donor, a single recipient, PEDS=1, PEDS=0, ...)?
bootstrap_replicates=999) | ||
real_median = stats["A:measure"].values | ||
fake_median = stats["B:measure"].values | ||
self.assertGreater(real_median, fake_median) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these tests pass all the time, or do you see intermittent failures? Ideally failures won't ever occur, but they should be really rare if there are intermittent failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not experienced intermittent failures. Did you? I ran the tests 20 times and all of them passed.
I am still working on this PR, I just wanted to jot down some notes in the interim
|
'subject_column': T_subject, | ||
'filter_missing_references': Bool, | ||
'drop_incomplete_subjects': Bool, | ||
'drop_incomplete_timepoint': List[Str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The List[Str]
type doesn't align with the description above - reminder to sort that out.
q2_fmt/tests/test_engraftment.py
Outdated
np.testing.assert_array_equal(recip_mask, exp_r_mask) | ||
|
||
def test_simulate_uniform_distro(self): | ||
mismatch_peds = [0, 0, 0, 0, 0, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend adding some diversity to this test. For example, make mismatch_peds = [1, 2, 3], and then confirm that you see each of those values at least once in the results (intermittent failure is possible, so it's worth mentioning that in a comment, noting that it should be extremely rare).
Co-authored-by: Greg Caporaso <[email protected]>
Back to you, @gregcaporaso Thanks for all the feedback! |
fixing merge conflicts from #88 |
q2_fmt/_peds.py
Outdated
# Tranforming to series for easy series math in _per_subject_stats() | ||
peds_iters = pd.Series(peds_iters) | ||
return peds_iters | ||
|
||
|
||
def _per_subject_stats(mismatched_peds, actual_temp, | ||
iterations, mismatched_pairs_n): | ||
def peds_sim_stats(value, peds_iters, num_iterations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing in num_iterations
, compute that internally here as len(peds_iters)
to avoid a value being provided for num_iterations
that doesn't match the number of values in ped_iters
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one little comment that should be addressed before merge, but after that this should be good to go.
@cherman2, just as a reminder, you should write up notes from our discussion about how dependence between samples and comparisons of baseline samples to (their actual) donor samples can both make the test overly conservative. This will be good to document along with instructions on how the user could address this if they want to (but it requires large recipient numbers, which is why we're not doing it by default).
There many be slight dependence issues here in this methods. @ebolyen, I would love your feedback on this.
We have thought of a couple solutions:
|
Evan and I discussed offline and decided that this is a limitation of the method and we will document accordingly. But like discussed above these limitations make the method more conservative and therefore these issues are less concerning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two test edits and this is good to go.
Co-authored-by: Greg Caporaso <[email protected]>
Co-authored-by: Greg Caporaso <[email protected]>
This PR adds peds-bootstrap functionality.
bootstrapping peds shuffles the donors and test to see if the distribution of the real donors is higher than the distribution of the fake donors
TODO: