Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Assigning transcript (points) to cells #132

Open
pakiessling opened this issue Sep 29, 2024 · 8 comments
Open

[Feature] Assigning transcript (points) to cells #132

pakiessling opened this issue Sep 29, 2024 · 8 comments

Comments

@pakiessling
Copy link
Contributor

pakiessling commented Sep 29, 2024

Hi Quentin,

I saw in #87 that you were making the choice to not update the points dataframe with the newly assigned cell id.
I can see the value in not modifying the original data when it comes to something like the image, however I do not really see what it adds in the case of the cell assignment. So far I never wanted to go back to the original assignment (or If I would like to I still have the sopa input) but I really need the point-to-polygon assignment for analysis I want to do.
Maybe just adding an additional column with the new cell_id would be a compromise?

@quentinblampey
Copy link
Collaborator

Hi @pakiessling,

Have you tried this _map_transcript_to_cell function? It should do exactly what you want.

If that works and if it's useful, then I could make it more visible in the docs!

@pakiessling
Copy link
Contributor Author

Hi, yes I ended up using this very fast & smooth.

It is very useful for things like subcellular analysis.

Would be cool to have this more visible!

@quentinblampey
Copy link
Collaborator

Nice!
Okay, I'll make it more visible!

@pakiessling
Copy link
Contributor Author

pakiessling commented Oct 14, 2024

@quentinblampey I actually have a question about _map_transcript_to_cell
I used it lie this:

_map_transcript_to_cell(
    sdata,
    "cellpose_boundaries_row_nr",
    sdata.points["my_points],
    "cellpose_boundaries",
)

I assumed that the values it added were the row numbers of the geopandas dataframe.
But:

sdata.points["my_points].compute().cellpose_boundaries_row_nr.value_counts()
cellpose_boundaries_row_nr
0        1331235
70150       3643
87365       3521
47877       3453
71058       3285
          ...   
92583         10
29033         10
65535         10
57416         10
44450         10

This doesnt match the values in the table:

sc.pp.calculate_qc_metrics(sdata.tables["table"],inplace=True)
sdata.tables["table"].obs.iloc[0]
total_counts                                        141
sdata.tables["table"].obs.iloc[70150]
total_counts                                       1048

Do you know what could be happening there?
This is fresh after Sopa Snakemake pipeline has been run. So everything should be correctly aligned?

@quentinblampey
Copy link
Collaborator

The IDs are shifted by 1 because the ID 0 is already used for transcripts that are not assigned to any cell.
Can you look at the counts of the cell at iloc 70151 instead?

I would prefer to use NA for the transcripts that are not assigned, this way we would not need to shift the indices. But this is the expected Baysor input. I can add an argument to choose how we handle the "non-assigned transcripts", what do you think?

@pakiessling
Copy link
Contributor Author

pakiessling commented Oct 14, 2024

Ok I understand now.

In my data I actually have to substract one from the added number.
70149 corresponds to the correct row in the table and shapes.

I think NA would be less confusing. My favorite would actually be NA + the index (cell_id) from shapes and tables if that would be possible

@quentinblampey
Copy link
Collaborator

70149 corresponds to the correct row in the table and shapes.

Yes, sorry, it's 70149 indeed.

Okay, I'll add the possibility to have NA + cell index!

@pakiessling
Copy link
Contributor Author

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants