Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crosstalk between shapes layer and table #267

Closed
lopollar opened this issue May 17, 2023 · 3 comments · Fixed by #455
Closed

Crosstalk between shapes layer and table #267

lopollar opened this issue May 17, 2023 · 3 comments · Fixed by #455

Comments

@lopollar
Copy link

As the different shapes in the shapes layer are linked to the cells in the table, it would be fantastic if they cross-talked.
Assuming the same indexing, filtering one of the two objects could automatically filter the other one.

Even when adding a shapes layer, it would be great to have the option to retain the objects in this layer that match with the table layer.

Or when adding new shapes layer, to only retain the ones that are joined between the two shapes layers.

As the indices of cells and nuclei don't always match, I wrote code to match up different shapes layers, and match the indices. This might be interesting too!

However, not all cells will have a counterpart in every other shapes layer, and sometimes cells are binucleated, meaning two nuclei will match the same whole cell. These things should be taken into account, as they will give rise to issues along the way.
How to index the cells then will be another issue, just as adapting the labels layer.

This also brings up the issue of multiple tables. As all different stainings will give different segmentation masks and thus different gene counts, they will lead to different tables. How do we solve this? Do we add a column in the table to address this? If so, we are having cells with double indices, which might cause issues.

@LucaMarconato
Copy link
Member

Thanks @lopollar for the feedback and describing your use cases. This is a crucial discussion and we encourage also other users to provide feedback to try to find a good solution. There is a tradeoff between the complexity/ergonomicity of the code and functionalities that we provide.

At one extreme there is having a single table and limited relationships, that would keep the objects and processing APIs simple to use, but restrict functionalities, at the other there is supporting a full relational database, that would be able to handle all the situations you described (and more) but it would be an overkill for most cases.

Currently we are leaning more toward the first extreme, but I think that there is some sort of sweet spot in the middle that we should aim for to provide a more powerful but still simple interface.

Furthermore probably we should avoid having something that is too close to a relational database (or maybe not? would be interesting to hear more opinions on this), and definitely we should avoid adding too much complexity to support something that is too close to a relational database but with custom rewritten code, as in that case both the ergonomicity/usability and functionalities would suffer.

@LucaMarconato
Copy link
Member

@melonora and I are working on this now.

Preliminary code is already present in relational_query.py, but here is our proposed refactoring.

  • introduce a new function join_spatialelement_table(), described below; the existing function match_table_to_element() will be deprecated and use this function.
  • we support left, right and inner join, not the outer join

The function:

def join_spatialelement_table(
    sdata: SpatialData, 
    spatial_element_name: str | list[str], 
    table_name: str, 
    how: str = "left"
	order_from: str | None = None
) -> tuple[SpatialElement | list[SpatialElement], AnnData]:
    pass

Behavior and edge cases:

  • for 'inner' join, both the returned spatialelement(s) and anndata are new objects, for 'left' or 'right' the reference, one of the two will be passed by reference
  • order_from is set to how when order_from is None and how is 'left' or 'right'
  • the returned object will have the order inherited from oder_from, with two exceptions:
    • labels objects don't have rows that can be sorted, therefore they don't change
    • when spatial_element_name is a list and order_from is right, each spatial element will be reordered to match the order of the table, but the order of the spatial elements won't change. In other words, when order_from is left, then merging the rows of the elements would lead to rows with the same order as the table rows, but when order_from is right, this doesn't happen because consecutive rows in the element may be discontinuous in the table.

Rough idea of the implementation:

  • set order_from
  • if the element(s) are labels, compute the unique indices and construct a temporary pandas dataframe that wll be discarded
  • if the element(s) are shapes/points, we already have a (dask)(geo)dataframe
  • merge the dataframes using pd.merge
  • return the appropriate values

Useful code:

@LucaMarconato
Copy link
Member

This is now implemented, thanks @melonora, and will be available when #455 is merged.

@LucaMarconato LucaMarconato linked a pull request Feb 22, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants