Crosstalk between shapes layer and table #267

lopollar · 2023-05-17T11:10:27Z

As the different shapes in the shapes layer are linked to the cells in the table, it would be fantastic if they cross-talked.
Assuming the same indexing, filtering one of the two objects could automatically filter the other one.

Even when adding a shapes layer, it would be great to have the option to retain the objects in this layer that match with the table layer.

Or when adding new shapes layer, to only retain the ones that are joined between the two shapes layers.

As the indices of cells and nuclei don't always match, I wrote code to match up different shapes layers, and match the indices. This might be interesting too!

However, not all cells will have a counterpart in every other shapes layer, and sometimes cells are binucleated, meaning two nuclei will match the same whole cell. These things should be taken into account, as they will give rise to issues along the way.
How to index the cells then will be another issue, just as adapting the labels layer.

This also brings up the issue of multiple tables. As all different stainings will give different segmentation masks and thus different gene counts, they will lead to different tables. How do we solve this? Do we add a column in the table to address this? If so, we are having cells with double indices, which might cause issues.

LucaMarconato · 2023-05-17T16:05:10Z

Thanks @lopollar for the feedback and describing your use cases. This is a crucial discussion and we encourage also other users to provide feedback to try to find a good solution. There is a tradeoff between the complexity/ergonomicity of the code and functionalities that we provide.

At one extreme there is having a single table and limited relationships, that would keep the objects and processing APIs simple to use, but restrict functionalities, at the other there is supporting a full relational database, that would be able to handle all the situations you described (and more) but it would be an overkill for most cases.

Currently we are leaning more toward the first extreme, but I think that there is some sort of sweet spot in the middle that we should aim for to provide a more powerful but still simple interface.

Furthermore probably we should avoid having something that is too close to a relational database (or maybe not? would be interesting to hear more opinions on this), and definitely we should avoid adding too much complexity to support something that is too close to a relational database but with custom rewritten code, as in that case both the ergonomicity/usability and functionalities would suffer.

LucaMarconato · 2024-01-26T16:16:01Z

@melonora and I are working on this now.

Preliminary code is already present in relational_query.py, but here is our proposed refactoring.

introduce a new function join_spatialelement_table(), described below; the existing function match_table_to_element() will be deprecated and use this function.
we support left, right and inner join, not the outer join

The function:

def join_spatialelement_table(
    sdata: SpatialData, 
    spatial_element_name: str | list[str], 
    table_name: str, 
    how: str = "left"
	order_from: str | None = None
) -> tuple[SpatialElement | list[SpatialElement], AnnData]:
    pass

Behavior and edge cases:

for 'inner' join, both the returned spatialelement(s) and anndata are new objects, for 'left' or 'right' the reference, one of the two will be passed by reference
order_from is set to how when order_from is None and how is 'left' or 'right'
the returned object will have the order inherited from oder_from, with two exceptions:
- labels objects don't have rows that can be sorted, therefore they don't change
- when spatial_element_name is a list and order_from is right, each spatial element will be reordered to match the order of the table, but the order of the spatial elements won't change. In other words, when order_from is left, then merging the rows of the elements would lead to rows with the same order as the table rows, but when order_from is right, this doesn't happen because consecutive rows in the element may be discontinuous in the table.

Rough idea of the implementation:

set order_from
if the element(s) are labels, compute the unique indices and construct a temporary pandas dataframe that wll be discarded
if the element(s) are shapes/points, we already have a (dask)(geo)dataframe
merge the dataframes using pd.merge
return the appropriate values

Useful code:

implementation of the inner join for the 1 shapes 1 table case, reordering shapes to match the table: https://github.com/scverse/spatialdata-io/blob/39073973822a8cf30a858194f78c742eca416037/src/spatialdata_io/converters/legacy_anndata.py#L87C1-L96C42

LucaMarconato · 2024-02-22T23:07:02Z

This is now implemented, thanks @melonora, and will be available when #455 is merged.

LucaMarconato mentioned this issue Jun 14, 2023

New multiple table in-memory design #298

Closed

LucaMarconato linked a pull request Feb 22, 2024 that will close this issue

multi table support #455

Merged

melonora closed this as completed in #455 Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crosstalk between shapes layer and table #267

Crosstalk between shapes layer and table #267

lopollar commented May 17, 2023

LucaMarconato commented May 17, 2023

LucaMarconato commented Jan 26, 2024

LucaMarconato commented Feb 22, 2024

Crosstalk between shapes layer and table #267

Crosstalk between shapes layer and table #267

Comments

lopollar commented May 17, 2023

LucaMarconato commented May 17, 2023

LucaMarconato commented Jan 26, 2024

LucaMarconato commented Feb 22, 2024