Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAB40B has the opposite phenotypes of PIK3R3/INSYN1: exploration for MorphMap paper (ORF) #4

Closed
AnneCarpenter opened this issue Dec 15, 2023 · 62 comments

Comments

@AnneCarpenter
Copy link
Contributor

Cluster of RAB40B and XLOC_I2_008134 have the opposite phenotypes of a cluster of PIK3R3 and INSYN1 - googled most connections and got nothing so it may be novel.
Emailed Scott Soderling Duke Dec 1 https://mail.google.com/mail/u/0/#sent/QgrcJHsHnNjJtWszNdwGctjDfRPknnWjVHl

  • only 4 papers for INSYN1: Scott Soderling at Duke (plus a genetics lab) [email protected]
    Backups
  • RAB40B - cancer - 19 papers, R Prekeris is most common and one says “invadosome” which may be related to neuron projections (??)
  • PIK3R3 - cancer - 206 papers
    XLOC_I2_008134 - nothing in google search
@AnneCarpenter AnneCarpenter added Ardigen Vignettes that stem from Ardigen's findings waiting labels Dec 15, 2023
@AnneCarpenter
Copy link
Contributor Author

Email correspondence so far...

Hello,
We've communicated before about spreading the word on faculty openings at Duke. I'm excited because I just discovered your papers about INSYN1. I wonder if our recent results might spark a collaboration?

To summarize a lot of work, we knocked down 8,000 genes one by one in U2OS human cells, and then clustered the genes based on having similar morphological impact (using the Cell Painting microscopy assay that labels major organelles).

We found a tight cluster of RAB40B and XLOC_I2_008134 which have the opposite morphological effects of a cluster of PIK3R3 and INSYN1.

I wonder if all of these connections are well-known already?

If not, we would be delighted to work together if you'd like to design an experiment to followup/confirm the connection and add to a paper we are beginning to write up about the large dataset. It always helps in such papers to make a new discovery that can be confirmed (even if it's not a dramatic finding).

All the best, please let me know if you would like to talk further!

Cheers,
Anne

Anne E. Carpenter, Ph.D. (she/her)
Institute Scientist & Senior Director, Imaging Platform
Fellow: Merkin Institute, AIMBE, SLAS, Massachusetts Academy of Sciences, Royal Microscopy Society
Broad Institute of Harvard and MIT
415 Main Street,
Cambridge MA 02142

phone: (617) 714-7750
[email protected]
http://www.broadinstitute.org/~anne

Anne Carpenter [email protected]
Fri, Dec 1, 10:22 AM
to scott.soderling

P.s. other interactions include: a connection between INSYN1 and RNF41, STYK1 and HOOK2.
And, MYT1 is another gene that looks like it has the opposite impact from INSYN1.

just in case those ring any bells or seem worth pursuing!

Scott Soderling, Ph.D.
Sat, Dec 2, 9:58 AM (13 days ago)
to me

Hi Anne,

Sounds like a REALLY cool project! We actually don’t have any active projects right now on Insyn1 in the lab. These look like totally new links. Our work on Insyn1 was related to its function in neuronal inhibitory synapses and I don’t recall these coming out in the proximity proteomics from neurons. It looks like STYK1 and PIK3R3 may both be related to the regulation of the PI3K pathway, so that is interesting.

We have some reagents in the lab for Insyn1 and would be happy to send you any of them. We could potentially perform BioID U2OS cells for InSyn1 if that was helpful for you. We have a new method for doing proximity proteomics using CRISPR engineering of the native gene/protein. Or we would also be happy to send you constructs for InSyn1 BioID.

You may already be thinking about this, but we also have a fantastic new faculty member here from the MIT CSAIL that has developed new computational tools for PPI prediction using language models and protein structure (Rohit Singh). Might be interesting to run your clusters at large scale for PPIs. He would probably be interested in collaborating with you.

Best,

Scott

Anne Carpenter [email protected]
Sat, Dec 2, 7:31 PM (13 days ago)
to Ph.D.

Ah, too bad it's no longer a focus for you. My lab is computational, so sending reagents here won't help :D

Ideally, we'd test some functional relationships among these proteins - I'm not sure we would expect a direct protein protein interaction (at the least we ought to check in PPI databases for existing data to see if that is already known!). Will put that on my list.

Do you know other labs who still focus on InSyn1 function?

@AnneCarpenter
Copy link
Contributor Author

STATUS: waiting to be sure that these results are actually in the freshest version of the data before proceeding.

@AnneCarpenter
Copy link
Contributor Author

info provided here now #8

@tjetkaARD
Copy link
Collaborator

Unfortunately,

  • [RAB40B vs. PIK3R3/INSYN1] correlation cannot be found within the CRISPR datasets - it is correlation purely seen only in ORFs.

Some additional facts:

  • Within Morhpological CRISPR data , RAB40B vs. PIK3R3 has negative correlation of ~ (-10%)
  • Within DepMap CRISPR data, RAB40B vs. PIK3R3 has also negative correlation of ~ (-16%), both in ~top20/top50 anti-correlated genes of each other
    image

@AnneCarpenter
Copy link
Contributor Author

I Pinged Soderling today to find another lab who studies this, because it sounds like you are confirming the relationship DOES exist in JUMP ORF data.

@tjetkaARD
Copy link
Collaborator

tjetkaARD commented Jan 7, 2024

Yes, relationship does exists in JUMP ORF:

JUMP ORF:

  • RAB40B, PIK3R3 and INSYN1 are fdr replicable
  • RAB40B vs. INSYN1, anti-correlated, coefficient: -0.62
  • RAB40B vs. PIK3R3, anti-correlated, coefficient: ~ -0.57
  • RAB40B, INSYN1, PIK3R3 all 'have the phenotype'. There are two separate ORFs perturbations (each with its own replicates) with symbol PIK3R3; both have replicable phenotype.

JUMP CRISPR:

  • RAB40B is p-value (less certain) replicable only, PIK3R3 is not replicable in terms of phenotype
  • INSYN1 is not measured in JUMP-CRISPR
  • RAB40B vs. PIK3R3, insignificant, coefficient: -0.1

@AnneCarpenter
Copy link
Contributor Author

AnneCarpenter commented Jan 8, 2024

If we can find someone to do followup experiments, we should check:

  • are interactions among these genes already known in protein-protein interaction (PPI) networks? This cluster wasn't found through the Evotec route so we don't know for sure that these connections don't already have strong evidence in data/literature such as PPIs.
  • but, that is not worth doing until we can find someone who can think about experiments to do to follow this up.
  • I sent a note to Rytis Prekeris at CU Anschutz and also pinged Soderling.
  • Can @niranjchandrasekaran confirm that XLOC_I2_008134 is not in this ORF cluster of RAB40B, INSYN1, PIK3R3? (I suspect it was in the CRISPR results that we since have abandoned due to fresh versions of the connections)
  • Is there KG support for these interactions, or is it relatively unknown?

@AnneCarpenter
Copy link
Contributor Author

AnneCarpenter commented Jan 17, 2024

@holgerhennig Can your team answer that last Q - is there KG support for these interactions? That will help us know if we should write this up as already-known validation or increase our efforts to pursue followup experiments if it is novel.

@holgerhennig
Copy link
Collaborator

@AnneCarpenter We'll look into it, whether there's KG support for these interactions, and get back to you asap

@AnneCarpenter
Copy link
Contributor Author

Got a response from Rytis in the meantime:
Hi Anne,

No, we did not know about potential connection between Rab40b and INSYN1 or PIK3R3. Would be happy to chat with you to see whether we can do couple experiments to confirm the connections. Let me know what times work for you next week and I can set up zoom link.

How about other Rab40 isoforms (Rab40b and Rab40c)? Do they cluster as well? From our experience, there is substantial amount of functional redundancy between Rab40a, Rab40b and Rab40c.
Rytis

We will schedule a meeting and depending whether Alan or Niranj can attend I can ask one of them to look up the other Rab40's.

@AnneCarpenter
Copy link
Contributor Author

I will meet with Rytis Jan 24th to brainstorm followup experiments.

@tjetkaARD can you please edit your comment above? I think it's a typo because two lines say "RAB40B vs. INSYN1". Given we are only looking at ORFs here I think my previous query to Niranj about XLOC_I2_008134 is now irrelevant but if you can confirm XLOC_I2_008134 is not in the ORF data that would be helpful! He also wanted to see if we have data for Rab40b and Rab40c (ORF or CRISPR?)

It would be great to show him a heatmap of all of these genes (while marking which ones pass our threshold for "has a phenotype"). Probably for ORF data only but if it's easy to make CRISPR we can do that just to be complete, in case Rab40b or c is interesting here.

@AnneCarpenter
Copy link
Contributor Author

@afermg Given this finding is in ORF data only, it would be great to use your web tool to look at images and most-similar/anti-similar genes. Please LMK when it's available. You're welcome to join the Jan 24 meeting as well if you like, similar to that FOXO one we had recently.

@auranic
Copy link
Collaborator

auranic commented Jan 18, 2024

@AnneCarpenter @holgerhennig For the three genes PIK3R3, INSYN1, RAB40B we do not find "unsupervised" explanations in KG for any pair of these genes, so any connection between them is new from KG perspective. Of note, INSYN1 is not in the list of 4850 genes having a signal in ORF. As for XLOC_I2_008134, I can not find any trace of this gene (or pseudogene?) identity, can you point to its NCBI Gene ID?

@AnneCarpenter
Copy link
Contributor Author

I copied that one from an early heatmap (that likely is 'wrong' / outdated) so I guess let's abandon it. I believe someone else tried to find it in our data and the reagent doesn't exist (making it possibly a typo). But anyway if we are not seeing it as a top-similar gene in ORF or CRISPR we can abandon it. Thank you for checking!

@AnneCarpenter AnneCarpenter removed the Ardigen Vignettes that stem from Ardigen's findings label Jan 18, 2024
@AnneCarpenter AnneCarpenter changed the title RAB40B, XLOC_I2_008134 have the opposite phenotypes of PIK3R3/INSYN1: exploration for MorphMap paper RAB40B has the opposite phenotypes of PIK3R3/INSYN1: exploration for MorphMap paper Jan 18, 2024
@tjetkaARD
Copy link
Collaborator

tjetkaARD commented Jan 19, 2024

@AnneCarpenter Answering in order:

  1. I edited the comment above. In fact, the second line was to indicate anticorrelation against PIK3R3 - typo
  2. @auranic XLOC_l2_008134 is in the ORF under the ID: ccsbBroad304_14513 (according to orf meta file) I cannot find any reasonable functional definition of it. I think we should just discard it.
  3. We do have RAB40A/B/C in both datasets. They are all:
  • replicable in ORF data
  • not replicable in CRISPR data

Hence, showing correlation plot for ORFs (all replicable):

orfs_heatmap_cosine_chosen_RAB_

and correlation plot for CRISPRs (only RAB40B p-value replicable):

crispr_heatmap_cosine_chosen_RAB_

  1. @niranjchandrasekaran - could you please check the case of INSYN1? According to the uploaded file to the github: https://github.com/jump-cellpainting/morphmap/blob/main/05.retrieve-orf-annotations/output/replicate-retrieval-mAP-transformed-inf-eff-filtered.csv.gz it is replicable:
    image

Unless Niranji points that the above file is incorrect one, all the genes in the ORFs heatmap are replicable.

@afermg
Copy link
Collaborator

afermg commented Jan 21, 2024

@AnneCarpenter The ORF data interface is now available on broad.io/orf. The easiest way to search for things is querying genes on the search box, but you can also run queries by editing the gene name in the following URL: https://lite.datasette.io/?install=datasette-json-html&parquet=https://zenodo.org/api/records/10542737/files/orf.parquet/content#/data/content?Gene%2FCompound__exact=RAB40B&_sort=Distance

Just replace "RAB40B with other genes of interest.

@AnneCarpenter
Copy link
Contributor Author

AMAZING! I wish Github allowed gifs to express Kermit-the-frog level excitement https://giphy.com/clips/buzzfeed-buzzfeed-celeb-the-muppets-find-out-which-muppet-they-really-are-zTF0aDwhF239JQzIXw

We talked about shifting distance back to the -1 to +1 similarity metric that we've always used... but now I see values of 1300 for RBA40B so I'm wondering if what you're showing is not just our correlation metric * 1000?

@afermg
Copy link
Collaborator

afermg commented Jan 22, 2024

Yes, actually I started using the metric Alex uses (cosine distance), ranging from 0 to 2 to be consistent his code. I was testing if it made a big difference to keep it in 1e3 units, but it doesn'tlook ideal. I'll set it back to decimals, but it will contain 10 digits (we can't limit them because it comes from the lite.datasette internal code. I'll update it and set a reminder here.

@afermg
Copy link
Collaborator

afermg commented Jan 24, 2024

Hi @tjetkaARD, just to clarify; which correlation metric did you use for the analysis? Thanks!

@AnneCarpenter
Copy link
Contributor Author

Maybe even more to the point: where is the file with the similarity metrics that we currently recommend using for this paper?

@shntnu
Copy link
Contributor

shntnu commented Jan 31, 2024

Sorry, step 3. Feel free to tag Alex for his input when you've written it up.

@AnneCarpenter AnneCarpenter changed the title RAB40B has the opposite phenotypes of PIK3R3/INSYN1: exploration for MorphMap paper RAB40B has the opposite phenotypes of PIK3R3/INSYN1: exploration for MorphMap paper (ORF) Feb 1, 2024
@afermg afermg added the orf Uses ORF data label Feb 1, 2024
@AnneCarpenter
Copy link
Contributor Author

I believe this issue is still waiting for three things (though the 2nd one is for ORF only, no crispr analysis needed at least for now because those profiles aren't finalized and we're pretty sure there aren't phenotypes there due to isoforms):

@niranjchandrasekaran - see "could you please check the case of INSYN1..." Q above
@afermg - check the data source Tomasz points to relative to your broad.io/orf and crispr sites to be sure they're generating the expected rank ordered similarities; this will tell us what genes to convey to the collaborators (if not the original cluster shown above). Would also be nice to generate the full list of similar/opposite genes for each gene here, esp for RAB40A which is not like the others here.
@afermg - generate a list of top morph features that distinguish RAB40B/C from INSYN1/PIK3R3 (or whatever the cluster is) and maybe grab images that demonstrate the phenotype if it's obvious/interpretable.

@afermg
Copy link
Collaborator

afermg commented Feb 14, 2024

I don't think we are certain that we are using the same ORF dataset, I pointed towards it here. The simplest way to do so is for someone on the other side to check if the checksum of their file is e1ef247bf4725f35ab8e4793e6289600bdc71ff9, as mentioned above.

@niranjchandrasekaran
Copy link
Member

niranjchandrasekaran commented May 10, 2024

connections

Here how these connections look like with the newest set of ORF and CRISPR profiles

ORF

INSYN1 PIK3R3 RAB40A RAB40B RAB40C XLOC_l2_008134
INSYN1 1 0.26 -0.11 -0.42 -0.23 -0.36
PIK3R3 1 0 -0.34 -0.21 -0.41
RAB40A 1 0.19 0.16 0.02
RAB40B 1 0.52 0.62
RAB40C 1 0.36
XLOC_l2_008134 1

The previously seen relationships between the genes are still there, though some connections are weaker.

CRISPR

Most of these connections are absent in CRISPR because they are either not present in the dataset or do not have a phenotype. Only PIK3R3-RAB40B connection is present in CRISPR but the connection is weak (cosine similarity: -0.11)

KG
Apart from the connection between RAB40A/B/C, all the other connections are not seen in the KG.

@niranjchandrasekaran
Copy link
Member

Notebook

These results are the same as what I said in the above comment: #4 (comment)

The heatmap shows the percentile of the cosine similarities (1 → similar, 0 → anti-similar). The text is the maximum of the absolute KG score (gene_mf__go, gene_bp_go, gene_pathway). I set a KG threshold (like we previously had) of 0.4. If connections have a score lesser than this threshold, then the connection is considered to be unknown. The KG scores were downloaded from Google Drive: ORF and CRISPR. The diagonal of the heatmap indicates whether a gene has a phenotype (False could also mean the gene is not present in the dataset).

ORF

ORF-connections-INSYN1-PIK3R3-RAB40A-RAB40B-RAB40C-XLOC_l2_008134

CRISPR

CRISPR-connections-INSYN1-PIK3R3-RAB40A-RAB40B-RAB40C-XLOC_l2_008134

@AnneCarpenter
Copy link
Contributor Author

This seems worth finalizing (i.e. re-creating the clusters based on what are the nearest neighbors of the genes involved rather than including genes just because they were in the original clusters with old profiles.)

@niranjchandrasekaran
Copy link
Member

Notebook

Here are the recreated clusters

ORF-connections-HOXC8-INSYN1-NRBP1-PIK3R3-RAB40B-RAB40C-ZFP36L1

@jessica-ewald
Copy link
Contributor

This latest re-created cluster is from the ORF data only. Is it worth querying the same genes ["INSYN1", "PIK3R3","RAB40B", "RAB40C"] in the CRISPR data?

@AnneCarpenter
Copy link
Contributor Author

AnneCarpenter commented Jul 24, 2024 via email

@jessica-ewald
Copy link
Contributor

Also I read through a bit of the Scott Soderling Science paper and they specifically knocked down INSYN1 with CRISPR to determine its function. They provide a list of proteins that were perturbed by doing this ... would be interesting to compare.

@jessica-ewald
Copy link
Contributor

@niranjchandrasekaran The basic story here is that other than INSYN1, all genes have high-quality in vitro functional genomics and in vivo transcriptomics data linking their expression to cell proliferation, tumor size, and cancer prognosis. Many of these genes are specifically related to cell migration (invadosome, cytoskeleton projections, etc) and endosome/vesicle formation.

The only functional data for INSYN1 is the Soderling paper linked above, which implicates it in neuronal inhibitory synapses. It's interesting that these data suggest that INSYN1 may also have a link to cell proliferation/cancer. I searched INSYN1 in all of the cancer-focused databases that Anne had listed, and didn't come up with anything. Most of these resources profile a targeted list of several thousand genes/proteins to increase throughput, and INSYN1 isn't included in these lists.

My notes for this story are here: https://docs.google.com/document/d/1zKkDpBWbb3NnQhlX34LEWuuZxy5Rotre5uuuBMTxvxY/edit

I'm not sure what the next steps are. Are we waiting until @afermg and I go through all stories, and then write up the most interesting ones? Are we doing more follow-up with any wet lab researchers? I'm also not sure how limited we are for space.

@AnneCarpenter
Copy link
Contributor Author

I think this resource shows an association of INSYN1 with glioma, which would link the Soderling results with cancer... could you dig in to that?

https://www.proteinatlas.org/ENSG00000205363-INSYN1/pathology

Screenshot 2024-07-30 at 9 14 54 AM

I also see on this page https://www.ncbi.nlm.nih.gov/gtr/genes/388135/ that this paper https://cgp.iiarjournals.org/content/11/4/201.long links INSYN1 to cancer, but INSYN1 doesn't show up in a search of the article and at first glance some of the Supp Data isn't available anymore!

And finally I think this paper is saying an antisense RNA is associated with glioma (brain cancer) which could provide another link:
https://journals.lww.com/md-journal/fulltext/2022/11040/foreboding_lncrna_markers_of_low_grade_gliomas.40.aspx

And this one more generally: https://europepmc.org/article/med/36552797
(both the latter two were via Open Targets)

Does any of that help? I think for this story the most ideal scenario is to find some supporting info from papers or from databases that make it more promising a results (which we do not pursue by lab work here in this paper, but that has 'enough' to suggest to others to do so).

@jessica-ewald
Copy link
Contributor

Thanks for the articles, @AnneCarpenter. I found that https://cgp.iiarjournals.org/content/11/4/201.long was published before INSYN1 had any characterized function, and uses the synonym C15orf59. I did a search specifically of C15orf59 and found more hits.

New info on INSYN1 that tentatively links it to cancer:

Some random facts about INSYN1:

I think overall this is promising! We have 6 genes strongly linked to cell proliferation/cancer, plus INSYN1. Targeted searches of INSYN1 show tentative links to cancer in fairly low profile studies that re-analyzed large datasets to look for novel associations with cancer tissues. Almost all of these studies specifically call out INSYN1 as having no known function (when it was called by its ORF ID) or no known link to cancer.

@AnneCarpenter
Copy link
Contributor Author

AnneCarpenter commented Jul 31, 2024 via email

@niranjchandrasekaran
Copy link
Member

Notebook

Yes! It will be important to comment on whether the same pattern exists or
doesn’t exist in the crispr data, so either way we should take a look at
how it’s looking.

The pattern doesn't seem to exist in CRISPR.

Note: Missing genes either don't have a phenotype or are not present in the CRISPR dataset.

CRISPR-connections-PIK3R3-RAB40B-HOXC8-ZFP36L1-NRBP1

@AnneCarpenter
Copy link
Contributor Author

I wouldn't say that exactly - of course we cannot say anything about INSYN1 which isn't in the CRISPR data, but for example NRBP1 and RAB40B are strongly anti-correlated here which is the same in ORF. There are some strongly opposite relationships too which are meaningful (RAB40B and PIK3R3 are strongly correlated in CRISPR and strongly anti-correlated in ORF).

Ok I just realized I messed up most of that directionality because I mis-read the color schemes, but anyway @jessica-ewald can take this plot and add a sentence explaining this - maybe it's literally just "Although INSYN1 isn't in the CRISPR dataset, several of the other genes' relationships can also be seen in CRISPR, albeit sometimes with reversed directionality"?

@niranjchandrasekaran
Copy link
Member

Notebook

I looked at the similarity of features, grouped by feature groups, compartment and channels, between the two clusters in ORFs. The two clusters are antisimilar across all feature groups and channels.

HOXC8-INSYN1-NRBP1-PIK3R3-RAB40B-RAB40C-ZFP36L1_area_size_compartment

HOXC8-INSYN1-NRBP1-PIK3R3-RAB40B-RAB40C-ZFP36L1_feature_group_channel

@jessica-ewald
Copy link
Contributor

@niranjchandrasekaran, I'm going to wait here also until you pull the location info for these genes.

@niranjchandrasekaran
Copy link
Member

Notebook

I am comfortable with the location of these genes in both ORF and CRISPR plates

ORF

Metadata_Symbol Metadata_Plate Metadata_Well Metadata_Batch
ZFP36L1 BR00121543 F05 2021_05_31_Batch2
PIK3R3 BR00123506 A11 2021_05_10_Batch3
NRBP1 BR00123526 M07 2021_05_31_Batch2
NRBP1 BR00124781 N02 2021_08_30_Batch13
PIK3R3 BR00124781 D21 2021_08_30_Batch13
INSYN1 BR00124794 K03 2021_08_02_Batch10
HOXC8 BR00124794 D06 2021_08_02_Batch10
RAB40C BR00126385 D02 2021_08_02_Batch10
RAB40B BR00126538 L23 2021_08_09_Batch11

CRISPR

Metadata_Symbol Metadata_Plate Metadata_Well Metadata_Batch
PIK3R3 CP-CC9-R1-14 I08 20220914_Run1
ZFP36L1 CP-CC9-R1-21 N04 20220914_Run1
HOXC8 CP-CC9-R1-22 P18 20220914_Run1
NRBP1 CP-CC9-R1-26 A18 20220914_Run1
RAB40B CP-CC9-R1-27 C03 20220914_Run1

@niranjchandrasekaran
Copy link
Member

Notebook

This cluster is not affected by plate layout
ORF-plate-layout-INSYN1-PIK3R3-RAB40C-NRBP1-RAB40B-ZFP36L1-HOXC8

@AnneCarpenter
Copy link
Contributor Author

@jessica-ewald Can you confirm this 'story' is done? (with possible exception of ensuring the final figures match the final text, which is a step we will do for all stories later)

If so you can close the issue, change from "Need to come up with a story" to one of the other options - either 'confirms' or 'new story we will include'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants