Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarize attributes as annotations #163

Merged
merged 8 commits into from
Dec 22, 2023
Merged

Summarize attributes as annotations #163

merged 8 commits into from
Dec 22, 2023

Conversation

anngvu
Copy link
Contributor

@anngvu anngvu commented Dec 19, 2023

There are two things to this:

  • Mainly started this several weeks ago to fix the dataType augmentation job to work with new study table configuration; we should add that summary data as annotations, not as a table row. This replaces @allaway's original code.
  • I left this hanging a bit because I was cycling between work on different PRs, but coming back to it I realized it should be generalized to work for datasets (touched upon this with @cconrad8 yesterday). So hopefully it makes sense what I wrote in the docs. Here is some test code for the dataset use case.
check_fun <- function(values) {
  all(values %in% c("US", "DE", "CA")) # pretend only US, Germany, and Canada are considered valid
}
dataset_id <- "syn51809867"
summarize_attribute(summary_query = glue::glue("select group_concat(distinct GEO_location) as country from {dataset_id} where GEO_location is not null"),
                    attribute = "country", 
                    entity_id = dataset_id,
                    dry_run = FALSE, 
                    check_fun = check_fun)

Update: I think this is technically higher priority than the 3 other PRs outstanding.

@anngvu anngvu marked this pull request as ready for review December 19, 2023 17:37
@anngvu anngvu linked an issue Dec 19, 2023 that may be closed by this pull request
@allaway
Copy link
Collaborator

allaway commented Dec 19, 2023

but coming back to it I realized it should be generalized to work for datasets (touched upon this with @cconrad8 yesterday). So hopefully it makes sense what I wrote in the docs. Here is some test code for the dataset use case.

I totally agree with this. I hope that this can be eventually baked into synapse (https://sagebionetworks.jira.com/browse/PLFM-8197) but for now a util/job to handle this is great.

@allaway
Copy link
Collaborator

allaway commented Dec 20, 2023

Hey @anngvu , I created a copy of your dataset (syn53182927) for testing, but when I run summarize_attribute, I'm running into the same issue that I ran into in #164, presumably caused by the get/set_annotations function:

> summarize_attribute(summary_query = glue::glue("select group_concat(distinct GEO_location) as country from {dataset_id} where GEO_location is not null"),
+                     attribute = "country", 
+                     entity_id = dataset_id,
+                     dry_run = FALSE, 
+                     check_fun = check_fun)
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  TypeError: Expected a synapseclient.Annotations object
Run `reticulate::py_last_error()` for details.
In addition: Warning message:
In entity_meta[attribute] <- meta[[entity]] :
  number of items to replace is not a multiple of replacement length

I'm wondering - what version of reticulate and synapseclient are you using? I wonder if this is something that I've messed up on my mac or if it's actually related to a broader reticulate and/or synapseclient change.

Here's my sessionInfo:

> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.7.7

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] synapser_1.0.0          nfportalutils_0.0.0.946 reticulate_1.34.0      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.11       codetools_0.2-19  lattice_0.22-5    png_0.1-8         withr_2.5.2       grid_4.2.3       
 [7] jsonlite_1.8.7    magrittr_2.0.3    rlang_1.1.2       cli_3.6.1         data.table_1.14.8 rstudioapi_0.15.0
[13] Matrix_1.5-4.1    rjson_0.2.21      tools_4.2.3       glue_1.6.2        compiler_4.2.3   

I am using a conda environment with synapseclient 3.2.0.

@allaway
Copy link
Collaborator

allaway commented Dec 20, 2023

If I get a chance today, I'll spin up a SC instance and test this there.

@allaway
Copy link
Collaborator

allaway commented Dec 20, 2023

Per @anngvu lets try reticulate 1.28 - 1.34 might be the issue?

@allaway
Copy link
Collaborator

allaway commented Dec 21, 2023

OK! After rolling back to reticulate 1.28, I have no issues running this function. Works great!

Copy link
Collaborator

@allaway allaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great - I added a suggested detail to the summarize_attribute function docs, but it's not critical.

R/assign_study_data_types.R Outdated Show resolved Hide resolved
Copy link
Contributor

@jaybee84 jaybee84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got similar reticulate related error as noted before with reticulate_1.32.0. Recommend updating reticulate for whole package to avoid breaking for users in the future. Added an issue #165 to track.

@allaway
Copy link
Collaborator

allaway commented Dec 21, 2023

@jaybee84, can you clarify the issue you faced and how you fixed it? The issue I got seemed to be because reticulate was updated the latest version on my end - with an older version of reticulate, the issue went away. I'm just trying to figure out if we encountered the same thing or somthing different!

@jaybee84
Copy link
Contributor

Here is the particular error I got:

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  TypeError: Expected a synapseclient.Annotations object
Run `reticulate::py_last_error()` for details.
In addition: Warning message:
In entity_meta[attribute] <- meta[[entity]] :
  number of items to replace is not a multiple of replacement length

> reticulate::py_last_error()

Traceback (most recent call last):
  File "/Users/jineta/opt/miniconda3/envs/annotate/lib/python3.7/site-packages/synapseclient/client.py", line 1600, in set_annotations
    raise TypeError("Expected a synapseclient.Annotations object")
TypeError: Expected a synapseclient.Annotations object

My session info is here:

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] reticulate_1.32.0        synapser_1.1.0.119      
[3] nfportalutils_0.0.0.9411

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.11       pillar_1.9.0      compiler_4.2.1   
 [4] tools_4.2.1       digest_0.6.33     jsonlite_1.8.7   
 [7] evaluate_0.21     lifecycle_1.0.3   tibble_3.2.1     
[10] lattice_0.20-45   pkgconfig_2.0.3   png_0.1-8        
[13] rlang_1.1.1       Matrix_1.5-3      cli_3.6.1        
[16] rstudioapi_0.14   yaml_2.3.7        xfun_0.39        
[19] fastmap_1.1.1     withr_2.5.1       dplyr_1.1.3      
[22] knitr_1.42        generics_0.1.3    vctrs_0.6.3      
[25] grid_4.2.1        tidyselect_1.2.0  glue_1.6.2       
[28] data.table_1.14.8 R6_2.5.1          fansi_1.0.5      
[31] rmarkdown_2.21    purrr_1.0.2       magrittr_2.0.3   
[34] codetools_0.2-18  htmltools_0.5.6.1 utf8_1.2.3       
[37] rjson_0.2.21   

I havent had a chance to fix it yet, but was looking into it.

@allaway
Copy link
Collaborator

allaway commented Dec 21, 2023 via email

@anngvu
Copy link
Contributor Author

anngvu commented Dec 21, 2023

@jaybee84 OK, I think this is same problem Robert encountered. It should be a reticulate downgrade to 1.28, anything beyond that won't work, according to what I came across in scidev, etc.

Edit: We can mention this in the docs and/or pin the version in package description.

@jaybee84
Copy link
Contributor

jaybee84 commented Dec 21, 2023

Interesting. rolling down reticulate did not fix it for me:

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] devtools_2.4.5           usethis_2.1.6           
[3] reticulate_1.28          synapser_1.1.0.119      
[5] nfportalutils_0.0.0.9411

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0  xfun_0.39         remotes_2.4.2    
 [4] purrr_1.0.2       lattice_0.20-45   vctrs_0.6.3      
 [7] generics_0.1.3    miniUI_0.1.1.1    htmltools_0.5.6.1
[10] yaml_2.3.7        utf8_1.2.3        rlang_1.1.1      
[13] pkgbuild_1.4.0    urlchecker_1.0.1  pillar_1.9.0     
[16] later_1.3.1       glue_1.6.2        withr_2.5.2      
[19] sessioninfo_1.2.2 lifecycle_1.0.3   stringr_1.5.0    
[22] htmlwidgets_1.6.2 codetools_0.2-18  evaluate_0.21    
[25] memoise_2.0.1     knitr_1.42        callr_3.7.3      
[28] fastmap_1.1.1     httpuv_1.6.11     ps_1.7.2         
[31] fansi_1.0.5       Rcpp_1.0.11       xtable_1.8-4     
[34] promises_1.2.1    cachem_1.0.8      pkgload_1.3.2    
[37] jsonlite_1.8.8    mime_0.12         fs_1.6.3         
[40] rjson_0.2.21      png_0.1-8         digest_0.6.33    
[43] stringi_1.7.12    processx_3.8.0    dplyr_1.1.3      
[46] shiny_1.7.5       grid_4.2.1        cli_3.6.1        
[49] tools_4.2.1       magrittr_2.0.3    tibble_3.2.1     
[52] profvis_0.3.7     crayon_1.5.2      pkgconfig_2.0.3  
[55] ellipsis_0.3.2    Matrix_1.5-3      prettyunits_1.2.0
[58] data.table_1.14.8 rmarkdown_2.21    rstudioapi_0.14  
[61] R6_2.5.1          compiler_4.2.1   
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  TypeError: Expected a synapseclient.Annotations object
Run `reticulate::py_last_error()` for details.
In addition: Warning message:
In entity_meta[attribute] <- meta[[entity]] :
  number of items to replace is not a multiple of replacement length

Then I noted that Robert has a different version of synapser. So I tried rolling my synapser version down to 1.0.0 but getting this error:

devtools::install_version("synapser", version = "1.0.0", repos = c("http://ran.synapse.org", "http://cran.fhcrc.org"))

Trying http://ran.synapse.org
Trying http://cran.fhcrc.org
Error in download_version_url(package, version, repos, type) : 
  version '1.0.0' is invalid for package 'synapser'

@anngvu
Copy link
Contributor Author

anngvu commented Dec 21, 2023

Hmm, synapser version shouldn't matter much at all since we don't interface with it (yet) -- though I also have synapser_1.1.0.119.

I'll try to do a little more research to see if there is another unknown factor in the combination, but let me know if you do a total restart and it does suddenly work.

@allaway
Copy link
Collaborator

allaway commented Dec 22, 2023

Discussed in standup - we'll merge this and then try and revisit the environment issues later when we get back.

@anngvu anngvu merged commit c432501 into develop Dec 22, 2023
7 checks passed
@jaybee84
Copy link
Contributor

but let me know if you do a total restart and it does suddenly work.

@anngvu sorry for my late response, I missed this message earlier. I restarted my R session and am still getting the same error.

@anngvu anngvu deleted the patch/issue-143 branch July 25, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Test and update assign_study_data_types with new view
3 participants