Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate case study creation | starting with rmd report #27

Open
4 tasks
jananiravi opened this issue Oct 2, 2024 · 21 comments
Open
4 tasks

Automate case study creation | starting with rmd report #27

jananiravi opened this issue Oct 2, 2024 · 21 comments
Assignees
Labels
bioinfo Bioinformatics related documentation Improvements or additions to documentation, incl. R docstring/roxygen2 good first issue Good for newcomers outreachy for outreachy interns

Comments

@jananiravi
Copy link
Member

  • Start w/ @the-mayer's rmd report
  • Identify proteins to run through MolEvolvR
  • Create case studies
  • Which additional summarization/visualizations would help with these case studies?
@jananiravi jananiravi added the outreachy for outreachy interns label Oct 2, 2024
@Cateline
Copy link
Collaborator

Cateline commented Oct 2, 2024

Cateline
Kindly assign this task to me

@jananiravi jananiravi added documentation Improvements or additions to documentation, incl. R docstring/roxygen2 good first issue Good for newcomers bioinfo Bioinformatics related labels Oct 2, 2024
@jananiravi
Copy link
Member Author

@the-mayer could you pass along your rmd doc to Cateline? @Cateline, you can get started with MolEvolvR web submissions in the meantime to help you understand what the different functions are doing (which summarizations and visualizations they result in).

@the-mayer
Copy link
Collaborator

I'm attaching the report template and some sample output, for reference.
example_report.zip

Note, this report is parameterized, so figures can be supplied as parameters when rendering. As an example, the MolEvolvR web app renders this report by calling:

## List of graphics to include in report
        params <- list(
                    ## Results Summary
                    ### Domain Architecture
                    rs_interproscan_visualization = rs_IprGenes_rx(),
                    ### Proximity Network
                    proximity_network = rval_rs_network_layout_rx(), 
                    ## Phylogeny
                    ### Sunburst
                    sunburst = data()@df,
                    ### Data
                    data = rs_data_table_rx(),
                    ## Query Data
                    ### Data Table
                    queryDataTable = queryDataTable_rx(),
                    ### FASTA
                    fastaDataText = fastaDataText_rx(),
                    ### Query Heatmap
                    heatmap = query_heatmap_rx(),
                    ### Domain Architecture
                    query_data = query_data(),
                    query_domarch_cols = query_domarch_cols(),
                    query_iprDatabases = input$query_iprDatabases,
                    query_iprVisType = input$query_iprVisType,
                    ## Homolog Data
                    mainTable = mainTable_rx(),
                    ## Domain Architecture
                    ### Table
                    DALinTable = DALinTable_rx(),
                    ### Heatmap
                    DALinPlot = DALinPlot_rx(),
                    ### Network
                    DANetwork = DANetwork_rx(),
                    DA_Prot = DA_Prot(),
                    domarch_cols = domarch_cols(),
                    DA_Col = input$DA_Col,
                    DACutoff = DACutoff(),
                    ### Interproscan Viz
                    da_interproscan_visualization = da_IprGenes_rx(),
                    ### Upset Plot
                    # uses existing params
                    ## Phylogeny
                    ### Sunburst
                    phylo_sunburst_levels = input$levels,
                    phylo_sunburst = phylogeny_prot(),
                    ### Tree
                    tree_msa_tool = input$tree_msa_tool,
                    ### MSA
                    rep_accnums = rep_accnums(),
                    msa_rep_num = input$msa_rep_num, 
                    app_data = app_data(),
                    PhyloSelect = input$PhyloSelect, 
                    acc_to_name = acc_to_name(),
                    rval_phylo = rval_phylo(),
                    query_pin = query_pin(),
                    msa_reduce_by = input$msa_reduce_by
                  )
        ## Render RMarkdown report, with included graphics
        rmarkdown::render(tempReport, output_file = file, params = params, envir = new.env(parent = globalenv()))

As you become more familiar with the way reports are generated in the Web App, we can work together to supply the correct parameters to this report in an automated fashion. Let me know if you have any questions in the meantime!

@Cateline
Copy link
Collaborator

Cateline commented Oct 4, 2024 via email

@MyleeeA
Copy link
Collaborator

MyleeeA commented Oct 4, 2024

Hi @jananiravi
Is this a good first issue to start with?

I’ll like to be assigned to this, to get started

@jananiravi
Copy link
Member Author

Yes! @Cateline @MyleeeA which proteins are you both starting with, or do you want us to assign? If that's the case, give me a day to add them. Thanks!

@Cateline
Copy link
Collaborator

Cateline commented Oct 5, 2024

Hi @jananiravi , please assign me some proteins I can work with. You can add them today

@jananiravi
Copy link
Member Author

Each of you can start with one of the 6 ESKAPE species in the CARD antibiotic resistance genes database: https://card.mcmaster.ca/download

  1. Enterobacter spp
  2. Staphylococcus aureus
  3. Klebsiella pneumoniae
  4. Acinetobacter baumannii
  5. Pseudomonas aeruginosa
  6. Enterococcus faecalis
  • Start with one drug/drug class at a time before moving into 1 drug across species or 1 species across drugs
  • generate generalizable functions to download, filter by species/drugs
  • input datasets into molevolvr to generate case study reports
  • download these analyses data systematically --> towards populating knowledgebases.

(Reach out via slack to those interested in bio/bioinfo to take other species -- those interested in bioinfo/r-pkg to work on well-annotated functions.)

Hope this helps!

@jananiravi
Copy link
Member Author

@AbhirupaGhosh @charmvang @wolfeet1 @klterwelp if you card data workflows or top genes readily available, please share those as starting points as well.

@jananiravi
Copy link
Member Author

@KewalinSamart if you have top TB disease/drug genes, create a new spinoff issue for TB gene case study with the same tags as this one. We can request assignees for that, too.

@MyleeeA
Copy link
Collaborator

MyleeeA commented Oct 5, 2024

Thank you so much @jananiravi

@AbhirupaGhosh
Copy link

AbhirupaGhosh commented Oct 9, 2024

Title: Process CARD Data, Map Short Names, and Run MolEvolveR

  • Download CARD Data: Retrieve the latest CARD dataset. (DOWNLOAD)
  • Open ARO_index.tsv: Parse the file (in R).
  • Map CARD Short Name: Map the CARD Short Name column to shortname_antibiotics.tsv and shortname_pathogens.tsv. The CARD Short Name values follow the format pathogen_gene or pathogen_gene_drug.
  • Sort and Group the data by pathogens and antibiotics.
  • Filter Favorite Bug-Drug or Bug for further analysis.
  • Download FASTA Sequences for the list of protein accessions filtered. (use Entrez)
  • Run MolEvolvR: Run the protein sequences through the MolEvolvR tool for evolutionary analysis.

I hope this helps @Cateline

@AbhirupaGhosh AbhirupaGhosh mentioned this issue Oct 9, 2024
5 tasks
@Cateline
Copy link
Collaborator

Cateline commented Oct 9, 2024

Yes, this is now clear. Does it mean that I need to cancel the pull request I had already made?

@AbhirupaGhosh
Copy link

Yes that would be better.

@Cateline
Copy link
Collaborator

Cateline commented Oct 9, 2024 via email

@MyleeeA
Copy link
Collaborator

MyleeeA commented Oct 10, 2024

@Cateline

I see you have a better understanding now
Would you be so kind as to guide me abit
Thank you 🙏🏾

@Cateline
Copy link
Collaborator

Cateline commented Oct 10, 2024 via email

@MyleeeA
Copy link
Collaborator

MyleeeA commented Oct 10, 2024

First of all, I'm struggling with getting the right version for R and R studio so I clone the MolEvolvR repo to enable me use it's functions in R
I understand this are the steps I need to follow.
I found the information but can't seem to locate it, I'll appreciate a link to that @Cateline

CC: @jananiravi

@Cateline
Copy link
Collaborator

Cateline commented Oct 10, 2024

https://posit.co/download/rstudio-desktop/
Try downloading from this site
I also found this resource useful: https://happygitwithr.com/rstudio-git-github#rstudio-git-github

@MyleeeA
Copy link
Collaborator

MyleeeA commented Oct 10, 2024

Thank you for helping out everytime I need your help @Cateline
I'll check it out

@jananiravi
Copy link
Member Author

jananiravi commented Oct 16, 2024

  • Phase 1: The first PR can be for the set of functions going from CARD SNPs to MolEvolvR input protein sequences (fasta). This commit will add a file to R/ with docstring/roxygen2 documentation as with other R functions in that folder.
  • Phase 2: Run these proteins through the MolEvolvR web-app and submit the reports for starters.
  • Phase 3: Create a qmd/rmd markdown (R/Quarto) report file that does everything the web-app report generator does but does so locally with the R-package functions (fully in-house within the MolEvolvR repo).

cc: @falquaddoomi @the-mayer @epbrenner

Cateline added a commit to Cateline/MolEvolvR that referenced this issue Oct 17, 2024
Add code for fetching and saving FASTA sequences for Staph-DA
Cateline added a commit to Cateline/MolEvolvR that referenced this issue Oct 17, 2024
Cateline added a commit to Cateline/MolEvolvR that referenced this issue Oct 21, 2024
@awasyn awasyn self-assigned this Oct 22, 2024
@jananiravi jananiravi added this to the Package release milestone Oct 27, 2024
Cateline added a commit to Cateline/MolEvolvR that referenced this issue Nov 24, 2024
Expanded Bug-Drug.R code to retrieve and save FASTA sequences for ESKAPE pathogens resistant to DAP (Daptomycin)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bioinfo Bioinformatics related documentation Improvements or additions to documentation, incl. R docstring/roxygen2 good first issue Good for newcomers outreachy for outreachy interns
Projects
None yet
Development

No branches or pull requests

6 participants