Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result field - propagation #429

Open
JohnCoene opened this issue Oct 7, 2024 · 4 comments
Open

Result field - propagation #429

JohnCoene opened this issue Oct 7, 2024 · 4 comments

Comments

@JohnCoene
Copy link
Member

Sharing some worries I have regarding the performances again, apologies.

A quick recap: we're going to have to handle 100+ stacks, probably far more than that. What is set is that each stack represents an output from the {composer} package by @karmatarap and @MikeJohnPage via {blockr.composer}.

One thing we don't have yet but will do soon (next iteration/sprint) is the concept of "globals" (we may need a better term) where each output/stack depends on one parent stack: this parent stack will allow having one place to store all the "global" data (data shared by many outputs, e.g.: footnotes).

Now to go back 1-2 weeks, in the app I originally rendered all stacks on load on a single tab (rendered hidden with d-none class from Bootstrap which apply display: none !important), then have a select input to toggle the visibility of the outputs.

However, this performed very poorly as all the tacks take quite some time to fire up with our examples of 11 stacks it took ~4-5 seconds for all of the blockr stacks to render and return control to the session.

So I refactored that entirely. Instead of rendering all stacks on load we render on demand as the user selects the stack from the dropdown and that fixed the issue.

However, this is with our current version of the outputs: these don't have the "globals"/parent stack, and I'm questioning whether this will work when we have that because all these updates are reactive that happen only in the shiny server, therefore all stacks server have to be running for the update to the parent stack to propagate.

The simplest example I could come up with is below, it's a bit much for a reprex but it simulates quite well what we do/will have in the application.

The first stack in the app is the "parent stack" on which all other stacks depend, select a number of stacks in the select input and see them being rendered, then change the dataset in the parent stack at the top and hit serialise: you will see that the result of the stacks have only been updated for the rendered stacks.

library(blockr)
library(shiny)

# our result block
new_results2_block <- function(value = character(), ...) {
  fields <- list(
    stack = new_result_field(value = iris)
  )

  new_block(
    fields = fields,
    expr = quote(.(stack)),
    ...,
    class = c("result_block", "data_block")
  )
}

# our parent stack that all stacks depend on
data_stack <- new_stack(
  data = new_dataset_block(selected = "iris"),
  name = "data-source",
  title = "Parent stack"
)

# simulate 10 stacks
N <- 1:10

# we create the stacks, these come from out database
# where they are individually serialised.
stacks <- N |>
  lapply(\(index) {
    new_stack(
      data = new_results2_block(value = "data-source"),
      title = paste("Stack", index),
      name = paste0("stack", index)
    )
  })

names(stacks) <- paste0("stack", N)

ui <- fluidPage(
  theme = bslib::bs_theme(5L),
  p("All stacks depend on the parent stack below"),
  p("Select a few inputs"),
  p("Do that for a few outputs, then change dataset in parent stack and serialise"),
  generate_ui(data_stack),
  hr(),
  div(
    class = "d-flex",
    div(
      class = "flex-grow-1",
      selectizeInput(
        "select",
        "select a stack",
        choices = paste0("stack", N),
        multiple = TRUE
      ),
    ),
    div(
      class = "flex-shrink-1",
      actionButton("render", "Render")
    ),
    div(
      class = "flex-shrink-1",
      actionButton("serialise", "Serialise")
    )
  ),
  div(id = "stacks")
)

server <- function(input, output, session) {
  stacks |>
    lapply(\(stack) {
      name <- attr(stack, "name")
      blockr::add_workspace_stack(name, stack)
    })

  generate_server(data_stack)

  rendered <- c()
  observeEvent(input$select, {
    sapply(input$select, \(stack_name) {
      # stack is already rendered, we return early
      if (stack_name %in% rendered) return()

      rendered <<- c(rendered, stack_name)

      stack <- stacks[[stack_name]]

      on.exit({
        generate_server(stack)
      })

      insertUI(
        "#stacks",
        "beforeEnd",
        generate_ui(stack)
      )
    })
  })

  observeEvent(input$serialise, {
    x <- get_workspace() |>
      ls() |>
      lapply(\(stack_name) {
        stack <- get(stack_name, envir = get_workspace())
        result <- attr(stack, "result")

        is_still_iris_dataset <- identical(result, iris)
        cat(
          "stack",
          stack_name,
          "is identical to iris dataset?",
          is_still_iris_dataset,
          "\n"
        )
      })
  })
}

shinyApp(ui, server)

In brief, either 1) I am missing something here and there's no issue, or 2) we need an efficient way to render hundreds of stacks, or 3) we need a way to propagate changes from parent stack in a different manner, 4) there is another solution.

@JohnCoene
Copy link
Member Author

To complete this, here's an example rendering all stacks on load.

library(shiny)
library(blockr)

stacks <- lapply(1:100, \(i) {
  new_stack(
    data = new_dataset_block(selected = "iris"),
    selected = new_select_block(columns = sample(names(iris), 1))
  )
})

ui <- fluidPage(
  theme = bslib::bs_theme(5L),
  div(id = "stacks")
)

server <- function(input, output, session) {
  observe({
    lapply(stacks, \(stack) {
      on.exit({
        generate_server(stack)
      })

      print(stack)

      insertUI(
        "#stacks",
        "beforeEnd",
        generate_ui(stack)
      )
    })
  })
}

shinyApp(ui, server)

@DivadNojnarg
Copy link
Collaborator

@JohnCoene this issue is expected at this scale. If you call 100 times a module server function that takes even only 0.5 sec to complete on the main thread, there are performance issues. I don't imagine when it's deployed on a server with multiple users.

It would make sense to start thinking about using ExtendedTask or some other async solution (coupled with mirai?) at the block level, so computations in 1 block don't freeze other independent elements. But this won't solve issues with coupled elements, that is, if you update a block and there are 25 blocks downstream, you'll have to wait to get the result.

We briefly talked about it with @nbenn.

@DivadNojnarg
Copy link
Collaborator

DivadNojnarg commented Oct 7, 2024

@JohnCoene @nbenn Little example with a fast stack on the left and slow on the right using ExtendedTask and mirai. They are running async so I can still do things on the first stack when the second one is running.

Screen.Recording.2024-10-07.at.18.44.57.mov

@JohnCoene
Copy link
Member Author

Thinking about this a bit more, for what we have at hand I'm not sure it's necessary.

Taking the usual stack A, B, C, ... depend on stack Z. If stack Z changes we don't need to update stack A, B, C, ... unless they are rendered. That is all dependent stacks only serialise a "reference" to stack Z.

It may indeed still cause issues if we have a lot of stacks rendered and stack Z sees even a minor change but I'm not sure this'll happen: we should not have to render this many stacks.

Something to think about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants