Result field - propagation #429

JohnCoene · 2024-10-07T14:14:35Z

Sharing some worries I have regarding the performances again, apologies.

A quick recap: we're going to have to handle 100+ stacks, probably far more than that. What is set is that each stack represents an output from the {composer} package by @karmatarap and @MikeJohnPage via {blockr.composer}.

One thing we don't have yet but will do soon (next iteration/sprint) is the concept of "globals" (we may need a better term) where each output/stack depends on one parent stack: this parent stack will allow having one place to store all the "global" data (data shared by many outputs, e.g.: footnotes).

Now to go back 1-2 weeks, in the app I originally rendered all stacks on load on a single tab (rendered hidden with d-none class from Bootstrap which apply display: none !important), then have a select input to toggle the visibility of the outputs.

However, this performed very poorly as all the tacks take quite some time to fire up with our examples of 11 stacks it took ~4-5 seconds for all of the blockr stacks to render and return control to the session.

So I refactored that entirely. Instead of rendering all stacks on load we render on demand as the user selects the stack from the dropdown and that fixed the issue.

However, this is with our current version of the outputs: these don't have the "globals"/parent stack, and I'm questioning whether this will work when we have that because all these updates are reactive that happen only in the shiny server, therefore all stacks server have to be running for the update to the parent stack to propagate.

The simplest example I could come up with is below, it's a bit much for a reprex but it simulates quite well what we do/will have in the application.

The first stack in the app is the "parent stack" on which all other stacks depend, select a number of stacks in the select input and see them being rendered, then change the dataset in the parent stack at the top and hit serialise: you will see that the result of the stacks have only been updated for the rendered stacks.

library(blockr)
library(shiny)

# our result block
new_results2_block <- function(value = character(), ...) {
  fields <- list(
    stack = new_result_field(value = iris)
  )

  new_block(
    fields = fields,
    expr = quote(.(stack)),
    ...,
    class = c("result_block", "data_block")
  )
}

# our parent stack that all stacks depend on
data_stack <- new_stack(
  data = new_dataset_block(selected = "iris"),
  name = "data-source",
  title = "Parent stack"
)

# simulate 10 stacks
N <- 1:10

# we create the stacks, these come from out database
# where they are individually serialised.
stacks <- N |>
  lapply(\(index) {
    new_stack(
      data = new_results2_block(value = "data-source"),
      title = paste("Stack", index),
      name = paste0("stack", index)
    )
  })

names(stacks) <- paste0("stack", N)

ui <- fluidPage(
  theme = bslib::bs_theme(5L),
  p("All stacks depend on the parent stack below"),
  p("Select a few inputs"),
  p("Do that for a few outputs, then change dataset in parent stack and serialise"),
  generate_ui(data_stack),
  hr(),
  div(
    class = "d-flex",
    div(
      class = "flex-grow-1",
      selectizeInput(
        "select",
        "select a stack",
        choices = paste0("stack", N),
        multiple = TRUE
      ),
    ),
    div(
      class = "flex-shrink-1",
      actionButton("render", "Render")
    ),
    div(
      class = "flex-shrink-1",
      actionButton("serialise", "Serialise")
    )
  ),
  div(id = "stacks")
)

server <- function(input, output, session) {
  stacks |>
    lapply(\(stack) {
      name <- attr(stack, "name")
      blockr::add_workspace_stack(name, stack)
    })

  generate_server(data_stack)

  rendered <- c()
  observeEvent(input$select, {
    sapply(input$select, \(stack_name) {
      # stack is already rendered, we return early
      if (stack_name %in% rendered) return()

      rendered <<- c(rendered, stack_name)

      stack <- stacks[[stack_name]]

      on.exit({
        generate_server(stack)
      })

      insertUI(
        "#stacks",
        "beforeEnd",
        generate_ui(stack)
      )
    })
  })

  observeEvent(input$serialise, {
    x <- get_workspace() |>
      ls() |>
      lapply(\(stack_name) {
        stack <- get(stack_name, envir = get_workspace())
        result <- attr(stack, "result")

        is_still_iris_dataset <- identical(result, iris)
        cat(
          "stack",
          stack_name,
          "is identical to iris dataset?",
          is_still_iris_dataset,
          "\n"
        )
      })
  })
}

shinyApp(ui, server)

In brief, either 1) I am missing something here and there's no issue, or 2) we need an efficient way to render hundreds of stacks, or 3) we need a way to propagate changes from parent stack in a different manner, 4) there is another solution.

The text was updated successfully, but these errors were encountered:

JohnCoene · 2024-10-07T14:25:17Z

To complete this, here's an example rendering all stacks on load.

library(shiny)
library(blockr)

stacks <- lapply(1:100, \(i) {
  new_stack(
    data = new_dataset_block(selected = "iris"),
    selected = new_select_block(columns = sample(names(iris), 1))
  )
})

ui <- fluidPage(
  theme = bslib::bs_theme(5L),
  div(id = "stacks")
)

server <- function(input, output, session) {
  observe({
    lapply(stacks, \(stack) {
      on.exit({
        generate_server(stack)
      })

      print(stack)

      insertUI(
        "#stacks",
        "beforeEnd",
        generate_ui(stack)
      )
    })
  })
}

shinyApp(ui, server)

DivadNojnarg · 2024-10-07T15:11:52Z

@JohnCoene this issue is expected at this scale. If you call 100 times a module server function that takes even only 0.5 sec to complete on the main thread, there are performance issues. I don't imagine when it's deployed on a server with multiple users.

It would make sense to start thinking about using ExtendedTask or some other async solution (coupled with mirai?) at the block level, so computations in 1 block don't freeze other independent elements. But this won't solve issues with coupled elements, that is, if you update a block and there are 25 blocks downstream, you'll have to wait to get the result.

We briefly talked about it with @nbenn.

DivadNojnarg · 2024-10-07T16:51:45Z

@JohnCoene @nbenn Little example with a fast stack on the left and slow on the right using ExtendedTask and mirai. They are running async so I can still do things on the first stack when the second one is running.

Screen.Recording.2024-10-07.at.18.44.57.mov

JohnCoene · 2024-10-13T23:57:37Z

Thinking about this a bit more, for what we have at hand I'm not sure it's necessary.

Taking the usual stack A, B, C, ... depend on stack Z. If stack Z changes we don't need to update stack A, B, C, ... unless they are rendered. That is all dependent stacks only serialise a "reference" to stack Z.

It may indeed still cause issues if we have a lot of stacks rendered and stack Z sees even a minor change but I'm not sure this'll happen: we should not have to render this many stacks.

Something to think about.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Result field - propagation #429

Result field - propagation #429

JohnCoene commented Oct 7, 2024

JohnCoene commented Oct 7, 2024

DivadNojnarg commented Oct 7, 2024

DivadNojnarg commented Oct 7, 2024 •

edited

Loading

JohnCoene commented Oct 13, 2024

Result field - propagation #429

Result field - propagation #429

Comments

JohnCoene commented Oct 7, 2024

JohnCoene commented Oct 7, 2024

DivadNojnarg commented Oct 7, 2024

DivadNojnarg commented Oct 7, 2024 • edited Loading

JohnCoene commented Oct 13, 2024

DivadNojnarg commented Oct 7, 2024 •

edited

Loading