Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r.session.objectLengthLimit seems not working well #811

Closed
Fred-Wu opened this issue Oct 7, 2021 · 20 comments · Fixed by #815
Closed

r.session.objectLengthLimit seems not working well #811

Fred-Wu opened this issue Oct 7, 2021 · 20 comments · Fixed by #815
Labels

Comments

@Fred-Wu
Copy link

Fred-Wu commented Oct 7, 2021

The delay after calling other objects in the terminal is still significant even setting r.session.objectLengthLimit to 100.
As long as the workspace can show the information from str() for a data.frame, the issue is still there.

Although I have not checked for the setting smaller than 100, the only time the issue is gone after loading 9 years data (reported in #174) was to set objectLengthLimit to 0 or 1. This time, str() information won't appear at all.

@Fred-Wu Fred-Wu added the bug label Oct 7, 2021
@renkun-ken
Copy link
Member

Could you provide a reproducible example?

It looks like calling str() on your object might have significant overhead. Could you run

system.time(str(ED))

and see if the timing is significant?

@Fred-Wu
Copy link
Author

Fred-Wu commented Oct 9, 2021

It appears to be the case.
This is from Vscode

   user  system elapsed
   0.12    0.03    0.42

And this is from RStudio. In RStudio, there is no delay at all.

  user  system elapsed 
   0.17    0.00    0.19 

@renkun-ken
Copy link
Member

It is quite strange that the timing of the same str(ED) could be so different, given that it does not include the overhead of the session watcher task callback.

Would you like to try the same in a vanilla R and a radian console (if you are using it in your vscode) without vscode?

@renkun-ken
Copy link
Member

I tried with the following example in vanilla R and radian:

n <- 10e4
ED <- lapply(1:20, function(i) {
  df <- lapply(1:60, function(j) rnorm(n))
  names(df) <- sprintf("x%d", seq_along(df))
  as.data.frame(df)
})

In radian

system.time(str(ED))
   user  system elapsed 
  0.681   0.023   0.706
system.time(utils::capture.output(str(ED)))
   user  system elapsed 
  0.447   0.004   0.452

In vanilla R:

system.time(str(ED))
   user  system elapsed 
  0.545   0.019   0.563
system.time(utils::capture.output(str(ED)))
   user  system elapsed 
  0.427   0.001   0.428 

It looks like the difference between system.time(str(ED)) mainly comes from the printing overhead of radian and vanilla R which does not actually the case as the session watcher only calls utils::capture.output(str(ED)).

@renkun-ken
Copy link
Member

It is now clear that r.session.objectLengthLimit does not cover the case where a list has a number of children with lengthy names. At the moment, you might need to set "r.session.levelOfObjectDetail": "Minimal".

Let me think if I could make r.session.objectLengthLimit also cover such case.

@Fred-Wu
Copy link
Author

Fred-Wu commented Oct 9, 2021

Thanks for the investigation. I did set r.session.levelOfObjectDetail to Minimal.
I will be waiting for the next update then.

@renkun-ken
Copy link
Member

renkun-ken commented Oct 9, 2021

I'll probably expand the values of r.session.levelOfObjectDetail so that it could take values corresponding to 0, 1, 2. Now only 0 and 2 are used as Minimal and Detailed respectively.

> system.time(utils::capture.output(str(ED, max.level = 0)))                                           
   user  system elapsed 
  0.001   0.000   0.001 

> system.time(utils::capture.output(str(ED, max.level = 1)))                                           
   user  system elapsed 
  0.009   0.000   0.008 

> system.time(utils::capture.output(str(ED, max.level = 2)))                                           
   user  system elapsed 
  0.489   0.002   0.491 

@renkun-ken
Copy link
Member

You might want to try the build of #815. Using "r.session.levelOfObjectDetail": "Normal" should be a better option.

@Fred-Wu
Copy link
Author

Fred-Wu commented Oct 11, 2021

Thanks for the new implementation. I have tried it, but not fully understand how it impacts on the delay issue.

The delay is still the same and the timing to call str() is still the same.
In the workspace area, I still can see the same information

system.time(str(ED))    
   user  system elapsed
   0.25    0.03    0.41
r$> system.time(utils::capture.output(str(ED, max.level = 2)))        
   user  system elapsed 
   0.13    0.00    0.12

r$> system.time(utils::capture.output(str(ED, max.level = 1)))        
   user  system elapsed 
   0.12    0.00    0.12

r$> system.time(utils::capture.output(str(ED, max.level = 0)))        
   user  system elapsed 
      0       0       0

Maybe I could make it more clear on the issue. What I observe so far is that, once those data are loaded, then when I just try something like the following in the terminal

a=1; b=2; d=3; g=4

One would expect that those objects should appear in the workspace immediately without any delay. However, it is not the case.
When those objects entered in the terminal, you can see in the workspace that they appear one after one., which is
a num 1 appears first, then b num 2 appears, and so on. This kind of delay is the one I am talking about. Same for calling a; b; d; g in the terminal, the numbers just appear one by one. Without those data loaded, they perform as expected.

@renkun-ken
Copy link
Member

Thanks for the testing of max.level = 1 which still has notable overhead in your case. Would you like to paste the output of str(ED, max.level = 1) so that I could make an example that is more similar with your data?

@Fred-Wu
Copy link
Author

Fred-Wu commented Oct 11, 2021

Here is a dummy str() information, sorry that I may not able to output the true information in the data at the moment.

Basically, this tells the number of characters in a variable name, data type for this variable and maximum number of characters for the value in that variable.

So var1.var_nchar.19. : chr "character max char: 4" means, variable name has 19 characters, this is a "character" variable with maximum 4 characters in its value.

Hope this helps.

r$> str(ED)
Classes 'data.table' and 'data.frame':  105535 obs. of  69 variables:
 $ var1.var_nchar.19. : chr "character max char: 4"
 $ var2.var_nchar.19. : chr "character max char: 10"
 $ var3.var_nchar.29. : chr "character max char: 50"
 $ var4.var_nchar.28. : chr "integer max char: 1"
 $ var5.var_nchar.19. : chr "integer max char: 1"
 $ var6.var_nchar.20. : chr "integer max char: 1"
 $ var7.var_nchar.15. : chr "integer max char: 1"
 $ var8.var_nchar.19. : chr "integer max char: 1"
 $ var9.var_nchar.20. : chr "integer max char: 1"
 $ var10.var_nchar.15.: chr "integer max char: 1"
 $ var11.var_nchar.27.: chr "integer max char: 1"
 $ var12.var_nchar.21.: chr "integer64 max char: 21"
 $ var13.var_nchar.3. : chr "character max char: 6"
 $ var14.var_nchar.26.: chr "character max char: 53"
 $ var15.var_nchar.15.: chr "character max char: 4"
 $ var16.var_nchar.22.: chr "character max char: 59"
 $ var17.var_nchar.17.: chr "character max char: 10"
 $ var18.var_nchar.16.: chr "character max char: 4"
 $ var19.var_nchar.14.: chr "character max char: 39"
 $ var20.var_nchar.14.: chr "integer64 max char: 21"
 $ var21.var_nchar.3. : chr "character max char: 10"
 $ var22.var_nchar.23.: chr "character max char: 4"
 $ var23.var_nchar.18.: chr "IDate max char: 5"
 $ var24.var_nchar.3. : chr "integer max char: 1"
 $ var25.var_nchar.19.: chr "character max char: 10"
 $ var26.var_nchar.17.: chr "character max char: 1"
 $ var27.var_nchar.17.: chr "character max char: 6"
 $ var28.var_nchar.16.: chr "character max char: 8"
 $ var29.var_nchar.21.: chr "character max char: 10"
 $ var30.var_nchar.12.: chr "IDate max char: 5"
 $ var31.var_nchar.12.: chr "POSIXct max char: 10"
 $ var32.var_nchar.17.: chr "integer max char: 1"
 $ var33.var_nchar.21.: chr "IDate max char: 5"
 $ var34.var_nchar.21.: chr "POSIXct max char: 10"
 $ var35.var_nchar.10.: chr "IDate max char: 7"
 $ var36.var_nchar.16.: chr "character max char: 4"
 $ var37.var_nchar.18.: chr "integer max char: 2"
 $ var38.var_nchar.13.: chr "integer max char: 1"
 $ var39.var_nchar.25.: chr "IDate max char: 5"
 $ var40.var_nchar.25.: chr "POSIXct max char: 10"
 $ var41.var_nchar.23.: chr "character max char: 4"
 $ var42.var_nchar.3. : chr "integer max char: 1"
 $ var43.var_nchar.15.: chr "character max char: 15"
 $ var44.var_nchar.18.: chr "character max char: 4"
 $ var45.var_nchar.15.: chr "integer max char: 2"
 $ var46.var_nchar.24.: chr "integer max char: 2"
 $ var47.var_nchar.24.: chr "character max char: 1"
 $ var48.var_nchar.21.: chr "IDate max char: 5"
 $ var49.var_nchar.21.: chr "POSIXct max char: 10"
 $ var50.var_nchar.3. : chr "numeric max char: 16"
 $ var51.var_nchar.24.: chr "character max char: 3"
 $ var52.var_nchar.21.: chr "integer max char: 2"
 $ var53.var_nchar.18.: chr "character max char: 56"
 $ var54.var_nchar.6. : chr "character max char: 5"
 $ var55.var_nchar.4. : chr "integer max char: 4"
 $ var56.var_nchar.8. : chr "integer max char: 2"
 $ var57.var_nchar.10.: chr "IDate max char: 5"
 $ var58.var_nchar.8. : chr "IDate max char: 5"
 $ var59.var_nchar.5. : chr "character max char: 3"
 $ var60.var_nchar.6. : chr "integer max char: 1"
 $ var61.var_nchar.7. : chr "character max char: 23"
 $ var62.var_nchar.4. : chr "character max char: 18"
 $ var63.var_nchar.8. : chr "character max char: 7"
 $ var64.var_nchar.10.: chr "character max char: 7"
 $ var65.var_nchar.5. : chr "integer max char: 2"
 $ var66.var_nchar.5. : chr "character max char: 9"
 $ var67.var_nchar.9. : chr "integer max char: 1"
 $ var68.var_nchar.15.: chr "integer max char: 1"
 $ var69.var_nchar.21.: chr "integer max char: 1"

@renkun-ken
Copy link
Member

A setting of r.session.objectTimeout is added (default 50ms) so that each call of capture.output(str(obj, max.level=)) takes at most 50ms or otherwise it will fallback to max.level=0 which takes minimal time.

Would you like to try the latest build of the #815?

@Fred-Wu
Copy link
Author

Fred-Wu commented Oct 11, 2021

Thanks, would like to have another try.

@Fred-Wu
Copy link
Author

Fred-Wu commented Oct 11, 2021

Thanks for having another implementation again quickly.

However, I think still something strange is going on here.

With the latest option, and my current settings : objectLengthLimit: 100 and levelOfObjectDetail: "Normal, there definitely has some improvements. But still can see a minor delay.

This time, data in the workspace does not show any information.

 system.time(str(ED)) 
   user  system elapsed
   0.16    0.03    0.38

I also tried to set objectLengthLimit: 0, doing this the workspace does not show any data information either.
There is no more delay, but timing is still the same.

 system.time(str(ED)) 
   user  system elapsed
   0.23    0.04    0.42

Calling system.time(utils::capture.output(str(ED, max.level = X))) returned exactly the same as in the previous post.

@renkun-ken
Copy link
Member

Calling system.time(str(ED)) and system.time(utils::capture.output(str(ED, max.level = X))) are just ways of quick diagnostics of the problem. You should only observe the difference of the delay after user input under different settings of r.session.objectLengthLimit, r.session.levelOfObjectDetail and r.session.objectTimeout. Calling these timings should always return similar results.

@Fred-Wu
Copy link
Author

Fred-Wu commented Oct 11, 2021

Find a good combination is hard. Now, the only time there is no delay if there is no str() information in the workspace or hover.

When I try to look at the str() information for those data in the RStudio's environment panel, I notice that it only evaluates and shows the information when I click on that arrow button for the first time. I can see that spinning arrow when I click on it.

Is this something that can be implemented in VSCode-R as well?

@renkun-ken
Copy link
Member

renkun-ken commented Oct 11, 2021

It requires a two-way communication between vscode-R and user R session. And the click is only responsive when R is not busy. Currently, we only rely on a mostly one-way communication approach.

@renkun-ken
Copy link
Member

The two-way communication method is discussed at #440, #594.

@Fred-Wu
Copy link
Author

Fred-Wu commented Oct 11, 2021

What about disabling the hover for data.frame, just showing in the workspace?
Will it make a difference?

@renkun-ken
Copy link
Member

It will not make a difference. The hover and the workspace viewer both share the same information from globalenv.json written by the session watcher task callback, which basically calls utils::capture.output(str(obj, max.level = N)) on each global object on each user input.

The overhead comes from that if the call is too slow for a single object, the delay after user input will be notable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants