Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual Thread support #125

Closed
wants to merge 13 commits into from
Closed

Virtual Thread support #125

wants to merge 13 commits into from

Conversation

He-Pin
Copy link
Contributor

@He-Pin He-Pin commented May 19, 2024

Quick Summary:
230.51 / S (Worker Thread)

VS

1545.18 / S (Virtual Threads)

Motivation:
Fix #90

I did some analyse on cask, and found it's using undertow for now. and the undertow itself is using Object lock as a lock in the http://www.jboss.org/xnio codebase, which we can do some contribution there to remove the monitor and migrating to ReentrantLock. Which should be better than waiting next Java LTS (maybe Java 25).

So, for now, I think we can just add the Virtual Thread support in cask, and then do some contribution upstream or migrating to another network layer maybe Netty?

The current changes is quite straightfoward.

Modifcation:

  1. Update the dependecies, Mill, libraries, Scala
  2. Because I'm using Windows 11, and chore: Rename page 1 #124 cause some issue, so I have to rename/delete it locally for now.
  3. I fixed some hint problem to make the code looks better before starting ,mainly inside Main.scala, others can follow after this PR.
  4. Add VirtualThreadSupport which only works on the Runtime where Virtual Thread is available, otherwise, it returns null..
  5. Add NewThreadPerTaskExecutor which will start a new Virtual Thread for every Command/Runnable, the Undertow needs this.
  6. Add VirtualThreadBlockingHandler which works like BlockingHandler but delegates to NewThreadPerTaskExecutor in 5 , instead of the connections' worker executor.
  7. Add a cask.virtualThread.enabled system property with expected value true|false, and only using the Virtual Thread to run Routes if and only if both the Runtime supported and cask.virtualThread.enabled is set to true. I'm not exposing an executor or etc thing, because that's would not be the philosophy of cask, but hiding the runtime behavior behind of cask.virtualThread.enabled, as I was not knowing cask is running with undertow.
  8. Do the some testing with ab, I changed a code with Thread.sleep(1000) to simulates.
  @cask.get("/")
  def hello(): String = {
    Thread.sleep(1000) // Simulate a slow endpoint
    "Hello World! Hello World! Hello World!"
  }

Results:
For the first run, I think it works just as expected, I did some assertion about the thread type Thread.isVirtual too during the testing.

Quick Summary: 230.51 / S (Worker Thread) VS 1545.18 / S (Virtual Threads)

My Laptop : i9-14900hx + 64G RAM

Result with Connection Worker Thread Java 21:

C:\Users\hepin>ab -n 10000 -c 10000 "http://localhost:8080/"
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        38 bytes

Concurrency Level:      10000
Time taken for tests:   43.381 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1740000 bytes
HTML transferred:       380000 bytes
Requests per second:    230.51 [#/sec] (mean)
Time per request:       43381.358 [ms] (mean)
Time per request:       4.338 [ms] (mean, across all concurrent requests)
Transfer rate:          39.17 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.2      0      16
Processing:  1435 19619 10914.5  19634   38933
Waiting:     1011 19598 10928.7  19604   38933
Total:       1435 19619 10914.6  19634   38933

Percentage of the requests served within a certain time (ms)
  50%  19634
  66%  25433
  75%  29280
  80%  31199
  90%  35040
  95%  36965
  98%  37949
  99%  37957
 100%  38933 (longest request)

VS

Result with Virtual Threads on Java 21

C:\Users\hepin>ab -n 10000 -c 10000 "http://localhost:8080/"
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        38 bytes

Concurrency Level:      10000
Time taken for tests:   6.472 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1740000 bytes
HTML transferred:       380000 bytes
Requests per second:    1545.18 [#/sec] (mean)
Time per request:       6471.734 [ms] (mean)
Time per request:       0.647 [ms] (mean, across all concurrent requests)
Transfer rate:          262.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.2      0      16
Processing:  1576 2120 202.6   2175    2527
Waiting:     1005 1631 364.6   1517    2279
Total:       1576 2120 202.6   2175    2527

Percentage of the requests served within a certain time (ms)
  50%   2175
  66%   2255
  75%   2306
  80%   2311
  90%   2327
  95%   2337
  98%   2341
  99%   2342
 100%   2527 (longest request)

Future Actions:

  1. Add testing about the NewThreadPerTaskExecutor and testing the Routes is running inside a VT when all setup.
  2. Add benchmarks with gathing/ab or doing it with Github Action (nightly build)
  3. Add Github Actions for Java 21 and Virtual thread.
  4. Adjust the code with feedback.

}

private[cask] object VirtualThreadBlockingHandler {
private lazy val EXECUTOR = new NewThreadPerTaskExecutor(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this Executor live in the Main object, rather than global. That would let it be more easily configured separately for unit tests or other multi-server scenarios

Copy link
Contributor Author

@He-Pin He-Pin May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I was thinking to make it simpler, but with vt, this should be some kind of concurrency limiter.

Copy link
Contributor Author

@He-Pin He-Pin May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about this:
When we expose the executor with handlerExecutor and then user can just override it with a Executors.newThreadPerTaskExecutor on Java 21, then the MethodHandle thing can be avoid.

I was expecting to support virtual thread out of box with cask.virtualThread.enabled if and only if this systeam property set to true and running with Java 21.

So I think the solution should be:

  1. add handlerExecutor just as you suggested.
  2. add a cask.virtualThread.enabled system property and run with Java 21 when user set that to true and running with Java 21, otherwise, without 2, all the MethodHandle things seems unneeded.

I was busy on otherthings, sorry for the delay, what's your input on 2?

Update: It's possible to wrap the virtual thread on top of an executor, with method handle too, I think that would b e nice way to go, will update this pr with that way.

@lihaoyi
Copy link
Member

lihaoyi commented May 19, 2024

Left some comments on the code. Thanks for looking into this!

Some high-level things to do:

  1. Include virtual thread configuration in a dedicated example in the examples/ folder, and then similarly include it in the doc-site. That would help ensure that it remains discoverable for people who want to try it out, which I expect will be an increasing number as Java 21+ is adopted.

  2. Embed your benchmark logic into the build system somehow, so other people can reproduce it. You can download and cache the ab binary via requests.get/os.write in Mill and then define a small ScalaModule to spin up Mill and run the benchmark with and without virtual threads. This will easily let people reproduce the benchmark or make adjustments to it as necessary

  3. Once you've got your benchmarking logic setup, run the benchmarks with/without virtual threads on a few of our other examples: perhaps on minimalApplication as a tiny example, and todo as a big example, and staticFiles as a non-compute bound example. We expect the benefits to be not as dramatic as with your synthetic example using Thread.sleep, but it is still good to know whether there is any improvement, no improvement, or even regression. That would help us advise users what kind of use cases would benefit from using virtual threads and which ones won't

  4. How much of the improvement is due to virtual threads, and how much is due to the NewThreadPerTaskExecutor we began using? We should run a benchmark with NewThreadPerTaskExecutor enabled but virtual threads disabled to verify that the virtual threads are the benefit, and not just the swapping out of the Executor strategy. Does the default one also spawn a NewThreadPerTask, or is it using a fixed thread pool of some sort?

@He-Pin
Copy link
Contributor Author

He-Pin commented May 20, 2024

Great, thanks for the quic and detailed review .

I will update this after work, we do see 10~15pt performance improvement and 5pt RAM reduce at internal stress testing, that numbers vary due to different workloads, but in most case,VT is a good choice ,and Nima is using virtual thread by default now.

@He-Pin
Copy link
Contributor Author

He-Pin commented May 27, 2024

Update:
As @lihaoyi once said , we need comparing the performance with and witout virtual thread, so I have to screen the executor, I tested it locally, it works well.

Even the backing scheduler of Virtual thread is not the default ForkJoinPool, it still works.

VirtualThread[#91,cask-handler-executor-virtual-thread-0]/runnable@pool-1-thread-1
VirtualThread[#93,cask-handler-executor-virtual-thread-1]/runnable@pool-1-thread-2

With this setup, I think I can continue other part of this PR.

object Compress extends cask.MainRoutes {

  protected override val handlerExecutor: Executor = Executors.newFixedThreadPool(Runtime.getRuntime.availableProcessors())

  @cask.decorators.compress
  @cask.get("/")
  def hello(): String = {
    println(Thread.currentThread())
    "Hello World! Hello World! Hello World!"
  }

  initialize()
}

else if (System.getProperty("cask.virtualThread.enabled", "true").toBoolean) {
Util.createVirtualThreadExecutor(executor).getOrElse(executor)
} else executor
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lihaoyi Wdyt about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the PR later, and still need update the CI to Java 21 to run the benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should run CI on both 11 and 21, with the virtual threads test suites only enabled on 21

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the plan.

@lihaoyi
Copy link
Member

lihaoyi commented Jun 5, 2024

@He-Pin let me know when this is ready for review, looking forward to using virtual threads in cask!

@He-Pin
Copy link
Contributor Author

He-Pin commented Jun 5, 2024

@lihaoyi I will continue it this weekend, but currently, I need a PR [https://github.com//pull/124] get merged, because I'm using Windows 11, and I don't have a mac/linux for now.

@lihaoyi
Copy link
Member

lihaoyi commented Jun 5, 2024

@He-Pin merged it

@lihaoyi
Copy link
Member

lihaoyi commented Jul 2, 2024

Bump on this @He-Pin :)

@He-Pin
Copy link
Contributor Author

He-Pin commented Jul 2, 2024

@lihaoyi Sorry for being late, We encountered a virtual thread deadlocking at work, related to the classloader. When the virtual thread and the other platform thread(not the carrier thread) both try to load the same classes or a virtual thread is been notified but there is no carrier thread(been pinned) to run the virtual thread, then deadlock.

So we just changed the classloader, both in the JVM cpp and the lib.

But, cask is a library, So the best we can do is add some documents about this case, and a simple way is limiting the Max concurrent Virtual threads number.

I'm sorry for not updating this soon, I was a little busy at work and we had this issue and fixed it with JVM modification.

So what do you think about :

  1. add some documentation about this.
  2. add some best practices about virtual threads.

@lihaoyi
Copy link
Member

lihaoyi commented Jul 4, 2024

@He-Pin I think documentation and best practices are fine. We do not have the power to change the JVM, so best we can do is tell people if they want to use JVM virtual threads with Cask what the best way of doing so is

@lihaoyi
Copy link
Member

lihaoyi commented Jul 16, 2024

Bump on this. @He-Pin if you're not able to pick this up, I'll put it up again in the next set of bounties

@He-Pin
Copy link
Contributor Author

He-Pin commented Jul 17, 2024

Sorry for the delay, I was working with our internal JVM team to workaround the synchronization in the Classloader, and another issue I currently encounter is the com-lihaoyi/mill#3168 on Windows.

I will continue this this week, did spend sometime to learn the toolchian.

@He-Pin
Copy link
Contributor Author

He-Pin commented Jul 17, 2024

Because privateLookupIn starts in Java 9, and Cask currnetly support java8, so I have to fallback to Reflect I think.

@lihaoyi
Copy link
Member

lihaoyi commented Jul 17, 2024

Got it, if you're still on it I'll leave it to you then!

Feel free to bump the required JVM version of necessary. We dont need to support Java 8 forever, e.g. requests-scala already moved to Java 11. Java is already on 21 or 22, so even jumping to Java 17 would be reasonable

@lihaoyi
Copy link
Member

lihaoyi commented Oct 21, 2024

Closing due to inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Java Virtual Threads (Loom) (500USD Bounty)
2 participants