Skip to content

Commit

Permalink
Merge pull request #635 from talex5/trace
Browse files Browse the repository at this point in the history
Switch from CTF to OCaml 5.1 runtime events
  • Loading branch information
talex5 authored Oct 26, 2023
2 parents 9e1872c + c721a07 commit 6e6ae31
Show file tree
Hide file tree
Showing 29 changed files with 478 additions and 573 deletions.
77 changes: 58 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,35 +221,73 @@ Calling an operation that performs an effect (such as `yield`) can switch to a d

## Tracing

The library can write traces in CTF format, showing when threads (fibers) are created, when they run, and how they interact.
We can run the previous code with tracing enabled (writing to a new `trace.ctf` file) like this:
When OCaml's tracing is turned on, Eio writes events about many actions,
such as creating fibers or resolving promises.

```ocaml
# let () =
Eio_unix.Trace.with_tracing "trace.ctf" @@ fun () ->
Eio_main.run main;;
+x = 1
+y = 1
+x = 2
+y = 2
+x = 3
+y = 3
[examples/trace](./examples/trace/) shows how to consume the events manually:

<!-- $MDX skip -->
```sh
$ dune exec -- ./examples/trace/main.exe
+tracer: starting
30926487700447:ring 0: create fiber 0
30926487702032:ring 0: running fiber 0
30926487705057:ring 0: create switch 1
30926487707264:ring 0: create fiber 2
30926487707512:ring 0: running fiber 2
30926487720213:ring 0: log "tracer: starting"
+server: starting
+client: connecting socket...
+server: got connection from client
+server: read "Hello" from socket
30926487769298:ring 0: running fiber 0
30926487769877:ring 0: create fiber 3
30926487770083:ring 0: running fiber 3
30926487771198:ring 0: create switch 4
30926487807888:ring 0: create switch 5
30926487808329:ring 0: create fiber 6
30926487808555:ring 0: running fiber 6
30926487812219:ring 0: log "server: starting"
30926487818883:ring 0: running fiber 3
30926487819091:ring 0: create fiber 7
30926487819155:ring 0: running fiber 7
30926487822428:ring 0: log "client: connecting socket..."
30926487901604:ring 0: running fiber 3
30926487904947:ring 0: running fiber 0
30926487907318:ring 0: running fiber 6
30926487917202:ring 0: log "server: got connection from client"
30926487929993:ring 0: running fiber 6
30926487941403:ring 0: running fiber 7
30926487948000:ring 0: running fiber 7
30926487971898:ring 0: resolve 7
30926487974810:ring 0: running fiber 6
30926487975215:ring 0: running fiber 6
30926487977869:ring 0: running fiber 6
30926487984514:ring 0: log "server: read \"Hello\" from socket"
30926487990785:ring 0: resolve 6
30926487991752:ring 0: running fiber 3
30926488022310:ring 0: resolve 3
30926497839725:ring 0: running fiber 2
+tracer: stopping
```

The trace can be viewed using [mirage-trace-viewer][].
This should work even while the program is still running.
The file is a ring buffer, so when it gets full, old events will start to be overwritten with new ones.
Note that the output from `traceln` appears in the trace as well as on the console.

There are various third-party tools that can consume this data
(but may currently require patches to support the new system):

- [Meio][] (Monitoring for Eio) provides an interactive console-based UI for exploring running fibers.
- [Olly][] can save Perfetto traces and report statistics.
- [mirage-trace-viewer][] renders traces visually.

For example, this is how mirage-trace-viewer renders the counting example above:

<p align='center'>
<img src="./doc/trace.svg"/>
</p>

This shows the two counting threads as two horizonal lines.
The white regions indicate when each thread was running.
Note that the output from `traceln` appears in the trace as well as on the console.

The [Meio][] (Monitoring for Eio) project provides an interactive console-based UI for exploring running fibers,
using the new runtime events support in OCaml 5.1.

## Cancellation

Expand Down Expand Up @@ -1849,3 +1887,4 @@ Some background about the effects system can be found in:
[Lambda Capabilities]: https://roscidus.com/blog/blog/2023/04/26/lambda-capabilities/
[Eio.Process]: https://ocaml-multicore.github.io/eio/eio/Eio/Process/index.html
[Dev meetings]: https://docs.google.com/document/d/1ZBfbjAkvEkv9ldumpZV5VXrEc_HpPeYjHPW_TiwJe4Q
[Olly]: https://github.com/tarides/runtime_events_tools
3 changes: 3 additions & 0 deletions examples/trace/dune
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
(executable
(name main)
(libraries eio.runtime_events eio_main))
77 changes: 77 additions & 0 deletions examples/trace/main.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
(* This example shows how to trace an Eio program.
The [main] function creates a listening socket and has a client connect and send a message,
which is handled by a server fiber.
At the same time, another fiber is displaying trace events.
For simplicity, this example runs the tracer in the same process as the program being traced,
but typically they would be separate processes. *)

open Eio.Std

(* [handle pp] handles an event by writing the timestamp, ring ID and user data with [traceln].
[pp] is used to format the user data. *)
let handle pp : _ Eio_runtime_events.handler =
fun ring ts v ->
(* Note: don't use traceln here, as it will just generate more log events! *)
Fmt.epr "%9Ld:ring %d: %a@." (Runtime_events.Timestamp.to_int64 ts) ring pp v

let callbacks =
Runtime_events.Callbacks.create ()
(* Uncomment to trace GC events too: *)
(*
~runtime_begin:(handle (fun f phase -> Fmt.pf f "begin %s" (Runtime_events.runtime_phase_name phase)))
~runtime_end:(handle (fun f phase -> Fmt.pf f "end %s" (Runtime_events.runtime_phase_name phase)))
*)
~lost_events:(fun ring n -> traceln "ring %d lost %d events" ring n)
|> Eio_runtime_events.add_callbacks
~fiber:(handle (Fmt.fmt "running fiber %d"))
~create:(handle (fun f (id, ty) -> Fmt.pf f "create %s %d" (Eio_runtime_events.ty_to_string ty) id))
~resolve:(handle (Fmt.fmt "resolve %d"))
~log:(handle (Fmt.fmt "log %S"))
~name:(handle (fun f (id, name) -> Fmt.pf f "%d is named %S" id name))
(* (see lib_eio/runtime_events/eio_runtime_events.mli for more event types) *)

(* Read and display trace events from [cursor] until [finished]. *)
let trace ~finished (clock, delay) cursor =
traceln "tracer: starting";
let rec aux () =
let _ : int = Runtime_events.read_poll cursor callbacks None in
if !finished then (
traceln "tracer: stopping"
) else (
Eio.Time.Mono.sleep clock delay;
aux ()
)
in
aux ()

(* The program to be traced. *)
let main net =
Switch.run @@ fun sw ->
let addr = `Tcp (Eio.Net.Ipaddr.V4.loopback, 8123) in
let s = Eio.Net.listen ~sw ~backlog:1 ~reuse_addr:true net addr in
Fiber.both
(fun () ->
traceln "server: starting";
let c, _addr = Eio.Net.accept ~sw s in
traceln "server: got connection from client";
let msg = Eio.Flow.read_all c in
traceln "server: read %S from socket" msg
)
(fun () ->
traceln "client: connecting socket...";
let c = Eio.Net.connect ~sw net addr in
Eio.Flow.copy_string "Hello" c;
Eio.Flow.close c
)

(* Enable tracing then run the [main] and [trace] fibers. *)
let () =
Runtime_events.start ();
let cursor = Runtime_events.create_cursor None in (* Create a in-process cursor *)
Eio_main.run @@ fun env ->
let finished = ref false in
Fiber.both
(fun () -> trace ~finished (env#mono_clock, 0.01) cursor)
(fun () -> main env#net; finished := true)
2 changes: 1 addition & 1 deletion lib_eio/core/debug.ml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ let default_traceln ?__POS__:pos fmt =
Format.pp_close_box f ();
Format.pp_print_flush f ();
let msg = Buffer.contents b in
Trace.label msg;
Trace.log msg;
let lines = String.split_on_char '\n' msg in
Mutex.lock traceln_mutex;
Fun.protect ~finally:(fun () -> Mutex.unlock traceln_mutex) @@ fun () ->
Expand Down
2 changes: 1 addition & 1 deletion lib_eio/core/dune
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(library
(name eio__core)
(public_name eio.core)
(libraries cstruct hmap lwt-dllist fmt optint domain-local-await))
(libraries cstruct hmap lwt-dllist fmt optint domain-local-await eio.runtime_events))
10 changes: 5 additions & 5 deletions lib_eio/core/fiber.ml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ let fork ~sw f =
Switch.with_op sw @@ fun () ->
match f () with
| () ->
Trace.resolve (Cancel.Fiber_context.tid new_fiber) ~ex:None
Trace.resolve (Cancel.Fiber_context.tid new_fiber)
| exception ex ->
Switch.fail sw ex; (* The [with_op] ensures this will succeed *)
Trace.resolve (Cancel.Fiber_context.tid new_fiber) ~ex:(Some ex)
Trace.resolve_error (Cancel.Fiber_context.tid new_fiber) ex
) (* else the fiber should report the error to [sw], but [sw] is failed anyway *)

let fork_daemon ~sw f =
Expand All @@ -35,13 +35,13 @@ let fork_daemon ~sw f =
match f () with
| `Stop_daemon ->
(* The daemon asked to stop. *)
Trace.resolve (Cancel.Fiber_context.tid new_fiber) ~ex:None
Trace.resolve (Cancel.Fiber_context.tid new_fiber)
| exception Cancel.Cancelled Exit when not (Cancel.is_on sw.cancel) ->
(* The daemon was cancelled because all non-daemon fibers are finished. *)
Trace.resolve (Cancel.Fiber_context.tid new_fiber) ~ex:None
Trace.resolve (Cancel.Fiber_context.tid new_fiber)
| exception ex ->
Switch.fail sw ex; (* The [with_daemon] ensures this will succeed *)
Trace.resolve (Cancel.Fiber_context.tid new_fiber) ~ex:(Some ex)
Trace.resolve_error (Cancel.Fiber_context.tid new_fiber) ex
) (* else the fiber should report the error to [sw], but [sw] is failed anyway *)

let fork_promise ~sw f =
Expand Down
2 changes: 1 addition & 1 deletion lib_eio/core/promise.ml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ let resolve t v =
| Resolved _ -> invalid_arg "Can't resolve already-resolved promise"
| Unresolved b as prev ->
if Atomic.compare_and_set t.state prev (Resolved v) then (
Trace.resolve t.id ~ex:None;
Trace.resolve t.id;
Broadcast.resume_all b
) else (
(* Otherwise, the promise was already resolved. Retry (to get the error). *)
Expand Down
25 changes: 14 additions & 11 deletions lib_eio/core/single_waiter.ml
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,17 @@ let create () = { wake = ignore }
let wake t v = t.wake v

let await t id =
Suspend.enter @@ fun ctx enqueue ->
Cancel.Fiber_context.set_cancel_fn ctx (fun ex ->
t.wake <- ignore;
enqueue (Error ex)
);
t.wake <- (fun x ->
Cancel.Fiber_context.clear_cancel_fn ctx;
t.wake <- ignore;
Trace.read ~reader:id ctx.tid;
enqueue x
)
let x =
Suspend.enter @@ fun ctx enqueue ->
Cancel.Fiber_context.set_cancel_fn ctx (fun ex ->
t.wake <- ignore;
enqueue (Error ex)
);
t.wake <- (fun x ->
Cancel.Fiber_context.clear_cancel_fn ctx;
t.wake <- ignore;
enqueue x
)
in
Trace.read id;
x
2 changes: 1 addition & 1 deletion lib_eio/core/switch.ml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ let combine_exn ex = function
let fail ?(bt=Printexc.get_raw_backtrace ()) t ex =
check_our_domain t;
if t.exs = None then
Trace.resolve t.id ~ex:(Some ex);
Trace.resolve_error t.id ex;
t.exs <- Some (combine_exn (ex, bt) t.exs);
try
Cancel.cancel t.cancel ex
Expand Down
Loading

0 comments on commit 6e6ae31

Please sign in to comment.