forked from ocaml/odoc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4 from art-w/jsoo
Jsoo sherlodoc
- Loading branch information
Showing
132 changed files
with
9,268 additions
and
1,719 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,3 +17,5 @@ _build/ | |
_doc/ | ||
_coverage/ | ||
_opam/ | ||
**/perf.data | ||
**/perf.data.old |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,9 @@ | ||
version = 0.24.1 | ||
version = 0.26.1 | ||
profile = janestreet | ||
let-binding-spacing = compact | ||
sequence-style = separator | ||
doc-comments = after-when-possible | ||
exp-grouping = preserve | ||
break-cases = toplevel | ||
break-separators = before | ||
cases-exp-indent = 4 | ||
cases-matching-exp-indent = normal | ||
if-then-else = keyword-first | ||
parens-tuple = multi-line-only | ||
type-decl = sparse | ||
field-space = loose | ||
space-around-arrays = true | ||
space-around-lists = true | ||
space-around-records = true | ||
dock-collection-brackets = false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{ | ||
"ocaml.sandbox": { | ||
"kind": "opam", | ||
"switch": "sherlodoc" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,91 @@ | ||
**Try it online at [doc.sherlocode.com](https://doc.sherlocode.com) !** | ||
|
||
A rough prototype of a Hoogle-like search engine for OCaml documentation. It's full of bugs and todos, but works well enough for my purpose: Perhaps it will be useful to you too. | ||
- The fuzzy type search is supported by a polarity search. As an example, the type `string -> int -> char` gets simplified to `{ -string, -int, +char }` which means that it consumes a `string` and an `int` and produces a `char` (irrespective of the order of the arguments). This yields good candidates which are then sorted by similarity with the query. | ||
- The real magic is all the package documentation generated for [`ocaml.org/packages`](https://ocaml.org/packages), which I got my hands on thanks to insider trading (but don't have the bandwidth to share back... sorry!) | ||
Sherlodoc is a search engine for OCaml documentation (inspired by [Hoogle](https://hoogle.haskell.org/)), which allows you to search through OCaml libraries by names and approximate type signatures: | ||
|
||
- Search by name: [`list map`](https://doc.sherlocode.com/?q=list%20map) | ||
- Search inside documentation comments: [`raise Not_found`](https://doc.sherlocode.com/?q=raise%20Not_found) | ||
- Fuzzy type search is introduced with a colon, e.g. [`: map -> list`](https://doc.sherlocode.com/?q=%3A%20map%20-%3E%20list) | ||
- Search by name and type with a colon separator [`Bogue : Button.t`](https://doc.sherlocode.com/?q=Bogue%20%3A%20Button.t) | ||
- An underscore `_` can be used as a wildcard in type queries: [`(int -> _) -> list -> _`](https://doc.sherlocode.com/?q=(int%20-%3E%20_)%20-%3E%20list%20-%3E%20_) | ||
- Type search supports products and reordering of function arguments: [`array -> ('a * int -> bool) -> array`](https://doc.sherlocode.com/?q=%3A%20array%20-%3E%20(%27a%20*%20int%20-%3E%20bool)%20-%3E%20array) | ||
|
||
## Local usage | ||
|
||
First, install sherlodoc and odig: | ||
|
||
```bash | ||
$ opam pin add 'https://github.com/art-w/sherlodoc.git' # optional | ||
|
||
$ opam install sherlodoc odig | ||
``` | ||
|
||
[Odig](https://erratique.ch/software/odig) can generate the odoc documentation of your current switch with: | ||
|
||
```bash | ||
$ odig odoc # followed by `odig doc` to browse your switch documentation | ||
``` | ||
|
||
Which sherlodoc can then index to create a search database: | ||
|
||
```bash | ||
# name your sherlodoc database | ||
$ export SHERLODOC_DB=/tmp/sherlodoc.marshal | ||
|
||
# if you are using OCaml 4, we recommend the `ancient` database format: | ||
$ opam install ancient | ||
$ export SHERLODOC_DB=/tmp/sherlodoc.ancient | ||
|
||
# index all odoc files generated by odig for your current switch: | ||
$ sherlodoc index $(find $OPAM_SWITCH_PREFIX/var/cache/odig/odoc -name '*.odocl') | ||
``` | ||
$ opam install --deps-only ./sherlodoc.opam | ||
# Note: your odoc version must match your odocl files | ||
|
||
# To index all the odocl files in `/path/to/doc`: | ||
$ dune exec -- ./index/index.exe /path/to/doc /path/to/result.db | ||
# `/path/to/doc` should contain a hierarchy of subfolders `libname/1.2.3/**/*.odocl` | ||
# `result.db` will be created or replaced | ||
Enjoy searching from the command-line or run the webserver: | ||
|
||
# To run the website: | ||
$ dune exec -- ./www/www.exe /path/to/result.db | ||
22.10.22 17:17:33.102 Running at http://localhost:1234 | ||
```bash | ||
$ sherlodoc search "map : list" | ||
$ sherlodoc search # interactice cli | ||
|
||
$ opam install dream | ||
$ sherlodoc serve # webserver at http://localhost:1234 | ||
``` | ||
|
||
The different commands support a `--help` argument for more details/options. | ||
|
||
In particular, sherlodoc supports three different file formats for its database, which can be specified either in the filename extension or through the `--db-format=` flag: | ||
- `ancient` for fast database loading using mmap, but is only compatible with OCaml 4. | ||
- `marshal` for when ancient is unavailable, with slower database opening. | ||
- `js` for integration with odoc static html documentation for client-side search without a server. | ||
|
||
## Integration with Odoc | ||
|
||
Odoc 2.4.0 adds a search bar inside the statically generated html documentation. [Integration with dune is in progress](https://github.com/ocaml/dune/pull/9772), you can try it inside a fresh opam switch with: (warning! this will recompile any installed package that depends on dune!) | ||
|
||
```bash | ||
$ opam pin https://github.com/emileTrotignon/dune.git#search-odoc-new | ||
|
||
$ dune build @doc # in your favorite project | ||
``` | ||
|
||
Otherwise, manual integration with odoc requires to add to every call of `odoc html-generate` the flags `--search-uri sherlodoc.js --search-uri db.js` to activate the search bar. You'll also need to generate a search database `db.js` and provide the `sherlodoc.js` dependency (a version of the sherlodoc search engine with odoc support, compiled to javascript): | ||
|
||
```bash | ||
$ sherlodoc index --db=_build/default/_doc/_html/YOUR_LIB/db.js \ | ||
$(find _build/default/_doc/_odocls/YOUR_LIB -name '*.odocl') | ||
|
||
$ sherlodoc js > _build/default/_doc/_html/sherlodoc.js | ||
``` | ||
|
||
## How it works | ||
|
||
The sherlodoc database uses [Suffix Trees](https://en.wikipedia.org/wiki/Suffix_tree) to search for substrings in value names, documentation and types. During indexation, the suffix trees are compressed to state machine automatas. The children of every node are also sorted, such that a sub-tree can be used as a priority queue during search enumeration. | ||
|
||
To rank the search results, sherlodoc computes a static evaluation of each candidate during indexation. This static scoring biases the search to favor short names, short types, the presence of documentation, etc. When searching, a dynamic evaluation dependent on the user query is used to adjust the static ordering of the results: | ||
|
||
- How similar is the result name to the search query? (to e.g. prefer results which respect the case: [`map`](https://doc.sherlocode.com/?q=map) vs [`Map`](https://doc.sherlocode.com/?q=Map)) | ||
- How similar are the types? (using a tree diff algorithm, as for example [`('a -> 'b -> 'a) -> 'a -> 'b list -> 'a`](https://doc.sherlocode.com/?q=(%27a%20-%3E%20%27b%20-%3E%20%27a)%20-%3E%20%27a%20-%3E%20%27b%20list%20-%3E%20%27a) and [`('a -> 'b -> 'b) -> 'a list -> 'b -> 'b`](https://doc.sherlocode.com/?q=(%27a%20-%3E%20%27b%20-%3E%20%27b)%20-%3E%20%27a%20list%20-%3E%20%27b%20-%3E%20%27b) are isomorphic yet point to `fold_left` and `fold_right` respectively) | ||
|
||
For fuzzy type search, sherlodoc aims to provide good results without requiring a precise search query, on the basis that the user doesn't know the exact type of the things they are looking for (e.g. [`string -> file_descr`](https://doc.sherlocode.com/?q=string%20-%3E%20file_descr) is incomplete but should still point in the right direction). In particular when exploring a package documentation, the common question "how do I produce a value of type `foo`" can be answered with the query `: foo` (and "which functions consume a value of type `bar`" with `: bar -> _`). This should also work when the type can only be produced indirectly through a callback (for example [`: Eio.Switch.t`](https://doc.sherlocode.com/?q=%3A%20Eio.Switch.t) has no direct constructor). To achieve this, sherlodoc performs a type decomposition based on the polarity of each term: A value produced by a function is said to be positive, while an argument consumed by a function is negative. This simplifies away the tree shape of types, allowing their indexation in the suffix trees. The cardinality of each value type is also indexed, to e.g. differentiate between [`list -> list`](https://doc.sherlocode.com/?q=list%20-%3E%20list) and [`list -> list -> list`](https://doc.sherlocode.com/?q=list%20-%3E%20list%20-%3E%20list). | ||
|
||
While the polarity search results are satisfying, sherlodoc offers very limited support for polymorphic variables, type aliases and true type isomorphisms. You should check out the extraordinary [Dowsing](https://github.com/Drup/dowsing) project for this! | ||
|
||
And if you speak French, a more detailed [presentation of Sherlodoc](https://www.irill.org/videos/OUPS/2023-03/wendling.html) (and [Sherlocode](https://sherlocode.com)) was given at the [OCaml Users in PariS (OUPS)](https://oups.frama.io/) in March 2023. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
(ocamllex unescape) | ||
|
||
(executable | ||
(name main) | ||
(public_name sherlodoc) | ||
(package sherlodoc) | ||
(libraries | ||
cmdliner | ||
index | ||
query | ||
db_store | ||
unix | ||
(select | ||
serve.ml | ||
from | ||
(www -> serve.available.ml) | ||
(!www -> serve.unavailable.ml))) | ||
(preprocess | ||
(pps ppx_blob)) | ||
(preprocessor_deps ../jsoo/sherlodoc.js)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
let guess_db_format db_format db_filename = | ||
match db_format with | ||
| Some db_format -> db_format | ||
| None -> begin | ||
let ext = Filename.extension db_filename in | ||
let ext_len = String.length ext in | ||
let ext = if ext_len = 0 then ext else String.sub ext 1 (ext_len - 1) in | ||
try List.assoc ext Db_store.available_backends with | ||
| Not_found -> | ||
Format.fprintf | ||
Format.err_formatter | ||
"Unknown db format extension %S (expected: %s)@." | ||
ext | ||
(String.concat ", " @@ List.map fst Db_store.available_backends) ; | ||
exit 1 | ||
end | ||
|
||
open Cmdliner | ||
|
||
let db_format = | ||
let env = | ||
let doc = "Database format" in | ||
Cmd.Env.info "SHERLODOC_FORMAT" ~doc | ||
in | ||
let kind = Arg.enum Db_store.available_backends in | ||
Arg.(value & opt (some kind) None & info [ "format" ] ~docv:"DB_FORMAT" ~env) | ||
|
||
let db_filename = | ||
let env = | ||
let doc = "The database to query" in | ||
Cmd.Env.info "SHERLODOC_DB" ~doc | ||
in | ||
Arg.(required & opt (some string) None & info [ "db"; "o" ] ~docv:"DB" ~env) | ||
|
||
let db_path = | ||
let env = | ||
let doc = "The database to query" in | ||
Cmd.Env.info "SHERLODOC_DB" ~doc | ||
in | ||
Arg.(required & opt (some file) None & info [ "db" ] ~docv:"DB" ~env) | ||
|
||
let with_db fn db_path = | ||
let apply fn db_format db_filename = | ||
let db_format = guess_db_format db_format db_filename in | ||
fn db_format db_filename | ||
in | ||
Term.(const apply $ fn $ db_format $ db_path) | ||
|
||
let cmd_search = | ||
let info = Cmd.info "search" ~doc:"Command-line search" in | ||
Cmd.v info (with_db Search.term db_path) | ||
|
||
let cmd_index = | ||
let doc = "Index odocl files to create a Sherlodoc database" in | ||
let info = Cmd.info "index" ~doc in | ||
Cmd.v info (with_db Index.term db_filename) | ||
|
||
let cmd_serve = | ||
let doc = "Webserver interface" in | ||
let info = Cmd.info "serve" ~doc in | ||
Cmd.v info (with_db Serve.term db_path) | ||
|
||
let cmd_jsoo = | ||
let doc = "For dune/odoc integration, sherlodoc compiled as javascript" in | ||
let info = Cmd.info "js" ~doc in | ||
let target = | ||
let doc = "Name of the file to create" in | ||
Arg.(value & pos 0 string "" & info [] ~docv:"QUERY" ~doc) | ||
in | ||
let emit_js_dep filename = | ||
let close, h = if filename = "" then false, stdout else true, open_out filename in | ||
output_string h [%blob "jsoo/sherlodoc.js"] ; | ||
if close then close_out h | ||
in | ||
Cmd.v info Term.(const emit_js_dep $ target) | ||
|
||
let cmd = | ||
let doc = "Sherlodoc" in | ||
let info = Cmd.info "sherlodoc" ~doc in | ||
Cmd.group info [ cmd_search; cmd_index; cmd_serve; cmd_jsoo ] | ||
|
||
let () = exit (Cmd.eval cmd) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
let header = | ||
{|Sherlodoc v0.2 -- search OCaml documentation by name and type (use CTRL-D to exit)|} | ||
|
||
let string_of_kind = | ||
let open Db.Entry.Kind in | ||
function | ||
| Doc -> "doc" | ||
| Type_decl _ -> "type" | ||
| Module -> "mod" | ||
| Exception _ -> "exn" | ||
| Class_type -> "class" | ||
| Method -> "meth" | ||
| Class -> "class" | ||
| Type_extension -> "type" | ||
| Extension_constructor _ -> "cons" | ||
| Module_type -> "sig" | ||
| Constructor _ -> "cons" | ||
| Field _ -> "field" | ||
| Val _ -> "val" | ||
|
||
let print_result ~print_cost ~no_rhs (elt : Db.Entry.t) = | ||
let cost = if print_cost then string_of_int elt.cost ^ " " else "" in | ||
let typedecl_params = | ||
(match elt.kind with | ||
| Type_decl args -> args | ||
| _ -> None) | ||
|> Option.map (fun str -> str ^ " ") | ||
|> Option.value ~default:"" | ||
in | ||
let kind = elt.kind |> string_of_kind |> Unescape.string in | ||
let name = Unescape.string elt.name in | ||
let pp_rhs h = function | ||
| None -> () | ||
| Some _ when no_rhs -> () | ||
| Some rhs -> Format.fprintf h "%s" (Unescape.string rhs) | ||
in | ||
Format.printf "%s%s %s%s%a@." cost kind typedecl_params name pp_rhs elt.rhs | ||
|
||
let search ~print_cost ~static_sort ~limit ~db ~no_rhs ~pretty_query ~time query = | ||
let query = Query.{ query; packages = []; limit } in | ||
if pretty_query then print_endline (Query.pretty query) ; | ||
let t0 = Unix.gettimeofday () in | ||
let r = Query.Blocking.search ~shards:db ~dynamic_sort:(not static_sort) query in | ||
let t1 = Unix.gettimeofday () in | ||
match r with | ||
| [] -> print_endline "[No results]" | ||
| _ :: _ as results -> | ||
List.iter (print_result ~print_cost ~no_rhs) results ; | ||
flush stdout ; | ||
if time then Format.printf "Search in %f@." (t1 -. t0) | ||
|
||
let rec search_loop ~print_cost ~no_rhs ~pretty_query ~static_sort ~limit ~time ~db = | ||
Printf.printf "%ssearch>%s %!" "\027[0;36m" "\027[0;0m" ; | ||
match Stdlib.input_line stdin with | ||
| query -> | ||
search ~print_cost ~static_sort ~limit ~db ~no_rhs ~pretty_query ~time query ; | ||
search_loop ~print_cost ~no_rhs ~pretty_query ~static_sort ~limit ~time ~db | ||
| exception End_of_file -> Printf.printf "\n%!" | ||
|
||
let search | ||
query | ||
print_cost | ||
no_rhs | ||
static_sort | ||
limit | ||
pretty_query | ||
time | ||
db_format | ||
db_filename | ||
= | ||
let module Storage = (val Db_store.storage_module db_format) in | ||
let db = Storage.load db_filename in | ||
match query with | ||
| None -> | ||
print_endline header ; | ||
search_loop ~print_cost ~no_rhs ~pretty_query ~static_sort ~limit ~time ~db | ||
| Some query -> | ||
search ~print_cost ~no_rhs ~pretty_query ~static_sort ~limit ~time ~db query | ||
|
||
open Cmdliner | ||
|
||
let limit = | ||
let doc = "The maximum number of results per query" in | ||
Arg.(value & opt int 25 & info [ "limit"; "n" ] ~docv:"N" ~doc) | ||
|
||
let query = | ||
let doc = "The query. If absent, queries will be read interactively." in | ||
Arg.(value & pos 0 (some string) None & info [] ~docv:"QUERY" ~doc) | ||
|
||
let print_cost = | ||
let doc = "For debugging purposes: prints the cost of each result" in | ||
Arg.(value & flag & info [ "print-cost" ] ~doc) | ||
|
||
let print_time = | ||
let doc = "For debugging purposes: prints the search time" in | ||
Arg.(value & flag & info [ "print-time" ] ~doc) | ||
|
||
let static_sort = | ||
let doc = | ||
"Sort the results without looking at the query.\n\ | ||
Enabling it allows to look at the static costs of elements.\n\ | ||
Mainly for testing purposes." | ||
in | ||
Arg.(value & flag & info [ "static-sort" ] ~doc) | ||
|
||
let no_rhs = | ||
let doc = "Do not print the right-hand side of results." in | ||
Arg.(value & flag & info [ "no-rhs"; "no-right-hand-side" ] ~doc) | ||
|
||
let pretty_query = | ||
let doc = "Prints the query itself as it was parsed" in | ||
Arg.(value & flag & info [ "pretty-query" ] ~doc) | ||
|
||
let term = | ||
Term.( | ||
const search | ||
$ query | ||
$ print_cost | ||
$ no_rhs | ||
$ static_sort | ||
$ limit | ||
$ pretty_query | ||
$ print_time) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
val term : (Db_store.db_format -> string -> unit) Cmdliner.Term.t |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
let term = Www.term |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
val term : (Db_store.db_format -> string -> unit) Cmdliner.Term.t |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
let main _ _ = | ||
Format.fprintf | ||
Format.err_formatter | ||
"Webserver unavailable: please install dream and retry.@." | ||
|
||
let term = Cmdliner.Term.const main |
Oops, something went wrong.