-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a marshalled output for index generation #1084
Add a marshalled output for index generation #1084
Conversation
I do not like the |
I must admit I'm quite confused at the difference between *.odoc and *.odocl (are there others) ? |
@Drup In term of command line, you need a single module (cmti ?) to create a |
I think the idea is that In addition to the
Implementation and source trees are produced during the compile phase, and are meant to be linked. Contrary to this, occurrences and indexes are generated after the link phase, so I definitely agree that their extension should not be About the
So, I think we have to options:
I would prefer the first option. |
I know it is quite common in odoc, but at least for files names I really do not like it because it is a weird convention that fails to communicate its intent, so this requires knowledge of odoc to understand. It might also restrict names available for users, in a way that will be unexpected : no one will try to name a page I also do not like that file with the same extension are in fact of a different "type". I think you should be able to trust that the extension tells you what kind of file a file is, and in (1), all the Regarding implementation and sources, it seems to me that they are used in the same context (linking) and as such can share an extension. Their content could even be of the same ocaml type if we wanted to (maybe they are ?). I deeply favor (2), even though I understand that it requires changes out of the scope of this PR. The only benefits for (1) I found in your message is the fact it is done like that in other places in odoc, but as it seems insufficient with regard to the drawbacks expressed above. Maybe there are other benefits I did not consider ? |
The interface changes from the first commit are fine, sherlodoc only uses Fold.unit and Fold.page |
I'm not decided on which is better, but let me still answer to some of you comments
Using
This is not a problem at all. I don't think there can be any issue:
So:
|
(sherlodoc uses |
|
Yes, but it that case, we make up our own convention when a perfectly fine one already exists, so even though it's not unreasonable to ask people to learn things, my impression is that here, we are making it harder when its not necessary.
You are right that it will not cause conflicts, but it could be confusing for people who might be trying to understand what is happening looking at
Yes, I do not mean that they should be used completely interchangeably, this is not possible if two file have different content, but that there exists a context where they are all accepted. This is the case for json (you can always pass a json file to jq), and I hope it is for Regarding consistency, I think that this convention is more appropriate for To me, the main benefit of option (2) is that it makes it clear what you can do with the file, in a way that is understandable with minimal friction. This feel like a way more important information about a file than when it was produced. This is also more important because these files will used by tools external to odoc. I think I have expressed myself fully, maybe a third person could take the final decision if you still disagree after reading this. |
src/odoc/bin/main.ml
Outdated
in | ||
Arg.( | ||
value & opt_all convert_fpath [] | ||
& info ~doc ~docv:"FILE" [ "file-list" ]) | ||
in | ||
let marshall = | ||
let doc = "whether to output a json file, or an .odoc file" in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpick : I think it should be "a .odoc file" (pronounced "a dot odoc file")
I have read the diff of the second commit and it looks good to me aside from a very small nitpick. |
2861f69
to
f3919b0
Compare
205afd9
to
4d28c05
Compare
This has been overhauled significantly, and a re-review would be appropriate. The extension for index files is now .odoc-index, and you give .odocl files with |
a09268a
to
c745cb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the driver part!
It looks good to me. It would be nice to:
- add some parallelism, which is easy here (see eg panglesd@2dadfb5)
- Add a progress bar for indexes, since it can take quite a lot of time
src/driver/compile.ml
Outdated
ignore @@ Bos.OS.Dir.create Fpath.(html_dir / pkgname); | ||
let format = `js in | ||
let inputs = [ Fpath.(odoc_dir / pkgname / "index.odoc-index") ] in | ||
let dst = Fpath.(html_dir / pkgname / "sherlodoc_db.js") in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This filename is hardcoded and reused. It would be better to return it or something like that.
src/driver/compile.ml
Outdated
let sherlodoc ~html_dir ~odoc_dir pkgs = | ||
ignore @@ Bos.OS.Dir.create html_dir; | ||
Sherlodoc.js Fpath.(html_dir / "sherlodoc.js"); | ||
Util.StringMap.iter (sherlodoc_index_one ~html_dir ~odoc_dir) pkgs; | ||
let format = `marshal in | ||
let dst = Fpath.(html_dir / "sherlodoc_db.marshal") in | ||
let inputs = | ||
pkgs |> Util.StringMap.bindings | ||
|> List.map (fun (pkgname, _pkg) -> | ||
Fpath.(odoc_dir / pkgname / "index.odoc-index")) | ||
in | ||
Sherlodoc.index ~format ~inputs ~dst () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several hardcoded path there, which should be returned to avoid having them hardcoded in distinct places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is a good idea to return them, because --search_uri wants relative paths. I have factored them in functions and values and its not repeated now.
src/driver/odoc.ml
Outdated
@@ -1,31 +1,11 @@ | |||
open Bos | |||
open Cmd_outputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not open that globally...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
test/search/html_search.t/run.t
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests and prose around needs to be updated!
odoc-driver.opam
Outdated
@@ -43,6 +43,7 @@ depends: [ | |||
"eio_main" | |||
"progress" | |||
"cmdliner" | |||
"sherlodoc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pinned version is needed, for now, I think.
:: impl | ||
in | ||
Fiber.List.map link compiled |> List.concat | ||
|
||
let odoc_index_path ~odoc_dir pkgname = | ||
Fpath.(odoc_dir / pkgname / "index.odoc-index") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be cleaner to output index files outside of the directory made for .odocl
files. That is:
Fpath.(odoc_dir / pkgname / "index.odoc-index") | |
Fpath.(index_dir / pkgname / "index.odoc-index") |
or
Fpath.(odoc_dir / pkgname / "index.odoc-index") | |
Fpath.(index_dir / pkgname ^ ".odoc-index") |
Although, this is not mandatory. Just resolve the comment if you believe it is better like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :) Made some comments but nothing that can't be changed later.
@@ -511,13 +527,28 @@ module Indexing = struct | |||
value & opt_all convert_fpath [] | |||
& info ~doc ~docv:"FILE" [ "file-list" ]) | |||
in | |||
let include_rec = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this option ? The inputs
term could also accept directories, which removes an exception case and is common among CLI tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, this option will change to -P
and -L
soon: we will use the index for sidebars too, and sidebars require those options.
src/odoc/bin/main.ml
Outdated
"At least one of --file-list or --include-rec or an .odocl file \ | ||
must be passed to odoc compile-index") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already checked in Indexing.compile
but in a slightly different way: If --file-list
points to an empty file, this passes but Indexing.compile
has an other check that do not.
Why shouldn't making an empty index be allowed ? For example, drivers might want to make an index for every packages, even empty or bugous packages. This restriction increases complexity in drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
src/odoc/indexing.ml
Outdated
[] include_rec) | ||
|> List.concat) | ||
in | ||
if files = [] then Error (`Msg "No .odocl files were included") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there's several indirect ways to pass the list of inputs, to reassure driver authors, we could log the number of odocl files considered for indexing:
Indexing N odocl files.
('a, [> msg ]) result | ||
(** This function is exposed for custom indexers that uses [odoc] as a library | ||
to generate their search index *) | ||
|
||
val compile : | ||
[ `JSON | `Marshall ] -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for each formats ? It would be nice to write that down for maintenance in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use cases for json are for external use of indexes, which include using external search engine but is not really limited...
The usecase for the marshalled format is for odoc: currently search, soon table of contents, maybe indices of values/types/... so it is going to evolve.
I'm not sure how much it would help maintenance or hurt it to have the usecases written there, as they are likely going to evolve and the explanation will be outdated. Maybe a simple sentence (`Marshall
is for uses of the index by odoc
, `JSON
is for external uses) would be sufficient?
Standalone documentation comments currently do not have an id. This id was carried as the accumulator of the field, which yielded wrong results! Signed-off-by: Paul-Elliot <[email protected]>
eb16010
to
f0acee4
Compare
The CI is not passing due to the new dependency: https://ocaml.ci.dev/github/ocaml/odoc/commit/f0acee4b79947a0a4ff1643bc675e877045b98a3/variant/debian-12-5.0_opam-2.1 Otherwise, this is good to merge ! |
23cff2f
to
906c96d
Compare
I opened an issue on ocaml-ci : ocurrent/ocaml-ci#954 As soon as it is solved, we should reintroduce the driver's dependency on sherlodoc. |
This PR adds a new marshalled output for the generation of index files.
This is desirable for several reasons:
The first commit fixes a bug in the computation of the ID of the page containing a standalone comment. The second commit is the actual change.
There are some design to discuss:
--marshall
flag to thecompile-index
: by default the output is json, but it is marshalled with this flag.index-
and have to have the.odoc
extension.But I think that is ok.