-
Notifications
You must be signed in to change notification settings - Fork 94
Dev meeting 2020 05 19
First dev meeting, concentrating on roadmap to replace ocamldoc.
Present:
- Jon Ludlam (@jonludlam)
- Anil Madhavapeddy (@avsm)
- Anton Bachin (@antron)
- Gabriel Radanne (@drup)
- Florian Angeletti (@octachron)
- Thomas Refis (@trefis)
- Leo White (@lpw25)
- Jules Aguillon (@julow)
Anton (@antron) is stepping down as the current chief maintainer of Odoc, and Jon Ludlam (@jonludlam) will take over.
Anton discussed the current state of the odoc codebase. The maintainability needs to be increased. It's difficult to make changes to some places in the code, in particular the model and the html frontend. The parser is in a much better shape. We need a better test framework (similar to Dune's), including a decent end-to-end test that checks all output files. We currently have coverage stats for the parser, but not for the model or html output. The html layer has recently been hugely improved from a maintainability point of view by @drup, and a replacement for the current model layer is also being prepared (@jonludlam and @julow). We have bisect-ppx coverage for the parser, and this should be extended to the html and model parts too. We would also benefit from more unit tests and tests of the command line.
The model layer is responsible for expansions, where the signature of modules and module types are computed. For example, given the code
module type X = Y with type t = int
odoc can compute the signature of X
. It is also responsible for
the resolution of paths. For example, given the code
type t = X(Y).t
odoc will compute where to link to when you click on X(Y).t
and also
what text the link will have (e.g. replacing hidden modules containing
double underscores with their canonical path). The semantics are very
complex to match the language, and it amounts to a pretty complete
reimplementation of the module system in the core OCaml system. It
has extra bits: it handles canonical paths better than the ocaml type
checker (avoids the hidden underscores present in many projects).
When run on a complex codebase like Base or Core_kernel, it works
better than currently released odoc, but doesn't quite cover the last
1% of cases.
Two corner cases not done yet:
-
include module type of struct include X
(used for strengthening and extending existing modules), complicated due to possibility of shadowing previous definitions. -
module type X = Y with module M = N
introduces a new alias, and this alias is not currently being propagated. (Leo: introducing aliases is what OCaml should do, but doesnt at the moment.)
The other main area that needs finishing is to complete the resolution of all other types of references, such as record fields or exceptions and so on, which is currently being tackled by @julow. This will complete the coverage. Once this is done, we can make a PR to odoc with the new model.
The bulk of the testing has been done by using Jane Street's Base
,
and once this was looking good we moved on to Core_kernel
. Leo has
also been helping with the testing of this on JS's internal
libraries. @drup has also suggested tyxml as a useful library that has
good coverage of OCaml's type system. This has been helpful for the
development of the new model but we need to extract from this a useful
test suite.
Performance is now reasonable (1m30s for Core_kernel and all of its dependencies on a laptop). It can be faster but is reasonable now compared with the exisitng model. It has been observed to throw Out_of_memory on JS codebase.
@drup has recently contributed a replacement to the html renderer. The previous implementation required the developer to have both the semantic and rendering logic in mind when modifying it, which made it quite complex. This has now been separated via an intermediate documentation representation, such that the model is now translated to the document representation and then this in turn is then transformed into a concrete output. This output could be HTML but could also be something else, and importantly someone who does not know the detailed semantics of modules can now write an output.
The types definitions for the document IR are available here:
https://github.com/ocaml/odoc/blob/master/src/document/types.ml
This is very similar to the agnostic format found in (e.g.) Pandoc, with the notion of blocks and inline elements. There are two renderers that can output OCaml and ReasonML, but we can eventually just have one renderer to avoid code duplication. After the document IR was merged @drup added a new man page renderer without difficulty. We need to have a new LaTeX renderer as ocamldoc has it and people rely on it. This will be more challenging as links on paper don't work like links on a webpage, so things like submodules will want to be rendered inline. To some extent this problem has already been addressed in the man page renderer. @octachron has vounteered to do this work which will be helpful in increasing the number of people familiar with the code.
After this, users would like a Docusaurus output. They have suggested a json output, but there is some ongoing discussions about this.
There was discussion about how to make odoc more modular such that new output formatters could be added without modifying the core of odoc. @drup suggested a driver model similar to ppx - pointing out that dune already has support for this style of processing. @avsm asked whether we need this complexity for a 1.0 release, though @drup pointed out it's not going to change much. @lpw25 argued that it's hard to design an extensible system, and that much effort could be put in to find that it's hardly used in practice. He suggested waiting for some users to contribute the code. @drup said it was a two-phase process, first we factor out the changes internally then externally. @anton mentioned that the repo is quite close already and that the datatypes we exchange are clean internally. @jonludlam pointed out that we don't just produce documents and that we may want to be producing output for code search for example for merlin. @avsm also mentioned vscode as another output format with specialised html to fit in with its UI framework. The html-fragments output was then discussed, and @antron said that @dbuenzli is using the fragments in odig.
We need to make a release of Odoc that supports OCaml 4.11. There is a very small patch in master that does this but we would need to complete the css work related to the new document code. The suggestion is to release from a 1.5.0 branch as the 4.11 compatibility patch is so small.
There are several issues that need addressing on the road to replacing ocamldoc.
We need to consider the use of plugins in ocamldoc - these tend to be tricky to find as they often aren't published on opam. Ocsigen uses one to output wiki creole. xapi-project may be using one to do the documentation for the xen api - @jonludlam to check with @robhoes. Bap also has one (@ivg). @avsm volunteered to post on discuss to find users and to give warning that ocamldoc will be deprecated.
@lpw25 has been working on some patches (to pre-unified odoc) that will help with the --use-code flag. The patches are a pre-requisite to doing decent code search, which will require ranking based on usage (hence analysing ml rather than mli files). Someone is needed to finish this work. Requires cmt files, which can be many gigs in size - therefore not built by default but only as a dependency of the @doc target in dune. There was a discussion about whether ocamlformat should be used to format the code prior to inclusion in the html. @lpw25 was strongly against doing this without at least an opt out. @avsm pointed out that we need some mechanism to handle custom epub outputs with different line wrapping and described the difficulties that had been observed with Real World OCaml. We need a bidirectional ocamldoc syntax <-> markdown bridge. @avsm to put this on the RWO tasklist. He also pointed out the need for an lsp mode to help with writing ocamldoc.
Currently odoc is driven by either odig or dune, as it is harder to sequence the commands than ocamldoc. What will we do for running it to produce the ocaml manual? @octachron suggested simply extending the makefiles with the new rules. There was some discussion around using dune, and it was pointed out that building the manual already has more dependencies than building the binaries.
There was then some discussion on whether odoc could drive itself. It's not practical yet. There's not enough information in the cmt(i) files to completely recreate the dependency graph and we'd need to replicate a lot of the build description in the inputs to odoc. @anton reported that Odig has some hacks in for similar reasons. We discussed whether it might work for a simple project, but the set of things for which this would work is very restrictive - for example, the standard library. The consensus was that there is no immediate need right now for this.
No. We will keep ocamldoc in a separate repository similar to camlp4. There's no need to preserve the old CLI driver and no attempt will be made to preserve CLI compatibility. Ocamldoc has some dependence on the ASTs in ocaml.git, but @lpw25 thought we might be able to minimize this. @anton pointed out that it's far less invasive to replace the documentation generator for code than it was to move from camlp4 to ppx, so we don't expect the out-of-tree ocamldoc to live as long as camlp4 has.
tl;dr roadmap for Odoc:
- Anything absolutely essential for replacing ocamldoc
- Everything else
- Mdx parsing/promoting fragments in doc comments.
@avsm keen to have instead a syntax for promoting external files into doc comments. @lpw25 sure that people will want to write toplevel fragments inline and have mdx test/promote within them. Editor support (including merlin) most important property that @avsm cares about. Need to add toplevel support to merlin's roadmap.
- Regenerating docs.mirage.io
@avsm reported difficulties with odig not producing index files when
generating docs.mirage.io. @jonludlam said he hadn't noticed any
problems with the current version of odig and the new model branch.
@avsm asked about the status of running dune build @doc
in a duniverse.
Works in new model tree, though we won't get the stdlib documentation.
Fixing this in dune requires functionality similar to jsoo - it will
run odoc on the installed external libraries and write the odoc files
into the _build tree, similar to how js_of_ocaml is run on installed
libraries and the resulting cmjs file written into _build. A start
was made on this during the dune retreat, more work required.
@ulrikstrid was interested, see https://github.com/ocaml/dune/issues/3436
@avsm to investigate further why odig wasn't working.
@lpw25 reported this issue has been blocking him. @jonludlam pleaded to wait for the new model as much time had been sunk into the issue without success.
Actions:
- @avsm to post on discuss about plans for replacing ocamldoc plugins
- @avsm to investigate odig issue on docs.mirage.io
- @jonludlam to start a definitive list of what's required to replace ocamldoc
- @jonludlam to cherry-pick 4.11 fix onto 1.5 release branch and make release
- @avsm to add toplevel support to merlin's roadmap
- @jonludlam to find out status of dune build @doc supporting external libs
- @lpw25 to dig out the branches relevant to --keep-code