Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable anchors in links to implementation. #993

Merged
merged 43 commits into from
Sep 18, 2023

Conversation

panglesd
Copy link
Collaborator

Instead of relying on internals of the compiler to generate anchors in source code rendering, this PR makes "semantic" anchors; thus improving stability between OCaml versions/changes in the implementation.

It is done in the following way:

  • For local identifiers (unqualified identifiers):
    • For definition, we generate an id from the name and the position, since names might not be unique. Example of such id: local_elt_7165
    • we build a table from Ident.t to id,
    • that we use for occurrences (since occurrences don't have access to the location of the definition, an information needed to generate the anchor)
  • For global identifiers (that have a Uid.t)
    • When traversing the typedtree, we record the parent structure to generate semantic id from the names. Example of such id: module-Reference.value-render_resolved.
    • We build a table loc_to_id that we combine with the uid_to_loc table to have a uid_to_id table, and store this table in .odoc files.
    • After resolving shapes, we use the corresponding table to get the id and the link right for occurrences.

Note: This PR is extracted from https://github.com/jonludlam/odoc/tree/source-locations-3, leaving the "Global jump" part aside. It contains some refactoring.

test/sources/source.t/run.t Outdated Show resolved Hide resolved
test/sources/source.t/run.t Outdated Show resolved Hide resolved
src/loader/implementation.ml Outdated Show resolved Hide resolved
src/syntax_highlighter/syntax_highlighter.ml Outdated Show resolved Hide resolved
src/loader/ident_env.cppo.ml Outdated Show resolved Hide resolved
src/loader/ident_env.cppo.ml Show resolved Hide resolved
@panglesd
Copy link
Collaborator Author

Thanks @Julow for the review!

I've added one "nontrivial" commit: it seems that functor parameter do have a Uid in the uid_to_loc table, so I added that to the traverse. But I had to change a bit the environment so that ids for functor parameter are only created once.

@panglesd panglesd force-pushed the semantic-anchors branch 2 times, most recently from 4766011 to 37bc8de Compare August 31, 2023 07:21
Copy link
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few remarks but otherwise this looks ready to be merged :)

src/loader/shape_.mli Outdated Show resolved Hide resolved
in
defs @ poses

let string_of_full_name_ty : Odoc_model.Paths.Identifier.full_name_ty -> string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have strings for the different kinds and they are different: https://github.com/ocaml/odoc/blob/master/src/document/url.ml#L109, https://github.com/ocaml/odoc/blob/master/src/xref2/ref_tools.ml#L60 and https://github.com/ocaml/odoc/blob/master/src/model/reference.ml#L33.

It would be weird to see different strings for the same things in the generated doc and in future uses of the new full_name function.

Copy link
Collaborator Author

@panglesd panglesd Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed that in 7804bc0 if you want to have a look!
Basically, I removed full_name (in any case, it did not included enough information to generate unique anchors), and reused the Odoc_document.Url.Anchor.kind.

src/model/paths_types.ml Outdated Show resolved Hide resolved
src/loader/shape_.mli Outdated Show resolved Hide resolved
@panglesd
Copy link
Collaborator Author

panglesd commented Sep 5, 2023

I added support for having anchors to constructors, but they will only get one when ocaml/ocaml#12508 is merged (more specifically, ocaml/ocaml@6c1fa73).

@panglesd panglesd force-pushed the semantic-anchors branch 2 times, most recently from fdb99d9 to 23ec596 Compare September 6, 2023 08:43
@jonludlam
Copy link
Member

I know I suggested the foo_<location> syntax for anchors for local identifiers, but I'm now wondering whether this is actually stable in the presence of line-ending normalization on windows? @dra27?

| DefJmp of Shape.Uid.t

module Analysis = struct
let ( @ ) = List.rev_append
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the redefinition of @ to be semantically different from the usual @! It's confusing.

Also I'm not sure that we care too much about the ordering of things do we? There are a bunch of List.revs below that can be removed if we don't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can turn the "lift" into a "fold" and carry an accumulator around, that would be the best option.

Yes, the order does not matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned the traverse into a fold with an accumulator in 9361f0a. Hopefully it's better like that, with less list concatenations!

jonludlam and others added 16 commits September 13, 2023 22:00
These should only be referred to by elements on that same source page.
This is in contrast with the `SourceLocation identifier, which uses an
anchor found using the Shapes API, and can refer across source pages.

This commit introduces the constructor but it is currently unused.
This fixes the problem of having anchors that are too dependent on
compiler internals by naming the anchor after the position in the
source of the particular value.
This is a general function that is intended to be used for naming specific
identifiers uniquely.
Signed-off-by: Paul-Elliot <[email protected]>
Co-authored-by: Paul-Elliot <[email protected]>
"Implementation loading" and "shape resolution" go to their own module.
Cleaned up some test

Signed-off-by: Paul-Elliot <[email protected]>
They get an entry in the `uid_to_loc` table, so we need to add them to the
`loc_to_ident` table as well for semantic anchor to work.

Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
panglesd and others added 11 commits September 13, 2023 22:08
Co-authored-by: Jules Aguillon <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
The anchors are different, since source code anchors needs the full path, but at
least the qualification is similar.

Replaces `full_name` which did not included enough information to disambiguate
functor parameters.

Signed-off-by: Paul-Elliot <[email protected]>
This will allow to have correct anchors to constructors, as soon as they are
added to the `uid_to_loc` table.

Signed-off-by: Paul-Elliot <[email protected]>
for more self-explanatoriness

Signed-off-by: Paul-Elliot <[email protected]>
Now that anchors are semantic, they should not depend on the OCaml
version.
Also simplify the interfaces of the loader a bit.
@jonludlam jonludlam mentioned this pull request Sep 14, 2023
|> handle_warnings ~input_warnings ~warnings_options
>>= fun (m, warnings) ->
Odoc_file.save_unit output ~warnings (m, shape);
let m = {m with Odoc_model.Lang.Compilation_unit.shape = None} in
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it make more sense to modify the compilation unit record only in the link_unit function?
When a record is modified in many place, I find it difficult to have an idea of which field will go into the file...

Moreover, it would be nice to add a small comment, just in case it is not clear later (shapes are not used after the link phase).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was aiming for 'as close as possible to the save function' - the comment is a good idea.

@panglesd
Copy link
Collaborator Author

I know I suggested the foo_ syntax for anchors for local identifiers, but I'm now wondering whether this is actually stable in the presence of line-ending normalization on windows?

I tested and yes, the "same" file with windows and unix ending won't have the same locations. (One could argue that it's not really the same file...)

We cannot use filename/line number/pos_bol since the anchor might not be unique, due to line directives.
Maybe the best option is to use our own counter for those anchors!

panglesd and others added 2 commits September 15, 2023 15:08
So that adding some content at the end of the file does not modify
anchors that are above.

Signed-off-by: Paul-Elliot <[email protected]>
Copy link
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot has changed in a small amount of time making the review difficult.
But I think this is ready to merge!

the compiler and load it in odoc x.y compiled with a different
version of the compiler. This is an important use case for
voodoo. *)
let m = { m with Odoc_model.Lang.Compilation_unit.shape = None } in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And uid_to_id as well ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh yes, I had originally thought that wouldn't be needed but you're right, we do need to sort that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I'm rather more nervous about putting an empty map in there than a None. Might need some more options...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fixed, I think!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice :)

save an odocl file with odoc x.y compiled with one version of
the compiler and load it in odoc x.y compiled with a different
version of the compiler. This is an important use case for
voodoo. *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these assumptions are true. Perhaps just say that we avoid adding more constraint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded it.

@@ -45,24 +42,44 @@ exception Not_an_interface

exception Make_root_error of string

#if OCAML_VERSION >= (4, 14, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cppo contamination is not needed, this function could be moved to Implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice idea, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dra27
Copy link
Member

dra27 commented Sep 17, 2023

I know I suggested the foo_ syntax for anchors for local identifiers, but I'm now wondering whether this is actually stable in the presence of line-ending normalization on windows?

I tested and yes, the "same" file with windows and unix ending won't have the same locations. (One could argue that it's not really the same file...)

We cannot use filename/line number/pos_bol since the anchor might not be unique, due to line directives. Maybe the best option is to use our own counter for those anchors!

Does the compiler keep actual locations as well? Not looked, but presumably Merlin uses actual locations (insensitive to #line directives)?

In general, the key thing for being LF/CRLF compatible is to ensure that the No measurement can ever span a new line. So pos_bol is unusable but pos_cnum is fine. Similarly, for spans you need to start lnum+cnum and end lnum+cnum rather than just end offset.

@jonludlam
Copy link
Member

OK, I think any other issues with this are minor and can be fixed up in follow-up PRs. In it goes!

@jonludlam jonludlam merged commit 86f33b1 into ocaml:master Sep 18, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants