Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odoc html mld processing introduces spurious <p>s with newlines #55

Closed
dbuenzli opened this issue Jan 5, 2017 · 11 comments · May be fixed by ocaml-doc/doc-ock-html#10
Closed

odoc html mld processing introduces spurious <p>s with newlines #55

dbuenzli opened this issue Jan 5, 2017 · 11 comments · May be fixed by ocaml-doc/doc-ock-html#10
Labels

Comments

@dbuenzli
Copy link
Contributor

dbuenzli commented Jan 5, 2017

The <p> in the output below should not exist at all (they are prettified by xmltrip they contain in fact a single newline):

> cat bla.mld
{%html:<h1>Hey</h1>%}
{!modules: String}
{!modules: String}
> odoc html -o /tmp/ --index-for Bla bla.mld
> cat /tmp/Bla/index.html | xmltrip -indent
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      Bla
    </title>
    <link rel="stylesheet" href="../odoc.css"/>
    <meta charset="utf-8"/>
    <meta name="viewport" content="width=device-width,initial-scale=1.0"/>
    <meta name="generator" content="odoc 8f4e403"/>
  </head>
  <body>
    <h1>
      Hey
    </h1>
    <p>
      

    </p>
    <table class="modules">
      <tr id="listing-unresolved-String" class="anchored">
        <td class="unresolved">
          <a href="#listing-unresolved-String" class="anchor"/>
          String
        </td>
        <td class="doc"/>
      </tr>
    </table>
    <p>
      

    </p>
    <table class="modules">
      <tr id="listing-unresolved-String" class="anchored">
        <td class="unresolved">
          <a href="#listing-unresolved-String" class="anchor"/>
          String
        </td>
        <td class="doc"/>
      </tr>
    </table>
  </body>
</html>
@dbuenzli dbuenzli added the output label Jan 5, 2017
@dbuenzli
Copy link
Contributor Author

dbuenzli commented Jan 5, 2017

Having instead:

{%html:<h1>Hey</h1>%}{!modules: String}{!modules: String}

is a workaround.

@trefis
Copy link
Contributor

trefis commented Jan 6, 2017

Not sure if this is a doc-ock-html, doc-ock or an octavius problem or no problem at all.
Basically what doc-ock-html receives is:

[ Target (Some "html", "<h1>Hey</h1>")
; Raw "\n"
; Special _
; Raw "\n"
; Special _
]

Should doc-ock-html receive this input? Should it treat it the way it does?
I don't really know.
@lpw25 ?

@dbuenzli
Copy link
Contributor Author

dbuenzli commented Jan 6, 2017

FWIW IIRC the same content interpreted in an ocamldoc comment does not introduce these empty paragraphs (paragraphs with whitespace only --- excluding nbsp --- should anyways never be emitted).

@trefis
Copy link
Contributor

trefis commented Jan 9, 2017

The fix is a bit crude, this should perhaps be done inside octavius itself at the same time as #39 .

@aantron
Copy link
Contributor

aantron commented Oct 19, 2017

I looked at this issue more deeply, and a few others, and wrote a bunch of tests for Octavius. The result is that I think Octavius would be much more useful if it emitted a representation that is aware of the difference between a block element and an inline element, and thus had a notion of paragraphs (and thus trimmed newlines, avoided the Newline constructor, etc).

I intend to add it, but please let me know any objections. I can write out a long-form argument, but in the interest of time efficiency, the short version is that generators for all output formats I've ever dealt with will like to know the block structure of doc comments. At the same time, manually reconstructing block structure from what Octavius currently produces is an error-prone process, so it's better to have Octavius provide it as either a helper format, or the only format. Further, there is really only one "answer" (e.g. section headers are always blocks, aligned text is always a separate block, etc.), so we won't be making any premature decisions for any Octavius user.

The current Octavius output has also a number of leading/trailing newline inconsistencies, and inconsistencies in when it emits Newline. With a block-aware representation, those will be fixed or become meaningless.

@aantron
Copy link
Contributor

aantron commented Oct 19, 2017

Almost forgot. It would also make "natural" the choice of representing tags as a kind of block, which would allow the representation to "remember" the order in which tags had appeared in the input relative to the rest of the text. This might be useful if we ever want to add Doxygen-style @warning, @note, etc., tags later.

@lpw25
Copy link
Contributor

lpw25 commented Oct 20, 2017

It's difficult to object without you being more concrete in what you think the problem is and what your solution entails. In general my aim is to make Octavius less aware of the different kinds of tags and elements, ideally reaching a point where it knows nothing about them, so I'm not keen on forcing it to know even more about them. It also seems unlikely to be necessary, as long as Octavius preserves sufficient information from the source the consumer should be able to process the different elements as appropriate.

@lpw25
Copy link
Contributor

lpw25 commented Oct 20, 2017

Perhaps things would be clearer if you give some concrete examples where Octavius is misbehaving. Or even better submit your tests/examples as a PR.

@lpw25
Copy link
Contributor

lpw25 commented Oct 20, 2017

Having looked at the issue a bit. I think I don't object to the underlying idea of your solution. I would just like to have the bit of code you're proposing adding be in doc-ock rather than in octavius. It looks to me like the difference between the different elements would really only affect the convert function in OctParser. This is in a way the "last" bit of code in octavius before the data is passed to the outside world. It's job is essentially from a type like:

type text_item =
    Blank
  | Newline
  | Blank_line
  | String of string
  | Element of text_element

to a type equivalent to:

type text_item =
  | Blank_line
  | String of string
  | Element of text_element

I would suggest that we move this function outside of octavius and leave the Blank and Newline constructors in Octavius' output type. Then the handling of paragraphs/block elements can be put inside of odoc or doc-ock.

@lpw25
Copy link
Contributor

lpw25 commented Oct 20, 2017

I would just like to have the bit of code you're proposing adding be in odoc rather than in octavius

Since the changes will probably be restricted to the convert function, I'm also happy for them to go into octavius in the short-term -- as it would be easy to lift them out to odoc later when I finally get around to trying to make Octavius more agnostic.

@aantron
Copy link
Contributor

aantron commented Oct 20, 2017

I noticed there is a pretty rigid coupling between almost all of Octavius.Types.t and a few of the types in DocOckTypes.Documentation, for which reason I probably won't do the above change now. It would require refactoring doc-ock a bit, to treat Octavius.Types.t (or an additional, more-structured type) in a somewhat more abstract fashion. This refactoring is probably not worth doing immediately.

Without fully considering everything, my impression is that doc-ock should, instead of translating Octavius.Types.t to a very similar variant that apparently differs in the reference type, pass Octavius.Types.t (or new type) straight through (or translate, abstractly, between some functorized forms provided by Octavius). Right now, there are two expansive sets of variant types, both in the APIs of two different libraries, both potentially inconvenient, and coupled to each other, so the overhead for a contributor for adjusting them in any way is pretty high.

I'll look at fixing the mentioned markup problems in doc-ock-html or somewhere in doc-ock for now, even though I suspect I'll believe it's the wrong place to fix them.

jonludlam pushed a commit to jonludlam/odoc-parser-cleaned that referenced this issue Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants