`odoc html` mld processing introduces spurious <p>s with newlines #55

dbuenzli · 2017-01-05T21:00:49Z

The <p> in the output below should not exist at all (they are ~~prettified~~ by xmltrip they contain in fact a single newline):

> cat bla.mld
{%html:<h1>Hey</h1>%}
{!modules: String}
{!modules: String}
> odoc html -o /tmp/ --index-for Bla bla.mld
> cat /tmp/Bla/index.html | xmltrip -indent
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      Bla
    </title>
    <link rel="stylesheet" href="../odoc.css"/>
    <meta charset="utf-8"/>
    <meta name="viewport" content="width=device-width,initial-scale=1.0"/>
    <meta name="generator" content="odoc 8f4e403"/>
  </head>
  <body>
    <h1>
      Hey
    </h1>
    <p>
      

    </p>
    <table class="modules">
      <tr id="listing-unresolved-String" class="anchored">
        <td class="unresolved">
          <a href="#listing-unresolved-String" class="anchor"/>
          String
        </td>
        <td class="doc"/>
      </tr>
    </table>
    <p>
      

    </p>
    <table class="modules">
      <tr id="listing-unresolved-String" class="anchored">
        <td class="unresolved">
          <a href="#listing-unresolved-String" class="anchor"/>
          String
        </td>
        <td class="doc"/>
      </tr>
    </table>
  </body>
</html>

The text was updated successfully, but these errors were encountered:

dbuenzli · 2017-01-05T21:07:03Z

Having instead:

{%html:<h1>Hey</h1>%}{!modules: String}{!modules: String}

is a workaround.

trefis · 2017-01-06T13:48:53Z

Not sure if this is a doc-ock-html, doc-ock or an octavius problem or no problem at all.
Basically what doc-ock-html receives is:

[ Target (Some "html", "<h1>Hey</h1>")
; Raw "\n"
; Special _
; Raw "\n"
; Special _
]

Should doc-ock-html receive this input? Should it treat it the way it does?
I don't really know.
@lpw25 ?

dbuenzli · 2017-01-06T14:08:15Z

FWIW IIRC the same content interpreted in an ocamldoc comment does not introduce these empty paragraphs (paragraphs with whitespace only --- excluding nbsp --- should anyways never be emitted).

trefis · 2017-01-09T09:52:53Z

The fix is a bit crude, this should perhaps be done inside octavius itself at the same time as #39 .

aantron · 2017-10-19T22:58:48Z

I looked at this issue more deeply, and a few others, and wrote a bunch of tests for Octavius. The result is that I think Octavius would be much more useful if it emitted a representation that is aware of the difference between a block element and an inline element, and thus had a notion of paragraphs (and thus trimmed newlines, avoided the Newline constructor, etc).

I intend to add it, but please let me know any objections. I can write out a long-form argument, but in the interest of time efficiency, the short version is that generators for all output formats I've ever dealt with will like to know the block structure of doc comments. At the same time, manually reconstructing block structure from what Octavius currently produces is an error-prone process, so it's better to have Octavius provide it as either a helper format, or the only format. Further, there is really only one "answer" (e.g. section headers are always blocks, aligned text is always a separate block, etc.), so we won't be making any premature decisions for any Octavius user.

The current Octavius output has also a number of leading/trailing newline inconsistencies, and inconsistencies in when it emits Newline. With a block-aware representation, those will be fixed or become meaningless.

aantron · 2017-10-19T23:08:52Z

Almost forgot. It would also make "natural" the choice of representing tags as a kind of block, which would allow the representation to "remember" the order in which tags had appeared in the input relative to the rest of the text. This might be useful if we ever want to add Doxygen-style @warning, @note, etc., tags later.

lpw25 · 2017-10-20T08:23:56Z

It's difficult to object without you being more concrete in what you think the problem is and what your solution entails. In general my aim is to make Octavius less aware of the different kinds of tags and elements, ideally reaching a point where it knows nothing about them, so I'm not keen on forcing it to know even more about them. It also seems unlikely to be necessary, as long as Octavius preserves sufficient information from the source the consumer should be able to process the different elements as appropriate.

lpw25 · 2017-10-20T08:26:05Z

Perhaps things would be clearer if you give some concrete examples where Octavius is misbehaving. Or even better submit your tests/examples as a PR.

lpw25 · 2017-10-20T09:04:34Z

Having looked at the issue a bit. I think I don't object to the underlying idea of your solution. I would just like to have the bit of code you're proposing adding be in doc-ock rather than in octavius. It looks to me like the difference between the different elements would really only affect the convert function in OctParser. This is in a way the "last" bit of code in octavius before the data is passed to the outside world. It's job is essentially from a type like:

type text_item =
    Blank
  | Newline
  | Blank_line
  | String of string
  | Element of text_element

to a type equivalent to:

type text_item =
  | Blank_line
  | String of string
  | Element of text_element

I would suggest that we move this function outside of octavius and leave the Blank and Newline constructors in Octavius' output type. Then the handling of paragraphs/block elements can be put inside of odoc or doc-ock.

lpw25 · 2017-10-20T10:50:08Z

I would just like to have the bit of code you're proposing adding be in odoc rather than in octavius

Since the changes will probably be restricted to the convert function, I'm also happy for them to go into octavius in the short-term -- as it would be easy to lift them out to odoc later when I finally get around to trying to make Octavius more agnostic.

aantron · 2017-10-20T13:24:53Z

I noticed there is a pretty rigid coupling between almost all of Octavius.Types.t and a few of the types in DocOckTypes.Documentation, for which reason I probably won't do the above change now. It would require refactoring doc-ock a bit, to treat Octavius.Types.t (or an additional, more-structured type) in a somewhat more abstract fashion. This refactoring is probably not worth doing immediately.

Without fully considering everything, my impression is that doc-ock should, instead of translating Octavius.Types.t to a very similar variant that apparently differs in the reference type, pass Octavius.Types.t (or new type) straight through (or translate, abstractly, between some functorized forms provided by Octavius). Right now, there are two expansive sets of variant types, both in the APIs of two different libraries, both potentially inconvenient, and coupled to each other, so the overhead for a contributor for adjusting them in any way is pretty high.

I'll look at fixing the mentioned markup problems in doc-ock-html or somewhere in doc-ock for now, even though I suspect I'll believe it's the wrong place to fix them.

dbuenzli added the output label Jan 5, 2017

trefis closed this as completed in ocaml-doc/doc-ock-html@bfe79a1 Jan 9, 2017

aantron mentioned this issue Sep 27, 2017

Don't drop whitespace between formatted elements ocaml-doc/doc-ock-html#10

Open

jonludlam pushed a commit to jonludlam/odoc-parser-cleaned that referenced this issue Jul 1, 2021

don't emit empty paragraphs (fix ocaml/odoc#55)

23e4d33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`odoc html` mld processing introduces spurious <p>s with newlines #55

`odoc html` mld processing introduces spurious <p>s with newlines #55

dbuenzli commented Jan 5, 2017 •

edited

Loading

dbuenzli commented Jan 5, 2017

trefis commented Jan 6, 2017

dbuenzli commented Jan 6, 2017

trefis commented Jan 9, 2017

aantron commented Oct 19, 2017

aantron commented Oct 19, 2017

lpw25 commented Oct 20, 2017

lpw25 commented Oct 20, 2017

lpw25 commented Oct 20, 2017 •

edited

Loading

lpw25 commented Oct 20, 2017

aantron commented Oct 20, 2017

odoc html mld processing introduces spurious <p>s with newlines #55

odoc html mld processing introduces spurious <p>s with newlines #55

Comments

dbuenzli commented Jan 5, 2017 • edited Loading

dbuenzli commented Jan 5, 2017

trefis commented Jan 6, 2017

dbuenzli commented Jan 6, 2017

trefis commented Jan 9, 2017

aantron commented Oct 19, 2017

aantron commented Oct 19, 2017

lpw25 commented Oct 20, 2017

lpw25 commented Oct 20, 2017

lpw25 commented Oct 20, 2017 • edited Loading

lpw25 commented Oct 20, 2017

aantron commented Oct 20, 2017

`odoc html` mld processing introduces spurious <p>s with newlines #55

`odoc html` mld processing introduces spurious <p>s with newlines #55

dbuenzli commented Jan 5, 2017 •

edited

Loading

lpw25 commented Oct 20, 2017 •

edited

Loading