Skip to content

Commit

Permalink
goldmark-jupyter: support cell attachments in markdown cells (#5)
Browse files Browse the repository at this point in the history
This new package provides 2 extensions:
- jupyter.Attachments (goldmark)
- jupyter.Goldmark (nb)
  • Loading branch information
bevzzz authored Jan 26, 2024
1 parent b833cf0 commit 574bb1e
Show file tree
Hide file tree
Showing 9 changed files with 484 additions and 0 deletions.
21 changes: 21 additions & 0 deletions extension/extra/goldmark-jupyter/LICENCE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Dmytro Solovei

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
85 changes: 85 additions & 0 deletions extension/extra/goldmark-jupyter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# goldmark-jupyter

From `nbformat` documentation:

```txt
Markdown (and raw) cells can have a number of attachments, typically inline images, that can be referenced in the markdown content of a cell. 🖇
(punctuation mine)
```

`goldmark-jupyter` helps [`goldmark`](https://github.com/yuin/goldmark) recognise [cell attachments](https://nbformat.readthedocs.io/en/latest/format_description.html#cell-attachments) and include them in the rendered markdown correctly.


| `goldmark` | `goldmark-jupyter` |
| ----------- | ----------- |
| ![img](./assets/goldmark.png) | ![img](./assets/goldmark-jupyter.png) |

## Installation

```sh
go get github.com/bevzzz/nb/extensions/extra/goldmark-jupyter
```

## Usage

Package `goldmark-jupyter` exports 2 dedicated extensions for `goldmark` and `nb`, which should be used together like so:

```go
import (
"github.com/bevzzz/nb"
"github.com/bevzzz/nb/extensions/extra/goldmark-jupyter"
"github.com/yuin/goldmark"
)

md := goldmark.New(
goldmark.WithExtensions(
jupyter.Attachments(),
),
)

c := nb.New(
nb.WithExtensions(
jupyter.Goldmark(md),
),
)

if err := c.Convert(io.Stdout, b); err != nil {
panic(err)
}
```

`Attachments` will extend the default `goldmark.Markdown` with a custom link parser and an image renderer. Quite naturally, this renderer accepts `html.Options` which can be passed to the constructor:

```go
import (
"github.com/bevzzz/nb/extensions/extra/goldmark-jupyter"
"github.com/yuin/goldmark"
"github.com/yuin/goldmark/render/html"
)

md := goldmark.New(
goldmark.WithExtensions(
jupyter.Attachments(
html.WithXHTML(),
html.WithUnsafe(),
),
),
)
```

Note, however, that options not applicable to image rendering will have no effect. As of the day of writing, `goldmark v1.6.0` references these options when rendering images:

- `WithXHML()`
- `WithUnsafe()`
- `WithWriter(w)`

## Contributing

Thank you for giving `goldmark-jupyter` a run!

If you find a bug that needs fixing or a feature that needs adding, please consider describing it in an issue or opening a PR.

## License

This software is released under [the MIT License](https://opensource.org/license/mit/).
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
206 changes: 206 additions & 0 deletions extension/extra/goldmark-jupyter/attachment.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
// Package jupyter provides extensions for goldmark and nb. Together they add support
// for inline images, which have their data stored as cell attachments, in markdown cells.
//
// How it is achieved:
//
// 1. Goldmark extends nb with a custom "markdown" cell renderer which
// stores cell attachments to the parser.Context on every render.
//
// 2. Attachments extends goldmark with a custom link parser (ast.KindLink)
// and an image NodeRenderFunc.
//
// The parser is context-aware and will get the related mime-bundle from the context
// and store it to node attributes for every link whose destination looks like "attachments:image.png"
//
// Custom image renderer writes base64-encoded data from the mime-bundle if one's present,
// falling back to the destination URL.
package jupyter

import (
"io"
"regexp"

"github.com/bevzzz/nb"
"github.com/bevzzz/nb/extension"
"github.com/bevzzz/nb/schema"
"github.com/yuin/goldmark"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/parser"
"github.com/yuin/goldmark/renderer"
"github.com/yuin/goldmark/renderer/html"
"github.com/yuin/goldmark/text"
"github.com/yuin/goldmark/util"
)

// Attachments adds support for Jupyter [cell attachments] to goldmark parser and renderer.
//
// [cell attachments]: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-attachments
func Attachments(opts ...html.Option) goldmark.Extender {
c := html.NewConfig()
for _, opt := range opts {
opt.SetHTMLOption(&c)
}
return &attachments{
config: c,
}
}

// Goldmark overrides the default rendering function for markdown cells
// and stores cell attachments to the parser.Context on every render.
func Goldmark(md goldmark.Markdown) nb.Extension {
return extension.NewMarkdown(
func(w io.Writer, c schema.Cell) error {
ctx := newContext(c)
return md.Convert(c.Text(), w, parser.WithContext(ctx))
},
)
}

var (
// key is a context key for storing cell attachments.
key = parser.NewContextKey()

// name is the name of a node attribute that holds the mime-bundle.
// This package uses node attributes as a proxy for rendering context,
// so <mime-bundle> will never be added to the HTML output. The name is
// intentionally [invalid] to avoid name-clashes with othen potential attributes.
//
// [invalid]: https://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#attributes-0
name = []byte("<mime-bundle>")
)

// newContext adds mime-bundles from cell attachements to a new parse.Context.
func newContext(cell schema.Cell) parser.Context {
ctx := parser.NewContext()
if c, ok := cell.(schema.HasAttachments); ok {
ctx.Set(key, c.Attachments())
}
return ctx
}

// linkParser adds base64-encoded image data from parser.Context to node's attributes.
type linkParser struct {
link parser.InlineParser // link is goldmark's default link parser.
}

func newLinkParser() *linkParser {
return &linkParser{
link: parser.NewLinkParser(),
}
}

var _ parser.InlineParser = (*linkParser)(nil)

func (p *linkParser) Trigger() []byte {
return p.link.Trigger()
}

// attachedFile retrieves the name of the attached file from the link's destination.
var attachedFile = regexp.MustCompile(`attachment:(\w+\.\w+)$`)

// Parse stores mime-bundle in node attributes for links whose destination is an attachment.
func (p *linkParser) Parse(parent ast.Node, block text.Reader, pc parser.Context) (n ast.Node) {
n = p.link.Parse(parent, block, pc)

img, ok := n.(*ast.Image)
if !ok {
// goldmark's default link parser will return a "state node" whenever it's triggered
// by the opening bracket of the link's alt-text "[" or any intermediate characters.
// We only want to intercept when the link is done parsing and we get a valid *ast.Image.
return n
}

submatch := attachedFile.FindSubmatch(img.Destination)
if len(submatch) < 2 {
return
}
filename := submatch[1]

att, ok := pc.Get(key).(schema.Attachments)
if att == nil || !ok {
return
}

// Admittedly
data := att.MimeBundle(string(filename))
n.SetAttribute(name, data)
return
}

// image renders inline images from cell attachments.
type image struct {
html.Config
}

var _ renderer.NodeRenderer = (*image)(nil)

func (img *image) RegisterFuncs(reg renderer.NodeRendererFuncRegisterer) {
reg.Register(ast.KindImage, img.render)
}

// render borrows heavily from goldmark's [renderImage].
//
// [renderImage]: https://github.com/yuin/goldmark/blob/90c46e0829c11ca8d1010856b2a6f6f88bfc68a3/renderer/html/html.go#L673
func (img *image) render(w util.BufWriter, source []byte, node ast.Node, entering bool) (ast.WalkStatus, error) {
if !entering {
return ast.WalkContinue, nil
}

n := node.(*ast.Image)
_, _ = w.WriteString("<img src=\"")

attr, hasAttachments := n.Attribute(name)
if !hasAttachments {
if img.Unsafe || !html.IsDangerousURL(n.Destination) {
_, _ = w.Write(util.EscapeHTML(util.URLEscape(n.Destination, true)))
}
} else if mb, ok := attr.(schema.MimeBundle); ok {
// Here we do not need to extract the filename again, as it is sufficient
// that the mime-bundle is present in the attributes.
io.WriteString(w, "data:")
io.WriteString(w, mb.MimeType())
io.WriteString(w, ";base64, ")
w.Write(mb.Text())
}

_, _ = w.WriteString(`" alt="`)
_, _ = w.Write(nodeToHTMLText(n, source))
_ = w.WriteByte('"')

if n.Title != nil {
_, _ = w.WriteString(` title="`)
img.Writer.Write(w, n.Title)
_ = w.WriteByte('"')
}

if n.Attributes() != nil {
html.RenderAttributes(w, n, html.ImageAttributeFilter)
}

if img.XHTML {
_, _ = w.WriteString(" />")
} else {
_, _ = w.WriteString(">")
}

return ast.WalkSkipChildren, nil
}

// attachments implements goldmark.Extender.
type attachments struct {
config html.Config
}

var _ goldmark.Extender = (*attachments)(nil)

// Extends adds custom link parser and image renderer.
//
// Priorities are selected based on the ones used in goldmark.
func (a *attachments) Extend(md goldmark.Markdown) {
md.Parser().AddOptions(
parser.WithInlineParsers(util.Prioritized(newLinkParser(), 199)), // default: 200
)
md.Renderer().AddOptions(
renderer.WithNodeRenderers(util.Prioritized(&image{Config: a.config}, 999)), // default: 1000
)
}
Loading

0 comments on commit 574bb1e

Please sign in to comment.