Sentence-based word wrapping #4

hukkin · 2020-07-15T10:26:27Z

Experiment with something like

from nltk import tokenize
sentences = tokenize.sent_tokenize(paragraph)

and find out if we can implement sentence-based word wrapping to reduce diffs.

Implement as an option, dont change the default mode (which preserves wrapping).

hukkin · 2021-04-27T22:29:23Z

Due to the nltk dependency and the many, many bugs that I expect, I think any work should be started (and most likely stay) in a plugin.

choldgraf · 2021-06-09T21:12:35Z

100% plugin is the right place to experiment with this. I suspect it will lead to many unexpected or unpredictable side effects which is the last thing you want in a black-style program :-)

jspaezp · 2022-09-15T00:06:37Z

Hello there!

I thought A LOT about this issue in the last couple of days and wanted to pitch an idea.
Inspired in the way that black handles docstrings, where a lot of the times "if it can fit in a a line of less than 88 chars it should", could a "lazy" implementation of the problem be to separate on punctuation unless the generated section was less than X number of characters?

I think it would accomplish a predictable behaviour and greatly reduce diff sizes, it will sometimes lead to uglier docstrings but ... well .. they will not look ugly when rendered to html ...

start empty chunk
for a given paragraph, start from the end
    append to chunk until a punctuation mark is found
        if the chunk is larger than X (.... i dont know ... 42 characters)
            yield the chunk (separate new line)

let me know what you think, (i am trying to find problems with my approach)

Temple of Doom was discovered by Dr. Jones.

Ended up implementing a version of this ... regex based and including support for some other stuff ...
I want to play with it a bit more before publishing it but looks promising
LMK if there are things you feel it should support that I have not tested.

https://github.com/jspaezp/mdformat-sentencebreak

hukkin added the enhancement New feature or request label Jul 15, 2020

hukkin added the research Research needs to be done label Jan 21, 2021

hukkin changed the title ~~Implement word wrapping that optimizes for smaller diffs~~ Word wrapping that optimizes for smaller diffs Jan 21, 2021

hukkin added the plugin A plugin should be created or updated label Apr 27, 2021

hukkin mentioned this issue Jun 9, 2021

Support enforcing line breaks after "end of sentence" #222

Closed

hukkin changed the title ~~Word wrapping that optimizes for smaller diffs~~ Sentence-based word wrapping Jun 9, 2021

hukkin mentioned this issue Dec 15, 2022

Allow wrapping option of 1 sentence per line #374

Closed

yarikoptic mentioned this issue Nov 18, 2023

leads to "breakage" according to mdformat internal check: jspaezp/mdformat-sentencebreak#1

Open

hukkin mentioned this issue Jan 30, 2024

Word-wrap on sentence (punctuation) and width. #422

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentence-based word wrapping #4

Sentence-based word wrapping #4

hukkin commented Jul 15, 2020 •

edited

Loading

hukkin commented Apr 27, 2021

choldgraf commented Jun 9, 2021 •

edited

Loading

jspaezp commented Sep 15, 2022 •

edited

Loading

Sentence-based word wrapping #4

Sentence-based word wrapping #4

Comments

hukkin commented Jul 15, 2020 • edited Loading

hukkin commented Apr 27, 2021

choldgraf commented Jun 9, 2021 • edited Loading

jspaezp commented Sep 15, 2022 • edited Loading

hukkin commented Jul 15, 2020 •

edited

Loading

choldgraf commented Jun 9, 2021 •

edited

Loading

jspaezp commented Sep 15, 2022 •

edited

Loading