Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much blankspace gets stripped #91

Open
hugopeixoto opened this issue Sep 15, 2020 · 1 comment
Open

Too much blankspace gets stripped #91

hugopeixoto opened this issue Sep 15, 2020 · 1 comment

Comments

@hugopeixoto
Copy link

Coming from forem/forem#8457, I think that ReverseMarkdown strips too much blankspace in some scenarios. Here's a failing test:

  it 'keeps whitespace surrounding links' do
    result = ReverseMarkdown.convert("a\n<a href='1'>link</a>\nis good\nbut blankspace is better")
    expect(result).to eq "a [link](1) is good but blankspace is better\n\n"
  end

The output is a[link](1)is good but blankspace is better\n\n. This happens because the text converter calls remove_border_newlines, and the fact that the middle line is a link means that it will be its own nokogiri node, and the three nodes will be joined with no whitespace. I tried changing remove_border_lines to squeeze instead of removing everything, but this doesn't work: it keeps whitespace in scenarios where it shouldn't:

first<p>second</p>third becomes first\n\nsecond\n\n third\n\n.

I still want to investigate this further, but I decided to post this now to share my findings.

@hugopeixoto
Copy link
Author

I was curious as to why this didn't happen with <strong> tags. The following test does succeed:

it 'keeps whitespace surrounding bolds' do
  result = ReverseMarkdown.convert("a\n<strong>potato</strong>\nis good\nbut blankspace is better")
  expect(result).to eq "a **potato** is good but blankspace is better\n\n"
end

After some puts debuggering, I see that this one also gets its blankspace trimmed in the first pass, but then cleaner.tidy adds it back again in clean_tag_borders:

converters.convert: "\n\na**potato**is good but blankspace is better\n\n"
cleaner.tidy: "a **potato** is good but blankspace is better\n\n"

So it seems that we're losing information during convert, and assuming that ** would always be preceeded by a blankspace, which gets added in tidy (clean_tag_borders). The same technique wouldn't work for links, since the surrounding blankspace doesn't need to be there for markdown links.

My previous attempt of changing remove_border_lines to use squeeze doesn't work because it would break this test:

it 'squeezes whitespace when using paragraph tags' do
  result = ReverseMarkdown.convert("first paragraph\n<p>second paragraph</p>\nthird\nin separate lines")
  expect(result).to eq "first paragraph\n\nsecond paragraph\n\nthird in separate lines\n\n"
end

with the following diff:

 first paragraph

 second paragraph

-third in separate lines
+ third in separate lines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant