Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline strings (or block scalars) are not preserved when they have trailing white space. #566

Open
bryant-ferguson opened this issue Oct 16, 2020 · 7 comments
Labels

Comments

@bryant-ferguson
Copy link

Describe the bug
Multiline strings (or block scalars) are not preserved when they have trailing white space. They are converted to quoted strings.

version of yq: 3.3.4
operating system: Linux Mint 19

Input Yaml

this: |
  should 
  really work

Command
The command you ran:

yq merge data.yml

Actual behavior

this: "should \nreally work\n"

Expected behavior

this: |
  should 
  really work
@Orrimp
Copy link

Orrimp commented Jun 20, 2023

We have the same issue and its important for us because it destroys the configuration.

@arikkfir
Copy link

arikkfir commented Sep 7, 2023

Same here - is this something to be expected in yq?

@arikkfir
Copy link

arikkfir commented Sep 8, 2023

Ok just discovered this is a duplicate of #1681 and #1277, which both depend on go-yaml/yaml#880

@mikefarah perhaps it would be best to close #1681 and #1277 as duplicates of this issue (since this one is the oldest)

arikkfir added a commit to arikkfir/delivery that referenced this issue Sep 8, 2023
The reason #9312c1f did not preserve YAML multi-line block is due to a
trailing space in one of the lines; this is a bug in "yq" documented in
mikefarah/yq#566 which depends on go-yaml/yaml#880
@rattboi
Copy link

rattboi commented Oct 4, 2023

I spent some time investigating this yesterday, and found the related code within go-yaml/yaml that could potentially be changed to fix this. However, I also see that go-yaml/yaml currently has 112 PRs that have gone unreviewed, most without comment.

I almost spent time working out a fix in the upstream library, but I'm wondering if @mikefarah would be open to another direction, i.e. using https://github.com/goccy/go-yaml/ for more of the standard parsing / lexing / encoding / decoding within yq. I see it's already in use for one specific use-case (colorized output), but it could likely be used to replace all go-yaml/yaml.

It may be an undertaking, but as upstream of go-yaml/yaml seems effectively dead, and benchmarks show https://github.com/goccy/go-yaml/ as being generally 2x faster in operations, it seems like a potential solution for these yaml-decoding issues that are blocked by upstream.

There's a simple ycat command bundled with goccy/go-yaml, but it was enough to see how it would handle literal scalars with trailing spaces:

echo -ne 'foo: |\n  bar:2    \n  baz:3\n  #foo' | go run ycat.go /dev/stdin
 1 | foo: |
 2 |   bar:2    
 3 |   baz:3
 4 |   #foo

@bradonkanyid
Copy link

bradonkanyid commented Oct 4, 2023

fwiw, these are the bits of the code in go-yaml/yaml that are involved in this decision

Determining if there are trailing spaces in a scalar:
https://github.com/go-yaml/yaml/blob/f6f7691b1fdeb513f56608cd2c32c51f8194bf51/emitterc.go#L1335-L1346

                if is_space(value, i) {
                        if i == 0 {
                                leading_space = true
                        }
                        if i+width(value[i]) == len(value) {
                                trailing_space = true
                        }
                        ...
                }
                ...

Disabling various ways to render based on trailing spaces being identified:
https://github.com/go-yaml/yaml/blob/f6f7691b1fdeb513f56608cd2c32c51f8194bf51/emitterc.go#L1375-L1381

        if leading_space || leading_break || trailing_space || trailing_break {
                emitter.scalar_data.flow_plain_allowed = false
                emitter.scalar_data.block_plain_allowed = false
        }
        if trailing_space {
                emitter.scalar_data.block_allowed = false
        }

The actual decision to render in double-quoted scalar format:
https://github.com/go-yaml/yaml/blob/f6f7691b1fdeb513f56608cd2c32c51f8194bf51/emitterc.go#L1039-L1043

        if style == yaml_LITERAL_SCALAR_STYLE || style == yaml_FOLDED_SCALAR_STYLE {
                if !emitter.scalar_data.block_allowed || emitter.flow_level > 0 || emitter.simple_key_context {
                        style = yaml_DOUBLE_QUOTED_SCALAR_STYLE
                }
        }

@vit-zikmund
Copy link
Contributor

Hi there, Mike!

While mr. Niemeyer has sensible reasons not to allow other maintainers on the project, in fact the repo hasn't been touched for 2 years now. The suggested project seems a bit more alive, but in the end it's yet another single-person-team project :( It has a half of opened issues, although that also doesn't have to mean anything and I understand the risk of porting yq to it (if that's even a possibility without a major rewrite). But where to go from here? go-yaml has a ridiculous number of forks... I guess I know why 🤕

I'd say yq became so better these days, running on those rusty wheels must be getting challenging to say the least. Not being able to provide any real advice 🙈, I'm also curious, whether moving to goccy's go-yaml is being considered.

Thank you!

@pascal-hofmann
Copy link

I use the following as a workaround for now. It removes all trailing spaces from block literals.

yq -i '... |= (select(tag == "!!str" and style == "literal") | sub(" *\n", "\n"))' file.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants