Unpacked string bytes are null-padded, which affects string comparisons #42

jblackman · 2016-03-04T14:41:18Z

This is an interesting side-effect. If an encoded character array is shorter than the field length, it will be padded out with byte(0), as you would expect. When you unpack into a string, these extra bytes make it into the string variable's underlying byte array. As a result, a comparison of this string with a string literal will return false.

For example:

package main

import (
    "bytes"
    "fmt"
    "github.com/lunixbochs/struc"
)

type Test struct {
    Str string `struc:"[10]byte"`
}

func main() {
    payload := "Foo"
    payloadSlice := make([]byte, 10)
    copy(payloadSlice, []byte(payload))
    buf := bytes.NewBuffer(payloadSlice)
    t := &Test{}
    if err := struc.Unpack(buf, t); err != nil {
        fmt.Println("Unpack failed:", err)
        return
    }
    if t.Str == "Foo" {
        fmt.Println("Comparison is OK")
    } else {
        fmt.Println(t.Str, "does not equal Foo")
        fmt.Println("t.Str = ", []byte(t.Str))
    }
}

If you run this, you will get:

Foo does not equal Foo
t.Str =  [70 111 111 0 0 0 0 0 0 0]

So, is it worth fixing this? IMO, yes. Hunting into field.go, we could alter this line:

val.SetString(string(buf))

perhaps to:

val.SetString( bytes.Trim(buf, "\x00") )

or, if that will add too much overhead, loop to the first null and use a truncated slice:

idx := bytes.IndexByte(buf, 0)
if idx > length {
    idx = length
}
val.SetString(buf[:idx])

wdyt?

The text was updated successfully, but these errors were encountered:

lunixbochs · 2016-03-04T17:35:50Z

duplicate of #31

you could want to unpack a value into a string which legitimately contains null characters.

jblackman · 2016-03-04T17:56:34Z

A fairly edge case, though - wouldn't you generally be unpacking that into a byte array?

lunixbochs · 2016-03-04T18:05:36Z

Strings can be slightly easier to work with. The main difference in Go is mutability. We shouldn't force C's shortcomings on every application - someone might want to, say, unpack then decode utf16 data, which wouldn't work if I stopped at the first null byte.

jblackman · 2016-03-04T18:09:22Z

A very good point.

jchv · 2016-03-21T15:22:25Z

Wait, that argument doesn't really hold. UTF-16 would very logically be []uint16 - and if you were to ignore that fact, you'd be in for some pain when trying to use I.E. unicode/utf16. Further, while it is explicitly true that Go strings are not defined to be UTF-8 data, using them to hold arbitrary data very seldom makes sense; as you probably know, performing a range over a string actually pulls one codepoint at a time, making the assumption that the string is in fact UTF-8.

But I don't think that's the crux of the problem here. I think we need to think about this statement more:

We shouldn't force C's shortcomings on every application

On its own, true: that wouldn't make any sense to do so. However, it's worth noting that this is not that. Supporting C-style strings using the string type would be at most an idiosyncrasy rather than a limitation, given that otherwise []byte would work fine. On the other hand, I think there's a compelling argument for why it should, and it's in the title of the library and first line of the README:

Struc exists to pack and unpack C-style structures from bytes

And I'd argue C is the primary reason why we're jamming null-terminated strings into fixed-size arrays, making it compelling to support the case better.

At the same time, I also would shy away from making this sort of magic happen automatically, especially since people already use the library. Instead, I am in favor of supporting a flag that would turn on this behavior.

Disclaimer and shameless plug: I developed a similar library that behaves the same in this case. It does not support a flag for this either, as of writing.

jblackman · 2016-03-21T23:19:36Z

In general, the application that originated the structure will be quite prescriptive about the encoding of the string/byte array. It also happens (has happened to me) that the rule may change (as they choose to adopt a more recent encoding standard rather than chucking the whole mess away and using JSON). I would be more inclined to introduce an "encoding" tag of some sort to manage the transition, and give the developer some sugar. For example:

 Str string `struc:"[100]byte,utf16"`

which would say that you're receiving 100 bytes, to be interpreted as UTF-16 (and null padding stripped accordingly). The default could then be 8-bit ASCII and null padding auto-stripped.

lunixbochs mentioned this issue Mar 14, 2018

Zero padded strings #53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unpacked string bytes are null-padded, which affects string comparisons #42

Unpacked string bytes are null-padded, which affects string comparisons #42

jblackman commented Mar 4, 2016

lunixbochs commented Mar 4, 2016

jblackman commented Mar 4, 2016

lunixbochs commented Mar 4, 2016

jblackman commented Mar 4, 2016

jchv commented Mar 21, 2016

jblackman commented Mar 21, 2016

Unpacked string bytes are null-padded, which affects string comparisons #42

Unpacked string bytes are null-padded, which affects string comparisons #42

Comments

jblackman commented Mar 4, 2016

lunixbochs commented Mar 4, 2016

jblackman commented Mar 4, 2016

lunixbochs commented Mar 4, 2016

jblackman commented Mar 4, 2016

jchv commented Mar 21, 2016

jblackman commented Mar 21, 2016