Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpacked string bytes are null-padded, which affects string comparisons #42

Open
jblackman opened this issue Mar 4, 2016 · 6 comments

Comments

@jblackman
Copy link

This is an interesting side-effect. If an encoded character array is shorter than the field length, it will be padded out with byte(0), as you would expect. When you unpack into a string, these extra bytes make it into the string variable's underlying byte array. As a result, a comparison of this string with a string literal will return false.

For example:

package main

import (
    "bytes"
    "fmt"
    "github.com/lunixbochs/struc"
)

type Test struct {
    Str string `struc:"[10]byte"`
}

func main() {
    payload := "Foo"
    payloadSlice := make([]byte, 10)
    copy(payloadSlice, []byte(payload))
    buf := bytes.NewBuffer(payloadSlice)
    t := &Test{}
    if err := struc.Unpack(buf, t); err != nil {
        fmt.Println("Unpack failed:", err)
        return
    }
    if t.Str == "Foo" {
        fmt.Println("Comparison is OK")
    } else {
        fmt.Println(t.Str, "does not equal Foo")
        fmt.Println("t.Str = ", []byte(t.Str))
    }
}

If you run this, you will get:

Foo does not equal Foo
t.Str =  [70 111 111 0 0 0 0 0 0 0]

So, is it worth fixing this? IMO, yes. Hunting into field.go, we could alter this line:

val.SetString(string(buf))

perhaps to:

val.SetString( bytes.Trim(buf, "\x00") )

or, if that will add too much overhead, loop to the first null and use a truncated slice:

idx := bytes.IndexByte(buf, 0)
if idx > length {
    idx = length
}
val.SetString(buf[:idx])

wdyt?

@lunixbochs
Copy link
Owner

duplicate of #31

you could want to unpack a value into a string which legitimately contains null characters.

@jblackman
Copy link
Author

A fairly edge case, though - wouldn't you generally be unpacking that into a byte array?

@lunixbochs
Copy link
Owner

Strings can be slightly easier to work with. The main difference in Go is mutability. We shouldn't force C's shortcomings on every application - someone might want to, say, unpack then decode utf16 data, which wouldn't work if I stopped at the first null byte.

@jblackman
Copy link
Author

A very good point.

@jchv
Copy link

jchv commented Mar 21, 2016

Wait, that argument doesn't really hold. UTF-16 would very logically be []uint16 - and if you were to ignore that fact, you'd be in for some pain when trying to use I.E. unicode/utf16. Further, while it is explicitly true that Go strings are not defined to be UTF-8 data, using them to hold arbitrary data very seldom makes sense; as you probably know, performing a range over a string actually pulls one codepoint at a time, making the assumption that the string is in fact UTF-8.

But I don't think that's the crux of the problem here. I think we need to think about this statement more:

We shouldn't force C's shortcomings on every application

On its own, true: that wouldn't make any sense to do so. However, it's worth noting that this is not that. Supporting C-style strings using the string type would be at most an idiosyncrasy rather than a limitation, given that otherwise []byte would work fine. On the other hand, I think there's a compelling argument for why it should, and it's in the title of the library and first line of the README:

Struc exists to pack and unpack C-style structures from bytes

And I'd argue C is the primary reason why we're jamming null-terminated strings into fixed-size arrays, making it compelling to support the case better.

At the same time, I also would shy away from making this sort of magic happen automatically, especially since people already use the library. Instead, I am in favor of supporting a flag that would turn on this behavior.

Disclaimer and shameless plug: I developed a similar library that behaves the same in this case. It does not support a flag for this either, as of writing.

@jblackman
Copy link
Author

In general, the application that originated the structure will be quite prescriptive about the encoding of the string/byte array. It also happens (has happened to me) that the rule may change (as they choose to adopt a more recent encoding standard rather than chucking the whole mess away and using JSON). I would be more inclined to introduce an "encoding" tag of some sort to manage the transition, and give the developer some sugar. For example:

 Str string `struc:"[100]byte,utf16"`

which would say that you're receiving 100 bytes, to be interpreted as UTF-16 (and null padding stripped accordingly). The default could then be 8-bit ASCII and null padding auto-stripped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants