Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GoString and CString functions to match C package API #121

Open
TotallyGamerJet opened this issue Apr 1, 2023 · 26 comments
Open

Add GoString and CString functions to match C package API #121

TotallyGamerJet opened this issue Apr 1, 2023 · 26 comments
Labels
enhancement New feature or request

Comments

@TotallyGamerJet
Copy link
Collaborator

When using the the special C package there are a few functions that convert from Go types to C types and vice versa. When porting code that uses C it's annoying not having an equivalent to these functions. Especially, GoString and CString which convert a C string to a Go string and vice versa. Even ebitengine had to have it's own implementation and writing these over and over is begging for mistakes. Purego already uses it's own internal versions.

I purpose adding these two functions to purego with the following signatures.

// Cstring converts a Go string to a null-terminated C string.
// The C string is allocated in the C heap using malloc.
// It is the caller's responsibility to arrange for it to be
// freed, such as by calling free.
func CString(string) *byte

// GoString converts a null-terminated C string and copies it into
// a Go allocated string.
func GoString(*byte) string

I believe these are the most used functions so should be added first. If there is strong demand later for the other ones a new issue can be created.

@hajimehoshi
Copy link
Member

hajimehoshi commented Apr 1, 2023

Does CString add a null-termination automatically?

// It is the caller's responsibility to arrange for it to be
// freed, such as by calling free.

How does a user call free with purego?

@TotallyGamerJet
Copy link
Collaborator Author

Does CString add a null-termination automatically?

Yes. That's what the C package does. Matching it's behavior would be the least surprising

How does a user call free with purego?

var free func(unsafe.Pointer)
purego.RegisterFunc(&free, freeHandle)
free(unsafe.Pointer(cStr))

And freeHandle can be gotten using Dlsym

@hajimehoshi
Copy link
Member

hajimehoshi commented Apr 1, 2023

Yes. That's what the C package does. Matching it's behavior would be the least surprising

Sure, I'm fine if this is explicitly documented.

EDIT: I missed that the comment already said so: "Cstring converts a Go string to a null-terminated C string." Thanks!

And freeHandle can be gotten using Dlsym

So do we dlopen libc?

@TotallyGamerJet
Copy link
Collaborator Author

So do we dlopen libc?

I don't think purego should export free nor libc in its API. How someone wants to free it is up to the user

@hajimehoshi
Copy link
Member

hajimehoshi commented Apr 1, 2023

I don't think purego should export free nor libc in its API. How someone wants to free it is up to the user

Doesn't this depend on how CString allocates the string data?

@hajimehoshi
Copy link
Member

If CString uses malloc from libc, a user would have to use free from libc. I'd like to know your plan how to implement CString.

@TotallyGamerJet
Copy link
Collaborator Author

If CString uses malloc from libc, a user would have to use free from libc.

That is true. I'm not sure what signature to use though so leaving it up to the user seems best. Does it take a *byte since that's what CString returns or unsafe.Pointer but then there are a bunch of unsafe.Pointer casts everywhere? I think part of purego is to make it more type safe to call into C not less.

Here is an example of what I was thinking.

func getSystemLibrary() string {
	switch runtime.GOOS {
	case "darwin":
		return "/usr/lib/libSystem.B.dylib"
	case "linux":
		return "libc.so.6"
	default:
		panic(fmt.Errorf("GOOS=%s is not supported", runtime.GOOS))
	}
}

var malloc func(size int) *byte

func init() {
	libc, err := purego.Dlopen(getSystemLibrary(), purego.RTLD_LAZY|purego.RTLD_GLOBAL)
	if err != nil {
		panic(fmt.Errorf("could not open libc: %w", err))
	}
	purego.RegisterLibFunc(&malloc, libc, "malloc")
}

// CString converts a Go string to a null-terminated C string.
// The C string is allocated in the C heap using malloc.
// It is the caller's responsibility to arrange for it to be
// freed, such as by calling free.
func CString(s string) *byte {
	if len(s) == 0 {
		return nil
	}
	size := len(s) +1
	c := unsafe.Slice(malloc(size), size)
	copy(c, s)
	return &c[0]
}

@hajimehoshi
Copy link
Member

There is no way for users to know how to free the allocated memory, other than knowing what DLL/so is used, right? I don't think this is a good API...

By the way, does CString have to allocate the memory on C heap, not Go heap? If so, I was wondering why.

@TotallyGamerJet
Copy link
Collaborator Author

Perhaps then returning a Free function that frees the string?

func CString(s string) (cStr *byte, free func())

By the way, does CString have to allocate the memory on C heap, not Go heap? If so, I was wondering why.

It's the same reason for Cgo. If the C code retains the pointer but the Go code thinks it needs to be garbage collected (since it can't see into the C code) then memory corruption can happen. It's safe to pass a Go string into C if the C code doesn't keep the pointer.

@hajimehoshi
Copy link
Member

Perhaps then returning a Free function that frees the string?

I think this makes sense, though this would vary from the original Cgo's CString.

It's the same reason for Cgo. If the C code retains the pointer but the Go code thinks it needs to be garbage collected (since it can't see into the C code) then memory corruption can happen. It's safe to pass a Go string into C if the C code doesn't keep the pointer.

As the users have to control freeing the value, would this work for example?

func CString(s string) (*byte, func()) {
    bs := []byte(s)
    if len(bs) == 0 || bs[len(bs)-1] == 0 {
        bs = append(bs, 0)
    }
    return &bs[0], func() { runtime.KeepAlive(bs) }
}

@TotallyGamerJet
Copy link
Collaborator Author

I think this makes sense, though this would vary from the original Cgo's CString.

True. I'm not sure there is a nice way to free while keeping the original API.

As the users have to control freeing the value, would this work for example?

I don't see why not since either way it is the responsibility of the caller to ensure the string is freed properly. Although what happens if the free function is ignored (cStr, _ = CString(...))? I do think it's better to make([]byte, len(s)+1) instead of casting since casting then appending is likely to be two allocations instead of one.

Perhaps, something like this?

func CString(str string) (*byte, func()) {
        var b []byte
        if len(str) == 0 {
               b = []byte{0x00}
	} else if hasSuffix(str, "\x00") {
                b = []byte(str)
	} else {
                b = make([]byte, len(str)+1)
                copy(b, str)
        }
	return &b[0], func(){ runtime.KeepAlive(b) }
}

Also, do we want the implementation to turn an empty string into a C string that only has null character instead of returning nil?

@TotallyGamerJet
Copy link
Collaborator Author

Although. What if the C function never releases its access to the string? Shouldn't it than be allocated on the C heap?

@hajimehoshi
Copy link
Member

hajimehoshi commented Apr 3, 2023

I don't see why not since either way it is the responsibility of the caller to ensure the string is freed properly. Although what happens if the free function is ignored (cStr, _ = CString(...))?

I see. Then what about keeping the string in a global slice or a global map in Go?

I do think it's better to make([]byte, len(s)+1) instead of casting since casting then appending is likely to be two allocations instead of one.

My code snippet is just a prototype so if there is a more efficient way, let's adopt this :-)

Also, do we want the implementation to turn an empty string into a C string that only has null character instead of returning nil?

Not sure. Let's sync with the original CString.

Although. What if the C function never releases its access to the string? Shouldn't it than be allocated on the C heap?

I don't see the point. Freeing is still the user's responsibility, right?

@hajimehoshi
Copy link
Member

hajimehoshi commented Apr 3, 2023

For example:

func CString(str string) (*byte, func()) {
        var b []byte
        if len(str) == 0 {
               b = []byte{0x00}
	} else if hasSuffix(str, "\x00") {
                b = []byte(str)
	} else {
                b = make([]byte, len(str)+1)
                copy(b, str)
        }
        key := uintptr(unsafe.Pointer(&b[0]))
        // TODO: Need to protect this by a mutex
        globalStringMap[key] = b
	return &b[0], func(){ delete(globalStringMap, key) }
}

@TotallyGamerJet
Copy link
Collaborator Author

I just remembered why the string needs to be on the C heap. The reason is because of Go's growing and shrinking stack. If C saves the string and then later Go grows the stack the C string pointer will point to random memory. This is bad.

@hajimehoshi
Copy link
Member

The reason is because of Go's growing and shrinking stack. If C saves the string and then later Go grows the stack the C string pointer will point to random memory.

Does this problem occur even if a Go string is allocated in a Go heap?

@TotallyGamerJet
Copy link
Collaborator Author

Hmm maybe it isn't an issue cuz the heap isn't affected by the stack growth.

@hajimehoshi
Copy link
Member

My guess is that allocated memory by the original CString might be freed in the C side, though I have never seen such usage. Unfortunately, in our purego, it might be impossible to guarantee that the memory allocator by libc selected by purego matches with the library's allocator.

@TotallyGamerJet
Copy link
Collaborator Author

Well, Cgo assumes the default implementation that is used by the system libc. So if they assume that can't we?

@hajimehoshi
Copy link
Member

Well, Cgo assumes the default implementation that is used by the system libc. So if they assume that can't we?

Maybe yes? Anyway, I'm not so strongly against using C allocators, but I feel like using Go allocators is easier and more efficient.

@CannibalVox
Copy link

return &b[0], func(){ delete(globalStringMap, key) }

This will be 4 allocations due to the closure, won't it?

re: auto-libc/malloc/free, I think if people have strong opinions about how they want their C memory allocated, they are likely to import the library of their choice themselves, I think having a common default malloc/free available in purego would be uncontroversial.

Personally, I find that mallocs tend to be more efficient than go allocations and so if I'm working in an environment where I have to manually free memory anyway and it wont' confuse the garbage collector, I'll use C memory. Passing go memory you had to allocate anyway into C instead of C memory is a performance win, but if you're already having to allocate a separate string, and you already have to manually free it, and you're already having to play games with the garbage collector, I'd say just allocate it in C and be done with it.

@hajimehoshi
Copy link
Member

I see. Allocating memory in the C world seems better in terms of the amount of memory allocations. I'm fine with it now.

@hajimehoshi
Copy link
Member

Another possible suggestion is to use syscall.Mmap, but I don't think this is feasible

@TotallyGamerJet TotallyGamerJet added the enhancement New feature or request label Apr 10, 2023
@TotallyGamerJet
Copy link
Collaborator Author

code
package purego_test

import (
	"fmt"
	"runtime"
	"strings"
	"testing"
	"unsafe"

	"github.com/ebitengine/purego"
)

const str = `Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce scelerisque fringilla ligula ut finibus. Aenean iaculis convallis erat, ornare tristique turpis. Fusce scelerisque dolor quis faucibus varius. In condimentum risus pharetra augue laoreet convallis. Quisque bibendum consequat ante sit amet molestie. Curabitur vel quam posuere, condimentum turpis sit amet, dapibus massa. In egestas tellus sed ante laoreet tempor.

Fusce ipsum velit, viverra in porttitor in, ultrices quis augue. Nulla urna ante, tincidunt eu ante nec, tincidunt sagittis felis. Curabitur ut imperdiet augue. Etiam nec ligula convallis, convallis leo ut, pulvinar tellus. Proin imperdiet augue ac ipsum scelerisque bibendum. Quisque id egestas turpis. Sed nec elit convallis, lacinia mi a, sagittis purus. Morbi euismod ut nunc at tempus. Cras rhoncus ut nisi at consequat. Suspendisse pellentesque erat arcu, nec volutpat sapien gravida ut.

Vestibulum elementum et nulla vitae malesuada. Suspendisse ut diam mi. Nullam non urna dignissim, cursus ipsum sed, pellentesque enim. Aliquam erat volutpat. Morbi id maximus ligula. Fusce lacinia est et nulla congue euismod. Quisque pulvinar ante magna, sed dictum nibh porta non. Etiam interdum nisl sed feugiat lobortis. Aenean in nunc nunc. Praesent mollis dapibus velit nec porttitor. Ut dictum sem nec condimentum vehicula. Nullam tempor lectus vitae venenatis euismod. Suspendisse potenti. Phasellus consectetur tincidunt nulla id aliquet. Pellentesque sem lorem, commodo at nisi a, iaculis aliquam nisi. In quis lacinia purus, ut rhoncus orci.

Nam varius pellentesque eros ac ullamcorper. Aenean nec nisl ultricies, iaculis sem vel, iaculis ex. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Sed vel hendrerit dolor. Suspendisse blandit venenatis iaculis. Phasellus vel convallis purus. Sed gravida sit amet ipsum id fringilla. In at risus vel lorem faucibus imperdiet. Curabitur id mattis est. Cras cursus egestas turpis, a aliquet dolor semper nec. Praesent dictum orci non diam convallis consectetur. Duis imperdiet, ante non tempus lacinia, nunc nisi feugiat elit, at euismod urna erat a nibh. Vestibulum vehicula vestibulum nulla sed dignissim. Sed quis nisi mauris. Morbi euismod tellus est, eget aliquam justo lacinia sed. Donec id dignissim dolor.

Cras blandit facilisis nulla, ut tincidunt purus vehicula vel. Suspendisse potenti. Maecenas porta vehicula tortor, ac porta eros lobortis vitae. Donec egestas diam velit, nec posuere enim vulputate eu. Nulla vulputate tempus dolor, eu convallis purus pellentesque sit amet. Mauris feugiat quam libero, sed molestie neque maximus nec. Mauris finibus vehicula eros sed euismod.`

var globalStringMap = map[uintptr][]byte{}

//go:noinline
func CStringMap(str string) (*byte, func()) {
	var b []byte
	if len(str) == 0 {
		b = []byte{0x00}
	} else if strings.HasSuffix(str, "\x00") {
		b = []byte(str)
	} else {
		b = make([]byte, len(str)+1)
		copy(b, str)
	}
	key := uintptr(unsafe.Pointer(&b[0]))
	// TODO: Need to protect this by a mutex
	globalStringMap[key] = b
	return &b[0], func() { delete(globalStringMap, key) }
}

//go:noinline
func CStringGo(str string) (*byte, func()) {
	var b []byte
	if len(str) == 0 {
		b = []byte{0x00}
	} else if strings.HasSuffix(str, "\x00") {
		b = []byte(str)
	} else {
		b = make([]byte, len(str)+1)
		copy(b, str)
	}
	return &b[0], func() { runtime.KeepAlive(b) }
}

func libc() string {
	switch runtime.GOOS {
	case "darwin":
		return "/usr/lib/libSystem.B.dylib"
	case "linux":
		return "libc.so.6"
	default:
		panic(fmt.Errorf("GOOS=%s is not supported", runtime.GOOS))
	}
}

var (
	malloc func(size int) *byte
	free   func(*byte)
)

func init() {
	libc, err := purego.Dlopen(libc(), purego.RTLD_LAZY|purego.RTLD_GLOBAL)
	if err != nil {
		panic(fmt.Errorf("could not open libc: %w", err))
	}
	purego.RegisterLibFunc(&malloc, libc, "malloc")
	purego.RegisterLibFunc(&free, libc, "free")
}

//go:noinline
func CStringC(s string) (*byte, func()) {
	size := len(s) + 1
	c := unsafe.Slice(malloc(size), size)
	copy(c, s)
	return &c[0], func() { free(&c[0]) }
}

func BenchmarkCStringMap(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		_, free := CStringMap(str)
		free()
	}
}

func BenchmarkCStringGo(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		_, free := CStringGo(str)
		free()
	}
}

func BenchmarkCStringC(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		_, free := CStringC(str)
		free()
	}
}

Instead of guessing which is the fastest implementation I went ahead and wrote benchmarks for each of the discussed implementations. Here are my results:

goos: darwin
goarch: arm64
pkg: github.com/ebitengine/purego
BenchmarkCStringMap
BenchmarkCStringMap-10    	 4506553	       273.7 ns/op	    3088 B/op	       2 allocs/op
BenchmarkCStringGo
BenchmarkCStringGo-10     	 4570737	       265.9 ns/op	    3104 B/op	       2 allocs/op
BenchmarkCStringC
BenchmarkCStringC-10      	 1704427	       702.8 ns/op	     512 B/op	      12 allocs/op
PASS

Both the map and Go version only have two allocs: one for the slice and the other for the closure. The C version is significantly slower probably due to the overhead of RegisterLibFunc. Perhaps its performance could be improved but I doubt it will get anywhere near the two go versions. Personally, I like the plain Go version because it is easy to understand. The only reason that we were leaning away from it is because if the free function is ignored the GC could remove the pointer at any time. It would only be a problem if one wanted to allocate a C string and then never free it later (for a global). Either way, map is still quite fast and doesn't have the downside of undefined behavior when ignoring the free function.

I just thought of another possibility. Say, the C string is allocated in Go but then later some C code will choose to free it. With this mechanism of allocating in the Go heap that won't work.

TotallyGamerJet added a commit to TotallyGamerJet/purego that referenced this issue Apr 10, 2023
This rewrites part of the RegisterFunc to have less code escape to the heap. Running the benchmark inside ebitengine#121 shows a speed increase and alloc decrease.

goos: darwin
goarch: arm64
pkg: github.com/ebitengine/purego
BenchmarkCStringC
BenchmarkCStringC-10    	 2294838	       500.3 ns/op	     472 B/op	       8 allocs/op
PASS
TotallyGamerJet added a commit that referenced this issue Apr 10, 2023
This rewrites part of the RegisterFunc to have less code escape to the heap. Running the benchmark inside #121 shows a speed increase and alloc decrease.

goos: darwin
goarch: arm64
pkg: github.com/ebitengine/purego
BenchmarkCStringC
BenchmarkCStringC-10    	 2294838	       500.3 ns/op	     472 B/op	       8 allocs/op
PASS
@hajimehoshi
Copy link
Member

hajimehoshi commented Sep 13, 2023

From #161 (comment)

In RegisterFunc if the Go string ends in a null terminated byte then it is not copied and just passed as is. Only if it doesn't is it copied into Go memory and kept alive for the call but the API makes no guarantee when it will be freed. This behavior is documented in #154

Right, I was misunderstanding. Thanks. So now

(arguments)
Go string w/o \0 termination -> C char* (no copy)
Go string w/ \0 termination -> C char* (copy. the lifetime is the function call)
Go *byte -> C char* (no copy)

(returns)
C char* w/o \0 termination -> Go *byte (no copy)
C char* w/ \0 termination -> Go string (copy) or Go *byte (no copy)
(The termination is checked when a Go string is specified)

is my understanding. Is this correct?

@TotallyGamerJet
Copy link
Collaborator Author

Yes your understanding is correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants