-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encode file contents as base64 when the result is smaller #13
base: flatcar-master
Are you sure you want to change the base?
Conversation
… smaller Ignition encodes file contents in the dataurl format. Dataurl supports either url-escaped or base64 encoded content. CLCT always url-escapes content, but there are cases where base64 results in smaller text. One example is compressed content, which ignition should support but ours doesn't due to a check which can be removed (see [1]). So this is step 1 in enabling CLCT to transparently compress data. Data is encoded as both url-escaped and base64 and the result lengths are tested to see which one to use. It is possible to get CLCT to ingest gzipped data by manually doing 'gzip+base64', using some yaml magic and abusing CLCT parsing (slightly): storage: files: - path: /file/path mode: 0644 contents: remote: compression: gzip inline: !!binary | H4sIAAAAAAAAA8tIzcnJ5wIAIDA6NgYAAAA= This results in spec compliant ignition configuration, that is not yet handled by our version of the ignition program. This is based on similar work done in butane[2]. [1]: coreos/ignition@f770885 [2]: https://github.com/coreos/butane/blob/07daeed0f1113acffbc21b47e2ecc997d44e7943/base/util/url.go#L31-L37
@invidian I added a test for the |
Signed-off-by: Jeremi Piotrowski <[email protected]>
122ae99
to
324132f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some extra suggestions, sorry if it's too much
"github.com/vincent-petithory/dataurl" | ||
"strings" | ||
"testing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3rd party imports are usually separated from stdlib imports.
"github.com/vincent-petithory/dataurl" | |
"strings" | |
"testing" | |
"strings" | |
"testing" | |
"github.com/vincent-petithory/dataurl" |
@@ -129,6 +130,19 @@ func (fc FileContents) Validate() report.Report { | |||
return report.Report{} | |||
} | |||
|
|||
func makeDataUrl(contents []byte) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest putting private function below the init() function, which tends to be at the top of the file.
func makeDataUrl(contents []byte) string { | ||
opaque := "," + dataurl.Escape(contents) | ||
b64 := ";base64," + base64.StdEncoding.EncodeToString(contents) | ||
if len(b64) < len(opaque) { | ||
opaque = b64 | ||
} | ||
uri := (&url.URL{ | ||
Scheme: "data", | ||
Opaque: opaque, | ||
}).String() | ||
return uri | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think formatting can be more compact:
func makeDataUrl(contents []byte) string { | |
opaque := "," + dataurl.Escape(contents) | |
b64 := ";base64," + base64.StdEncoding.EncodeToString(contents) | |
if len(b64) < len(opaque) { | |
opaque = b64 | |
} | |
uri := (&url.URL{ | |
Scheme: "data", | |
Opaque: opaque, | |
}).String() | |
return uri | |
} | |
func makeDataUrl(contents []byte) string { | |
opaque := "," + dataurl.Escape(contents) | |
if b64 := ";base64," + base64.StdEncoding.EncodeToString(contents); len(b64) < len(opaque) { | |
opaque = b64 | |
} | |
return (&url.URL{ | |
Scheme: "data", | |
Opaque: opaque, | |
}).String() | |
} |
b64
declaration can be moved outside of if
block though, no preferences here.
package types | ||
|
||
import ( | ||
"github.com/vincent-petithory/dataurl" | ||
"strings" | ||
"testing" | ||
) | ||
|
||
func TestDataURL(t *testing.T) { | ||
testString1 := "hello world" | ||
dataUrl1 := makeDataUrl([]byte(testString1)) | ||
decodedUrl1, err := dataurl.DecodeString(dataUrl1) | ||
if err != nil { | ||
t.Fatalf("DecodeString #1 failed: %v", err) | ||
} | ||
resultStr1 := string(decodedUrl1.Data) | ||
if strings.HasPrefix(dataUrl1, "data:;base64,") { | ||
t.Errorf("base64 encoding used instead of url encoding: %v", dataUrl1) | ||
} | ||
if resultStr1 != testString1 { | ||
t.Errorf("wanted %v, got %v", testString1, resultStr1) | ||
} | ||
|
||
testString2 := ` | ||
#!/bin/bash | ||
msg="hello" | ||
f() { | ||
( echo "${msg}" ) & | ||
} | ||
f` | ||
dataUrl2 := makeDataUrl([]byte(testString2)) | ||
if !strings.HasPrefix(dataUrl2, "data:;base64,") { | ||
t.Errorf("base64 encoding missing: %v", dataUrl2) | ||
} | ||
decodedUrl2, err := dataurl.DecodeString(dataUrl2) | ||
if err != nil { | ||
t.Fatalf("DecodeString #2 failed: %v", err) | ||
} | ||
resultStr2 := string(decodedUrl2.Data) | ||
if resultStr2 != testString2 { | ||
t.Errorf("wanted %v, got %v", testString2, resultStr2) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about some property-based blackbox tests like below? I believe they communicate more about the underlying code and they are not dependent on the implementation so much.
They also capture the case when base64 encoding results in the same data as url encoding, which in this case I guess we should choose simpler url encoding? We could benchmark here, but that seems like an overkill 😄
package types | |
import ( | |
"github.com/vincent-petithory/dataurl" | |
"strings" | |
"testing" | |
) | |
func TestDataURL(t *testing.T) { | |
testString1 := "hello world" | |
dataUrl1 := makeDataUrl([]byte(testString1)) | |
decodedUrl1, err := dataurl.DecodeString(dataUrl1) | |
if err != nil { | |
t.Fatalf("DecodeString #1 failed: %v", err) | |
} | |
resultStr1 := string(decodedUrl1.Data) | |
if strings.HasPrefix(dataUrl1, "data:;base64,") { | |
t.Errorf("base64 encoding used instead of url encoding: %v", dataUrl1) | |
} | |
if resultStr1 != testString1 { | |
t.Errorf("wanted %v, got %v", testString1, resultStr1) | |
} | |
testString2 := ` | |
#!/bin/bash | |
msg="hello" | |
f() { | |
( echo "${msg}" ) & | |
} | |
f` | |
dataUrl2 := makeDataUrl([]byte(testString2)) | |
if !strings.HasPrefix(dataUrl2, "data:;base64,") { | |
t.Errorf("base64 encoding missing: %v", dataUrl2) | |
} | |
decodedUrl2, err := dataurl.DecodeString(dataUrl2) | |
if err != nil { | |
t.Fatalf("DecodeString #2 failed: %v", err) | |
} | |
resultStr2 := string(decodedUrl2.Data) | |
if resultStr2 != testString2 { | |
t.Errorf("wanted %v, got %v", testString2, resultStr2) | |
} | |
} | |
package types_test | |
import ( | |
"encoding/base64" | |
"strings" | |
"testing" | |
"github.com/vincent-petithory/dataurl" | |
"github.com/flatcar-linux/container-linux-config-transpiler/config/types" | |
) | |
const ( | |
base64overhead = len(";base64") | |
baseOverhead = len("data:,") | |
) | |
func Test_When_converting_files_inline_content_gets(t *testing.T) { | |
t.Parallel() | |
t.Run("url_encoded_when_base64_encoding", func(t *testing.T) { | |
t.Parallel() | |
for name, testInput := range map[string]string{ | |
"increase_contents_size": "hello world", | |
"does_not_change_contents_size": "1234567 aa", | |
} { | |
testInput := testInput | |
t.Run(name, func(t *testing.T) { | |
t.Parallel() | |
c := types.Config{ | |
Storage: types.Storage{ | |
Files: []types.File{ | |
{ | |
Path: "/foo", | |
Contents: types.FileContents{ | |
Inline: testInput, | |
}, | |
}, | |
}, | |
}, | |
} | |
f, r := types.Convert(c, "", nil) | |
if r.IsFatal() { | |
t.Fatalf("Unexpected conversion error: %v", r) | |
} | |
if strings.Contains(f.Storage.Files[0].Contents.Source, "base64") { | |
t.Fatalf("Expected data to be URl-encoded, got: %q", f.Storage.Files[0].Contents.Source) | |
} | |
b64encodedData := base64.StdEncoding.EncodeToString([]byte(testInput)) | |
b64encodedDataLengthWithOverhead := base64overhead + baseOverhead + len(b64encodedData) | |
if b64encodedDataLengthWithOverhead < len(f.Storage.Files[0].Contents.Source) { | |
t.Fatalf("Expected encoded file contents to be smaller than base64 encoded data with overhead") | |
} | |
}) | |
} | |
}) | |
t.Run("base64_encoded_when_it_reduces_content_size", func(t *testing.T) { | |
base64reducibleData := ` | |
#!/bin/bash | |
msg="hello" | |
f() { | |
( echo "${msg}" ) & | |
} | |
f` | |
c := types.Config{ | |
Storage: types.Storage{ | |
Files: []types.File{ | |
{ | |
Path: "/foo", | |
Contents: types.FileContents{ | |
Inline: base64reducibleData, | |
}, | |
}, | |
}, | |
}, | |
} | |
f, r := types.Convert(c, "", nil) | |
if r.IsFatal() { | |
t.Fatalf("Unexpected conversion error: %v", r) | |
} | |
if !strings.Contains(f.Storage.Files[0].Contents.Source, "base64") { | |
t.Fatalf("Expected data to be base64-encoded, got: %q", f.Storage.Files[0].Contents.Source) | |
} | |
urlEncodedDataLengthWithOverhead := len(dataurl.Escape([]byte(base64reducibleData))) + baseOverhead | |
if urlEncodedDataLengthWithOverhead <= len(f.Storage.Files[0].Contents.Source) { | |
t.Fatalf("Expected encoded file contents to be smaller than base64 encoded data with overhead.") | |
} | |
}) | |
} |
encode file contents as base64 when the result is smaller
This is supported by ignition and dataurl since forever, but clct did not produce base64 encoded strings.
How to use
Try encoding a binary file with clct:
and observe the ignition json.
Testing done
Successfully booted stable images with resulting ignition.