[bugfix] Fix handling of bytes #124

brighton1101 · 2021-07-22T16:39:49Z

Closes #79

One of our operators has an optional field that requires bytes. If that field is populated, it will error out when it gets formatted. This should fix that.

vchiapaikeo · 2021-07-22T16:55:10Z

@brighton1101 , won't the leading b cause issues with decoding?

vchiapaikeo · 2021-07-22T16:58:44Z

I think this will turn it into a string and we don't have to worry about bytes after that?

https://github.com/etsy/boundary-layer/pull/125/files

vchiapaikeo

This should resolve but really appreciate the help here!

https://github.com/etsy/boundary-layer/pull/125/files

brighton1101 · 2021-07-22T17:31:44Z

@vchiapaikeo I think that hack is alright, but the leading b means that it is a byte literal. It has nothing to do with encoding/decoding by design, since the contents are only bytes. Python docs.

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

Afaik Calling str on a byte literal just outputs the byte literal string like below:

>>> tm = "™".encode("utf-8")
>>> print(tm)
b'\xe2\x84\xa2'
>>> print(str(tm)) # here is what we'd be doing
b'\xe2\x84\xa2'
>>> str(tm)
"b'\\xe2\\x84\\xa2'"
>>> tm.decode("utf-8")
'™'

That operator is expecting bytes. I feel like it is a regression for boundary-layer to pass the incorrect parameter for a type just because it currently works.

brighton1101 · 2021-07-22T17:37:42Z

Also - for context this likely wasn't in boundary layer before because Python 2.x did not support byte literals. Since we don't support python 2 anymore, we don't have to worry about this.

vchiapaikeo · 2021-07-22T17:40:40Z

I think the problem here is that if you call the str function around a bytes type, it will fail to b64 decode later on. Example:

>>> encoded_bytes = base64.b64encode('{"name": "admin_threads", "version": 1, "send_to_bigquery": 1, "export_from_bigquery": 1, "copy_from_bigquery": 1}'.encode('utf-8'))
>>> 
>>> encoded_bytes
b'eyJuYW1lIjogImFkbWluX3RocmVhZHMiLCAidmVyc2lvbiI6IDEsICJzZW5kX3RvX2JpZ3F1ZXJ5IjogMSwgImV4cG9ydF9mcm9tX2JpZ3F1ZXJ5IjogMSwgImNvcHlfZnJvbV9iaWdxdWVyeSI6IDEsICJkYXRhZmxvd19zZXJ2aWNlX2FjY291bnQiOiAiZGF0YWZsb3ctZGV2LXBpaUBldHN5LWhhZG9vcC1zYW5kYm94LWRldi5pYW0uZ3NlcnZpY2VhY2NvdW50LmNvbSIsICJicV9wcm9qZWN0IjogImV0c3ktZGF0YS13YXJlaG91c2UtcHJvZCJ9'
>>> 
>>> str(encoded_bytes)
"b'eyJuYW1lIjogImFkbWluX3RocmVhZHMiLCAidmVyc2lvbiI6IDEsICJzZW5kX3RvX2JpZ3F1ZXJ5IjogMSwgImV4cG9ydF9mcm9tX2JpZ3F1ZXJ5IjogMSwgImNvcHlfZnJvbV9iaWdxdWVyeSI6IDEsICJkYXRhZmxvd19zZXJ2aWNlX2FjY291bnQiOiAiZGF0YWZsb3ctZGV2LXBpaUBldHN5LWhhZG9vcC1zYW5kYm94LWRldi5pYW0uZ3NlcnZpY2VhY2NvdW50LmNvbSIsICJicV9wcm9qZWN0IjogImV0c3ktZGF0YS13YXJlaG91c2UtcHJvZCJ9'"
>>> 
>>> # Now try to decode this
... 
>>> 
>>> base64.b64decode(str(encoded_bytes))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

Alternatively:

>>> encoded_bytes = base64.b64encode('{"name": "admin_threads", "version": 1, "send_to_bigquery": 1, "export_from_bigquery": 1, "copy_from_bigquery": 1, "dataflow_service_account": "[email protected]", "bq_project": "etsy-data-warehouse-prod"}'.encode('utf-8'))
>>> 
>>> encoded_bytes
b'eyJuYW1lIjogImFkbWluX3RocmVhZHMiLCAidmVyc2lvbiI6IDEsICJzZW5kX3RvX2JpZ3F1ZXJ5IjogMSwgImV4cG9ydF9mcm9tX2JpZ3F1ZXJ5IjogMSwgImNvcHlfZnJvbV9iaWdxdWVyeSI6IDEsICJkYXRhZmxvd19zZXJ2aWNlX2FjY291bnQiOiAiZGF0YWZsb3ctZGV2LXBpaUBldHN5LWhhZG9vcC1zYW5kYm94LWRldi5pYW0uZ3NlcnZpY2VhY2NvdW50LmNvbSIsICJicV9wcm9qZWN0IjogImV0c3ktZGF0YS13YXJlaG91c2UtcHJvZCJ9'
>>> 
>>> encoded_bytes.decode('utf-8')
'eyJuYW1lIjogImFkbWluX3RocmVhZHMiLCAidmVyc2lvbiI6IDEsICJzZW5kX3RvX2JpZ3F1ZXJ5IjogMSwgImV4cG9ydF9mcm9tX2JpZ3F1ZXJ5IjogMSwgImNvcHlfZnJvbV9iaWdxdWVyeSI6IDEsICJkYXRhZmxvd19zZXJ2aWNlX2FjY291bnQiOiAiZGF0YWZsb3ctZGV2LXBpaUBldHN5LWhhZG9vcC1zYW5kYm94LWRldi5pYW0uZ3NlcnZpY2VhY2NvdW50LmNvbSIsICJicV9wcm9qZWN0IjogImV0c3ktZGF0YS13YXJlaG91c2UtcHJvZCJ9'
>>> 
>>> decoded_bytes = encoded_bytes.decode('utf-8')
>>> 
>>> # This will decode properly
... 
>>> base64.b64decode(decoded_bytes)
b'{"name": "admin_threads", "version": 1, "send_to_bigquery": 1, "export_from_bigquery": 1, "copy_from_bigquery": 1}'

vchiapaikeo · 2021-07-22T17:41:47Z

Ah I think your code snippet only works this way in Py2 and not Py3

>>> tm = "™".encode("utf-8")
>>> print(tm)
b'\xe2\x84\xa2'
>>> print(str(tm)) # here is what we'd be doing
b'\xe2\x84\xa2'
>>> str(tm)
"b'\\xe2\\x84\\xa2'"
>>> tm.decode("utf-8")
'™'

vchiapaikeo · 2021-07-22T17:47:24Z

boundary_layer/builders/util.py

@@ -120,7 +120,7 @@ def format_value(value):

        return '{{ {items} }}'.format(items=','.join(pairs))

-    if isinstance(value, (int, float, type(None))):
+    if isinstance(value, (int, float, type(None), bytes)):


If we want to support bytes, we should just return the bytes and not pass it through the str function. e.g.,

if isinstance(value, bytes): return value

@vchiapaikeo I considered that, however since this function is returning exclusively strs apart from this, I wanted to keep the return type uniform. I understand that now with jinja this is not causing problems. However, what if someone decorated this method call with more logic thinking they were getting a str and instead got bytes? Maybe this is not a problem and i am overthinking

Add handling of bytes

09b4756

lepe92 requested a review from vchiapaikeo July 22, 2021 16:43

vchiapaikeo reviewed Jul 22, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] Fix handling of bytes #124

[bugfix] Fix handling of bytes #124

brighton1101 commented Jul 22, 2021

vchiapaikeo commented Jul 22, 2021

vchiapaikeo commented Jul 22, 2021 •

edited

Loading

vchiapaikeo left a comment

brighton1101 commented Jul 22, 2021

brighton1101 commented Jul 22, 2021

vchiapaikeo commented Jul 22, 2021

vchiapaikeo commented Jul 22, 2021 •

edited

Loading

vchiapaikeo Jul 22, 2021 •

edited

Loading

brighton1101 Jul 22, 2021

[bugfix] Fix handling of bytes #124

Are you sure you want to change the base?

[bugfix] Fix handling of bytes #124

Conversation

brighton1101 commented Jul 22, 2021

vchiapaikeo commented Jul 22, 2021

vchiapaikeo commented Jul 22, 2021 • edited Loading

vchiapaikeo left a comment

Choose a reason for hiding this comment

brighton1101 commented Jul 22, 2021

brighton1101 commented Jul 22, 2021

vchiapaikeo commented Jul 22, 2021

vchiapaikeo commented Jul 22, 2021 • edited Loading

vchiapaikeo Jul 22, 2021 • edited Loading

Choose a reason for hiding this comment

brighton1101 Jul 22, 2021

Choose a reason for hiding this comment

vchiapaikeo commented Jul 22, 2021 •

edited

Loading

vchiapaikeo commented Jul 22, 2021 •

edited

Loading

vchiapaikeo Jul 22, 2021 •

edited

Loading