You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is where the following code from lib/saito/transaction.ts dictates metadata for the transaction whose bytes come right after the 229 header bytes:
Says that there are four values to be grabbed from the first 4 groups of 4 bytes 229 past the start of the block. It seems I am on the correct track. This is the corresponding point in the list from Haskell.
[0,0,0,1,0,0,0,1,0,0,3,228,0,0,0,1...]
Here are what they should mean according to the typescript code above:
0,0,0,1 means one tx input - that's correct 0,0,0,1 means one tx output - that's correct 0,0,3,228 is the message length. 3 * 256 + 228 = 996 bytes 0,0,0,1 is the path_len - not sure exactly but it seems to line up
The test message I included in the transaction is meant to be easy to spot in any encoding or format:
That message above is 247 characters long. With each character encoded in 4 bytes, 247 * 4 = 988 which is just less than the message length of 996 encoded as 0,0,3,228. So that makes sense.
I was expecting to see clean repetitions which made the message and the relationship between characters obvious.
In practice, while repeating sequences in the binary data do occur, the length's do not match up with the length's of the plaintext above and often times vary in unexpected ways.
Using this block of code from lines 190... of the transaction.ts file, I figured the actual message bytes should begin (95+75+75) bytes past the start of the transaction metadata.
consttransaction_type=buffer[start_of_transaction_data+92];conststart_of_inputs=start_of_transaction_data+TRANSACTION_SIZE;// Tx size (given on line 8) is 95, times 1 input = 95conststart_of_outputs=start_of_inputs+inputs_len*SLIP_SIZE;// Slip size (given on line 9) is 75, times 1 output = 75conststart_of_message=start_of_outputs+outputs_len*SLIP_SIZE;// 95 + 75 + 75conststart_of_path=start_of_message+message_len;//...constmessage=buffer.slice(start_of_message,start_of_message+message_len);// 95 + 75 + 75
This could have a mistake, but it certainly puts us in the ballpark of where the message should be sitting within the bytes. Here is the first notable repeating sequence which might represent repeating plaintext characters that shows up:
But it isn't long enough to simply decode as 'each repetition is one a." Generally there is a lot more repetition around this area where counting bytes indicates the message should live, but it doesn't seem to match in any simple way to the test message it should represent above.
I believed it would be as simple as taking repeating groups, and figuring out which character form the test text in the message they correspond to, but the groups never repeat consistently and the lengths are often much shorter than the amount of character repeated in the plaintext. I can post the full list if its helpful to anyone.
This is fairly niche, but if someone is willing to point me in the correct direction to finish decoding this it would be useful building apps which require some more low level functionality. I'm looking to decode the /blocks folder in real time modularly and feed relevant information into other programs like smart contract nodes, and, well, sky's the limit.
It may be as simple as be digging a little bit deeper into the source code and JS primitives, but in the mean time I'll drop this here.
Update
I've started fresh with cleaner results. The test message this time is a Saitolicious post with:
I've grouped the bytes into four since every indication seems to point that way. There are now three groups in the byte array which clearly delineate these long lines of characters.
The line of 'a' is this sequence repeated 195 times: [89,87,70,104]
195 x 3 = 585 ~ 587 'a's
Line of 'b' is this repeated 184 times [89,109,74,105]
184 x 3 = 552 ~ 559 'c's
Line of 'c' is this repeated 186 times [89,50,78,106]
186 x 3 = 558 ~561 'c's
Assuming these byte groupings correspond to three of each character gets us very close. The minor differences are probably explained by similar but not exactly equal byte arrays which sometimes come at the start and/or end of the repeating sequences.
The 'a' group of bytes ends with [89,87,70,104],[89,87,69,56]
The 'b' group of bytes ends with [89,109,74,105],[89,109,73,56]
The 'c' group of bytes starts with [80,109,78,106],[89,50,78,106]
and ends with [89,50,78,106],[89,122,120,105]
These are surely just the case when the group of characters don't neatly fit into groups of three - since it seems fairly clear there are three characters to four bytes.
I haven't figured out how to get the text out just yet, but this should be a much clearer starting point.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Update Below
Interested in anyone that can help:
I'm on a mission to decode the binary block data stored from the saito-lite-rust client written in JS. I've made some decent progress:
Firstly I've gotten the binary data into Haskell using:
This gives a list of type
GHC.Word.Word8
- in practice its a list of unsigned integers that wraps at 256.According to lines
6
and158
oflib/saito/block.ts
the header bytes end at position229
.This is where the following code from
lib/saito/transaction.ts
dictates metadata for the transaction whose bytes come right after the229
header bytes:Says that there are four values to be grabbed from the first 4 groups of 4 bytes
229
past the start of the block. It seems I am on the correct track. This is the corresponding point in the list from Haskell.[0,0,0,1,0,0,0,1,0,0,3,228,0,0,0,1...]
Here are what they should mean according to the typescript code above:
0,0,0,1
means one tx input - that's correct0,0,0,1
means one tx output - that's correct0,0,3,228
is the message length. 3 * 256 + 228 = 996 bytes0,0,0,1
is thepath_len
- not sure exactly but it seems to line upThe test message I included in the transaction is meant to be easy to spot in any encoding or format:
That message above is 247 characters long. With each character encoded in 4 bytes, 247 * 4 = 988 which is just less than the message length of 996 encoded as
0,0,3,228.
So that makes sense.I was expecting to see clean repetitions which made the message and the relationship between characters obvious.
In practice, while repeating sequences in the binary data do occur, the length's do not match up with the length's of the plaintext above and often times vary in unexpected ways.
Using this block of code from lines 190... of the
transaction.ts
file, I figured the actual message bytes should begin(95+75+75)
bytes past the start of the transaction metadata.This could have a mistake, but it certainly puts us in the ballpark of where the message should be sitting within the bytes. Here is the first notable repeating sequence which might represent repeating plaintext characters that shows up:
But it isn't long enough to simply decode as 'each repetition is one a." Generally there is a lot more repetition around this area where counting bytes indicates the message should live, but it doesn't seem to match in any simple way to the test message it should represent above.
I believed it would be as simple as taking repeating groups, and figuring out which character form the test text in the message they correspond to, but the groups never repeat consistently and the lengths are often much shorter than the amount of character repeated in the plaintext. I can post the full list if its helpful to anyone.
This is fairly niche, but if someone is willing to point me in the correct direction to finish decoding this it would be useful building apps which require some more low level functionality. I'm looking to decode the
/blocks
folder in real time modularly and feed relevant information into other programs like smart contract nodes, and, well, sky's the limit.It may be as simple as be digging a little bit deeper into the source code and JS primitives, but in the mean time I'll drop this here.
Update
I've started fresh with cleaner results. The test message this time is a Saitolicious post with:
587 lowercase 'a's
linebreak
559 lowercase 'b's
linebreak
561 lowercase 'c's
I've grouped the bytes into four since every indication seems to point that way. There are now three groups in the byte array which clearly delineate these long lines of characters.
The line of 'a' is this sequence repeated 195 times:
[89,87,70,104]
195 x 3 = 585 ~ 587 'a's
Line of 'b' is this repeated 184 times
[89,109,74,105]
184 x 3 = 552 ~ 559 'c's
Line of 'c' is this repeated 186 times
[89,50,78,106]
186 x 3 = 558 ~561 'c's
Assuming these byte groupings correspond to three of each character gets us very close. The minor differences are probably explained by similar but not exactly equal byte arrays which sometimes come at the start and/or end of the repeating sequences.
The 'a' group of bytes ends with
[89,87,70,104],[89,87,69,56]
The 'b' group of bytes ends with
[89,109,74,105],[89,109,73,56]
The 'c' group of bytes starts with
[80,109,78,106],[89,50,78,106]
and ends with
[89,50,78,106],[89,122,120,105]
These are surely just the case when the group of characters don't neatly fit into groups of three - since it seems fairly clear there are three characters to four bytes.
I haven't figured out how to get the text out just yet, but this should be a much clearer starting point.
Beta Was this translation helpful? Give feedback.
All reactions