You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to add a function to RowBatch which can estimate the batch size in bytes. Threre a couple of options which could be take here :-
convert the row batch to bytes by using json marshalling and calculate the size of the resulting byte slice. This is implemented in the following PR - Add estimated row batch size in bytes to state #230. @shuhaowu did some performance analysis and generated flame graph which clearly depicted the json marshalling method is slow and needs optimization.
reflect - just estimating the size of the different types in the rowdata which is of type []interface{} might suffice. The downside here is this might give us too much inaccurate estimate. For eg consider this rowdata :- ["mkm0", "mkm0a", 1234] . If we use the reflect package to calculate the size of this rowdata in bytes it gives us 40 where it calculates the size of the two strings as 16 bytes and the int as 8 bytes. But we can see that the first two strings mkm0 and mkm0a are different in lengths and the reflect package doesn't account for this.
On the same rowdata if we try to calculate the byte size using json marshalling we get the size as 21 which seems accurate.
So if we have a batch of 200 rowData I mentioned above the reflect package would give the size of batch as 8000 bytes and the json marshalling method would give the size in bytes ~4200. This seems to be too much of a deviation from the actual value. We need to see if we want to prioritize performance over correctness values or not. The code I ran to depict the issues with the reflect package and why it is not performant is here - https://play.golang.org/p/dWGh-M-qu_j
The mysqldriver when it creates the mysql packet calcualates the size in bytes of the data before performing the syscall - https://github.com/go-sql-driver/mysql/blob/master/packets.go#L1011-L1132. The
obvious option is to modify the internal function writeExecutePacket in the mysqldriver to return the data size in bytes but that would mean we have to maintain our own feature branch which might be cumbersome.
We can also take the logic used in the mysqldriver code to calculate the data size and put in into our rowbatch.EstimatedByteSize function. This would be clearly more performant then the json marshalling
and more accurate then the reflect package.
Do some sampling - If there are X rows in a batch, calculate the size in bytes for X/N rows
We want to add a function to
RowBatch
which can estimate the batch size in bytes. Threre a couple of options which could be take here :-convert the row batch to bytes by using
json
marshalling and calculate the size of the resulting byte slice. This is implemented in the following PR - Add estimated row batch size in bytes to state #230. @shuhaowu did some performance analysis and generated flame graph which clearly depicted the json marshalling method is slow and needs optimization.reflect
- just estimating the size of the different types in the rowdata which is of type[]interface{}
might suffice. The downside here is this might give us too much inaccurate estimate. For eg consider this rowdata :-["mkm0", "mkm0a", 1234]
. If we use the reflect package to calculate the size of this rowdata in bytes it gives us 40 where it calculates the size of the two strings as 16 bytes and the int as 8 bytes. But we can see that the first two stringsmkm0
andmkm0a
are different in lengths and the reflect package doesn't account for this.On the same rowdata if we try to calculate the byte size using json marshalling we get the size as 21 which seems accurate.
So if we have a batch of 200 rowData I mentioned above the reflect package would give the size of batch as 8000 bytes and the json marshalling method would give the size in bytes
~4200
. This seems to be too much of a deviation from the actual value. We need to see if we want to prioritize performance over correctness values or not. The code I ran to depict the issues with the reflect package and why it is not performant is here - https://play.golang.org/p/dWGh-M-qu_jThe mysqldriver when it creates the mysql packet calcualates the size in bytes of the data before performing the syscall - https://github.com/go-sql-driver/mysql/blob/master/packets.go#L1011-L1132. The
obvious option is to modify the internal function
writeExecutePacket
in the mysqldriver to return the data size in bytes but that would mean we have to maintain our own feature branch which might be cumbersome.We can also take the logic used in the mysqldriver code to calculate the data size and put in into our
rowbatch.EstimatedByteSize
function. This would be clearly more performant then the json marshallingand more accurate then the reflect package.
Do some sampling - If there are X rows in a batch, calculate the size in bytes for X/N rows
@fjordan @tiwilliam @shuhaowu
The text was updated successfully, but these errors were encountered: