Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix conversion of non-utf8 sequences to AnyValue #1253

Merged
merged 1 commit into from
Mar 13, 2024

Conversation

Nevay
Copy link
Contributor

@Nevay Nevay commented Mar 8, 2024

Converting to AnyValue:

String values which are not valid Unicode sequences SHOULD be converted to AnyValue's bytes_value with the bytes representing the string in the original order and format of the source string.

@Nevay Nevay requested a review from a team March 8, 2024 19:50
Copy link

codecov bot commented Mar 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.62%. Comparing base (bb07aca) to head (a9b76b2).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##               main    #1253   +/-   ##
=========================================
  Coverage     84.62%   84.62%           
- Complexity     2136     2140    +4     
=========================================
  Files           284      284           
  Lines          6054     6062    +8     
=========================================
+ Hits           5123     5130    +7     
- Misses          931      932    +1     
Flag Coverage Δ
8.0 84.57% <100.00%> (+<0.01%) ⬆️
8.1 84.60% <100.00%> (+<0.01%) ⬆️
8.2 84.60% <100.00%> (+<0.01%) ⬆️
8.3 84.60% <100.00%> (+<0.01%) ⬆️
8.4 84.60% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/Contrib/Otlp/AttributesConverter.php 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb07aca...a9b76b2. Read the comment docs.

@Nevay Nevay force-pushed the fix/non-utf8-sequences branch from f107d6e to 228102c Compare March 8, 2024 19:54
}

return $result;
}

private static function isUtf8(string $value): bool
{
return \extension_loaded('mbstring')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this condition might not be necessary since symfony/polyfill-mbstring is pulled in transitively from the SDK. Not sure if it's best practice to rely on that though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nevay should we rely on the polyfill, or leave it as-is. Otherwise, I'm happy to approve and merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should leave it as-is. The polyfill is slower and exports can contain a large number of attributes.
Creating and exporting spans with 10 string attributes (16 bytes ea.; 8 valid and 2 invalid utf8 sequences) to local collector w/ default batch size was around 10% slower with the polyfill compared to the preg_match() fallback (fallback was incorrect until latest push).

Benchmark results for creating AnyValue from a 16 bytes string (polyfill is esp. costly for invalid utf8 sequences due to its iconv fallback):

+------------------------------------+---------+---------+--------+---------+
| subject                            | memory  | mode    | rstdev | stdev   |
+------------------------------------+---------+---------+--------+---------+
| benchCheckEncoding (valid)         | 1.959mb | 0.245μs | ±1.95% | 0.005μs |
| benchCheckEncoding (invalid-begin) | 1.959mb | 0.248μs | ±1.68% | 0.004μs |
| benchCheckEncoding (invalid-end)   | 1.959mb | 0.250μs | ±1.94% | 0.005μs |
| benchPregMatch (valid)             | 1.959mb | 0.307μs | ±2.06% | 0.006μs |
| benchPregMatch (invalid-begin)     | 1.959mb | 0.262μs | ±1.53% | 0.004μs |
| benchPregMatch (invalid-end)       | 1.959mb | 0.270μs | ±1.62% | 0.004μs |
| benchIsUtf8 (valid)                | 1.959mb | 0.250μs | ±1.62% | 0.004μs |
| benchIsUtf8 (invalid-begin)        | 1.959mb | 0.252μs | ±1.64% | 0.004μs |
| benchIsUtf8 (invalid-end)          | 1.959mb | 0.253μs | ±1.31% | 0.003μs |
| benchPolyfill (valid)              | 1.959mb | 0.383μs | ±1.67% | 0.006μs |
| benchPolyfill (invalid-begin)      | 1.959mb | 0.830μs | ±1.51% | 0.013μs |
| benchPolyfill (invalid-end)        | 1.959mb | 0.945μs | ±1.51% | 0.014μs |
+------------------------------------+---------+---------+--------+---------+

> String values which are not valid Unicode sequences SHOULD be converted to AnyValue's bytes_value with the bytes representing the string in the original order and format of the source string.
@Nevay Nevay force-pushed the fix/non-utf8-sequences branch from 228102c to a9b76b2 Compare March 13, 2024 14:00
@brettmc brettmc merged commit 1753fbe into open-telemetry:main Mar 13, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants