-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Rust optimization when target-features has "allow-non-simd" and the CPU flag is set #188
Comments
I tried this and my hunch here was wrong, at least as far as that I could not detect performance changes when adding the |
@Licenser oh so close this then? I feel like this is likely my code. Non SIMD code
SIMD code
As you can see the only difference is the line |
lets keep it open until we find the solution :) , even if it's not the issue I first thought it is, it's not any less of an issue for you. Could it be related to #189? |
That did not work for me. Also, I tried with new version of
|
Sorry for the wrong suggestion and late reply. the derive offers direct function for example: https://github.com/simd-lite/simd-json-derive/blob/main/tests/struct.rs#L74 |
So strange, because of this
I have a feeling I am not doing something right. The same code works for you but for me it throws an error if https://github.com/amanjeev/simdjson-playground/blob/simd-search/src/main.rs |
You did nothing wrong, I was just an silly and forgot to package/release the -int crate 🤦 I forget it every time :( I downloaded your example and with |
phew! I should have checked, I did look at the -int but did not check whether it was updated in this repo. Sorry and thank you so much! Cool, it compiles and runs fine but the SIMD version is still slower. Which makes me think that may be my code sucks. As in, I am using the SIMD library incorrectly. The large JSON file looks like this in structure. I can upload it somewhere if you want. It is not a valid json but each line is an object.
|
You couldn't have caught it :) the crate wasn't published no chance to catch that from your end, I'll give that a try tomorrow I can generate a file like that :D |
I got this working with example data, but even with 10k entries, the things end so fast that it's hard to measure anything for me :( that said since it prints on every entry I'm pretty sure that for this code the time spend in How are you measuring 'faster' or 'slower' exactly? |
Based on thoughts, this is going to be the optimal code: fn main() {
let data_file = File::open("data/fake_data.json").unwrap();
let mut reader = BufReader::new(data_file);
let mut data = Vec::with_capacity(1024);
let mut string_buffer = Vec::with_capacity(2048);
let mut input_buffer = AlignedBuf::with_capacity(1024);
while reader.read_until(b'\n', &mut data).unwrap() > 0 {
let row =
SIMDExample::from_slice_with_buffers(&mut data, &mut input_buffer, &mut string_buffer)
.unwrap();
if row.id == 2807149942735425369 {
println!("look ma! a match! - {}", row.id_str)
}
data.clear();
}
} It:
|
The SIMD version changes were suggested here - simd-lite/simd-json#188 (comment)
These changes make the cod faster by * avoids allocations by using preallocated buffers * avoids the rather expensive lines() call * takes advantage of simd_jsons faster utf8 validation by skipping it and reading the file as bytes not strings simd-lite/simd-json#188 (comment)
Your code is so slick that it cut about 4 seconds off of the 5GB file I am using for this experiment. However, the non-SIMD version is still faster. For the sake of "similarity" I changed the non-SIMD code to match the style of your faster SIMD code. Pasting it here inline
The SIMD run
Non SIMD run
What do you think I should do to proceed? SIMD search compilation has this flag https://github.com/amanjeev/simdjson-playground/blob/simd-search/.cargo/config and these features https://github.com/amanjeev/simdjson-playground/blob/simd-search/Cargo.toml#L9 |
Calling simd-json on individual lines is really not a good usecase for it. Ideally we should have special support for newline-delimited JSON (NDJSON) like upstream. |
Ja the lines are fairly short too the advantages are a lot smaller (sometimes detrimental) as there is an initial cost to pay for filling the registers, doing multiple runs etc. can overshadow the performance gain for very small payloads. NDJSON would be incredibly cool (especially if we manage to realize in a streaming fashion / as an iterator), sadly so far no use case has presented it self that justified the time 😭 but the want is there :D |
Yes! Thank you! I was unable to enunciate or explain. I did think something was off, as in, naturally there is more work being done per line in SIMD version than normal Serde version. I wrote this earlier xD -
Do you think we can repurpose this ticket or open a new one for the upstream alignment? Also, who will be doing that? That is, I am assuming that's a good way for me to achieve this SIMD test with my JSON. edit: they missed the chance in calling the thing |
Absolutely both repurposing the ticket or opening a new one is fine :) and documenting that the need is now there us a really good way to improve simd-json |
#194 has been opened where this work needs to be done. Closing this. |
Summary
To enable SIMD one has to do these things
Cargo.toml
set thetarget-features
.cargo/config
But sometimes, despite adding the above, and especially with
allow-non-simd
in thefeatures
, the result is still not SIMD compatible code.To do
Write tests that help identify if the rustc optimization is prevented by modifying
Cargo.toml
The text was updated successfully, but these errors were encountered: