-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement and use vmx_strchr() #440
Comments
… for js_strchr_limit Latin-1 strings
The network code caused JPEG artifacts on Macintosh Repository, probably due to bad HTTP chunking. Disabling it and flushing the cache removed the artifacts. I suspect the problem is the first |
Did you receive my mail a couple weeks ago? Anyway, I should've commented here at first. Since assembly is completely innapropriate for |
Do you have a diff? If it seems reasonable, I'll incorporate it into FPR12 (I'm already typing this in what will be the FPR11 beta). |
Actually, I just finished reading it. Clever. It definitely needs a full beta bake time, but it's pretty ingenious, I must say. Can you turn it into a PR? |
Thanks a lot :) Do you mind if I delay the PR a little bit? There's one case where it's detrimental vs the current vmx_strchr. If the character c is the first element of the first vector load, it'll make strchr 4-5% slower than currently and I'm curious if there's a way I can avoid this that's near gratis. I'll PR soon. My handwritten assembly routines are still significantly faster than what gcc produces but.. I spent about two weeks finishing the whole thing in asm. That was for fun, at least. |
Sure, take your time. This won't get integrated for a few weeks until I'm into FPR12. What are you using to do your benchmarks? |
It will look silly, but currently I use the unix |
To be more precise: the resulting executable accepts one command-line argument: the offset of the byte that it has to look for. Then it prints the address of the array(alignment), the address of the pointer returned by |
Okay, I believe I'm done. I have a few more changes to commit. I removed The makefile will produce executables Otherwise I made my type usage more consistent. Thanks! |
I haven't forgotten about this, I've just been on the Talos II all week. |
Well, obviously SIMD is SIMD, so there's no major speedup, but the difference was tempting enough that I relinked, silently dropping the assembly version plvmx.o in the objects and using it once I passed all JS tests. It cuts libnss by about 2KB. I think for what it does it's worth it, though I'm not sure what should be "want" and "not worthwhile". |
Sorry for the delay. I made and attached a quick chart filled with time results from 10 millions calls for each function with 3 machines (one for each altivec-enabled arch) I had at hand to try to give a somewhat overall view. There's a couple things you'll notice. Every C source was compiled with gcc-4.8.5 from macports with the good Just as a note, exceptionally for the G5 in assembly memchr and haschr I've added |
@classilla, I know you're busy right now. Given the blog post, I believe you're still interested, and FPRb12 is next on the road map. So I expect no more than a little "ok" before converting the thing to If it can help reduce hesitation on taking the patch, it's been part of my own build for over 1 month and I have yet to find any problem with it. |
Yes, I'm still interested, I just don't have a large number of cycles just now (between fixing Talos II issues with Firefox like https://bugzilla.mozilla.org/show_bug.cgi?id=1512162 and the holiday) but the idea is to get it into FPR12 when I get back on the G5. So ... "ok" ;) |
Okay. We're back to this now that I have most of the other stuff for FPR12 landed and I have a testing G5 build up. |
I'm looking at your |
You're right, I can observe this. In this case the in-tree version becomes slightly faster than the C version with my modifications. Though similar happens with the asm version, it's already far ahead so it still completes faster. I'm no longer convinced the changes brought to the C versions are worthwhile. It represents well the algorithm from the asm version, that said. |
On the topic of readability. I suspect adding comments on top of an asm function would make it even more a mess. So how about I add the original .S files for each function somewhere appropriate in the tree? |
Or, just attach them here (no PR, just a naked set of files or a |
Sorry if it's taking awhile, I was reviewing everything, and now I'm re-passing JS tests on the G5. As soon as it's done I'm uploading. |
|
Very nice! Some really clever stuff in the assembly. I'm patching NSPR for the build. |
Thanks! I put a good amount of thought in this. I'm repeating myself, but if there's anything I can do to help, just let me know. I don't want this to end up being a burden. |
Add a SSE2 optimized variant of `strchr`.
Like it says. Not as dramatic as
memchr
but should still improve things.The text was updated successfully, but these errors were encountered: