Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MaybeUninit in DecInt #1201

Merged
merged 8 commits into from
Nov 21, 2024
Merged

Conversation

SUPERCILEX
Copy link
Contributor

Double checked with cargo miri t --workspace --features all-apis -- dec_int. Also got rid of the fmt::Write impl as that's confusingly complicated.

@sunfishcode
Copy link
Member

Looking at the generated machine code in --release, this patch makes DecInt::new slightly longer. It eliminates the zeroing of the array at the beginning, but adds a bounds check. Is there a way this code could be structured to avoid that?

@SUPERCILEX
Copy link
Contributor Author

Eyyyyyy, we're doing compiler optimization driven development! So in trying to remove all panics I discovered that since we accept the itoa::Integer trait we technically also accept i128s which meant we could crash if their representation was too large. I changed the number of bytes in DecInt to match itoa::Buffer+1 so that all panics are removed. However, I'm not sure if this is the right choice given that we're targeting file descriptors... maybe we should actually only have space for 10 str digits and assert that fact (which removes the double panics)?

@sunfishcode
Copy link
Member

Would it work to add a type parameter to DecInt? Decint<T> that calculates the size of the buffer as a function of the size of T?

@SUPERCILEX
Copy link
Contributor Author

Damn, I love that idea but don't it's possible in stable rust. This is an associated type:

error: generic parameters may not be used in const operations
  --> src/path/dec_int.rs:66:28
   |
66 |     buf: [MaybeUninit<u8>; T::BUF_SIZE + 1],
   |                            ^^^^^^^^^^^ cannot perform const operation using `T`
   |
   = note: type parameters may not be used in const expressions
   = help: add `#![feature(generic_const_exprs)]` to allow generic const expressions

I also tried a lookup table version:

error: generic parameters may not be used in const operations
  --> src/path/dec_int.rs:66:53
   |
66 |     buf: [MaybeUninit<u8>; BUF_SIZES[mem::size_of::<T>()] + 1],
   |                                                     ^ cannot perform const operation using `T`
   |
   = note: type parameters may not be used in const expressions
   = help: add `#![feature(generic_const_exprs)]` to allow generic const expressions

@sunfishcode
Copy link
Member

Hrm. I'm not sure what to suggest here then.

We do also use DecInt for PID values, which are usually u32, but is usize on redox.

@SUPERCILEX
Copy link
Contributor Author

Ok I think I figured out the best we can do. I went back to a buffer size of 21 but added explicit assertions that fail if you try to print a huge number. I also added buffer sizing guarantees for the optimizer so panics and the assertion are compiled out for anything smaller than an i64. It's hella ugly but the assembly is super clean so there's that.

@SUPERCILEX SUPERCILEX force-pushed the decint branch 4 times, most recently from 25a332a to 22e9bf9 Compare November 9, 2024 17:58
@SUPERCILEX
Copy link
Contributor Author

Errors are unrelated:

home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rustix-0.38.39/src/backend/libc/fs/types.rs:957:30
    |
957 |     LockShared = bitcast!(c::LOCK_SH),
    |                              ^^^^^^^ not found in `c`

error[E0425]: cannot find value `LOCK_EX` in module `c`
   --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rustix-0.38.39/src/backend/libc/fs/types.rs:959:33
    |
959 |     LockExclusive = bitcast!(c::LOCK_EX),
    |                                 ^^^^^^^ not found in `c`

error[E0425]: cannot find value `LOCK_UN` in module `c`
   --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rustix-0.38.39/src/backend/libc/fs/types.rs:961:26
    |
961 |     Unlock = bitcast!(c::LOCK_UN),
    |                          ^^^^^^^ not found in `c`

error[E0425]: cannot find value `LOCK_SH` in module `c`
   --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rustix-0.38.39/src/backend/libc/fs/types.rs:963:41
    |
963 |     NonBlockingLockShared = bitcast!(c::LOCK_SH | c::LOCK_NB),
    |                                         ^^^^^^^ not found in `c`

@sunfishcode
Copy link
Member

Cool, that approach sounds good. Those errors are now fixed on main.

@SUPERCILEX
Copy link
Contributor Author

Hmmm, are you sure? I just rebased 20 mins ago.

@sunfishcode
Copy link
Member

Oh, this is the issue where rustix dev-depends on itself. The libc crate broke semver on solaris, breaking the latest release version of rustix, which is what gets pulled in in a cargo test. I've now released v0.38.40 to fix this.

@SUPERCILEX
Copy link
Contributor Author

Thank you! Just re-triggered the build.

@sunfishcode
Copy link
Member

And I've now fixed the test build in #1214.

@SUPERCILEX
Copy link
Contributor Author

We have a green build! :)

// 20 `u8`s is enough to hold the decimal ASCII representation of any
// `u64`, and we add one for a NUL terminator for `as_c_str`.
buf: [u8; 20 + 1],
// Enough to hold an i64 and NUL terminator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth commenting that this also happens to be enough to hold a u64 and NUL terminator, because even though u64::MAX needs one more decimal digit than i64::MAX, it doesn't need the "-", so it works out.

};
if str_buf.len() > max_buf_size {
unsafe { core::hint::unreachable_unchecked() }
}
Copy link
Member

@sunfishcode sunfishcode Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is pretty subtle, so I've attempted to analyze the different cases:

If the condition here is false, this code executes undefined behavior. If this condition is true, in release builds, the assert below is optimized out. If this condition is true, in debug builds, the assert below is not optimized out.

If I'm not mistaken, this is all effectively the same as if we removed the unreachable_unchecked() and all the code it depends on above here, and just changed the assert below to a debug_assert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, the assertion is a bit of a red herring. It's there to make the error message nicer on {u,i}128s because otherwise we panic on bounds check failures. If you remove the hint, you'll get bounds check panics on all ints (or rather the assertion will be compiled in, but if we made it debug_assert then release mode would see bounds checks).

It's probably easier to see this by playing with https://rust.godbolt.org/z/dqb53Pj8o. Change the param in the function foo at the top to see how the different monomorphizations look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks, I get it now. I played around with the link a little, and came up with this: https://rust.godbolt.org/z/4TEvYc4x8. It codegens to the same code, but does the assert before the unreachable hint so that we don't invoke UB before the assert in debug builds, and it doesn't use the TypeId checks. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that has UB if you pass in a u128::MAX for example (str_buf will be 40 long while buf is only 21). For debug builds they should already be getting an assertion as long as they don't enable optimizations: https://doc.rust-lang.org/src/core/hint.rs.html#100. One idea is to change the TypeId stuff to panic on 128 bit values so that we can use your simplified hint condition, but that'll still have very ugly TypeId checks.

@SUPERCILEX
Copy link
Contributor Author

Alrighty, thoughts now that all the icky stuff was removed?

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this looks much prettier, thanks for working on this with upstream! Just a few minor comments:

tests/path/dec_int.rs Show resolved Hide resolved
src/path/dec_int.rs Outdated Show resolved Hide resolved
@sunfishcode sunfishcode merged commit 61f17d6 into bytecodealliance:main Nov 21, 2024
45 checks passed
sunfishcode pushed a commit that referenced this pull request Dec 8, 2024
* Use MaybeUninit in DecInt

* Fix DecInt::new panicking on integers greater than 64 bits and optimize out any panics in DecInt::new

* Add tests so people messing with DecInt can catch UB with miri

* Use new itoa

---------

Co-authored-by: Alex Saveau <[email protected]>
sunfishcode pushed a commit that referenced this pull request Dec 8, 2024
* Use MaybeUninit in DecInt

* Fix DecInt::new panicking on integers greater than 64 bits and optimize out any panics in DecInt::new

* Add tests so people messing with DecInt can catch UB with miri

* Use new itoa

---------

Co-authored-by: Alex Saveau <[email protected]>
@sunfishcode
Copy link
Member

This is now released in rustix 0.38.42.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants