Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our RNG interface and implementation seem outdated #6171

Open
david-christiansen opened this issue Nov 22, 2024 · 0 comments
Open

Our RNG interface and implementation seem outdated #6171

david-christiansen opened this issue Nov 22, 2024 · 0 comments

Comments

@david-christiansen
Copy link
Contributor

david-christiansen commented Nov 22, 2024

It seems that our standard library's randomness support is not up to our usual standards. I think it's using old technology, and perhaps also old Lean naming conventions from before most of the current library was designed.

We have two separate interfaces to randomness: there's IO.getRandomBytes, and a splittable RNG based on RandomGen. Here, I'm talking about the latter.

First off, the namespacing is inconsistent. Some of the operators are in the IO namespace, like IO.setRandSeed and IO.rand. Most of them are in no defined namespace at all, like randBool, randNat, and stdSplit, making it harder to find them with autocomplete.

But the big issue seems to be that the StdGen in the Lean library uses an algorithm that's known to have poor randomness properties (see paper). It seems to have been based on an old version of the Haskell libraries. I'm far from an expert here, but I believe that SplitMix (which the Haskell libraries have used for the last four or so years) is the state of the art for splittable RNGs, though when I last looked into this a few years ago for a different project, I also ran across a blog post about common implementation mistakes in SplitMix. Git blame says that the Haskell libraries were used as inspiration five years ago, so Lean would have imported them just before they were updated.

For reference: here's the PR that updated the Haskell library in 2020. They considered, but rejected, having separate classes for splittable and non-splittable RNGs, deciding against it primarily due to backwards compatibility. It seems that ours is not used much, so perhaps we should have both?

I found one use of StdGen in Mathlib. Its comments point out what seem to be implementation issues with seeding.

Plausible also uses it. IIRC tf-random, which was a slower predecessor to SplitMix that used the Threefish block cipher to generate new states, came about when the statistical issues with the old Haskell StdGen started causing problems with QuickCheck generators in practice.

Impact

I don't know of anyone being actively blocked by this. It came up while working on the language reference, and I figured it was worth writing down what little I know about the matter for the stdlib team to prioritize as you see fit.

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant