-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Balloon implementation #232
Conversation
balloon/src/lib.rs
Outdated
// block_t idx_block = ints_to_block(t, m, i) | ||
// int other = to_int(hash(cnt++, salt, idx_block)) mod s_cost | ||
digest.update(&cnt.to_le_bytes()); | ||
cnt += 1; | ||
digest.update(salt); | ||
digest.update(&u32::try_from(t).unwrap().to_le_bytes()); | ||
digest.update(&u32::try_from(m).unwrap().to_le_bytes()); | ||
digest.update(&i.to_le_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what ints_to_block
is supposed to mean, the reference implementation does something with AES (I think) that I didn't understand: libballoon/hash_state.c#L171 -> libballoon/bitstream.c#L152 -> libballoon/bitstream.c#L134 -> libballoon/bitstream.c#L118 -> EVP_EncryptUpdate
I took some inspiration from a Python implementation instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not an expert, but ints_to_block seems to just be for type safety than an actual cryptographic operation. Feeding directly into the hash seems fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to revisit this for compatibility reasons.
My thinking was that ints_to_block
is supposed to be some functions that can take these integers and turn them into a "block" of bytes, which is done in the prototype implementations by encrypting them. As an alternative we could simply use the hash already in use to do that.
We could also possibly use SHA-1 here instead of the possibly more expensive hash used in D
.
balloon/src/lib.rs
Outdated
/// - Default set of [`Params`] to be used | ||
/// - (Optional) Secret key a.k.a. "pepper" to be used | ||
#[derive(Clone, Default)] | ||
pub struct Balloon<'key, D: Digest> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to store the hash algorithm to use in Balloon
because I can't pass it into PasswordHasher
otherwise.
I could also move it to Params
, but I was unsure if I should store the hash algorithm in the PHC string and I don't know how to store it in the PHC string anyway, where do I get an ID or name for every hash algorithm from?
Another idea would be to make PasswordHasher
always use SHA-256, or some other algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought: the key size is fixed to the size of a block_t
, which is the hash digest size. So instead of making Balloon
depend on a 'key
lifetime, you can just make the secret a GenericArray<u8, D::OutputSize>
or something
Sorry that made no sense. I mistook the password for the key. What precisely would be the purpose of a "pepper" in this scheme? Honest question because I've never seen it used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re the hasher: parameterizing by D
and using a PhantomData
is a pretty standard pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What precisely would be the purpose of a "pepper" in this scheme? Honest question because I've never seen it used.
I thought the reason it was used in Argon2 was because it is considered good practice. See OWASP for example.
re the hasher: parameterizing by
D
and using aPhantomData
is a pretty standard pattern
Yes, I'm familiar with it, I'm just uncertain because which hash algorithm is used is quiet important, considering Params
isn't storing it, it's being left out of the PHC string too. In any case, as noted in Zulip, we might wanna remove the whole PasswordHasher
anyway.
balloon/src/lib.rs
Outdated
algorithm: Ident::new("balloon"), | ||
version: Some(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a standard or spec, it's "just" a research paper, it never mentions PHC strings. I just took those from the reference implementation.
Error::MemoryTooLittle => "memory cost is too small", | ||
Error::ThreadsTooFew => "not enough threads", | ||
Error::ThreadsTooMany => "too many threads", | ||
Error::TimeTooSmall => "time cost is too small", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add more descriptive messages here. MemoryTooLittle
could mention how much memory is actually needed. ThreadsTooMany
could mention that you need the parallel
crate feature to use threads. I don't know how to translate any of this to password_hash::Error
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added how much memory is needed in the documentation.
Maybe some baseline questions to guide the discussion of whether to put this in this repo:
|
So far we've been fairly inclusive of algorithms which have some standardization effort behind them. We don't exactly have a policy regarding who's allowed to describe standards. This algorithm does seem a bit nonstandard in that regard. |
Not as far as I know, and the research paper really leaves a lot to desire that is usually expected in a standard, which is to be expected.
I couldn't find anybody who does. The biggest motivation I found is that it is the only alternative mentioned by NIST to PBKDF2.
Almost half of the research paper compares it to Argon2. I'm no cryptographer, but from what I could make out the biggest advantages are:
To me the biggest disadvantages are:
|
If we can generate test vectors and demonstrate some sort of interoperability between implementations in multiple languages, I think we could potentially consider including it. Curious what @newpavlov thinks. |
I tested a couple, none of them are compatible with each other as far as I could determine:
From what I could gather there are some minor differences in the implementation that keep things from being interoperable. I currently started working on getting the prototype implementation into a Rust crate to test against: https://github.com/daxpedda/balloon-proto. Other implementations really followed the paper closely, I'm unsure what to do now, should I just try to reach compatibility with the prototype implementation, or should I try to follow the paper? |
Wow, that's not a great situation. I'm not sure what to do if literally no two implementations interoperate. Perhaps it would be good to open some issues on the respective repos and try to coordinate some form of interoperability. |
I don't mind following up on that, but all of these implementations are years old by now, the prototype implementation is officially unsupported now: henrycg/balloon#5 (comment). The only recently active implementation is the Rust implementation: https://github.com/stichtingorganism/balloon. |
Lack of some kind of standardization, at the very least in the form of test vectors is certainly not great... I agree that it may be worth to try to coordinate some form of interoperability first. I wonder if @henrycg can chime in and recommend whether we should treat the prototype code as a reference implementation. Also I wonder if @goldenMetteyya is open to sharing with us access to the |
Yes, I am happy to add some context here, even if it is a bit unsatisfying. For background: The Balloon algorithm never ended up being standardized in an RFC or similar standards document. The goal in the research paper was just to give a high-level description of the algorithm, without concern for the low-level details of encoding, endianness, ordering of arguments, etc. Our hope at the time of writing the paper was that someone would be interested enough in the algorithm to formalize it in an RFC or similar standard at the level of precision that one would need to write an interoperable implementation. Unfortunately, that never happened. While the prototype code gives one way to map the high-level algorithm into code, there are lots of other ways that are equally good, or perhaps even better. People who have used the algorithm in practice tend to make the low-level implementation decisions (about endianness, encoding, ordering of arguments, etc.) themselves. Since there is no standard implementation and since the algorithm is not widely used anyways, interoperability is not a concern for most users. I'm not sure if this answers your question exactly, but I hope that the background is helpful. |
So I see three options:
I'm leaning towards option 1. |
Option 1 seems the best to me, but ideally producing a set of test vectors and ensuring interop with at least one other implementation, preferably implemented in something other than Rust |
I believe I addressed all remaining issues except the compatibility one. I will open a PR or issue at https://github.com/nachonavarro/balloon-hashing, which is the implementation with the most recent activity. If everything goes well I can add it to the tests. |
I'm now using https://github.com/daxpedda/nachonavarro-balloon, which uses https://github.com/nachonavarro/balloon-hashing, to test against a Python implementation. Compatibility was provided here: nachonavarro/balloon-hashing#2. I added some test vectors that I generated. I believe I addressed all outstanding issues. |
Re-based and updated to digest and sha2 0.10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have several minor nits. I haven't looked deeply into implementation of the algorithm itself and I probably would've done some things a bit differently (e.g. I prefer free-standing functions, on top of which abstractions are built), but I think we can merge the crate after the nits will be fixed. Unless, of course, @tarcieri has something to add.
balloon/tests/balloon_m.rs
Outdated
]; | ||
|
||
#[test] | ||
fn test() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have it's worth to have a more descriptive test name, e.g. test_balloon_m
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named it test_vectors
, it already is in a module called balloon_m
, what do you think?
I could still do that, it sounds quiet attractive to me too. |
I hope I addressed all outstanding issues. Please feel free to ask for any kind of improvement, even if it's just a preference. |
Thank you! |
I am currently toying with a java impl, and i will try to make it compatible to this, thanks for the test vectors. there are some open issues I tried to raise upstream, but maybe we can resolve it here?
Still interested to look into this or has the argon fraction won? ,) |
IANAC but as I am generally interested in this algorithm here are my 2 cents:
The balloon-hash crate here already supports any hash with the
Also supported. Never heard of a usage label in a password hash.
Does this belong in a password hash?
Does that achieve anything?
I thought we are doing that already?
No clue what these are.
Honestly if Balloon is as good as the research says it is I think it's sad that it was forgotten like this. |
yes, But there is no reference impl and no (official?) test vectors, so it would be good if „community“ can agree on one alternate hash which is supported by all and does not have the blocksize problem of the AB16b paper (bw is worse than argon2).
if you use it for KDF it might be relevant. Not sure how safe the idea of using it as a KDF would be though. For password hashes a „site distinguisher“ would be a Pepper with lower secrecy requirements. It would prevent reusing „found hashes“ (known from the md5 correlation attacks (shucking)). Not sure how important that is for modern/salted hashes, but still it does not cost much. Anyway, I will implement a pepper interface I balloon compatible to your rust api - if I can find it ,)
the balloon paper suggests a hash or HMAC as a final step. It is just a small extension after the xor step (or to replace the xor step). If you use a real HMAC in this step it would introduce a good place for a „NIST compliant“ peppering step.
it would add confusion, but more seriously it looks like it was a moving target in the beginning and the xx paper identified a much larger delta as a possibility to gain some protection back (in their proposed xor alternative).
Yes the parallel phase uses it already, but I meant in the mixing phase as discussed by AB16b. (But i know it comes with its own problems as the xor-block states could be used as a optimized way of storage. So that would require some more advanced crypto knowledge to design it it, I am curious if there is some interest for follow up research)
currently the mix function retrieves a 64Bit pseudorandom integer and reduces it to the neighbor blockindex with integer module, this introduces some bias as the block numbers are not power of two. This hohle be fixed but would require a new algorithm version. I also wonder if 32bit for Block Index would be enough.
when hashing password and seed into the initial state with the HMAC iterations of PBKDF2 you have nearly no increase in complexity but an „approved“ function which would „firewall“ password from the relative uncommon and under-analysed balloon function.
It is a general trick to „forget“ some parts of the salt or randomize the iteration numbers. This forces the validator to try all combinations (for example 4 different 2 bit combinations of the salt). As a defender you normally can validate a correct password after half the attempts by chance, as a attacker you would have to try all 4 wrong attempts - effectively halving the attackers cracking advantage (with the cost of making the verification times a bit more unpredictable if not run in parallel)
we should see which hash, which algorithm modifications, maybe which bitstream method, pre/posthashinng and random-module is used. For example we can define another parameter would be good to name the pepper key for rotation, but that one most likely does not need standardization.
i also like its construction simplicity, but regarding „as good as research“, there is not much (and it’s really weird NIST recommended it even without any production ready implementwtion) did you check the alwen,blocki 16 paper? AB16b: https://eprint.iacr.org/2016/759 BTW if you or any reader/researcher also are on the balloon-hashing mailinglist, should we continue the discussion there? |
TIL there's a mailing list. I guess it's this one? https://mailman.stanford.edu/mailman/listinfo/balloon-hashing |
@ecki honestly most of this stuff goes way over my head, as I said, I am not a cryptographer. I am happy to discuss or adapt the implementation, or add or provide more test vectors if you have any specific suggestions. My only strong opinion on this whole matter is that if we really want to improve the current situation, some cryptographer(s) will have to get together and push for standardization at the IETF, like Argon2 or anything else really. |
Hi folks, I'm writing an Internet-Draft for Balloon so there's an actual specification and would like to ask for feedback on the current draft and input on the open Issues. Some ideas are breaking changes but would improve the algorithm. Another important aspect is recommending parameters, which requires proper benchmarking like what Steve Thomas has done. Any help would be greatly appreciated. I reached out to Henry Corrigan-Gibbs, but he's busy with teaching/research commitments. I tried emailing some people on the PHC panel, but nobody replied. A few people don't have a public email address, and I thought there was no point emailing you @tarcieri when leaving this message here. I haven't yet tried contacting the authors of the papers looking at Balloon, but I'm not sure how successful that will be either given they are a few years old now and the algorithm has been modified. I'm not expecting the Internet-Draft to become an RFC given Argon2's success, but the algorithm could still see some usage like XChaCha20. |
@samuel-lucas6 i would like an I-D, especially given NISTs premature recommendation of the algorithm. A incompatible variant would allow to fix some of the issues, so it’s a good starting point:
do you also plan to address the scheduling weakness found in the follow-up paper? (This night however require some academic involvement?) |
Not sure what you mean by pre-hashing here. I originally specified that the password should be UTF-8 encoded, but I noticed that the Argon2 and scrypt RFCs don't mention this so removed it.
See this issue.
This hasn't been addressed. The current draft is compatible with existing interoperable implementations.
This has been attempted minus the pepper part, but it's quite messy because of Balloon vs Balloon-M. I would prefer if there was just one algorithm, as mentioned here.
I haven't listed that down as I haven't tried to read all the third-party papers yet. That may not be feasible as the designers have moved on. I was planning on making minor design modifications at most as I'm not a cryptographer and no cryptographers seem to be that interested in Balloon. |
See #1.
There are a lot of small open questions, but the really important ones that need to be answered are:
ints_to_block
is supposed to be (Balloon implementation #232 (comment))to_int
is unclear (Balloon implementation #232 (comment))||
operator to describe modifying the salt for parallel execution, I don't know what this is (Balloon implementation #232 (comment))cnt
can overflow, the paper and the reference implementation say nothing about that (Balloon implementation #232 (comment))u32
values, the reference implementation usesi32
(I think), the paper says nothing about it (Balloon implementation #232 (comment))For reference: