Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hrp: add constructor from an arbitrary T: Display #196

Merged
merged 1 commit into from
Aug 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions src/primitives/hrp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,80 @@ impl Hrp {
Ok(new)
}

/// Parses the human-readable part from an object which can be formatted.
///
/// The formatted form of the object is subject to all the same rules as [`Self::parse`].
/// This method is semantically equivalent to `Hrp::parse(&data.to_string())` but avoids
/// allocating an intermediate string.
pub fn parse_display<T: core::fmt::Display>(data: T) -> Result<Self, Error> {
use Error::*;

struct ByteFormatter {
arr: [u8; MAX_HRP_LEN],
index: usize,
error: Option<Error>,
}

impl core::fmt::Write for ByteFormatter {
fn write_str(&mut self, s: &str) -> fmt::Result {
let mut has_lower: bool = false;
let mut has_upper: bool = false;
for ch in s.chars() {
let b = ch as u8; // cast ok, `b` unused until `ch` is checked to be ASCII

// Break after finding an error so that we report the first invalid
// character, not the last.
if !ch.is_ascii() {
self.error = Some(Error::NonAsciiChar(ch));
break;
} else if !(33..=126).contains(&b) {
self.error = Some(InvalidAsciiByte(b));
break;
}

if ch.is_ascii_lowercase() {
if has_upper {
self.error = Some(MixedCase);
break;
}
has_lower = true;
} else if ch.is_ascii_uppercase() {
if has_lower {
self.error = Some(MixedCase);
break;
}
has_upper = true;
};
}

// However, an invalid length error will take priority over an
// invalid character error.
if self.index + s.len() > self.arr.len() {
self.error = Some(Error::TooLong(self.index + s.len()));
} else {
// Only do the actual copy if we passed the index check.
self.arr[self.index..self.index + s.len()].copy_from_slice(s.as_bytes());
}

// Unconditionally update self.index so that in the case of a too-long
// string, our error return will reflect the full length.
Comment on lines +177 to +178
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This comment is stale now I believe.

Copy link
Member

@tcharding tcharding Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just check the length at the start and early return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this is stale. The line is called every time the write_str function is called, for the reason that the comment describes.

We could do a length check at the start and early return but I don't see what that would accomplish other than increasing the cyclotomic complexity of the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooo, I see. Don't mind me.

self.index += s.len();
Ok(())
}
}

let mut byte_formatter = ByteFormatter { arr: [0; MAX_HRP_LEN], index: 0, error: None };

write!(byte_formatter, "{}", data).expect("custom Formatter cannot fail");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this macro call have to allocate a String because data is not know until runtime?

My understanding is that write! calls format_args! which creates an Arguments type which has an as_str method but only if there are no runtime args.

In other words isn't this the same performance as:

    pub fn parse_display<T: core::fmt::Display>(data: T) -> Result<Self, Error> {
        let s = data.to_string();
        Self::parse(&s)
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code works without any allocator. It's not allocating a string. It would also be insane for the fmt::Write API to work this way, where every single call to write_str involved a gratuitous allocation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or indeed, for every call to write! anywhere to involve gratuitous allocations.

If you think this could be true we'd need to change our fmt::Display implementations throughout every one of our crates to avoid this macro.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a little while to trace through the code but the core of write! is the write function which iterates over the "pieces" of an Arguments and does not allocate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, TIL. Thanks for the link (I was unconvinced before it, you saved me trawling through the source :)

if byte_formatter.index == 0 {
Err(Empty)
} else if let Some(err) = byte_formatter.error {
Err(err)
} else {
Ok(Self { buf: byte_formatter.arr, size: byte_formatter.index })
}
}

/// Parses the human-readable part (see [`Hrp::parse`] for full docs).
///
/// Does not check that `hrp` is valid according to BIP-173 but does check for valid ASCII
Expand Down Expand Up @@ -424,6 +498,7 @@ mod tests {
#[test]
fn $test_name() {
assert!(Hrp::parse($hrp).is_ok());
assert!(Hrp::parse_display($hrp).is_ok());
}
)*
}
Expand All @@ -445,6 +520,7 @@ mod tests {
#[test]
fn $test_name() {
assert!(Hrp::parse($hrp).is_err());
assert!(Hrp::parse_display($hrp).is_err());
}
)*
}
Expand Down Expand Up @@ -538,4 +614,45 @@ mod tests {
let hrp = Hrp::parse_unchecked(s);
assert_eq!(hrp.as_bytes(), s.as_bytes());
}

#[test]
fn parse_display() {
let hrp = Hrp::parse_display(format_args!("{}_{}", 123, "abc")).unwrap();
assert_eq!(hrp.as_str(), "123_abc");

let hrp = Hrp::parse_display(format_args!("{:083}", 1)).unwrap();
assert_eq!(
hrp.as_str(),
"00000000000000000000000000000000000000000000000000000000000000000000000000000000001"
);

assert_eq!(Hrp::parse_display(format_args!("{:084}", 1)), Err(Error::TooLong(84)),);

assert_eq!(
Hrp::parse_display(format_args!("{:83}", 1)),
Err(Error::InvalidAsciiByte(b' ')),
);
}

#[test]
fn parse_non_ascii() {
assert_eq!(Hrp::parse("❤").unwrap_err(), Error::NonAsciiChar('❤'));
}

#[test]
fn parse_display_non_ascii() {
assert_eq!(Hrp::parse_display("❤").unwrap_err(), Error::NonAsciiChar('❤'));
}

#[test]
fn parse_display_returns_first_error() {
assert_eq!(Hrp::parse_display("❤ ").unwrap_err(), Error::NonAsciiChar('❤'));
}

// This test shows that the error does not contain heart.
#[test]
fn parse_display_iterates_chars() {
assert_eq!(Hrp::parse_display(" ❤").unwrap_err(), Error::InvalidAsciiByte(b' '));
assert_eq!(Hrp::parse_display("_❤").unwrap_err(), Error::NonAsciiChar('❤'));
}
}
Loading