Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure correctness of formatted output conversion #169

Merged
merged 3 commits into from
Dec 2, 2024

Conversation

DrXiao
Copy link
Collaborator

@DrXiao DrXiao commented Dec 1, 2024

Consider the following code printing the several formatted integers:

void print_conversion(int num) {
    printf("%#032x\n%#32x\n%032x\n%32x\n%#x\n%x\n", num, num, num, num, num, num);
    printf("%#012x\n%#12x\n%012x\n%12x\n", num, num, num, num);
    printf("%#032o\n%#32o\n%032o\n%32o\n%#o\n%o\n", num, num, num, num, num, num);
    printf("%#012o\n%#12o\n%012o\n%12o\n", num, num, num, num);
    printf("%032d\n%32d\n%d\n", num, num, num);
    printf("%012d\n%12d\n", num, num);
}
int main() {
    int a = 0xFF00CDE1, b = 0xFFFFF204, c = 65548, d = 0;
    print_conversion(a);
    print_conversion(b);
    print_conversion(c);
    print_conversion(d);
    return 0;
}

I found some incorrect results for certain outputs when using specifier, field width and alternate form:

1. Incorrect conversion from integers to hexadecimal strings

Consider a 32-bit integer -16724511 (0xFF00CDE1 in hexadecimal), if using printf() with the formatted string "%#12x", it outputs ffffff00cde1, and the result has redundent f characters. However, because the integer is just 4 bytes, the result should be ff00cde1 containing 4 whitespace characters.

2. Incorrect conversion from integers to octal strings

As the previous point, consider the negative integer -3580 (0xFFFFF204 in hexadecimal), it produces 77777771004 when using "%o" to print this integer.

Because the function accepts a 32-bit integer, the most significant digit of the octal should be obtained by the leftmost 2 bits only, but the original implementation doesn't consider this point and produce incorrect results.

Thus, the correct result is 37777771004, which obtains 3 from the leftmost 2 bits of the integer.

3. Prepended characters may be placed in incorrect positions

When using field width and flag characters, it may output unexpected strings for certain cases.
printf("%#12x\n", 0xff00cde1) --> "0x ff00cde1\n"
printf("%#12o\n", 0x1000c) --> "0 200014\n"
printf("%12d\n", -3580) --> "0000000-3580\n"

Conclusion

Through the given test code, the current built-in C library has been found to have the above issues, so this pull request resolves them and ensures correctness when using formatted output conversion.

lib/c.c Outdated
* Finally, the remaining 2 bits are processed after
* the loop.
* */
for (int i = 0; i < (sizeof(int) << 3) / 3; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can lift the rhs expression of condition expression out of loop to squeeze last bit of performance?

tests/driver.sh Outdated
0"

try_output 0 "$fmt_ans" << EOF
void print_conversion(int num) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use printf_conversion for consistency.

The original implementation of __str_base16(), which converts
integers to hexadecimal strings, causes errors when processing
negative integers.

It produces redundant 'f' characters at the beginning of the
converted string due to sign extension and results in unexpected
output when calling printf() or sprintf().

Therefore, this commit fixes it and ensure correct conversion.
The implementation of __str_base8() produces incorrect
conversions when handling negative integers.

For example, a 32-bit negative integer '0xfffff204', it will
produce '7' at the most significant digit due to sign extension.
However, it should produce '3' by using the leftmost 2 bits only.

Therefore, this commit modifies the conversion process to fix
the mentioned problem.
The implementation of __format() produces incorrect results
when using field width and flag characters.

For example, consider the following printf calls, where the
prepended characters are placed in incorrect positions.

    printf("%#12x\n", 0xff00cde1) --> "0x  ff00cde1\n"
    printf("%#12o\n", 0x1000c)    --> "0     200014\n"
    printf("%12d\n", -3580)	  --> "0000000-3580\n"

Thus, this commit improves __format() to ensure correctness
and to make the results consistent with GCC, and adds a test
case to validate the changes.
@jserv jserv merged commit 18e005d into sysprog21:master Dec 2, 2024
4 checks passed
@jserv
Copy link
Collaborator

jserv commented Dec 2, 2024

Thank @DrXiao for contributing!

@DrXiao DrXiao deleted the fix-libc branch December 2, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants