Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe unsound in c2str_vec #1776

Open
lwz23 opened this issue Dec 9, 2024 · 13 comments · May be fixed by #1781
Open

Maybe unsound in c2str_vec #1776

lwz23 opened this issue Dec 9, 2024 · 13 comments · May be fixed by #1781
Assignees
Labels
bug Something isn't working runtime Issues or PRs related to kcl runtime including value and value opertions

Comments

@lwz23
Copy link

lwz23 commented Dec 9, 2024

hello, thank you for your contribution in this project, I am scanning the unsoundness problem in rust project.
I notice the following code:

pub fn c2str_vec(ptr_array: *const *const c_char) -> Vec<String> {
    let mut result = Vec::new();
    let mut index = 0;

    unsafe {
        loop {
            let current_ptr = *ptr_array.offset(index);
            if current_ptr.is_null() {
                break;
            }
            let c_str = std::ffi::CStr::from_ptr(current_ptr);
            let rust_string = c_str.to_string_lossy().to_string();
            result.push(rust_string);
            index += 1;
        }
    }

    result
}

Considering pub mod api and this is a pub function, I assume user can directly call to this function, if it's this case , I think there may exist a unsound problem in this code, eg. maybe ptr_array is null pointer? It will lead to UB. I suggest mark this function as unsafe or add additional check to varify the pointer. I chose to report this issue for security reasons, but please don't mind if the function is not intended for external use and should be marked as pub(crate), or if this is an error report and there is actually no unsound problem.

@Peefy Peefy added the bug Something isn't working label Dec 9, 2024
@Peefy Peefy added this to the v0.11.0 Release milestone Dec 9, 2024
@Peefy Peefy added the runtime Issues or PRs related to kcl runtime including value and value opertions label Dec 9, 2024
@He1pa
Copy link
Contributor

He1pa commented Dec 10, 2024

Thanks for your suggestion, I will review this code

@He1pa
Copy link
Contributor

He1pa commented Dec 10, 2024

As far as I know, derefer a null pointer in Rust will cause panic, not ub. But I think it is reasonable to change to pub(crate). If you have a better suggestion, thank you for any kind contribution.

@lwz23
Copy link
Author

lwz23 commented Dec 10, 2024

Thank you for your response! I appreciate your insights. However, I would like to clarify that dereferencing a null pointer in Rust does not cause a unwinding panic but instead results in an abort, this is a long story to tell the difference........ But in short, as this action is considered undefined behavior (UB) according to Rust's safety guarantees. This is because null pointer dereferencing violates Rust's memory safety rules, and the compiler does not insert runtime checks for such cases. I hope this explanation helps, and I look forward to contributing further!

@lwz23
Copy link
Author

lwz23 commented Dec 10, 2024

By the way, if you can confirm that all calls to c2str_vec are internal and trustworthy, such as ensuring that no null pointers are involved, then marking c2str_vec as pub(crate) would be sufficient. However, if the reliability of ptr's source cannot be guaranteed, I would suggest the following:

  1. Mark the function as unsafe and add documentation to clarify that the caller is responsible for ensuring safety.
    or
  2. Add internal checks to verify that ptr is valid.

I hope this suggestion is helpful!

@He1pa
Copy link
Contributor

He1pa commented Dec 10, 2024

Thank you for your response! I appreciate your insights. However, I would like to clarify that dereferencing a null pointer in Rust does not cause a unwinding panic but instead results in an abort, this is a long story to tell the difference........ But in short, as this action is considered undefined behavior (UB) according to Rust's safety guarantees. This is because null pointer dereferencing violates Rust's memory safety rules, and the compiler does not insert runtime checks for such cases. I hope this explanation helps, and I look forward to contributing further!

Thanks for your advice. I did a simple test.

fn test_c2str_vec_null_ptr() {
    let ptr_array: *const *const c_char = std::ptr::null();

    let result = c2str_vec(ptr_array);

    assert!(result.is_empty(), "The result should be an empty array");
}

and i got
process didn't exit successfully: kcl/kclvm/target/debug/deps/kclvm_runtime-338e7c8890527cc1 'api::utils::test_c2str_vec_null_ptr' --exact --show-output (signal: 11, SIGSEGV: invalid memory reference)

I'm not very familiar with this, maybe I should learn more about this :)
But now I have no idea how to check if ptr is valid. Do you have some examples, or can you just submit a PR to modify them directly?

@lwz23
Copy link
Author

lwz23 commented Dec 10, 2024

@He1pa Thank you for your trust! However, since I am completely unfamiliar with this project, I am concerned that submitting a PR might encounter various compilation issues or even negatively affect the functionality of the project. Therefore, I would like to propose two possible solutions, and you can choose the one that you find more suitable for addressing this issue.

Solution 1: Mark the Function as unsafe and Add Documentation
By marking the function as unsafe, it explicitly communicates to the caller that they are responsible for ensuring the validity of the input pointer when invoking this function. This approach is suitable for performance-critical scenarios where the caller can be trusted to ensure safety.

/// Converts a null-terminated array of C strings to a `Vec<String>`.
///
/// # Safety
/// - `ptr_array` must point to a valid, null-terminated array of C string pointers.
/// - Each C string pointer in the array must be valid and null-terminated.
pub unsafe fn c2str_vec(ptr_array: *const *const c_char) -> Vec<String> {
    let mut result = Vec::new();
    let mut index = 0;

    loop {
        let current_ptr = *ptr_array.offset(index);
        if current_ptr.is_null() {
            break;
        }
        let c_str = std::ffi::CStr::from_ptr(current_ptr);
        let rust_string = c_str.to_string_lossy().to_string();
        result.push(rust_string);
        index += 1;
    }

    result
}

Solution 2: Add Internal Checks to Ensure Pointer Validity
This approach validates the legality of the pointer and ensures safety internally, reducing the risk of misuse by the caller. It is suitable for scenarios prioritizing safety, though it may introduce some performance overhead.

pub fn c2str_vec(ptr_array: *const *const c_char) -> Vec<String> {
    if ptr_array.is_null() {
        panic!("ptr_array is null");
    }

    let mut result = Vec::new();
    let mut index = 0;

    unsafe {
        loop {
            let current_ptr = *ptr_array.offset(index);
            if current_ptr.is_null() {
                break;
            }

            // Validate the current pointer before accessing it
            if current_ptr.is_null() {
                panic!("Encountered a null pointer in ptr_array at index {}", index);
            }

            let c_str = std::ffi::CStr::from_ptr(current_ptr);
            let rust_string = c_str.to_string_lossy().to_string();
            result.push(rust_string);
            index += 1;
        }
    }

    result
}

Comparison
Solution 1 has lower performance overhead and is straightforward, making it suitable for performance-critical scenarios. However, it might involve significant code changes, as all calls to c2str_vec would need to be wrapped in unsafe blocks.

Solution 2 is safer, as the caller does not need to ensure the validity of the input pointer, and it provides explicit error messages (e.g., through panic) when encountering issues like null pointers. However, it comes at the cost of additional runtime checks.

You can decide which solution to adopt based on the specific use cases of this function :)

@lwz23
Copy link
Author

lwz23 commented Dec 10, 2024

If you think option 2 is more appropriate, then I'm happy to submit a PR because it doesn't involve refactoring the code elsewhere. Haha.

@He1pa
Copy link
Contributor

He1pa commented Dec 10, 2024

cc @zong-zhe what do you think about it? I prefer option 2

@zong-zhe
Copy link
Contributor

Hi @lwz23 @He1pa 😄

Oh, this is really cool! The second option is better.

And we are actively seeking the community's support to help us build the KCL community together. If you, @lwz23, as a tech expert, are interested, you are more than welcome to submit a PR to help us add this part. We welcome you to join us as a contributor to KCL. Furthermore, we could invite you to become a community maintainer if opportunities arise.

Thanks again for the feedback you provided for KCL; it has immensely benefited us. 👍👍👍

@lwz23
Copy link
Author

lwz23 commented Dec 11, 2024

Thanks for your trust! I will submit a PR to fix this issue. And I am happy to contribue to KCL in the future.

@lwz23
Copy link
Author

lwz23 commented Dec 11, 2024

In the same rs file I found the following code:

/// Copy str to mutable pointer with length
pub(crate) fn copy_str_to(v: &str, p: *mut c_char, size: *mut kclvm_size_t) {
    unsafe {
        let c_str_ptr = v.as_ptr() as *const c_char;
        let c_str_len = v.len() as i32;
        if c_str_len <= *size {
            std::ptr::copy(c_str_ptr, p, c_str_len as usize);
            *size = c_str_len
        }
    }
}

/// Convert a C str pointer to a Rust &str.
/// Safety: The caller must ensure that `s` is a valid null-terminated C string.
pub fn c2str<'a>(p: *const c_char) -> &'a str {
    let s = unsafe { std::ffi::CStr::from_ptr(p) }.to_str().unwrap();
    s
}

It seems to me that there is also the possibility of a null pointer. Especially this part is very strange to me:

/// Convert a C str pointer to a Rust &str.
/// Safety: The caller must ensure that `s` is a valid null-terminated C string.
pub fn c2str<'a>(p: *const c_char) -> &'a str {
    let s = unsafe { std::ffi::CStr::from_ptr(p) }.to_str().unwrap();
    s
}

In Rust, it is generally expected that if a /// Safety comment is explicitly provided, the function should either be marked as unsafe, or it should include internal checks to ensure safety, accompanied by a // Safety comment explaining the rationale. This particular section, however, seems to deviate from both conventions. It neither guarantees that the operation is entirely safe nor marks the function as unsafe. Perhaps I should also consider adding assert checks in these parts for clarity and safety? Do you think that's reasonable?

@lwz23 lwz23 linked a pull request Dec 11, 2024 that will close this issue
16 tasks
@lwz23
Copy link
Author

lwz23 commented Dec 11, 2024

I have submitted a PR to address this issue. If you find anything that could be improved or adjusted, please feel free to either modify it directly in the PR or let me know your suggestions.

@zong-zhe
Copy link
Contributor

cc @He1pa , we need to review at this PR together

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working runtime Issues or PRs related to kcl runtime including value and value opertions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants