Re: [RFC PATCH v2 1/2] rust: add synchronous message digest support

From: Benno Lossin <benno.lossin@proton.me>
To: FUJITA Tomonori <fujita.tomonori@gmail.com>
Cc: rust-for-linux@vger.kernel.org, Gary Guo <gary@garyguo.net>
Subject: Re: [RFC PATCH v2 1/2] rust: add synchronous message digest support
Date: Sun, 25 Jun 2023 10:08:29 +0000	[thread overview]
Message-ID: <0a9af5fa-4df2-11da-b3cb-0a6b1d27fdc2@proton.me> (raw)
In-Reply-To: <20230622.111419.241422502377572827.ubuntu@gmail.com>

On 6/22/23 04:14, FUJITA Tomonori wrote:
> Hi, thanks a lot for reviewing!
> 
> On Mon, 19 Jun 2023 11:40:40 +0000
> Benno Lossin <benno.lossin@proton.me> wrote:
> 
>>> diff --git a/rust/kernel/crypto/hash.rs b/rust/kernel/crypto/hash.rs
>>> new file mode 100644
>>> index 000000000000..53f4a311b3b2
>>> --- /dev/null
>>> +++ b/rust/kernel/crypto/hash.rs
>>> @@ -0,0 +1,118 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Cryptographic Hash operations.
>>> +//!
>>> +//! C headers: [`include/crypto/hash.h`](../../../../include/crypto/hash.h)
>>> +
>>> +use crate::{
>>> +    error::{code::ENOMEM, from_err_ptr, to_result, Result},
>>> +    str::CStr,
>>> +};
>>> +use alloc::alloc::{alloc, dealloc};
>>> +use core::alloc::Layout;
>>> +
>>> +/// Corresponds to the kernel's `struct crypto_shash`.
>>> +///
>>> +/// # Invariants
>>> +///
>>> +/// The pointer is valid.
>>> +pub struct Shash(*mut bindings::crypto_shash);
>>> +
>>> +impl Drop for Shash {
>>> +    fn drop(&mut self) {
>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>> +        unsafe { bindings::crypto_free_shash(self.0) }
>>> +    }
>>> +}
>>> +
>>> +impl Shash {
>>> +    /// Creates a [`Shash`] object for a message digest handle.
>>> +    pub fn new(name: &CStr, t: u32, mask: u32) -> Result<Shash> {
>>> +        // SAFETY: FFI call.
>>
>> If there are no requirements then say so.
> 
> Understood. Looks like we have variation. Might be better to add the
> standard to Documentation/rust/coding-guidelines.rst or such?
> 
> ubuntu@ip-172-30-47-114:~/git/linux$ grep -r FFI rust/kernel|grep SAFE
> rust/kernel/task.rs:        // SAFETY: Just an FFI call with no additional safety requirements.
> rust/kernel/lib.rs:    // SAFETY: FFI call.
> rust/kernel/sync/arc.rs:            // SAFETY: There are no safety requirements for this FFI call.
> rust/kernel/sync/arc.rs:            // SAFETY: There are no safety requirements for this FFI call.
> rust/kernel/error.rs:        // SAFETY: Just an FFI call, there are no extra safety requirements.
> rust/kernel/error.rs:///     // SAFETY: FFI call.
> rust/kernel/error.rs:    // SAFETY: The FFI function does not deref the pointer.
> rust/kernel/error.rs:        // SAFETY: The FFI function does not deref the pointer.

I am currently working on more standardized safety comments. Until we 
agree on how it should be, you can just choose a wording that you think 
is OK.

> 
>>> +        let ptr =
>>> +            unsafe { from_err_ptr(bindings::crypto_alloc_shash(name.as_char_ptr(), t, mask)) }?;
>>> +        // INVARIANT: `ptr` is valid and non-null since `crypto_alloc_shash`
>>> +        // returned a valid pointer which was null-checked.
>>> +        Ok(Self(ptr))
>>> +    }
>>> +
>>> +    /// Sets optional key used by the hashing algorithm.
>>> +    pub fn setkey(&mut self, data: &[u8]) -> Result {
>>
>> This should be called `set_key`.
> 
> I thought that using C function names is a recommended way because
> it's easier for subsystem maintainers to review.

IMO having a `_` that separates words helps a lot with readability. 
Especially with `digestsize`. I also think that adding an `_` will not 
confuse the subsystem maintainers, so we should just do it.

> 
> 
>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>> +        to_result(unsafe {
>>> +            bindings::crypto_shash_setkey(self.0, data.as_ptr(), data.len() as u32)
>>> +        })
>>> +    }
>>> +
>>> +    /// Returns the size of the result of the transformation.
>>> +    pub fn digestsize(&self) -> u32 {
>>
>> This should be called `digest_size`.
> 
> Ditto.
> 
>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>> +        unsafe { bindings::crypto_shash_digestsize(self.0) }
>>> +    }
>>> +}
>>> +
>>> +/// Corresponds to the kernel's `struct shash_desc`.
>>> +///
>>> +/// # Invariants
>>> +///
>>> +/// The field `ptr` is valid.
>>> +pub struct ShashDesc<'a> {
>>> +    ptr: *mut bindings::shash_desc,
>>> +    tfm: &'a Shash,
>>> +    size: usize,
>>> +}
>>> +
>>> +impl Drop for ShashDesc<'_> {
>>> +    fn drop(&mut self) {
>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>> +        unsafe {
>>> +            dealloc(
>>> +                self.ptr.cast(),
>>> +                Layout::from_size_align(self.size, 2).unwrap(),
>>> +            );
>>
>> Why do we own the pointer (i.e. why can we deallocate the memory)? Add as
>> a TI (type invariant). Why are you using `dealloc`? Is there no C
>> function that allocates a `struct shash_desc`? Why is the alignment 2?
> 
> No C function that allocates `struct shash_desc`. kmalloc() family is
> used in the C side (or stack is used).
> 
> IIUC, the alignment isn't used in the kernel but dealloc() still
> requires, right? I'm not sure what number should be used here.

CC'ing Gary, since I am not familiar with `dealloc` in the kernel.
I think the value of the alignment should still be correct if at some 
point in the future `dealloc` starts to use it again.

> 
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +impl<'a> ShashDesc<'a> {
>>> +    /// Creates a [`ShashDesc`] object for a request data structure for message digest.
>>> +    pub fn new(tfm: &'a Shash) -> Result<Self> {
>>> +        // SAFETY: The type invariant guarantees that `tfm.0` pointer is valid.
>>> +        let size = core::mem::size_of::<bindings::shash_desc>()
>>> +            + unsafe { bindings::crypto_shash_descsize(tfm.0) } as usize;
>>> +        let layout = Layout::from_size_align(size, 2)?;
>>> +        let ptr = unsafe { alloc(layout) } as *mut bindings::shash_desc;
>>
>> Several things:
>> - The `SAFETY` comment for `crypto_shash_descsize` should be directly above
>>    the `unsafe` block,maybe factor that out into its own variable.
> 
> Ok.
> 
>> - Why is 2 the right alignment?
> 
> As long as the size is larger than alignment, alignment arugment is
> meaningless. Like dealloc, not sure what should be used.
> 
> 
>> - Missing `SAFETY` comment for `alloc`.
> 
> Will be fixed.
> 
>> - Why are you manually creating this layout from size and alignment? Is it
>>    not possible to do it via the `Layout` API?
> 
> What function should be used?

Maybe `Layout::new()`, `Layout::extend` and `Layout::repeat` might be 
enough?

> 
>>> +        if ptr.is_null() {
>>> +            return Err(ENOMEM);
>>> +        }
>>> +        // INVARIANT: `ptr` is valid and non-null since `alloc`
>>> +        // returned a valid pointer which was null-checked.
>>> +        let mut desc = ShashDesc { ptr, tfm, size };
>>> +        // SAFETY: `desc.ptr` is valid and non-null since `alloc`
>>> +        // returned a valid pointer which was null-checked.
>>> +        // Additionally, The type invariant guarantees that `tfm.0` is valid.
>>> +        unsafe { (*desc.ptr).tfm = desc.tfm.0 };
>>> +        desc.reset()?;
>>> +        Ok(desc)
>>> +    }
>>> +
>>> +    /// Re-initializes message digest.
>>> +    pub fn reset(&mut self) -> Result {
>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>> +        to_result(unsafe { bindings::crypto_shash_init(self.ptr) })
>>> +    }
>>> +
>>> +    /// Adds data to message digest for processing.
>>> +    pub fn update(&mut self, data: &[u8]) -> Result {
>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>> +        to_result(unsafe {
>>> +            bindings::crypto_shash_update(self.ptr, data.as_ptr(), data.len() as u32)
>>> +        })
>>
>> What if `data.len() > u32::MAX`?
> 
> The buffer might not be updated properly, I guess. Should check the case?

Not sure what we should do in that case, will bring it up at the next 
team meeting. In Rust, `write` and `read` functions often output the 
number of bytes that were actually read/written. So maybe we should also 
do that here? Then you could just return `u32::MAX` and the user would 
have to call again. We could also call the C side multiple times until 
the entire buffer has been processed. But as the C side only supports 
u32 anyway, I think it would be a rare occurrence for `data` to be large.

-- 
Cheers,
Benno

> 
>>> +    }
>>> +
>>> +    /// Calculates message digest.
>>> +    pub fn finalize(&mut self, output: &mut [u8]) -> Result {
>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>> +        to_result(unsafe { bindings::crypto_shash_final(self.ptr, output.as_mut_ptr()) })
>>
>> As already mentioned by Alex, this needs a size check.
> 
> Will be fixed.
> 
> 
> thanks,