Re: [RFC PATCH v2 1/2] rust: add synchronous message digest support

From: FUJITA Tomonori <fujita.tomonori@gmail.com>
To: benno.lossin@proton.me
Cc: fujita.tomonori@gmail.com, rust-for-linux@vger.kernel.org,
	gary@garyguo.net
Subject: Re: [RFC PATCH v2 1/2] rust: add synchronous message digest support
Date: Sun, 25 Jun 2023 20:55:07 +0900 (JST)	[thread overview]
Message-ID: <20230625.205507.24200574349942230.ubuntu@gmail.com> (raw)
In-Reply-To: <0a9af5fa-4df2-11da-b3cb-0a6b1d27fdc2@proton.me>

Hi,

On Sun, 25 Jun 2023 10:08:29 +0000
Benno Lossin <benno.lossin@proton.me> wrote:

(snip)

>>>> +        let ptr =
>>>> +            unsafe { from_err_ptr(bindings::crypto_alloc_shash(name.as_char_ptr(), t, mask)) }?;
>>>> +        // INVARIANT: `ptr` is valid and non-null since `crypto_alloc_shash`
>>>> +        // returned a valid pointer which was null-checked.
>>>> +        Ok(Self(ptr))
>>>> +    }
>>>> +
>>>> +    /// Sets optional key used by the hashing algorithm.
>>>> +    pub fn setkey(&mut self, data: &[u8]) -> Result {
>>>
>>> This should be called `set_key`.
>> 
>> I thought that using C function names is a recommended way because
>> it's easier for subsystem maintainers to review.
> 
> IMO having a `_` that separates words helps a lot with readability. 
> Especially with `digestsize`. I also think that adding an `_` will not 
> confuse the subsystem maintainers, so we should just do it.

Looks like `digestsize` is more popular in the tree so let's wait for
reviewing from the crypto maintainers:

ubuntu@ip-172-30-47-114:~/git/linux$ grep -or digestsize crypto/|wc -l
112
ubuntu@ip-172-30-47-114:~/git/linux$ grep -or digest_size crypto/|wc -l
37

>>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>>> +        to_result(unsafe {
>>>> +            bindings::crypto_shash_setkey(self.0, data.as_ptr(), data.len() as u32)
>>>> +        })
>>>> +    }
>>>> +
>>>> +    /// Returns the size of the result of the transformation.
>>>> +    pub fn digestsize(&self) -> u32 {
>>>
>>> This should be called `digest_size`.
>> 
>> Ditto.
>> 
>>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>>> +        unsafe { bindings::crypto_shash_digestsize(self.0) }
>>>> +    }
>>>> +}
>>>> +
>>>> +/// Corresponds to the kernel's `struct shash_desc`.
>>>> +///
>>>> +/// # Invariants
>>>> +///
>>>> +/// The field `ptr` is valid.
>>>> +pub struct ShashDesc<'a> {
>>>> +    ptr: *mut bindings::shash_desc,
>>>> +    tfm: &'a Shash,
>>>> +    size: usize,
>>>> +}
>>>> +
>>>> +impl Drop for ShashDesc<'_> {
>>>> +    fn drop(&mut self) {
>>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>>> +        unsafe {
>>>> +            dealloc(
>>>> +                self.ptr.cast(),
>>>> +                Layout::from_size_align(self.size, 2).unwrap(),
>>>> +            );
>>>
>>> Why do we own the pointer (i.e. why can we deallocate the memory)? Add as
>>> a TI (type invariant). Why are you using `dealloc`? Is there no C
>>> function that allocates a `struct shash_desc`? Why is the alignment 2?
>> 
>> No C function that allocates `struct shash_desc`. kmalloc() family is
>> used in the C side (or stack is used).
>> 
>> IIUC, the alignment isn't used in the kernel but dealloc() still
>> requires, right? I'm not sure what number should be used here.
> 
> CC'ing Gary, since I am not familiar with `dealloc` in the kernel.
> I think the value of the alignment should still be correct if at some 
> point in the future `dealloc` starts to use it again.
> 
>> 
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +impl<'a> ShashDesc<'a> {
>>>> +    /// Creates a [`ShashDesc`] object for a request data structure for message digest.
>>>> +    pub fn new(tfm: &'a Shash) -> Result<Self> {
>>>> +        // SAFETY: The type invariant guarantees that `tfm.0` pointer is valid.
>>>> +        let size = core::mem::size_of::<bindings::shash_desc>()
>>>> +            + unsafe { bindings::crypto_shash_descsize(tfm.0) } as usize;
>>>> +        let layout = Layout::from_size_align(size, 2)?;
>>>> +        let ptr = unsafe { alloc(layout) } as *mut bindings::shash_desc;
>>>
>>> Several things:
>>> - The `SAFETY` comment for `crypto_shash_descsize` should be directly above
>>>    the `unsafe` block,maybe factor that out into its own variable.
>> 
>> Ok.
>> 
>>> - Why is 2 the right alignment?
>> 
>> As long as the size is larger than alignment, alignment arugment is
>> meaningless. Like dealloc, not sure what should be used.
>> 
>> 
>>> - Missing `SAFETY` comment for `alloc`.
>> 
>> Will be fixed.
>> 
>>> - Why are you manually creating this layout from size and alignment? Is it
>>>    not possible to do it via the `Layout` API?
>> 
>> What function should be used?
> 
> Maybe `Layout::new()`, `Layout::extend` and `Layout::repeat` might be 
> enough?

new() needs type and extend() and repeat() need self; both is
irrelevant here.

>>>> +        if ptr.is_null() {
>>>> +            return Err(ENOMEM);
>>>> +        }
>>>> +        // INVARIANT: `ptr` is valid and non-null since `alloc`
>>>> +        // returned a valid pointer which was null-checked.
>>>> +        let mut desc = ShashDesc { ptr, tfm, size };
>>>> +        // SAFETY: `desc.ptr` is valid and non-null since `alloc`
>>>> +        // returned a valid pointer which was null-checked.
>>>> +        // Additionally, The type invariant guarantees that `tfm.0` is valid.
>>>> +        unsafe { (*desc.ptr).tfm = desc.tfm.0 };
>>>> +        desc.reset()?;
>>>> +        Ok(desc)
>>>> +    }
>>>> +
>>>> +    /// Re-initializes message digest.
>>>> +    pub fn reset(&mut self) -> Result {
>>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>>> +        to_result(unsafe { bindings::crypto_shash_init(self.ptr) })
>>>> +    }
>>>> +
>>>> +    /// Adds data to message digest for processing.
>>>> +    pub fn update(&mut self, data: &[u8]) -> Result {
>>>> +        // SAFETY: The type invariant guarantees that the pointer is valid.
>>>> +        to_result(unsafe {
>>>> +            bindings::crypto_shash_update(self.ptr, data.as_ptr(), data.len() as u32)
>>>> +        })
>>>
>>> What if `data.len() > u32::MAX`?
>> 
>> The buffer might not be updated properly, I guess. Should check the case?
> 
> Not sure what we should do in that case, will bring it up at the next 
> team meeting. In Rust, `write` and `read` functions often output the 
> number of bytes that were actually read/written. So maybe we should also 
> do that here? Then you could just return `u32::MAX` and the user would 
> have to call again. We could also call the C side multiple times until 
> the entire buffer has been processed. But as the C side only supports 
> u32 anyway, I think it would be a rare occurrence for `data` to be large.

I'll change the code to return an error in this case. I prefer not to
extend C logic (like calling a C function multiple times) but if there
is an official policy for Rust bindings, I'll change the code to
follow the policy.

thanks,