rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
@ 2024-02-14 17:24 Danilo Krummrich
  2024-02-14 19:27 ` Boqun Feng
  2024-02-16 16:53 ` Alice Ryhl
  0 siblings, 2 replies; 10+ messages in thread
From: Danilo Krummrich @ 2024-02-14 17:24 UTC (permalink / raw)
  To: ojeda, alex.gaynor, wedsonaf, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl
  Cc: rust-for-linux, linux-kernel, Danilo Krummrich

Add functions to convert a CString to upper- / lowercase, either
in-place or by creating a copy of the original CString.

Naming followes the one from the Rust stdlib, where functions starting
with 'to' create a copy and functions starting with 'make' perform an
in-place conversion.

This is required by the Nova project (GSP only Rust successor of
Nouveau) to convert stringified enum values (representing different GPU
chipsets) to strings in order to generate the corresponding firmware
paths. See also [1].

[1] https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/String.20manipulation.20in.20kernel.20Rust

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
Changes in V3:
  - add an `impl DerefMut for CString`, such that these functions can be defined
    for `CStr` as `&mut self` and still be called on a `CString`
Changes in V2:
  - expand commit message mentioning the use case
  - expand function doc comments to match the ones from Rust's stdlib
  - rename to_* to make_* and add the actual to_* implementations
---
 rust/kernel/str.rs | 81 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 80 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
index 7d848b83add4..02d6e510b852 100644
--- a/rust/kernel/str.rs
+++ b/rust/kernel/str.rs
@@ -5,7 +5,7 @@
 use alloc::alloc::AllocError;
 use alloc::vec::Vec;
 use core::fmt::{self, Write};
-use core::ops::{self, Deref, Index};
+use core::ops::{self, Deref, DerefMut, Index};
 
 use crate::{
     bindings,
@@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
         unsafe { core::mem::transmute(bytes) }
     }
 
+    /// Creates a mutable [`CStr`] from a `[u8]` without performing any
+    /// additional checks.
+    ///
+    /// # Safety
+    ///
+    /// `bytes` *must* end with a `NUL` byte, and should only have a single
+    /// `NUL` byte (or the string will be truncated).
+    #[inline]
+    pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
+        // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
+        unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
+    }
+
     /// Returns a C pointer to the string.
     #[inline]
     pub const fn as_char_ptr(&self) -> *const core::ffi::c_char {
@@ -206,6 +219,32 @@ pub unsafe fn as_str_unchecked(&self) -> &str {
     pub fn to_cstring(&self) -> Result<CString, AllocError> {
         CString::try_from(self)
     }
+
+    /// Converts this [`CStr`] to its ASCII lower case equivalent in-place.
+    ///
+    /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
+    /// but non-ASCII letters are unchanged.
+    ///
+    /// To return a new lowercased value without modifying the existing one, use
+    /// [`to_ascii_lowercase()`].
+    ///
+    /// [`to_ascii_lowercase()`]: #method.to_ascii_lowercase
+    pub fn make_ascii_lowercase(&mut self) {
+        self.0.make_ascii_lowercase();
+    }
+
+    /// Converts this [`CStr`] to its ASCII upper case equivalent in-place.
+    ///
+    /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
+    /// but non-ASCII letters are unchanged.
+    ///
+    /// To return a new uppercased value without modifying the existing one, use
+    /// [`to_ascii_uppercase()`].
+    ///
+    /// [`to_ascii_uppercase()`]: #method.to_ascii_uppercase
+    pub fn make_ascii_uppercase(&mut self) {
+        self.0.make_ascii_uppercase();
+    }
 }
 
 impl fmt::Display for CStr {
@@ -581,6 +620,40 @@ pub fn try_from_fmt(args: fmt::Arguments<'_>) -> Result<Self, Error> {
         // exist in the buffer.
         Ok(Self { buf })
     }
+
+    /// Returns a copy of this [`CString`] where each character is mapped to its
+    /// ASCII lower case equivalent.
+    ///
+    /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
+    /// but non-ASCII letters are unchanged.
+    ///
+    /// To lowercase the value in-place, use [`make_ascii_lowercase`].
+    ///
+    /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
+    pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
+        let mut s = (*self).to_cstring()?;
+
+        s.make_ascii_lowercase();
+
+        return Ok(s);
+    }
+
+    /// Returns a copy of this [`CString`] where each character is mapped to its
+    /// ASCII upper case equivalent.
+    ///
+    /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
+    /// but non-ASCII letters are unchanged.
+    ///
+    /// To uppercase the value in-place, use [`make_ascii_uppercase`].
+    ///
+    /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
+    pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
+        let mut s = (*self).to_cstring()?;
+
+        s.make_ascii_uppercase();
+
+        return Ok(s);
+    }
 }
 
 impl Deref for CString {
@@ -593,6 +666,12 @@ fn deref(&self) -> &Self::Target {
     }
 }
 
+impl DerefMut for CString {
+    fn deref_mut(&mut self) -> &mut Self::Target {
+        unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
+    }
+}
+
 impl<'a> TryFrom<&'a CStr> for CString {
     type Error = AllocError;
 

base-commit: 7e90b5c295ec1e47c8ad865429f046970c549a66
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
  2024-02-14 17:24 [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString Danilo Krummrich
@ 2024-02-14 19:27 ` Boqun Feng
  2024-02-14 19:59   ` Alice Ryhl
  2024-02-16 16:53 ` Alice Ryhl
  1 sibling, 1 reply; 10+ messages in thread
From: Boqun Feng @ 2024-02-14 19:27 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: ojeda, alex.gaynor, wedsonaf, gary, bjorn3_gh, benno.lossin,
	a.hindborg, aliceryhl, rust-for-linux, linux-kernel

On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> Add functions to convert a CString to upper- / lowercase, either
> in-place or by creating a copy of the original CString.
> 
> Naming followes the one from the Rust stdlib, where functions starting
> with 'to' create a copy and functions starting with 'make' perform an
> in-place conversion.
> 
> This is required by the Nova project (GSP only Rust successor of
> Nouveau) to convert stringified enum values (representing different GPU
> chipsets) to strings in order to generate the corresponding firmware
> paths. See also [1].
> 
> [1] https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/String.20manipulation.20in.20kernel.20Rust
> 
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
> Changes in V3:
>   - add an `impl DerefMut for CString`, such that these functions can be defined
>     for `CStr` as `&mut self` and still be called on a `CString`
> Changes in V2:
>   - expand commit message mentioning the use case
>   - expand function doc comments to match the ones from Rust's stdlib
>   - rename to_* to make_* and add the actual to_* implementations
> ---
>  rust/kernel/str.rs | 81 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
> index 7d848b83add4..02d6e510b852 100644
> --- a/rust/kernel/str.rs
> +++ b/rust/kernel/str.rs
> @@ -5,7 +5,7 @@
>  use alloc::alloc::AllocError;
>  use alloc::vec::Vec;
>  use core::fmt::{self, Write};
> -use core::ops::{self, Deref, Index};
> +use core::ops::{self, Deref, DerefMut, Index};
>  
>  use crate::{
>      bindings,
> @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
>          unsafe { core::mem::transmute(bytes) }
>      }
>  
> +    /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> +    /// additional checks.
> +    ///
> +    /// # Safety
> +    ///
> +    /// `bytes` *must* end with a `NUL` byte, and should only have a single
> +    /// `NUL` byte (or the string will be truncated).
> +    #[inline]
> +    pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> +        // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> +        unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }

First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
I think the dereference (or reborrow) is only safe if `CStr` is
`#[repr(transparent)]. I.e.

	#[repr(transparent)]
	pub struct CStr([u8]);

with that you can implement the function as (you can still use `cast()`
implementation, but I sometimes find `transmute` is more simple).

    pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
	// SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
	// safe to do, and per the function safety requirement, `bytes`
	// is a valid `CStr`.
	unsafe { core::mem::transmute(bytes) }
    }

but this is just my thought, better wait for others' feedback as well.

Regards,
Boqun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
  2024-02-14 19:27 ` Boqun Feng
@ 2024-02-14 19:59   ` Alice Ryhl
  2024-02-15  1:18     ` Boqun Feng
  0 siblings, 1 reply; 10+ messages in thread
From: Alice Ryhl @ 2024-02-14 19:59 UTC (permalink / raw)
  To: Boqun Feng, Danilo Krummrich
  Cc: ojeda, alex.gaynor, wedsonaf, gary, bjorn3_gh, benno.lossin,
	a.hindborg, aliceryhl, rust-for-linux, linux-kernel

On 2/14/24 20:27, Boqun Feng wrote:
> On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
>> Add functions to convert a CString to upper- / lowercase, either
>> in-place or by creating a copy of the original CString.
>>
>> Naming followes the one from the Rust stdlib, where functions starting
>> with 'to' create a copy and functions starting with 'make' perform an
>> in-place conversion.
>>
>> This is required by the Nova project (GSP only Rust successor of
>> Nouveau) to convert stringified enum values (representing different GPU
>> chipsets) to strings in order to generate the corresponding firmware
>> paths. See also [1].
>>
>> [1] https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/String.20manipulation.20in.20kernel.20Rust
>>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>> Changes in V3:
>>    - add an `impl DerefMut for CString`, such that these functions can be defined
>>      for `CStr` as `&mut self` and still be called on a `CString`
>> Changes in V2:
>>    - expand commit message mentioning the use case
>>    - expand function doc comments to match the ones from Rust's stdlib
>>    - rename to_* to make_* and add the actual to_* implementations
>> ---
>>   rust/kernel/str.rs | 81 +++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 80 insertions(+), 1 deletion(-)
>>
>> diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
>> index 7d848b83add4..02d6e510b852 100644
>> --- a/rust/kernel/str.rs
>> +++ b/rust/kernel/str.rs
>> @@ -5,7 +5,7 @@
>>   use alloc::alloc::AllocError;
>>   use alloc::vec::Vec;
>>   use core::fmt::{self, Write};
>> -use core::ops::{self, Deref, Index};
>> +use core::ops::{self, Deref, DerefMut, Index};
>>   
>>   use crate::{
>>       bindings,
>> @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
>>           unsafe { core::mem::transmute(bytes) }
>>       }
>>   
>> +    /// Creates a mutable [`CStr`] from a `[u8]` without performing any
>> +    /// additional checks.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// `bytes` *must* end with a `NUL` byte, and should only have a single
>> +    /// `NUL` byte (or the string will be truncated).
>> +    #[inline]
>> +    pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
>> +        // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
>> +        unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
> 
> First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> I think the dereference (or reborrow) is only safe if `CStr` is
> `#[repr(transparent)]. I.e.
> 
> 	#[repr(transparent)]
> 	pub struct CStr([u8]);
> 
> with that you can implement the function as (you can still use `cast()`
> implementation, but I sometimes find `transmute` is more simple).
> 
>      pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> 	// SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> 	// safe to do, and per the function safety requirement, `bytes`
> 	// is a valid `CStr`.
> 	unsafe { core::mem::transmute(bytes) }
>      }
> 
> but this is just my thought, better wait for others' feedback as well.

Transmuting references is generally frowned upon. It's better to use a 
pointer cast.

As for .cast() vs the `as` operator, I'm not sure you can use .cast() in 
this case since the pointers are unsized. So you might have to use `as` 
instead.

Alice

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
  2024-02-14 19:59   ` Alice Ryhl
@ 2024-02-15  1:18     ` Boqun Feng
  2024-02-15  9:38       ` Alice Ryhl
  0 siblings, 1 reply; 10+ messages in thread
From: Boqun Feng @ 2024-02-15  1:18 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Danilo Krummrich, ojeda, alex.gaynor, wedsonaf, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, rust-for-linux,
	linux-kernel

On Wed, Feb 14, 2024 at 08:59:06PM +0100, Alice Ryhl wrote:
> On 2/14/24 20:27, Boqun Feng wrote:
> > On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> > > Add functions to convert a CString to upper- / lowercase, either
> > > in-place or by creating a copy of the original CString.
> > > 
> > > Naming followes the one from the Rust stdlib, where functions starting
> > > with 'to' create a copy and functions starting with 'make' perform an
> > > in-place conversion.
> > > 
> > > This is required by the Nova project (GSP only Rust successor of
> > > Nouveau) to convert stringified enum values (representing different GPU
> > > chipsets) to strings in order to generate the corresponding firmware
> > > paths. See also [1].
> > > 
> > > [1] https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/String.20manipulation.20in.20kernel.20Rust
> > > 
> > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > ---
> > > Changes in V3:
> > >    - add an `impl DerefMut for CString`, such that these functions can be defined
> > >      for `CStr` as `&mut self` and still be called on a `CString`
> > > Changes in V2:
> > >    - expand commit message mentioning the use case
> > >    - expand function doc comments to match the ones from Rust's stdlib
> > >    - rename to_* to make_* and add the actual to_* implementations
> > > ---
> > >   rust/kernel/str.rs | 81 +++++++++++++++++++++++++++++++++++++++++++++-
> > >   1 file changed, 80 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
> > > index 7d848b83add4..02d6e510b852 100644
> > > --- a/rust/kernel/str.rs
> > > +++ b/rust/kernel/str.rs
> > > @@ -5,7 +5,7 @@
> > >   use alloc::alloc::AllocError;
> > >   use alloc::vec::Vec;
> > >   use core::fmt::{self, Write};
> > > -use core::ops::{self, Deref, Index};
> > > +use core::ops::{self, Deref, DerefMut, Index};
> > >   use crate::{
> > >       bindings,
> > > @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
> > >           unsafe { core::mem::transmute(bytes) }
> > >       }
> > > +    /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> > > +    /// additional checks.
> > > +    ///
> > > +    /// # Safety
> > > +    ///
> > > +    /// `bytes` *must* end with a `NUL` byte, and should only have a single
> > > +    /// `NUL` byte (or the string will be truncated).
> > > +    #[inline]
> > > +    pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > +        // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> > > +        unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
> > 
> > First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> > I think the dereference (or reborrow) is only safe if `CStr` is
> > `#[repr(transparent)]. I.e.
> > 
> > 	#[repr(transparent)]
> > 	pub struct CStr([u8]);
> > 
> > with that you can implement the function as (you can still use `cast()`
> > implementation, but I sometimes find `transmute` is more simple).
> > 
> >      pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > 	// SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> > 	// safe to do, and per the function safety requirement, `bytes`
> > 	// is a valid `CStr`.
> > 	unsafe { core::mem::transmute(bytes) }
> >      }
> > 
> > but this is just my thought, better wait for others' feedback as well.
> 
> Transmuting references is generally frowned upon. It's better to use a
> pointer cast.
> 

Ok, but honestly, I don't think the pointer casting is better ;-) What
wants to be done here is simply converting a `&mut [u8]` to `&mut CStr`,
adding two levels of pointer casting is kinda noise. (Also
`from_bytes_with_nul` uses `transmute` as well).

> As for .cast() vs the `as` operator, I'm not sure you can use .cast() in
> this case since the pointers are unsized. So you might have to use `as`
> instead.
> 

You're right, that's a bit unfortunate..

Regards,
Boqun

> Alice

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
  2024-02-15  1:18     ` Boqun Feng
@ 2024-02-15  9:38       ` Alice Ryhl
  2024-02-15 16:51         ` Boqun Feng
  0 siblings, 1 reply; 10+ messages in thread
From: Alice Ryhl @ 2024-02-15  9:38 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Alice Ryhl, Danilo Krummrich, ojeda, alex.gaynor, wedsonaf, gary,
	bjorn3_gh, benno.lossin, a.hindborg, rust-for-linux,
	linux-kernel

On Thu, Feb 15, 2024 at 2:18 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Wed, Feb 14, 2024 at 08:59:06PM +0100, Alice Ryhl wrote:
> > On 2/14/24 20:27, Boqun Feng wrote:
> > > On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> > > > --- a/rust/kernel/str.rs
> > > > +++ b/rust/kernel/str.rs
> > > > @@ -5,7 +5,7 @@
> > > >   use alloc::alloc::AllocError;
> > > >   use alloc::vec::Vec;
> > > >   use core::fmt::{self, Write};
> > > > -use core::ops::{self, Deref, Index};
> > > > +use core::ops::{self, Deref, DerefMut, Index};
> > > >   use crate::{
> > > >       bindings,
> > > > @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
> > > >           unsafe { core::mem::transmute(bytes) }
> > > >       }
> > > > +    /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> > > > +    /// additional checks.
> > > > +    ///
> > > > +    /// # Safety
> > > > +    ///
> > > > +    /// `bytes` *must* end with a `NUL` byte, and should only have a single
> > > > +    /// `NUL` byte (or the string will be truncated).
> > > > +    #[inline]
> > > > +    pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > > +        // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> > > > +        unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
> > >
> > > First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> > > I think the dereference (or reborrow) is only safe if `CStr` is
> > > `#[repr(transparent)]. I.e.
> > >
> > >     #[repr(transparent)]
> > >     pub struct CStr([u8]);
> > >
> > > with that you can implement the function as (you can still use `cast()`
> > > implementation, but I sometimes find `transmute` is more simple).
> > >
> > >      pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > >     // SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> > >     // safe to do, and per the function safety requirement, `bytes`
> > >     // is a valid `CStr`.
> > >     unsafe { core::mem::transmute(bytes) }
> > >      }
> > >
> > > but this is just my thought, better wait for others' feedback as well.
> >
> > Transmuting references is generally frowned upon. It's better to use a
> > pointer cast.
> >
>
> Ok, but honestly, I don't think the pointer casting is better ;-) What
> wants to be done here is simply converting a `&mut [u8]` to `&mut CStr`,
> adding two levels of pointer casting is kinda noise. (Also
> `from_bytes_with_nul` uses `transmute` as well).

Here's my logic for preferring pointer casts: Transmute raises
questions about the layout of fat pointers, whereas pointer casts are
obviously okay.

Alice

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
  2024-02-15  9:38       ` Alice Ryhl
@ 2024-02-15 16:51         ` Boqun Feng
  2024-02-16  9:09           ` Alice Ryhl
  0 siblings, 1 reply; 10+ messages in thread
From: Boqun Feng @ 2024-02-15 16:51 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Alice Ryhl, Danilo Krummrich, ojeda, alex.gaynor, wedsonaf, gary,
	bjorn3_gh, benno.lossin, a.hindborg, rust-for-linux,
	linux-kernel

On Thu, Feb 15, 2024 at 10:38:07AM +0100, Alice Ryhl wrote:
> On Thu, Feb 15, 2024 at 2:18 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Wed, Feb 14, 2024 at 08:59:06PM +0100, Alice Ryhl wrote:
> > > On 2/14/24 20:27, Boqun Feng wrote:
> > > > On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> > > > > --- a/rust/kernel/str.rs
> > > > > +++ b/rust/kernel/str.rs
> > > > > @@ -5,7 +5,7 @@
> > > > >   use alloc::alloc::AllocError;
> > > > >   use alloc::vec::Vec;
> > > > >   use core::fmt::{self, Write};
> > > > > -use core::ops::{self, Deref, Index};
> > > > > +use core::ops::{self, Deref, DerefMut, Index};
> > > > >   use crate::{
> > > > >       bindings,
> > > > > @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
> > > > >           unsafe { core::mem::transmute(bytes) }
> > > > >       }
> > > > > +    /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> > > > > +    /// additional checks.
> > > > > +    ///
> > > > > +    /// # Safety
> > > > > +    ///
> > > > > +    /// `bytes` *must* end with a `NUL` byte, and should only have a single
> > > > > +    /// `NUL` byte (or the string will be truncated).
> > > > > +    #[inline]
> > > > > +    pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > > > +        // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> > > > > +        unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
> > > >
> > > > First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> > > > I think the dereference (or reborrow) is only safe if `CStr` is
> > > > `#[repr(transparent)]. I.e.
> > > >
> > > >     #[repr(transparent)]
> > > >     pub struct CStr([u8]);
> > > >
> > > > with that you can implement the function as (you can still use `cast()`
> > > > implementation, but I sometimes find `transmute` is more simple).
> > > >
> > > >      pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > >     // SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> > > >     // safe to do, and per the function safety requirement, `bytes`
> > > >     // is a valid `CStr`.
> > > >     unsafe { core::mem::transmute(bytes) }
> > > >      }
> > > >
> > > > but this is just my thought, better wait for others' feedback as well.
> > >
> > > Transmuting references is generally frowned upon. It's better to use a
> > > pointer cast.
> > >
> >
> > Ok, but honestly, I don't think the pointer casting is better ;-) What
> > wants to be done here is simply converting a `&mut [u8]` to `&mut CStr`,
> > adding two levels of pointer casting is kinda noise. (Also
> > `from_bytes_with_nul` uses `transmute` as well).
> 
> Here's my logic for preferring pointer casts: Transmute raises
> questions about the layout of fat pointers, whereas pointer casts are
> obviously okay.
> 

But in this case, eventually you need to worry about fat pointer layout
when you dereference the `*mut CStr`, right? In other words, the
dereference is only safe if `*mut [u8]` has the same fat pointer layout
as `*mut CStr`. I prefer to transmute here because it's a newtype
paradigm, and transmute kinda makes that clear.

Regards,
Boqun

> Alice

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
  2024-02-15 16:51         ` Boqun Feng
@ 2024-02-16  9:09           ` Alice Ryhl
  0 siblings, 0 replies; 10+ messages in thread
From: Alice Ryhl @ 2024-02-16  9:09 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Alice Ryhl, Danilo Krummrich, ojeda, alex.gaynor, wedsonaf, gary,
	bjorn3_gh, benno.lossin, a.hindborg, rust-for-linux,
	linux-kernel

On Thu, Feb 15, 2024 at 5:51 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Thu, Feb 15, 2024 at 10:38:07AM +0100, Alice Ryhl wrote:
> > On Thu, Feb 15, 2024 at 2:18 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> > >
> > > On Wed, Feb 14, 2024 at 08:59:06PM +0100, Alice Ryhl wrote:
> > > > On 2/14/24 20:27, Boqun Feng wrote:
> > > > > On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> > > > > > --- a/rust/kernel/str.rs
> > > > > > +++ b/rust/kernel/str.rs
> > > > > > @@ -5,7 +5,7 @@
> > > > > >   use alloc::alloc::AllocError;
> > > > > >   use alloc::vec::Vec;
> > > > > >   use core::fmt::{self, Write};
> > > > > > -use core::ops::{self, Deref, Index};
> > > > > > +use core::ops::{self, Deref, DerefMut, Index};
> > > > > >   use crate::{
> > > > > >       bindings,
> > > > > > @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
> > > > > >           unsafe { core::mem::transmute(bytes) }
> > > > > >       }
> > > > > > +    /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> > > > > > +    /// additional checks.
> > > > > > +    ///
> > > > > > +    /// # Safety
> > > > > > +    ///
> > > > > > +    /// `bytes` *must* end with a `NUL` byte, and should only have a single
> > > > > > +    /// `NUL` byte (or the string will be truncated).
> > > > > > +    #[inline]
> > > > > > +    pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > > > > +        // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> > > > > > +        unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
> > > > >
> > > > > First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> > > > > I think the dereference (or reborrow) is only safe if `CStr` is
> > > > > `#[repr(transparent)]. I.e.
> > > > >
> > > > >     #[repr(transparent)]
> > > > >     pub struct CStr([u8]);
> > > > >
> > > > > with that you can implement the function as (you can still use `cast()`
> > > > > implementation, but I sometimes find `transmute` is more simple).
> > > > >
> > > > >      pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > > >     // SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> > > > >     // safe to do, and per the function safety requirement, `bytes`
> > > > >     // is a valid `CStr`.
> > > > >     unsafe { core::mem::transmute(bytes) }
> > > > >      }
> > > > >
> > > > > but this is just my thought, better wait for others' feedback as well.
> > > >
> > > > Transmuting references is generally frowned upon. It's better to use a
> > > > pointer cast.
> > > >
> > >
> > > Ok, but honestly, I don't think the pointer casting is better ;-) What
> > > wants to be done here is simply converting a `&mut [u8]` to `&mut CStr`,
> > > adding two levels of pointer casting is kinda noise. (Also
> > > `from_bytes_with_nul` uses `transmute` as well).
> >
> > Here's my logic for preferring pointer casts: Transmute raises
> > questions about the layout of fat pointers, whereas pointer casts are
> > obviously okay.
> >
>
> But in this case, eventually you need to worry about fat pointer layout
> when you dereference the `*mut CStr`, right? In other words, the
> dereference is only safe if `*mut [u8]` has the same fat pointer layout
> as `*mut CStr`. I prefer to transmute here because it's a newtype
> paradigm, and transmute kinda makes that clear.

No, if the `*mut CStr` and `*mut [u8]` types disagree on whether the
data or vtable pointer is first in the layout, then an as cast should
swap them.

The question of whether their vtables (well I guess it's just a length
in this case) are compatible is separate.

Alice

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
  2024-02-14 17:24 [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString Danilo Krummrich
  2024-02-14 19:27 ` Boqun Feng
@ 2024-02-16 16:53 ` Alice Ryhl
  2024-02-16 17:11   ` Danilo Krummrich
  1 sibling, 1 reply; 10+ messages in thread
From: Alice Ryhl @ 2024-02-16 16:53 UTC (permalink / raw)
  To: dakr
  Cc: a.hindborg, alex.gaynor, aliceryhl, benno.lossin, bjorn3_gh,
	boqun.feng, gary, linux-kernel, ojeda, rust-for-linux, wedsonaf

> +    pub fn make_ascii_lowercase(&mut self) {
> +        self.0.make_ascii_lowercase();
> +    }

It's important to note here that this doesn't remove or introduce NUL
bytes.

pub fn make_ascii_lowercase(&mut self) {
    // INVARIANT: This doesn't introduce or remove NUL bytes in the c
    // string.
    self.0.make_ascii_lowercase();
}

Ditto for make_ascii_uppercase. (But not the to_* methods.)

> +    /// Returns a copy of this [`CString`] where each character is mapped to its
> +    /// ASCII lower case equivalent.
> +    ///
> +    /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
> +    /// but non-ASCII letters are unchanged.
> +    ///
> +    /// To lowercase the value in-place, use [`make_ascii_lowercase`].
> +    ///
> +    /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
> +    pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
> +        let mut s = (*self).to_cstring()?;
> +
> +        s.make_ascii_lowercase();
> +
> +        return Ok(s);
> +    }
> +
> +    /// Returns a copy of this [`CString`] where each character is mapped to its
> +    /// ASCII upper case equivalent.
> +    ///
> +    /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
> +    /// but non-ASCII letters are unchanged.
> +    ///
> +    /// To uppercase the value in-place, use [`make_ascii_uppercase`].
> +    ///
> +    /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
> +    pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
> +        let mut s = (*self).to_cstring()?;
> +
> +        s.make_ascii_uppercase();
> +
> +        return Ok(s);
> +    }

Please move these to `CStr` as well.

> +impl DerefMut for CString {
> +    fn deref_mut(&mut self) -> &mut Self::Target {
> +        unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
> +    }
> +}

Needs a safety comment.

impl DerefMut for CString {
    fn deref_mut(&mut self) -> &mut Self::Target {
        // SAFETY: A `CString` is always NUL-terminated and contains no
	// other NUL bytes.
        unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
    }
}

Alice

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
  2024-02-16 16:53 ` Alice Ryhl
@ 2024-02-16 17:11   ` Danilo Krummrich
  2024-02-16 19:25     ` Alice Ryhl
  0 siblings, 1 reply; 10+ messages in thread
From: Danilo Krummrich @ 2024-02-16 17:11 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: a.hindborg, alex.gaynor, benno.lossin, bjorn3_gh, boqun.feng,
	gary, linux-kernel, ojeda, rust-for-linux, wedsonaf

On 2/16/24 17:53, Alice Ryhl wrote:
>> +    pub fn make_ascii_lowercase(&mut self) {
>> +        self.0.make_ascii_lowercase();
>> +    }
> 
> It's important to note here that this doesn't remove or introduce NUL
> bytes.
> 
> pub fn make_ascii_lowercase(&mut self) {
>      // INVARIANT: This doesn't introduce or remove NUL bytes in the c
>      // string.
>      self.0.make_ascii_lowercase();
> }
> 
> Ditto for make_ascii_uppercase. (But not the to_* methods.)
> 
>> +    /// Returns a copy of this [`CString`] where each character is mapped to its
>> +    /// ASCII lower case equivalent.
>> +    ///
>> +    /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
>> +    /// but non-ASCII letters are unchanged.
>> +    ///
>> +    /// To lowercase the value in-place, use [`make_ascii_lowercase`].
>> +    ///
>> +    /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
>> +    pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
>> +        let mut s = (*self).to_cstring()?;
>> +
>> +        s.make_ascii_lowercase();
>> +
>> +        return Ok(s);
>> +    }
>> +
>> +    /// Returns a copy of this [`CString`] where each character is mapped to its
>> +    /// ASCII upper case equivalent.
>> +    ///
>> +    /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
>> +    /// but non-ASCII letters are unchanged.
>> +    ///
>> +    /// To uppercase the value in-place, use [`make_ascii_uppercase`].
>> +    ///
>> +    /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
>> +    pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
>> +        let mut s = (*self).to_cstring()?;
>> +
>> +        s.make_ascii_uppercase();
>> +
>> +        return Ok(s);
>> +    }
> 
> Please move these to `CStr` as well.

That would result into two copies if I actually want a CString, wouldn't it?

Also, what would be the use case? And even if someone wants to have a CStr
again, couldn't we just deref the resulting CString?

- Danilo

> 
>> +impl DerefMut for CString {
>> +    fn deref_mut(&mut self) -> &mut Self::Target {
>> +        unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
>> +    }
>> +}
> 
> Needs a safety comment.
> 
> impl DerefMut for CString {
>      fn deref_mut(&mut self) -> &mut Self::Target {
>          // SAFETY: A `CString` is always NUL-terminated and contains no
> 	// other NUL bytes.
>          unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
>      }
> }
> 
> Alice
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString
  2024-02-16 17:11   ` Danilo Krummrich
@ 2024-02-16 19:25     ` Alice Ryhl
  0 siblings, 0 replies; 10+ messages in thread
From: Alice Ryhl @ 2024-02-16 19:25 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl
  Cc: a.hindborg, alex.gaynor, benno.lossin, bjorn3_gh, boqun.feng,
	gary, linux-kernel, ojeda, rust-for-linux, wedsonaf

On 2/16/24 18:11, Danilo Krummrich wrote:
> On 2/16/24 17:53, Alice Ryhl wrote:
>>> +    /// Returns a copy of this [`CString`] where each character is 
>>> mapped to its
>>> +    /// ASCII upper case equivalent.
>>> +    ///
>>> +    /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
>>> +    /// but non-ASCII letters are unchanged.
>>> +    ///
>>> +    /// To uppercase the value in-place, use [`make_ascii_uppercase`].
>>> +    ///
>>> +    /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
>>> +    pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
>>> +        let mut s = (*self).to_cstring()?;
>>> +
>>> +        s.make_ascii_uppercase();
>>> +
>>> +        return Ok(s);
>>> +    }
>>
>> Please move these to `CStr` as well.
> 
> That would result into two copies if I actually want a CString, wouldn't 
> it?
> 
> Also, what would be the use case? And even if someone wants to have a CStr
> again, couldn't we just deref the resulting CString?

To clarify, I want you to move it to the `impl CStr` block. That changes 
the type of the `self` argument. I don't want you to change the return 
type - that should still be `CString`.

Currently, if I have a `&CStr` and I want an uppercase `CString`, I 
can't do that with this method.

Alice

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-16 19:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-14 17:24 [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString Danilo Krummrich
2024-02-14 19:27 ` Boqun Feng
2024-02-14 19:59   ` Alice Ryhl
2024-02-15  1:18     ` Boqun Feng
2024-02-15  9:38       ` Alice Ryhl
2024-02-15 16:51         ` Boqun Feng
2024-02-16  9:09           ` Alice Ryhl
2024-02-16 16:53 ` Alice Ryhl
2024-02-16 17:11   ` Danilo Krummrich
2024-02-16 19:25     ` Alice Ryhl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).