rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Rust block device driver API and null block driver
@ 2024-03-13 11:05 Andreas Hindborg
  2024-03-13 11:05 ` [RFC PATCH 1/5] rust: block: introduce `kernel::block::mq` module Andreas Hindborg
                   ` (6 more replies)
  0 siblings, 7 replies; 50+ messages in thread
From: Andreas Hindborg @ 2024-03-13 11:05 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Keith Busch, Damien Le Moal,
	Bart Van Assche, Hannes Reinecke, linux-block
  Cc: Andreas Hindborg, Niklas Cassel, Greg KH, Matthew Wilcox,
	Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
	Chaitanya Kulkarni, Luis Chamberlain, Yexuan Yang,
	Sergio González Collado, Joel Granados,
	Pankaj Raghav (Samsung),
	Daniel Gomez, open list, rust-for-linux, lsf-pc, gost.dev

From: Andreas Hindborg <a.hindborg@samsung.com>

Hi All!

This is the second version of the Rust block device driver API and the Rust null
block driver. The context and motivation can be seen in cover letter of the RFC
v1 [1]. If more context is required, a talk about this effort was recorded at
LPC [2]. I hope to be able to discuss this series at LSF this year [3].

The series is still in RFC state, as many dependencies have not yet been merged.

Changes from v1:
 - Improved request lifetime tracking
 - Changed rust null block driver name to `rnull`
 - Added timer completion support to `rnull`
 - Added softirq completion support to `rnull`
 - Moved to `xarray` for memory backing
 - Adopted the folio abstraction where applicable
 - Added `SAFETY` comments to all unsafe blocks
 - Improved documentation and added examples
 - Fixed up `vtable` default method behavior
 - Dropped `unsafe` from vtable builder
 - Fixed a safety issue in `RawWriter`
 - Dropped the `TagSetRef` abstraction
 - Renamed `Bio::iter` -> `Bio::raw_iter`
 - Updated `core::fmt::Display for Bio` for readability
 - Simplified `BioIterator::next`
 - Documented and refactored `bvec_iter` functions
 - Moved cache alignment out of `Lock`
 - Added MAINTAINER entry

Thanks for all the input on v1!


Performance
===========

Rather than include an abundance of performance numbers in this letter, I would
refer to [4] for a comparison of performance to the C null block driver. In
general across all of the benchmark configurations, the mean of the difference
is -2.5%, with individual configurations showing [-10%;30%] difference in IO/s.


Request lifetime modeling
=========================

While implementing timer completion, it became clear that the request lifetime
modeling applied in the v1 RFC was not adequate. With that modeling, combined
with the timer callback feature, a developer would be able to write safe Rust
code that violates the Rust invariants for references, thus leading to undefined
behavior. A similar situation would happen in the completion path when
developers write code to convert from a completion id to a `&Request`.

To make these situations safe, and thus free from undefined behavior, the
`Request` type now applies reference counting. The logic piggybacks on the
`atomic_t ref` field of the C `struct request`. In order to make this work, the
Rust code has to be able to free requests when the refcount reaches zero.
Therefore, the series exposes (EXPORT_SYMBOL_GPL) the internal blk-mq symbol
`__blk_mq_free_request`.

I am curious what the community thinks of this approach and whether it is OK to
expose this symbol so that Rust can call it.

One consequence of these changes is that requests that are processed by a Rust
block device driver, are freed at a different place in the completion path, than
requests processed by C drivers. Requests processed by C drivers are typically
freed and recycled inline during the call to `blk_mq_complete_request`. The
requests processed by a Rust driver will be recycled when the `RequestRef` is
dropped, which is usually at some time after the request has been completed.

There does not seem to be any statistically significant effect on performance
for the rust null block implementation due to this change.


Dependencies
============

This series is based on the branch `rnull-deps-v6.8` of [5]. This tree is based
on the upstream `v6.8` tag and has been prepared with dependencies for this
series:

- [17] rust: add `CacheAligned` for easy cache line alignment of values
- [16] rust: xarray: add mutable access through `Guard`
- [15] rust: hrtimer: introduce hrtimer support
- [14] rust: add `Opaque::try_ffi_init`
- [13] rust: make `ARef` repr transparent
- [12] rust: folio: add `with_slice_into_page()`
- [11] rust: page: add `from_raw()`
- [10] rust: page: add `with_slice_into_page()`
- [ 9] IM: implement `ForeignOwnable` for `Pin`
- [ 8] IM: add `module_params` macro
- [ 7] rust: xarray: fix example
- [ 6] LIST: rust: xarray: Add an abstraction for XArray
- [ 5] LIST: rust: types: add FOREIGN_ALIGN to ForeignOwnable
- [ 4] LIST: rust: lock: implement `IrqSaveBackend` for `SpinLock`
- [ 3] LIST: rust: lock: add support for `Lock::lock_irqsave`
- [ 2] LIST: rust: add improved version of `ForeignOwnable::borrow_mut`
- [ 1] LIST: rust: folio: introduce basic support for folios
- [ 0] LIST: rust: add abstraction for `struct page`

Dependencies 0-6 are patches that have appeared on the list but are not yet
merged. Dependencies 8-9 are imports from the `rust` branch [6,7] that have not
yet been submitted. Dependencies 10-17 I will submit independently in the near
future.

If the upstream maintainers agree, then when the dependencies are merged, I will
eventually start submitting PRs to Jens. I fully commit to maintain this code as
indicated in the MAINTAINERS entry.

Best regards,
Andreas Hindborg


[1] https://lore.kernel.org/rust-for-linux/20230503090708.2524310-1-nmi@metaspace.dk
[2] https://lpc.events/event/17/contributions/1425
[3] https://lore.kernel.org/rust-for-linux/87v87cgsp8.fsf@metaspace.dk
[4] https://rust-for-linux.com/null-block-driver
[5] git https://github.com/metaspace/linux.git rnull-deps-v6.8
[6] git https://github.com/rust-for-Linux/linux.git rust
[7] https://rust-for-linux.com/branches#rust


Andreas Hindborg (5):
  rust: block: introduce `kernel::block::mq` module
  rust: block: introduce `kernel::block::bio` module
  rust: block: allow `hrtimer::Timer` in `RequestData`
  rust: block: add rnull, Rust null_blk implementation
  MAINTAINERS: add entry for Rust block device driver API

 MAINTAINERS                        |  14 ++
 block/blk-mq.c                     |   3 +-
 drivers/block/Kconfig              |   4 +
 drivers/block/Makefile             |   3 +
 drivers/block/rnull.rs             | 323 +++++++++++++++++++++++++++
 include/linux/blk-mq.h             |   1 +
 rust/bindings/bindings_helper.h    |   2 +
 rust/helpers.c                     |  46 ++++
 rust/kernel/block.rs               |   6 +
 rust/kernel/block/bio.rs           | 112 ++++++++++
 rust/kernel/block/bio/vec.rs       | 279 +++++++++++++++++++++++
 rust/kernel/block/mq.rs            | 131 +++++++++++
 rust/kernel/block/mq/gen_disk.rs   | 174 +++++++++++++++
 rust/kernel/block/mq/operations.rs | 346 +++++++++++++++++++++++++++++
 rust/kernel/block/mq/raw_writer.rs |  60 +++++
 rust/kernel/block/mq/request.rs    | 269 ++++++++++++++++++++++
 rust/kernel/block/mq/tag_set.rs    | 117 ++++++++++
 rust/kernel/error.rs               |   5 +
 rust/kernel/lib.rs                 |   1 +
 scripts/Makefile.build             |   2 +-
 20 files changed, 1896 insertions(+), 2 deletions(-)
 create mode 100644 drivers/block/rnull.rs
 create mode 100644 rust/kernel/block.rs
 create mode 100644 rust/kernel/block/bio.rs
 create mode 100644 rust/kernel/block/bio/vec.rs
 create mode 100644 rust/kernel/block/mq.rs
 create mode 100644 rust/kernel/block/mq/gen_disk.rs
 create mode 100644 rust/kernel/block/mq/operations.rs
 create mode 100644 rust/kernel/block/mq/raw_writer.rs
 create mode 100644 rust/kernel/block/mq/request.rs
 create mode 100644 rust/kernel/block/mq/tag_set.rs


base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
prerequisite-patch-id: 299c2cc48c45c8149e7cb18c6146a0308e5d0a44
prerequisite-patch-id: 5153ee1c410dbdf22a6fd40667712943b4b89a97
prerequisite-patch-id: 4d025bab9bc9741aedecc5327ad18f88f9105271
prerequisite-patch-id: a5e932c86fa6c68234764aa3d7f314e5b534b1d9
prerequisite-patch-id: aef3042976c4c678b7aa96154fc280f9061ebaf7
prerequisite-patch-id: 8bf108ad0af2a3ec89acb8d99ee1b49ca2f51c69
prerequisite-patch-id: a803b221c3232db3258406a4075558e85acefd09
prerequisite-patch-id: 5e9cbcd0dc56a83353f0e4a3b5d4e8d5b51f3160
prerequisite-patch-id: 28bae4a7fe83b36afed9892515a6dde1ea51e98a
prerequisite-patch-id: 5b5ea2a21b37afb05fdf655396a6f74d83bb99c4
prerequisite-patch-id: dc53b6ce21f74726d5d13988398c2954da07bcb6
prerequisite-patch-id: b86d2b14f1770c1d4756ca10b93efaada643e560
prerequisite-patch-id: 6a155859eb9a18afcd22a5bda3350d45d92e2fc7
prerequisite-patch-id: c8ca075008f50d3cf1781c1ea1130a8ee735e7d2
prerequisite-patch-id: b000cd190fe80dea3f4dd9172ecf8787f23b72be
prerequisite-patch-id: b72f1fc3bd44b60911d9d91ecc5a45812a75eba3
prerequisite-patch-id: 167c7770e124b9afa44bead742f90a57ac73f2d7
prerequisite-patch-id: cc24a3135b4f258f4b1ea83ac91c2be4ffe31772
-- 
2.44.0


^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [RFC PATCH 1/5] rust: block: introduce `kernel::block::mq` module
@ 2024-03-22 23:40 Benno Lossin
  2024-03-23  6:32 ` Andreas Hindborg
  0 siblings, 1 reply; 50+ messages in thread
From: Benno Lossin @ 2024-03-22 23:40 UTC (permalink / raw)
  To: Andreas Hindborg, Jens Axboe, Christoph Hellwig, Keith Busch,
	Damien Le Moal, Bart Van Assche, Hannes Reinecke, linux-block
  Cc: Andreas Hindborg, Wedson Almeida Filho, Niklas Cassel, Greg KH,
	Matthew Wilcox, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Alice Ryhl, Chaitanya Kulkarni,
	Luis Chamberlain, Yexuan Yang, Sergio González Collado,
	Joel Granados, Pankaj Raghav (Samsung),
	Daniel Gomez, open list, rust-for-linux, lsf-pc, gost.dev

On 3/13/24 12:05, Andreas Hindborg wrote:
> From: Andreas Hindborg <a.hindborg@samsung.com>
> 
> Add initial abstractions for working with blk-mq.
> 
> This patch is a maintained, refactored subset of code originally published by
> Wedson Almeida Filho <wedsonaf@gmail.com> [1].
> 
> [1] https://github.com/wedsonaf/linux/tree/f2cfd2fe0e2ca4e90994f96afe268bbd4382a891/rust/kernel/blk/mq.rs
> 
> Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
> Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com>
> ---
>   block/blk-mq.c                     |   3 +-
>   include/linux/blk-mq.h             |   1 +
>   rust/bindings/bindings_helper.h    |   2 +
>   rust/helpers.c                     |  45 ++++
>   rust/kernel/block.rs               |   5 +
>   rust/kernel/block/mq.rs            | 131 +++++++++++
>   rust/kernel/block/mq/gen_disk.rs   | 174 +++++++++++++++
>   rust/kernel/block/mq/operations.rs | 346 +++++++++++++++++++++++++++++
>   rust/kernel/block/mq/raw_writer.rs |  60 +++++
>   rust/kernel/block/mq/request.rs    | 182 +++++++++++++++
>   rust/kernel/block/mq/tag_set.rs    | 117 ++++++++++
>   rust/kernel/error.rs               |   5 +
>   rust/kernel/lib.rs                 |   1 +
>   13 files changed, 1071 insertions(+), 1 deletion(-)

Do you think that it's possible to split this into smaller
patches? It would make review a lot easier.

>   create mode 100644 rust/kernel/block.rs
>   create mode 100644 rust/kernel/block/mq.rs
>   create mode 100644 rust/kernel/block/mq/gen_disk.rs
>   create mode 100644 rust/kernel/block/mq/operations.rs
>   create mode 100644 rust/kernel/block/mq/raw_writer.rs
>   create mode 100644 rust/kernel/block/mq/request.rs
>   create mode 100644 rust/kernel/block/mq/tag_set.rs

[...]

> diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs
> new file mode 100644
> index 000000000000..4c93317a568a
> --- /dev/null
> +++ b/rust/kernel/block.rs
> @@ -0,0 +1,5 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Types for working with the block layer

Missing '.'.

> +
> +pub mod mq;
> diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
> new file mode 100644
> index 000000000000..08de1cc114ff
> --- /dev/null
> +++ b/rust/kernel/block/mq.rs
> @@ -0,0 +1,131 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! This module provides types for implementing block drivers that interface the
> +//! blk-mq subsystem.
> +//!
> +//! To implement a block device driver, a Rust module must do the following:
> +//!
> +//! - Implement [`Operations`] for a type `T`

I think it would be nicer to use `Driver` (or `MyBlkDevice`) instead of
`T`.

> +//! - Create a [`TagSet<T>`]
> +//! - Create a [`GenDisk<T>`], passing in the `TagSet` reference
> +//! - Add the disk to the system by calling [`GenDisk::add`]
> +//!
> +//! The types available in this module that have direct C counterparts are:
> +//!
> +//! - The `TagSet` type that abstracts the C type `struct tag_set`.
> +//! - The `GenDisk` type that abstracts the C type `struct gendisk`.
> +//! - The `Request` type that abstracts the C type `struct request`.
> +//!
> +//! Many of the C types that this module abstracts allow a driver to carry
> +//! private data, either embedded in the stuct directly, or as a C `void*`. In
> +//! these abstractions, this data is typed. The types of the data is defined by
> +//! associated types in `Operations`, see [`Operations::RequestData`] for an
> +//! example.
> +//!
> +//! The kernel will interface with the block evice driver by calling the method

Typo: "block evice driver"

> +//! implementations of the `Operations` trait.
> +//!
> +//! IO requests are passed to the driver as [`Request`] references. The
> +//! `Request` type is a wrapper around the C `struct request`. The driver must
> +//! mark start of request processing by calling [`Request::start`] and end of
> +//! processing by calling one of the [`Request::end`], methods. Failure to do so
> +//! can lead to IO failures.

I am unfamiliar with this, what are "IO failures"?
Do you think that it might be better to change the API to use a
callback? So instead of calling start and end, you would do

     request.handle(|req| {
         // do the stuff that would be done between start and end
     });

I took a quick look at the rnull driver and there you are calling
`Request::end_ok` from a different function. So my suggestion might not
be possible, since you really need the freedom.

Do you think that a guard approach might work better? ie `start` returns
a guard that when dropped will call `end` and you need the guard to
operate on the request.

> +//!
> +//! The `TagSet` is responsible for creating and maintaining a mapping between
> +//! `Request`s and integer ids as well as carrying a pointer to the vtable
> +//! generated by `Operations`. This mapping is useful for associating
> +//! completions from hardware with the correct `Request` instance. The `TagSet`
> +//! determines the maximum queue depth by setting the number of `Request`
> +//! instances available to the driver, and it determines the number of queues to
> +//! instantiate for the driver. If possible, a driver should allocate one queue
> +//! per core, to keep queue data local to a core.
> +//!
> +//! One `TagSet` instance can be shared between multiple `GenDisk` instances.
> +//! This can be useful when implementing drivers where one piece of hardware
> +//! with one set of IO resources are represented to the user as multiple disks.
> +//!
> +//! One significant difference between block device drivers implemented with
> +//! these Rust abstractions and drivers implemented in C, is that the Rust
> +//! drivers have to own a reference count on the `Request` type when the IO is
> +//! in flight. This is to ensure that the C `struct request` instances backing
> +//! the Rust `Request` instances are live while the Rust driver holds a
> +//! reference to the `Request`. In addition, the conversion of an ineger tag to

Typo: "of an ineger tag"

> +//! a `Request` via the `TagSet` would not be sound without this bookkeeping.
> +//!
> +//! # ⚠ Note
> +//!
> +//! For Rust block device drivers, the point in time where a request
> +//! is freed and made available for recycling is usualy at the point in time
> +//! when the last `ARef<Request>` is dropped. For C drivers, this event usually
> +//! occurs when `bindings::blk_mq_end_request` is called.
> +//!
> +//! # Example
> +//!
> +//! ```rust
> +//! use kernel::{
> +//!     block::mq::*,
> +//!     new_mutex,
> +//!     prelude::*,
> +//!     sync::{Arc, Mutex},
> +//!     types::{ARef, ForeignOwnable},
> +//! };
> +//!
> +//! struct MyBlkDevice;
> +//!
> +//! #[vtable]
> +//! impl Operations for MyBlkDevice {
> +//!     type RequestData = ();
> +//!     type RequestDataInit = impl PinInit<()>;
> +//!     type QueueData = ();
> +//!     type HwData = ();
> +//!     type TagSetData = ();
> +//!
> +//!     fn new_request_data(
> +//!         _tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,
> +//!     ) -> Self::RequestDataInit {
> +//!         kernel::init::zeroed()
> +//!     }
> +//!
> +//!     fn queue_rq(_hw_data: (), _queue_data: (), rq: ARef<Request<Self>>, _is_last: bool) -> Result {
> +//!         rq.start();
> +//!         rq.end_ok();
> +//!         Ok(())
> +//!     }
> +//!
> +//!     fn commit_rqs(
> +//!         _hw_data: <Self::HwData as ForeignOwnable>::Borrowed<'_>,
> +//!         _queue_data: <Self::QueueData as ForeignOwnable>::Borrowed<'_>,
> +//!     ) {
> +//!     }
> +//!
> +//!     fn complete(rq: &Request<Self>) {
> +//!         rq.end_ok();
> +//!     }
> +//!
> +//!     fn init_hctx(
> +//!         _tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,
> +//!         _hctx_idx: u32,
> +//!     ) -> Result<Self::HwData> {
> +//!         Ok(())
> +//!     }
> +//! }
> +//!
> +//! let tagset: Arc<TagSet<MyBlkDevice>> = Arc::pin_init(TagSet::try_new(1, (), 256, 1))?;
> +//! let mut disk = GenDisk::try_new(tagset, ())?;
> +//! disk.set_name(format_args!("myblk"))?;
> +//! disk.set_capacity_sectors(4096);
> +//! disk.add()?;
> +//!
> +//! # Ok::<(), kernel::error::Error>(())
> +//! ```

This piece of documentation is **really** valuable, thanks a lot for
taking the time to write it.

> +
> +mod gen_disk;
> +mod operations;
> +mod raw_writer;
> +mod request;
> +mod tag_set;
> +
> +pub use gen_disk::GenDisk;
> +pub use operations::Operations;
> +pub use request::{Request, RequestDataRef};
> +pub use tag_set::TagSet;
> diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
> new file mode 100644
> index 000000000000..b7845fc9e39f
> --- /dev/null
> +++ b/rust/kernel/block/mq/gen_disk.rs
> @@ -0,0 +1,174 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Generic disk abstraction.
> +//!
> +//! C header: [`include/linux/blkdev.h`](srctree/include/linux/blkdev.h)
> +//! C header: [`include/linux/blk_mq.h`](srctree/include/linux/blk_mq.h)
> +
> +use crate::block::mq::{raw_writer::RawWriter, Operations, TagSet};
> +use crate::{
> +    bindings, error::from_err_ptr, error::Result, sync::Arc, types::ForeignOwnable,
> +    types::ScopeGuard,
> +};
> +use core::fmt::{self, Write};
> +
> +/// A generic block device
> +///
> +/// # Invariants
> +///
> +///  - `gendisk` must always point to an initialized and valid `struct gendisk`.
> +pub struct GenDisk<T: Operations> {
> +    _tagset: Arc<TagSet<T>>,
> +    gendisk: *mut bindings::gendisk,
> +}
> +
> +// SAFETY: `GenDisk` is an owned pointer to a `struct gendisk` and an `Arc` to a
> +// `TagSet` It is safe to send this to other threads as long as T is Send.
> +unsafe impl<T: Operations + Send> Send for GenDisk<T> {}
> +
> +impl<T: Operations> GenDisk<T> {
> +    /// Try to create a new `GenDisk`
> +    pub fn try_new(tagset: Arc<TagSet<T>>, queue_data: T::QueueData) -> Result<Self> {
> +        let data = queue_data.into_foreign();
> +        let recover_data = ScopeGuard::new(|| {
> +            // SAFETY: T::QueueData was created by the call to `into_foreign()` above
> +            unsafe { T::QueueData::from_foreign(data) };
> +        });
> +
> +        let lock_class_key = crate::sync::LockClassKey::new();
> +
> +        // SAFETY: `tagset.raw_tag_set()` points to a valid and initialized tag set
> +        let gendisk = from_err_ptr(unsafe {
> +            bindings::__blk_mq_alloc_disk(
> +                tagset.raw_tag_set(),
> +                data.cast_mut(),
> +                lock_class_key.as_ptr(),
> +            )
> +        })?;
> +
> +        const TABLE: bindings::block_device_operations = bindings::block_device_operations {
> +            submit_bio: None,
> +            open: None,
> +            release: None,
> +            ioctl: None,
> +            compat_ioctl: None,
> +            check_events: None,
> +            unlock_native_capacity: None,
> +            getgeo: None,
> +            set_read_only: None,
> +            swap_slot_free_notify: None,
> +            report_zones: None,
> +            devnode: None,
> +            alternative_gpt_sector: None,
> +            get_unique_id: None,
> +            // TODO: Set to THIS_MODULE. Waiting for const_refs_to_static feature to be merged
> +            // https://github.com/rust-lang/rust/issues/119618
> +            owner: core::ptr::null_mut(),
> +            pr_ops: core::ptr::null_mut(),
> +            free_disk: None,
> +            poll_bio: None,
> +        };
> +
> +        // SAFETY: gendisk is a valid pointer as we initialized it above
> +        unsafe { (*gendisk).fops = &TABLE };
> +
> +        recover_data.dismiss();
> +        Ok(Self {
> +            _tagset: tagset,
> +            gendisk,

Missing INVARIANT comment.

> +        })
> +    }
> +
> +    /// Set the name of the device

Missing '.'.

> +    pub fn set_name(&mut self, args: fmt::Arguments<'_>) -> Result {
> +        let mut raw_writer = RawWriter::from_array(
> +            // SAFETY: By type invariant `self.gendisk` points to a valid and initialized instance
> +            unsafe { &mut (*self.gendisk).disk_name },

To create a `&mut` reference, you need exclusive access, it should be
sufficient to add to the invariant that `gendisk` is owned/unique.

> +        );
> +        raw_writer.write_fmt(args)?;
> +        raw_writer.write_char('\0')?;
> +        Ok(())
> +    }

[...]

> +impl<T: Operations> Drop for GenDisk<T> {
> +    fn drop(&mut self) {
> +        // SAFETY: By type invariant of `Self`, `self.gendisk` points to a valid
> +        // and initialized instance of `struct gendisk`. As such, `queuedata`
> +        // was initialized by the initializer returned by `try_new` with a call
> +        // to `ForeignOwnable::into_foreign`.

This should also be an invariant of `GenDisk`.

> +        let queue_data = unsafe { (*(*self.gendisk).queue).queuedata };
> +
> +        // SAFETY: By type invariant, `self.gendisk` points to a valid and
> +        // initialized instance of `struct gendisk`
> +        unsafe { bindings::del_gendisk(self.gendisk) };
> +
> +        // SAFETY: `queue.queuedata` was created by `GenDisk::try_new()` with a
> +        // call to `ForeignOwnable::into_pointer()` to create `queuedata`.
> +        // `ForeignOwnable::from_foreign()` is only called here.
> +        let _queue_data = unsafe { T::QueueData::from_foreign(queue_data) };
> +    }
> +}
> diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/operations.rs
> new file mode 100644
> index 000000000000..53c6ad663208
> --- /dev/null
> +++ b/rust/kernel/block/mq/operations.rs
> @@ -0,0 +1,346 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! This module provides an interface for blk-mq drivers to implement.
> +//!
> +//! C header: [`include/linux/blk-mq.h`](srctree/include/linux/blk-mq.h)
> +
> +use crate::{
> +    bindings,
> +    block::mq::Request,
> +    error::{from_result, Result},
> +    init::PinInit,
> +    types::{ARef, ForeignOwnable},
> +};
> +use core::{marker::PhantomData, ptr::NonNull};
> +
> +use super::TagSet;
> +
> +/// Implement this trait to interface blk-mq as block devices
> +#[macros::vtable]
> +pub trait Operations: Sized {
> +    /// Data associated with a request. This data is located next to the request
> +    /// structure.
> +    ///
> +    /// To be able to handle accessing this data from interrupt context, this
> +    /// data must be `Sync`.
> +    type RequestData: Sized + Sync;
> +
> +    /// Initializer for `Self::RequestDta`. Used to initialize private data area
> +    /// when requst structure is allocated.
> +    type RequestDataInit: PinInit<Self::RequestData>;

Just to let you know, this dance with the associated types is not needed
any longer. RPITIT (return position impl trait in trait) has been
stabilized in 1.75 and you should be able to just write this:

     fn new_request_data(
         //rq: ARef<Request<Self>>,
         tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,
     ) -> impl PinInit<Self::RequestData>;


> +
> +    /// Data associated with the `struct request_queue` that is allocated for
> +    /// the `GenDisk` associated with this `Operations` implementation.
> +    type QueueData: ForeignOwnable;
> +
> +    /// Data associated with a dispatch queue. This is stored as a pointer in
> +    /// the C `struct blk_mq_hw_ctx` that represents a hardware queue.
> +    type HwData: ForeignOwnable;
> +
> +    /// Data associated with a `TagSet`. This is stored as a pointer in `struct
> +    /// blk_mq_tag_set`.
> +    type TagSetData: ForeignOwnable;
> +
> +    /// Called by the kernel to get an initializer for a `Pin<&mut RequestData>`.
> +    fn new_request_data(
> +        //rq: ARef<Request<Self>>,
> +        tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,

Since you need to use this pattern a lot, it might be a good idea to
introduce a type alias to help with this:

     type ForeignBorrowed<'a, T> = <T as ForeignOwnable>::Borrowed<'a>;

What do the others think?

The function would then become (with the RPITIT improvement as well):

     fn new_request_data(
         //rq: ARef<Request<Self>>,
         tagset_data: ForeignBorrowed<'_, Self::TagSetData>,
     ) -> impl PinInit<Self::RequestData>;


> +    ) -> Self::RequestDataInit;
> +
> +    /// Called by the kernel to queue a request with the driver. If `is_last` is
> +    /// `false`, the driver is allowed to defer commiting the request.
> +    fn queue_rq(
> +        hw_data: <Self::HwData as ForeignOwnable>::Borrowed<'_>,
> +        queue_data: <Self::QueueData as ForeignOwnable>::Borrowed<'_>,
> +        rq: ARef<Request<Self>>,
> +        is_last: bool,
> +    ) -> Result;
> +
> +    /// Called by the kernel to indicate that queued requests should be submitted
> +    fn commit_rqs(
> +        hw_data: <Self::HwData as ForeignOwnable>::Borrowed<'_>,
> +        queue_data: <Self::QueueData as ForeignOwnable>::Borrowed<'_>,
> +    );
> +
> +    /// Called by the kernel when the request is completed
> +    fn complete(_rq: &Request<Self>);
> +
> +    /// Called by the kernel to allocate and initialize a driver specific hardware context data
> +    fn init_hctx(
> +        tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,
> +        hctx_idx: u32,
> +    ) -> Result<Self::HwData>;
> +
> +    /// Called by the kernel to poll the device for completed requests. Only
> +    /// used for poll queues.
> +    fn poll(_hw_data: <Self::HwData as ForeignOwnable>::Borrowed<'_>) -> bool {
> +        crate::build_error(crate::error::VTABLE_DEFAULT_ERROR)
> +    }
> +
> +    /// Called by the kernel to map submission queues to CPU cores.
> +    fn map_queues(_tag_set: &TagSet<Self>) {
> +        crate::build_error(crate::error::VTABLE_DEFAULT_ERROR)
> +    }
> +
> +    // There is no need for exit_request() because `drop` will be called.

I think it would be a good idea to mention this in the documentation of
the trait.

> +}

[...]

> diff --git a/rust/kernel/block/mq/raw_writer.rs b/rust/kernel/block/mq/raw_writer.rs
> new file mode 100644
> index 000000000000..f7857740af29
> --- /dev/null
> +++ b/rust/kernel/block/mq/raw_writer.rs
> @@ -0,0 +1,60 @@
> +use core::{
> +    fmt::{self, Write},
> +    marker::PhantomData,
> +};
> +
> +/// A mutable reference to a byte buffer where a string can be written into
> +///
> +/// # Invariants
> +///
> +/// * `ptr` is not aliased and valid for read and write for `len` bytes

You probably also want to add "for the duration of `'a`".

> +///
> +pub(crate) struct RawWriter<'a> {
> +    ptr: *mut u8,
> +    len: usize,
> +    _p: PhantomData<&'a ()>,
> +}
> +
> +impl<'a> RawWriter<'a> {
> +    /// Create a new `RawWriter` instance.
> +    ///
> +    /// # Safety
> +    ///
> +    /// * `ptr` must be valid for read and write for `len` consecutive `u8` elements
> +    /// * `ptr` must not be aliased
> +    unsafe fn new(ptr: *mut u8, len: usize) -> RawWriter<'a> {
> +        Self {
> +            ptr,
> +            len,
> +            _p: PhantomData,
> +        }
> +    }

Since this function is not used (except in the function below), what is
the reason for using a raw pointer?
I looked in your other patches, but did not find another user, so could
this be improved by using mutable references?

> +
> +    pub(crate) fn from_array<const N: usize>(a: &'a mut [core::ffi::c_char; N]) -> RawWriter<'a> {
> +        // SAFETY: the buffer of `a` is valid for read and write for at least `N` bytes
> +        unsafe { Self::new(a.as_mut_ptr().cast::<u8>(), N) }
> +    }
> +}
> +
> +impl Write for RawWriter<'_> {
> +    fn write_str(&mut self, s: &str) -> fmt::Result {
> +        let bytes = s.as_bytes();
> +        let len = bytes.len();
> +        if len > self.len {
> +            return Err(fmt::Error);
> +        }
> +
> +        // SAFETY:
> +        // * `bytes` is valid for reads of `bytes.len()` size because we hold a shared reference to `s`
> +        // * By type invariant `self.ptr` is valid for writes for at lest `self.len` bytes
> +        // * The regions are not overlapping as `ptr` is not aliased
> +        unsafe { core::ptr::copy_nonoverlapping(&bytes[0], self.ptr, len) };
> +
> +        // SAFETY: By type invariant of `Self`, `ptr` is in bounds of an
> +        // allocation. Also by type invariant, the pointer resulting from this
> +        // addition is also in bounds.
> +        self.ptr = unsafe { self.ptr.add(len) };
> +        self.len -= len;
> +        Ok(())
> +    }
> +}
> diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
> new file mode 100644
> index 000000000000..b4dacac5e091
> --- /dev/null
> +++ b/rust/kernel/block/mq/request.rs
> @@ -0,0 +1,182 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! This module provides a wrapper for the C `struct request` type.
> +//!
> +//! C header: [`include/linux/blk-mq.h`](srctree/include/linux/blk-mq.h)
> +
> +use crate::{
> +    bindings,
> +    block::mq::Operations,
> +    error::{Error, Result},
> +    types::{ARef, AlwaysRefCounted, Opaque},
> +};
> +use core::{ffi::c_void, marker::PhantomData, ops::Deref};
> +
> +/// A wrapper around a blk-mq `struct request`. This represents an IO request.
> +///
> +/// # Invariants
> +///
> +/// * `self.0` is a valid `struct request` created by the C portion of the kernel
> +/// * `self` is reference counted. a call to `req_ref_inc_not_zero` keeps the
> +///    instance alive at least until a matching call to `req_ref_put_and_test`
> +///
> +#[repr(transparent)]
> +pub struct Request<T: Operations>(Opaque<bindings::request>, PhantomData<T>);
> +
> +impl<T: Operations> Request<T> {
> +    /// Create a `&mut Request` from a `bindings::request` pointer
> +    ///
> +    /// # Safety
> +    ///
> +    /// * `ptr` must be aligned and point to a valid `bindings::request` instance
> +    /// * Caller must ensure that the pointee of `ptr` is live and owned
> +    ///   exclusively by caller for at least `'a`
> +    ///
> +    pub(crate) unsafe fn from_ptr_mut<'a>(ptr: *mut bindings::request) -> &'a mut Self {
> +        // SAFETY:
> +        // * The cast is valid as `Self` is transparent.
> +        // * By safety requirements of this function, the reference will be
> +        //   valid for 'a.
> +        unsafe { &mut *(ptr.cast::<Self>()) }
> +    }
> +
> +    /// Get the command identifier for the request
> +    pub fn command(&self) -> u32 {
> +        // SAFETY: By C API contract and type invariant, `cmd_flags` is valid for read
> +        unsafe { (*self.0.get()).cmd_flags & ((1 << bindings::REQ_OP_BITS) - 1) }
> +    }
> +
> +    /// Call this to indicate to the kernel that the request has been issued by the driver

I am a bit confused, is this not supposed to signal that the processing
of the request is going to start now? cf C documentation:

     /**
      * blk_mq_start_request - Start processing a request
      * @rq: Pointer to request to be started
      *
      * Function used by device drivers to notify the block layer that a request
      * is going to be processed now, so blk layer can do proper initializations
      * such as starting the timeout timer.
      */

> +    pub fn start(&self) {
> +        // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
> +        // existence of `&mut self` we have exclusive access.
> +        unsafe { bindings::blk_mq_start_request(self.0.get()) };
> +    }
> +
> +    /// Call this to indicate to the kernel that the request has been completed without errors

I dislike the "Call this", it feels redundant, what about "Signal the
block layer that the request has been completed without errors.".

Also with the other functions below.

> +    pub fn end_ok(&self) {
> +        // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
> +        // existence of `&mut self` we have exclusive access.
> +        unsafe { bindings::blk_mq_end_request(self.0.get(), bindings::BLK_STS_OK as _) };
> +    }
> +
> +    /// Call this to indicate to the kernel that the request completed with an error
> +    pub fn end_err(&self, err: Error) {
> +        // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
> +        // existence of `&mut self` we have exclusive access.
> +        unsafe { bindings::blk_mq_end_request(self.0.get(), err.to_blk_status()) };
> +    }
> +
> +    /// Call this to indicate that the request completed with the status indicated by `status`
> +    pub fn end(&self, status: Result) {
> +        if let Err(e) = status {
> +            self.end_err(e);
> +        } else {
> +            self.end_ok();
> +        }
> +    }
> +
> +    /// Call this to schedule defered completion of the request
> +    pub fn complete(&self) {
> +        // SAFETY: By type invariant, `self.0` is a valid `struct request`
> +        if !unsafe { bindings::blk_mq_complete_request_remote(self.0.get()) } {
> +            T::complete(self);
> +        }
> +    }
> +
> +    /// Get the target sector for the request
> +    #[inline(always)]
> +    pub fn sector(&self) -> usize {
> +        // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
> +        unsafe { (*self.0.get()).__sector as usize }
> +    }
> +
> +    /// Returns an owned reference to the per-request data associated with this
> +    /// request
> +    pub fn owned_data_ref(request: ARef<Self>) -> RequestDataRef<T> {
> +        RequestDataRef::new(request)
> +    }
> +
> +    /// Returns a reference to the oer-request data associated with this request

Typo: "the oer-request"

> +    pub fn data_ref(&self) -> &T::RequestData {
> +        let request_ptr = self.0.get().cast::<bindings::request>();
> +
> +        // SAFETY: `request_ptr` is a valid `struct request` because `ARef` is
> +        // `repr(transparent)`
> +        let p: *mut c_void = unsafe { bindings::blk_mq_rq_to_pdu(request_ptr) };
> +
> +        let p = p.cast::<T::RequestData>();
> +
> +        // SAFETY: By C API contract, `p` is initialized by a call to
> +        // `OperationsVTable::init_request_callback()`. By existence of `&self`
> +        // it must be valid for use as a shared reference.
> +        unsafe { &*p }
> +    }
> +}
> +
> +// SAFETY: It is impossible to obtain an owned or mutable `Request`, so we can

What do you mean by "mutable `Request`"? There is the function to obtain
a `&mut Request`.
Also this should probably be an invariant if you use it as a
justification here.

> +// mark it `Send`.
> +unsafe impl<T: Operations> Send for Request<T> {}
> +
> +// SAFETY: `Request` references can be shared across threads.

Does not explain why that is the case.

> +unsafe impl<T: Operations> Sync for Request<T> {}
> +
> +/// An owned reference to a `Request<T>`
> +#[repr(transparent)]
> +pub struct RequestDataRef<T: Operations> {
> +    request: ARef<Request<T>>,
> +}

Is this extra type really needed? I have not yet taken a look at patch 3,
which uses this. But would it hurt if you implemented the traits
there directly on `ARef<Request<T>>`?

> +
> +impl<T> RequestDataRef<T>
> +where
> +    T: Operations,
> +{
> +    /// Create a new instance.
> +    fn new(request: ARef<Request<T>>) -> Self {
> +        Self { request }
> +    }
> +
> +    /// Get a reference to the underlying request
> +    pub fn request(&self) -> &Request<T> {
> +        &self.request
> +    }
> +}

I really like how you improved the safety comments and documentation. It
was a lot easier to wrap my head around this time (might also me getting
more experienced, but I think it helped a lot).

-- 
Cheers,
Benno


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2024-04-05  9:40 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-13 11:05 [RFC PATCH 0/5] Rust block device driver API and null block driver Andreas Hindborg
2024-03-13 11:05 ` [RFC PATCH 1/5] rust: block: introduce `kernel::block::mq` module Andreas Hindborg
2024-03-13 23:55   ` Boqun Feng
2024-03-14  8:58     ` Andreas Hindborg
2024-03-14 18:55       ` Miguel Ojeda
2024-03-14 19:22         ` Andreas Hindborg
2024-03-14 19:41           ` Andreas Hindborg
2024-03-14 19:41           ` Miguel Ojeda
2024-03-14 20:56             ` Miguel Ojeda
2024-03-15  7:52             ` Andreas Hindborg
2024-03-15 12:17               ` Ming Lei
2024-03-15 12:46                 ` Andreas Hindborg
2024-03-15 15:24                   ` Ming Lei
2024-03-15 17:49                     ` Andreas Hindborg
2024-03-16 14:48                       ` Ming Lei
2024-03-16 17:27                         ` Andreas Hindborg
2024-03-13 11:05 ` [RFC PATCH 2/5] rust: block: introduce `kernel::block::bio` module Andreas Hindborg
2024-03-13 11:05 ` [RFC PATCH 3/5] rust: block: allow `hrtimer::Timer` in `RequestData` Andreas Hindborg
2024-03-23 10:51   ` Benno Lossin
2024-04-02 12:43     ` Andreas Hindborg
2024-03-13 11:05 ` [RFC PATCH 4/5] rust: block: add rnull, Rust null_blk implementation Andreas Hindborg
2024-03-23 11:33   ` Benno Lossin
2024-04-02 12:52     ` Andreas Hindborg
2024-04-02 22:35       ` Benno Lossin
2024-04-03  9:47         ` Andreas Hindborg
2024-04-03 10:29           ` Benno Lossin
2024-03-13 11:05 ` [RFC PATCH 5/5] MAINTAINERS: add entry for Rust block device driver API Andreas Hindborg
2024-03-13 18:02 ` [RFC PATCH 0/5] Rust block device driver API and null block driver Bart Van Assche
2024-03-13 18:22   ` Boqun Feng
2024-03-13 19:03     ` Andreas Hindborg
2024-03-13 19:11       ` Bart Van Assche
2024-03-13 19:12   ` Matthew Wilcox
2024-03-14 12:14   ` Philipp Stanner
2024-03-14 17:03     ` Bart Van Assche
2024-03-14 17:16       ` Conor Dooley
2024-03-14 17:43       ` Andreas Hindborg
2024-03-17  2:50 ` Matthew Wilcox
2024-03-17  7:09   ` Andreas Hindborg
2024-03-17 21:34     ` Matthew Wilcox
2024-03-22 23:40 [RFC PATCH 1/5] rust: block: introduce `kernel::block::mq` module Benno Lossin
2024-03-23  6:32 ` Andreas Hindborg
2024-04-02 23:09   ` Benno Lossin
2024-04-03  8:46     ` Andreas Hindborg
2024-04-03 19:37       ` Benno Lossin
2024-04-04  5:44         ` Andreas Hindborg
2024-04-04  8:46           ` Benno Lossin
2024-04-04  9:30             ` Andreas Hindborg
2024-04-04 13:20               ` Benno Lossin
2024-04-05  8:43                 ` Andreas Hindborg
2024-04-05  9:40                   ` Benno Lossin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).