rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/30] Rust abstractions for VFS
@ 2024-05-14 13:16 Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 01/30] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
                   ` (30 more replies)
  0 siblings, 31 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

This series introduces Rust abstractions that allow read-only file systems to
be written in Rust.

There are three file systems implementations using these abstractions
abstractions: ext2, tarfs, and puzzlefs. The first two are part of this series.

Rust file system modules can be declared with the `module_fs` macro and are
required to implement the following functions (which are part of the
`FileSystem` trait):

    fn fill_super(
        sb: &mut SuperBlock<Self, sb::New>,
        mapper: Option<inode::Mapper>,
    ) -> Result<Self::Data>;

    fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>>;

They can optionally implement the following:

    fn read_xattr(
        _dentry: &DEntry<Self>,
        _inode: &INode<Self>,
        _name: &CStr,
        _outbuf: &mut [u8],
    ) -> Result<usize>;

    fn statfs(_dentry: &DEntry<Self>) -> Result<Stat>;

They may also choose the type of the data they can attach to superblocks and/or
inodes.

Lastly, file systems can implement inode, file, and address space operations
and attach them to inodes when they're created, similar to how C does it. They
can get a ro address space operations table from an implementation of iomap
operations, to be used with generic ro file operations.

A git tree is available here:
    git://github.com/wedsonaf/linux.git vfs-v2

Web:
    https://github.com/wedsonaf/linux/commits/vfs-v2

---

Changes in v2:

- Rebased to latest rust-next tree
- Removed buffer heads
- Added iomap support
- Removed `_pin` field from `Registration` as it's not needed anymore
- Renamed sample filesystem to match the module's name
- Using typestate instead of a separate type for superblock/new-superblock
- Created separate submodules for superblocks, inodes, dentries, and files
- Split out operations from FileSystem to inode/file/address_space ops, similar to how C does it
- Removed usages of folio_set_error
- Removed UniqueFolio, for now reading blocks from devices via the pagecache
- Changed map() to return the entire folio if not in highmem
- Added support for unlocking the folio asynchronously
- Added `from_raw` to all new ref-counted types
- Added explicit types in calls to cast()
- Added typestate to folio
- Added support for implementing get_link
- Fixed data race when reading inode->i_state
- Added nofs scope support during allocation
- Link to v1: https://lore.kernel.org/rust-for-linux/20231018122518.128049-1-wedsonaf@gmail.com/

---

Wedson Almeida Filho (30):
  rust: fs: add registration/unregistration of file systems
  rust: fs: introduce the `module_fs` macro
  samples: rust: add initial ro file system sample
  rust: fs: introduce `FileSystem::fill_super`
  rust: fs: introduce `INode<T>`
  rust: fs: introduce `DEntry<T>`
  rust: fs: introduce `FileSystem::init_root`
  rust: file: move `kernel::file` to `kernel::fs::file`
  rust: fs: generalise `File` for different file systems
  rust: fs: add empty file operations
  rust: fs: introduce `file::Operations::read_dir`
  rust: fs: introduce `file::Operations::seek`
  rust: fs: introduce `file::Operations::read`
  rust: fs: add empty inode operations
  rust: fs: introduce `inode::Operations::lookup`
  rust: folio: introduce basic support for folios
  rust: fs: add empty address space operations
  rust: fs: introduce `address_space::Operations::read_folio`
  rust: fs: introduce `FileSystem::read_xattr`
  rust: fs: introduce `FileSystem::statfs`
  rust: fs: introduce more inode types
  rust: fs: add per-superblock data
  rust: fs: allow file systems backed by a block device
  rust: fs: allow per-inode data
  rust: fs: export file type from mode constants
  rust: fs: allow populating i_lnk
  rust: fs: add `iomap` module
  rust: fs: add memalloc_nofs support
  tarfs: introduce tar fs
  WIP: fs: ext2: add rust ro ext2 implementation

 fs/Kconfig                        |   2 +
 fs/Makefile                       |   2 +
 fs/rust-ext2/Kconfig              |  13 +
 fs/rust-ext2/Makefile             |   8 +
 fs/rust-ext2/defs.rs              | 173 +++++++
 fs/rust-ext2/ext2.rs              | 551 +++++++++++++++++++++
 fs/tarfs/Kconfig                  |  15 +
 fs/tarfs/Makefile                 |   8 +
 fs/tarfs/defs.rs                  |  80 +++
 fs/tarfs/tar.rs                   | 394 +++++++++++++++
 rust/bindings/bindings_helper.h   |  11 +
 rust/helpers.c                    | 182 +++++++
 rust/kernel/block.rs              |  10 +-
 rust/kernel/error.rs              |   8 +-
 rust/kernel/file.rs               | 251 ----------
 rust/kernel/folio.rs              | 305 ++++++++++++
 rust/kernel/fs.rs                 | 492 +++++++++++++++++++
 rust/kernel/fs/address_space.rs   |  90 ++++
 rust/kernel/fs/dentry.rs          | 136 ++++++
 rust/kernel/fs/file.rs            | 607 +++++++++++++++++++++++
 rust/kernel/fs/inode.rs           | 780 ++++++++++++++++++++++++++++++
 rust/kernel/fs/iomap.rs           | 281 +++++++++++
 rust/kernel/fs/sb.rs              | 194 ++++++++
 rust/kernel/lib.rs                |   6 +-
 rust/kernel/mem_cache.rs          |   2 -
 rust/kernel/user.rs               |   1 -
 samples/rust/Kconfig              |  10 +
 samples/rust/Makefile             |   1 +
 samples/rust/rust_rofs.rs         | 202 ++++++++
 scripts/generate_rust_analyzer.py |   2 +-
 30 files changed, 4555 insertions(+), 262 deletions(-)
 create mode 100644 fs/rust-ext2/Kconfig
 create mode 100644 fs/rust-ext2/Makefile
 create mode 100644 fs/rust-ext2/defs.rs
 create mode 100644 fs/rust-ext2/ext2.rs
 create mode 100644 fs/tarfs/Kconfig
 create mode 100644 fs/tarfs/Makefile
 create mode 100644 fs/tarfs/defs.rs
 create mode 100644 fs/tarfs/tar.rs
 delete mode 100644 rust/kernel/file.rs
 create mode 100644 rust/kernel/folio.rs
 create mode 100644 rust/kernel/fs.rs
 create mode 100644 rust/kernel/fs/address_space.rs
 create mode 100644 rust/kernel/fs/dentry.rs
 create mode 100644 rust/kernel/fs/file.rs
 create mode 100644 rust/kernel/fs/inode.rs
 create mode 100644 rust/kernel/fs/iomap.rs
 create mode 100644 rust/kernel/fs/sb.rs
 create mode 100644 samples/rust/rust_rofs.rs


base-commit: 183ea65d1fcd71039cf4d111a22d69c337bfd344
-- 
2.34.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 01/30] rust: fs: add registration/unregistration of file systems
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 02/30] rust: fs: introduce the `module_fs` macro Wedson Almeida Filho
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow basic registration and unregistration of Rust file system types.
Unregistration happens automatically when a registration variable is
dropped (e.g., when it goes out of scope).

File systems registered this way are visible in `/proc/filesystems` but
cannot be mounted yet because `init_fs_context` fails.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/error.rs |  2 --
 rust/kernel/fs.rs    | 75 ++++++++++++++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs   |  1 +
 3 files changed, 76 insertions(+), 2 deletions(-)
 create mode 100644 rust/kernel/fs.rs

diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index b248a4c22fb4..f4fa2847e210 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -308,8 +308,6 @@ pub(crate) fn from_err_ptr<T>(ptr: *mut T) -> Result<*mut T> {
 ///     })
 /// }
 /// ```
-// TODO: Remove `dead_code` marker once an in-kernel client is available.
-#[allow(dead_code)]
 pub(crate) fn from_result<T, F>(f: F) -> T
 where
     T: From<i16>,
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
new file mode 100644
index 000000000000..cc1ed7ed2f54
--- /dev/null
+++ b/rust/kernel/fs.rs
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Kernel file systems.
+//!
+//! This module allows Rust code to register new kernel file systems.
+//!
+//! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
+
+use crate::error::{code::*, from_result, to_result, Error};
+use crate::types::Opaque;
+use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
+use core::{ffi, pin::Pin};
+use macros::{pin_data, pinned_drop};
+
+/// A file system type.
+pub trait FileSystem {
+    /// The name of the file system type.
+    const NAME: &'static CStr;
+}
+
+/// A registration of a file system.
+#[pin_data(PinnedDrop)]
+pub struct Registration {
+    #[pin]
+    fs: Opaque<bindings::file_system_type>,
+}
+
+// SAFETY: `Registration` doesn't provide any `&self` methods, so it is safe to pass references
+// to it around.
+unsafe impl Sync for Registration {}
+
+// SAFETY: Both registration and unregistration are implemented in C and safe to be performed
+// from any thread, so `Registration` is `Send`.
+unsafe impl Send for Registration {}
+
+impl Registration {
+    /// Creates the initialiser of a new file system registration.
+    pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<Self, Error> {
+        try_pin_init!(Self {
+            fs <- Opaque::try_ffi_init(|fs_ptr: *mut bindings::file_system_type| {
+                // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write.
+                unsafe { fs_ptr.write(bindings::file_system_type::default()) };
+
+                // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write, and it has
+                // just been initialised above, so it's also valid for read.
+                let fs = unsafe { &mut *fs_ptr };
+                fs.owner = module.0;
+                fs.name = T::NAME.as_char_ptr();
+                fs.init_fs_context = Some(Self::init_fs_context_callback);
+                fs.kill_sb = Some(Self::kill_sb_callback);
+                fs.fs_flags = 0;
+
+                // SAFETY: Pointers stored in `fs` are static so will live for as long as the
+                // registration is active (it is undone in `drop`).
+                to_result(unsafe { bindings::register_filesystem(fs_ptr) })
+            }),
+        })
+    }
+
+    unsafe extern "C" fn init_fs_context_callback(_fc: *mut bindings::fs_context) -> ffi::c_int {
+        from_result(|| Err(ENOTSUPP))
+    }
+
+    unsafe extern "C" fn kill_sb_callback(_sb_ptr: *mut bindings::super_block) {}
+}
+
+#[pinned_drop]
+impl PinnedDrop for Registration {
+    fn drop(self: Pin<&mut Self>) {
+        // SAFETY: If an instance of `Self` has been successfully created, a call to
+        // `register_filesystem` has necessarily succeeded. So it's ok to call
+        // `unregister_filesystem` on the previously registered fs.
+        unsafe { bindings::unregister_filesystem(self.fs.get()) };
+    }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 4b629aa94735..e664f80b8141 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -31,6 +31,7 @@
 mod build_assert;
 pub mod error;
 pub mod file;
+pub mod fs;
 pub mod init;
 pub mod ioctl;
 #[cfg(CONFIG_KUNIT)]
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 02/30] rust: fs: introduce the `module_fs` macro
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 01/30] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 03/30] samples: rust: add initial ro file system sample Wedson Almeida Filho
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Simplify the declaration of modules that only expose a file system type.
They can now do it using the `module_fs` macro.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs.rs | 56 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index cc1ed7ed2f54..fb7a9b200b85 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -9,7 +9,7 @@
 use crate::error::{code::*, from_result, to_result, Error};
 use crate::types::Opaque;
 use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
-use core::{ffi, pin::Pin};
+use core::{ffi, marker::PhantomData, pin::Pin};
 use macros::{pin_data, pinned_drop};
 
 /// A file system type.
@@ -73,3 +73,57 @@ fn drop(self: Pin<&mut Self>) {
         unsafe { bindings::unregister_filesystem(self.fs.get()) };
     }
 }
+
+/// Kernel module that exposes a single file system implemented by `T`.
+#[pin_data]
+pub struct Module<T: FileSystem + ?Sized> {
+    #[pin]
+    fs_reg: Registration,
+    _p: PhantomData<T>,
+}
+
+impl<T: FileSystem + ?Sized + Sync + Send> crate::InPlaceModule for Module<T> {
+    fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
+        try_pin_init!(Self {
+            fs_reg <- Registration::new::<T>(module),
+            _p: PhantomData,
+        })
+    }
+}
+
+/// Declares a kernel module that exposes a single file system.
+///
+/// The `type` argument must be a type which implements the [`FileSystem`] trait. Also accepts
+/// various forms of kernel metadata.
+///
+/// # Examples
+///
+/// ```
+/// # mod module_fs_sample {
+/// use kernel::fs;
+/// use kernel::prelude::*;
+///
+/// kernel::module_fs! {
+///     type: MyFs,
+///     name: "myfs",
+///     author: "Rust for Linux Contributors",
+///     description: "My Rust fs",
+///     license: "GPL",
+/// }
+///
+/// struct MyFs;
+/// impl fs::FileSystem for MyFs {
+///     const NAME: &'static CStr = kernel::c_str!("myfs");
+/// }
+/// # }
+/// ```
+#[macro_export]
+macro_rules! module_fs {
+    (type: $type:ty, $($f:tt)*) => {
+        type ModuleType = $crate::fs::Module<$type>;
+        $crate::macros::module! {
+            type: ModuleType,
+            $($f)*
+        }
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 03/30] samples: rust: add initial ro file system sample
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 01/30] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 02/30] rust: fs: introduce the `module_fs` macro Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 04/30] rust: fs: introduce `FileSystem::fill_super` Wedson Almeida Filho
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Introduce a basic sample that for now only registers the file system and
doesn't really provide any functionality beyond having it listed in
`/proc/filesystems`. New functionality will be added to the sample in
subsequent patches as their abstractions are introduced.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 samples/rust/Kconfig      | 10 ++++++++++
 samples/rust/Makefile     |  1 +
 samples/rust/rust_rofs.rs | 19 +++++++++++++++++++
 3 files changed, 30 insertions(+)
 create mode 100644 samples/rust/rust_rofs.rs

diff --git a/samples/rust/Kconfig b/samples/rust/Kconfig
index 59f44a8b6958..2f26c5c52813 100644
--- a/samples/rust/Kconfig
+++ b/samples/rust/Kconfig
@@ -41,6 +41,16 @@ config SAMPLE_RUST_PRINT
 
 	  If unsure, say N.
 
+config SAMPLE_RUST_ROFS
+	tristate "Read-only file system"
+	help
+	  This option builds the Rust read-only file system sample.
+
+	  To compile this as a module, choose M here:
+	  the module will be called rust_rofs.
+
+	  If unsure, say N.
+
 config SAMPLE_RUST_HOSTPROGS
 	bool "Host programs"
 	help
diff --git a/samples/rust/Makefile b/samples/rust/Makefile
index 791fc18180e9..df1e4341ae95 100644
--- a/samples/rust/Makefile
+++ b/samples/rust/Makefile
@@ -3,5 +3,6 @@
 obj-$(CONFIG_SAMPLE_RUST_MINIMAL)		+= rust_minimal.o
 obj-$(CONFIG_SAMPLE_RUST_INPLACE)		+= rust_inplace.o
 obj-$(CONFIG_SAMPLE_RUST_PRINT)			+= rust_print.o
+obj-$(CONFIG_SAMPLE_RUST_ROFS)			+= rust_rofs.o
 
 subdir-$(CONFIG_SAMPLE_RUST_HOSTPROGS)		+= hostprogs
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
new file mode 100644
index 000000000000..d465b107a07d
--- /dev/null
+++ b/samples/rust/rust_rofs.rs
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Rust read-only file system sample.
+
+use kernel::prelude::*;
+use kernel::{c_str, fs};
+
+kernel::module_fs! {
+    type: RoFs,
+    name: "rust_rofs",
+    author: "Rust for Linux Contributors",
+    description: "Rust read-only file system sample",
+    license: "GPL",
+}
+
+struct RoFs;
+impl fs::FileSystem for RoFs {
+    const NAME: &'static CStr = c_str!("rust_rofs");
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 04/30] rust: fs: introduce `FileSystem::fill_super`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (2 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 03/30] samples: rust: add initial ro file system sample Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-20 19:38   ` Darrick J. Wong
  2024-05-14 13:16 ` [RFC PATCH v2 05/30] rust: fs: introduce `INode<T>` Wedson Almeida Filho
                   ` (26 subsequent siblings)
  30 siblings, 1 reply; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to initialise superblocks, which allows them
to be mounted (though they are still empty).

Some scaffolding code is added to create an empty directory as the root.
It is replaced by proper inode creation in a subsequent patch in this
series.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |   5 ++
 rust/kernel/fs.rs               | 147 ++++++++++++++++++++++++++++++--
 rust/kernel/fs/sb.rs            |  50 +++++++++++
 samples/rust/rust_rofs.rs       |   6 ++
 4 files changed, 202 insertions(+), 6 deletions(-)
 create mode 100644 rust/kernel/fs/sb.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 1bef4dff3019..dabb5a787e0d 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -12,6 +12,7 @@
 #include <linux/ethtool.h>
 #include <linux/file.h>
 #include <linux/fs.h>
+#include <linux/fs_context.h>
 #include <linux/jiffies.h>
 #include <linux/mdio.h>
 #include <linux/phy.h>
@@ -32,3 +33,7 @@ const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO;
 
 const slab_flags_t RUST_CONST_HELPER_SLAB_RECLAIM_ACCOUNT = SLAB_RECLAIM_ACCOUNT;
 const slab_flags_t RUST_CONST_HELPER_SLAB_ACCOUNT = SLAB_ACCOUNT;
+
+const unsigned long RUST_CONST_HELPER_SB_RDONLY = SB_RDONLY;
+
+const loff_t RUST_CONST_HELPER_MAX_LFS_FILESIZE = MAX_LFS_FILESIZE;
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index fb7a9b200b85..263b4b6186ae 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -6,16 +6,30 @@
 //!
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
-use crate::error::{code::*, from_result, to_result, Error};
+use crate::error::{code::*, from_result, to_result, Error, Result};
 use crate::types::Opaque;
 use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
 use core::{ffi, marker::PhantomData, pin::Pin};
 use macros::{pin_data, pinned_drop};
+use sb::SuperBlock;
+
+pub mod sb;
+
+/// The offset of a file in a file system.
+///
+/// This is C's `loff_t`.
+pub type Offset = i64;
+
+/// Maximum size of an inode.
+pub const MAX_LFS_FILESIZE: Offset = bindings::MAX_LFS_FILESIZE;
 
 /// A file system type.
 pub trait FileSystem {
     /// The name of the file system type.
     const NAME: &'static CStr;
+
+    /// Initialises the new superblock.
+    fn fill_super(sb: &mut SuperBlock<Self>) -> Result;
 }
 
 /// A registration of a file system.
@@ -46,7 +60,7 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
                 let fs = unsafe { &mut *fs_ptr };
                 fs.owner = module.0;
                 fs.name = T::NAME.as_char_ptr();
-                fs.init_fs_context = Some(Self::init_fs_context_callback);
+                fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
                 fs.kill_sb = Some(Self::kill_sb_callback);
                 fs.fs_flags = 0;
 
@@ -57,11 +71,22 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
         })
     }
 
-    unsafe extern "C" fn init_fs_context_callback(_fc: *mut bindings::fs_context) -> ffi::c_int {
-        from_result(|| Err(ENOTSUPP))
+    unsafe extern "C" fn init_fs_context_callback<T: FileSystem + ?Sized>(
+        fc_ptr: *mut bindings::fs_context,
+    ) -> ffi::c_int {
+        from_result(|| {
+            // SAFETY: The C callback API guarantees that `fc_ptr` is valid.
+            let fc = unsafe { &mut *fc_ptr };
+            fc.ops = &Tables::<T>::CONTEXT;
+            Ok(0)
+        })
     }
 
-    unsafe extern "C" fn kill_sb_callback(_sb_ptr: *mut bindings::super_block) {}
+    unsafe extern "C" fn kill_sb_callback(sb_ptr: *mut bindings::super_block) {
+        // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
+        // the appropriate function to call for cleanup.
+        unsafe { bindings::kill_anon_super(sb_ptr) };
+    }
 }
 
 #[pinned_drop]
@@ -74,6 +99,113 @@ fn drop(self: Pin<&mut Self>) {
     }
 }
 
+struct Tables<T: FileSystem + ?Sized>(T);
+impl<T: FileSystem + ?Sized> Tables<T> {
+    const CONTEXT: bindings::fs_context_operations = bindings::fs_context_operations {
+        free: None,
+        parse_param: None,
+        get_tree: Some(Self::get_tree_callback),
+        reconfigure: None,
+        parse_monolithic: None,
+        dup: None,
+    };
+
+    unsafe extern "C" fn get_tree_callback(fc: *mut bindings::fs_context) -> ffi::c_int {
+        // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
+        // the right type and is a valid callback.
+        unsafe { bindings::get_tree_nodev(fc, Some(Self::fill_super_callback)) }
+    }
+
+    unsafe extern "C" fn fill_super_callback(
+        sb_ptr: *mut bindings::super_block,
+        _fc: *mut bindings::fs_context,
+    ) -> ffi::c_int {
+        from_result(|| {
+            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
+            // newly-created superblock.
+            let new_sb = unsafe { SuperBlock::from_raw_mut(sb_ptr) };
+
+            // SAFETY: The callback contract guarantees that `sb_ptr`, from which `new_sb` is
+            // derived, is valid for write.
+            let sb = unsafe { &mut *new_sb.0.get() };
+            sb.s_op = &Tables::<T>::SUPER_BLOCK;
+            sb.s_flags |= bindings::SB_RDONLY;
+
+            T::fill_super(new_sb)?;
+
+            // The following is scaffolding code that will be removed in a subsequent patch. It is
+            // needed to build a root dentry, otherwise core code will BUG().
+            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
+            let inode = unsafe { bindings::new_inode(sb) };
+            if inode.is_null() {
+                return Err(ENOMEM);
+            }
+
+            // SAFETY: `inode` is valid for write.
+            unsafe { bindings::set_nlink(inode, 2) };
+
+            {
+                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
+                // safe to mutably dereference it.
+                let inode = unsafe { &mut *inode };
+                inode.i_ino = 1;
+                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
+
+                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
+                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
+
+                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
+                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
+            }
+
+            // SAFETY: `d_make_root` requires that `inode` be valid and referenced, which is the
+            // case for this call.
+            //
+            // It takes over the inode, even on failure, so we don't need to clean it up.
+            let dentry = unsafe { bindings::d_make_root(inode) };
+            if dentry.is_null() {
+                return Err(ENOMEM);
+            }
+
+            sb.s_root = dentry;
+
+            Ok(0)
+        })
+    }
+
+    const SUPER_BLOCK: bindings::super_operations = bindings::super_operations {
+        alloc_inode: None,
+        destroy_inode: None,
+        free_inode: None,
+        dirty_inode: None,
+        write_inode: None,
+        drop_inode: None,
+        evict_inode: None,
+        put_super: None,
+        sync_fs: None,
+        freeze_super: None,
+        freeze_fs: None,
+        thaw_super: None,
+        unfreeze_fs: None,
+        statfs: None,
+        remount_fs: None,
+        umount_begin: None,
+        show_options: None,
+        show_devname: None,
+        show_path: None,
+        show_stats: None,
+        #[cfg(CONFIG_QUOTA)]
+        quota_read: None,
+        #[cfg(CONFIG_QUOTA)]
+        quota_write: None,
+        #[cfg(CONFIG_QUOTA)]
+        get_dquots: None,
+        nr_cached_objects: None,
+        free_cached_objects: None,
+        shutdown: None,
+    };
+}
+
 /// Kernel module that exposes a single file system implemented by `T`.
 #[pin_data]
 pub struct Module<T: FileSystem + ?Sized> {
@@ -100,7 +232,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///
 /// ```
 /// # mod module_fs_sample {
-/// use kernel::fs;
+/// use kernel::fs::{sb::SuperBlock, self};
 /// use kernel::prelude::*;
 ///
 /// kernel::module_fs! {
@@ -114,6 +246,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 /// struct MyFs;
 /// impl fs::FileSystem for MyFs {
 ///     const NAME: &'static CStr = kernel::c_str!("myfs");
+///     fn fill_super(_: &mut SuperBlock<Self>) -> Result {
+///         todo!()
+///     }
 /// }
 /// # }
 /// ```
diff --git a/rust/kernel/fs/sb.rs b/rust/kernel/fs/sb.rs
new file mode 100644
index 000000000000..113d3c0d8148
--- /dev/null
+++ b/rust/kernel/fs/sb.rs
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! File system super blocks.
+//!
+//! This module allows Rust code to use superblocks.
+//!
+//! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
+
+use super::FileSystem;
+use crate::{bindings, types::Opaque};
+use core::marker::PhantomData;
+
+/// A file system super block.
+///
+/// Wraps the kernel's `struct super_block`.
+#[repr(transparent)]
+pub struct SuperBlock<T: FileSystem + ?Sized>(
+    pub(crate) Opaque<bindings::super_block>,
+    PhantomData<T>,
+);
+
+impl<T: FileSystem + ?Sized> SuperBlock<T> {
+    /// Creates a new superblock mutable reference from the given raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that:
+    ///
+    /// * `ptr` is valid and remains so for the lifetime of the returned object.
+    /// * `ptr` has the correct file system type.
+    /// * `ptr` is the only active pointer to the superblock.
+    pub(crate) unsafe fn from_raw_mut<'a>(ptr: *mut bindings::super_block) -> &'a mut Self {
+        // SAFETY: The safety requirements guarantee that the cast below is ok.
+        unsafe { &mut *ptr.cast::<Self>() }
+    }
+
+    /// Returns whether the superblock is mounted in read-only mode.
+    pub fn rdonly(&self) -> bool {
+        // SAFETY: `s_flags` only changes during init, so it is safe to read it.
+        unsafe { (*self.0.get()).s_flags & bindings::SB_RDONLY != 0 }
+    }
+
+    /// Sets the magic number of the superblock.
+    pub fn set_magic(&mut self, magic: usize) -> &mut Self {
+        // SAFETY: This is a new superblock that is being initialised, so it's ok to write to its
+        // fields.
+        unsafe { (*self.0.get()).s_magic = magic as core::ffi::c_ulong };
+        self
+    }
+}
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index d465b107a07d..022addf68891 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -2,6 +2,7 @@
 
 //! Rust read-only file system sample.
 
+use kernel::fs::sb;
 use kernel::prelude::*;
 use kernel::{c_str, fs};
 
@@ -16,4 +17,9 @@
 struct RoFs;
 impl fs::FileSystem for RoFs {
     const NAME: &'static CStr = c_str!("rust_rofs");
+
+    fn fill_super(sb: &mut sb::SuperBlock<Self>) -> Result {
+        sb.set_magic(0x52555354);
+        Ok(())
+    }
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 05/30] rust: fs: introduce `INode<T>`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (3 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 04/30] rust: fs: introduce `FileSystem::fill_super` Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 06/30] rust: fs: introduce `DEntry<T>` Wedson Almeida Filho
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to handle typed and ref-counted inodes.

This is in preparation for creating new inodes (for example, to create
the root inode of a new superblock), which comes in the next patch in
the series.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c          |  7 ++++
 rust/kernel/block.rs    |  9 +++++
 rust/kernel/fs.rs       | 20 +++++++++++
 rust/kernel/fs/inode.rs | 78 +++++++++++++++++++++++++++++++++++++++++
 rust/kernel/fs/sb.rs    | 15 +++++++-
 5 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 rust/kernel/fs/inode.rs

diff --git a/rust/helpers.c b/rust/helpers.c
index 318e3e85dddd..c697c1c4c9d7 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -164,6 +164,13 @@ struct file *rust_helper_get_file(struct file *f)
 }
 EXPORT_SYMBOL_GPL(rust_helper_get_file);
 
+
+loff_t rust_helper_i_size_read(const struct inode *inode)
+{
+	return i_size_read(inode);
+}
+EXPORT_SYMBOL_GPL(rust_helper_i_size_read);
+
 unsigned long rust_helper_copy_to_user(void __user *to, const void *from,
 				       unsigned long n)
 {
diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs
index 38d2a3089ae7..868623d7c873 100644
--- a/rust/kernel/block.rs
+++ b/rust/kernel/block.rs
@@ -5,6 +5,7 @@
 //! C headers: [`include/linux/blk_types.h`](../../include/linux/blk_types.h)
 
 use crate::bindings;
+use crate::fs::inode::INode;
 use crate::types::Opaque;
 
 /// The type used for indexing onto a disc or disc partition.
@@ -35,4 +36,12 @@ pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::block_device) -> &'a Self
         // SAFETY: The safety requirements guarantee that the cast below is ok.
         unsafe { &*ptr.cast::<Self>() }
     }
+
+    /// Returns the inode associated with this block device.
+    pub fn inode(&self) -> &INode {
+        // SAFETY: `bd_inode` is never reassigned.
+        let ptr = unsafe { (*self.0.get()).bd_inode };
+        // SAFET: `ptr` is valid as long as the block device remains valid as well.
+        unsafe { INode::from_raw(ptr) }
+    }
 }
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 263b4b6186ae..89dcd5537830 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -13,6 +13,7 @@
 use macros::{pin_data, pinned_drop};
 use sb::SuperBlock;
 
+pub mod inode;
 pub mod sb;
 
 /// The offset of a file in a file system.
@@ -28,10 +29,29 @@ pub trait FileSystem {
     /// The name of the file system type.
     const NAME: &'static CStr;
 
+    /// Determines if an implementation doesn't specify the required types.
+    ///
+    /// This is meant for internal use only.
+    #[doc(hidden)]
+    const IS_UNSPECIFIED: bool = false;
+
     /// Initialises the new superblock.
     fn fill_super(sb: &mut SuperBlock<Self>) -> Result;
 }
 
+/// A file system that is unspecified.
+///
+/// Attempting to get super-block or inode data from it will result in a build error.
+pub struct UnspecifiedFS;
+
+impl FileSystem for UnspecifiedFS {
+    const NAME: &'static CStr = crate::c_str!("unspecified");
+    const IS_UNSPECIFIED: bool = true;
+    fn fill_super(_: &mut SuperBlock<Self>) -> Result {
+        Err(ENOTSUPP)
+    }
+}
+
 /// A registration of a file system.
 #[pin_data(PinnedDrop)]
 pub struct Registration {
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
new file mode 100644
index 000000000000..bcb9c8ce59a9
--- /dev/null
+++ b/rust/kernel/fs/inode.rs
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! File system inodes.
+//!
+//! This module allows Rust code to implement inodes.
+//!
+//! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
+
+use super::{sb::SuperBlock, FileSystem, Offset, UnspecifiedFS};
+use crate::bindings;
+use crate::types::{AlwaysRefCounted, Opaque};
+use core::{marker::PhantomData, ptr};
+
+/// The number of an inode.
+pub type Ino = u64;
+
+/// A node (inode) in the file index.
+///
+/// Wraps the kernel's `struct inode`.
+///
+/// # Invariants
+///
+/// Instances of this type are always ref-counted, that is, a call to `ihold` ensures that the
+/// allocation remains valid at least until the matching call to `iput`.
+#[repr(transparent)]
+pub struct INode<T: FileSystem + ?Sized = UnspecifiedFS>(
+    pub(crate) Opaque<bindings::inode>,
+    PhantomData<T>,
+);
+
+impl<T: FileSystem + ?Sized> INode<T> {
+    /// Creates a new inode reference from the given raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that:
+    ///
+    /// * `ptr` is valid and remains so for the lifetime of the returned object.
+    /// * `ptr` has the correct file system type, or `T` is [`super::UnspecifiedFS`].
+    #[allow(dead_code)]
+    pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::inode) -> &'a Self {
+        // SAFETY: The safety requirements guarantee that the cast below is ok.
+        unsafe { &*ptr.cast::<Self>() }
+    }
+
+    /// Returns the number of the inode.
+    pub fn ino(&self) -> Ino {
+        // SAFETY: `i_ino` is immutable, and `self` is guaranteed to be valid by the existence of a
+        // shared reference (&self) to it.
+        unsafe { (*self.0.get()).i_ino }
+    }
+
+    /// Returns the super-block that owns the inode.
+    pub fn super_block(&self) -> &SuperBlock<T> {
+        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
+        // shared reference (&self) to it.
+        unsafe { SuperBlock::from_raw((*self.0.get()).i_sb) }
+    }
+
+    /// Returns the size of the inode contents.
+    pub fn size(&self) -> Offset {
+        // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference.
+        unsafe { bindings::i_size_read(self.0.get()) }
+    }
+}
+
+// SAFETY: The type invariants guarantee that `INode` is always ref-counted.
+unsafe impl<T: FileSystem + ?Sized> AlwaysRefCounted for INode<T> {
+    fn inc_ref(&self) {
+        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+        unsafe { bindings::ihold(self.0.get()) };
+    }
+
+    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+        unsafe { bindings::iput(obj.as_ref().0.get()) }
+    }
+}
diff --git a/rust/kernel/fs/sb.rs b/rust/kernel/fs/sb.rs
index 113d3c0d8148..f48e0e2695fa 100644
--- a/rust/kernel/fs/sb.rs
+++ b/rust/kernel/fs/sb.rs
@@ -20,6 +20,19 @@ pub struct SuperBlock<T: FileSystem + ?Sized>(
 );
 
 impl<T: FileSystem + ?Sized> SuperBlock<T> {
+    /// Creates a new superblock reference from the given raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that:
+    ///
+    /// * `ptr` is valid and remains so for the lifetime of the returned object.
+    /// * `ptr` has the correct file system type, or `T` is [`super::UnspecifiedFS`].
+    pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::super_block) -> &'a Self {
+        // SAFETY: The safety requirements guarantee that the cast below is ok.
+        unsafe { &*ptr.cast::<Self>() }
+    }
+
     /// Creates a new superblock mutable reference from the given raw pointer.
     ///
     /// # Safety
@@ -27,7 +40,7 @@ impl<T: FileSystem + ?Sized> SuperBlock<T> {
     /// Callers must ensure that:
     ///
     /// * `ptr` is valid and remains so for the lifetime of the returned object.
-    /// * `ptr` has the correct file system type.
+    /// * `ptr` has the correct file system type, or `T` is [`super::UnspecifiedFS`].
     /// * `ptr` is the only active pointer to the superblock.
     pub(crate) unsafe fn from_raw_mut<'a>(ptr: *mut bindings::super_block) -> &'a mut Self {
         // SAFETY: The safety requirements guarantee that the cast below is ok.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 06/30] rust: fs: introduce `DEntry<T>`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (4 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 05/30] rust: fs: introduce `INode<T>` Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 07/30] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c           |   6 ++
 rust/kernel/error.rs     |   2 -
 rust/kernel/fs.rs        |   1 +
 rust/kernel/fs/dentry.rs | 137 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 144 insertions(+), 2 deletions(-)
 create mode 100644 rust/kernel/fs/dentry.rs

diff --git a/rust/helpers.c b/rust/helpers.c
index c697c1c4c9d7..c7fe6917251e 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -165,6 +165,12 @@ struct file *rust_helper_get_file(struct file *f)
 EXPORT_SYMBOL_GPL(rust_helper_get_file);
 
 
+struct dentry *rust_helper_dget(struct dentry *dentry)
+{
+	return dget(dentry);
+}
+EXPORT_SYMBOL_GPL(rust_helper_dget);
+
 loff_t rust_helper_i_size_read(const struct inode *inode)
 {
 	return i_size_read(inode);
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index f4fa2847e210..bb13bd4a7fa6 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -261,8 +261,6 @@ pub fn to_result(err: core::ffi::c_int) -> Result {
 ///     from_err_ptr(unsafe { bindings::devm_platform_ioremap_resource(pdev.to_ptr(), index) })
 /// }
 /// ```
-// TODO: Remove `dead_code` marker once an in-kernel client is available.
-#[allow(dead_code)]
 pub(crate) fn from_err_ptr<T>(ptr: *mut T) -> Result<*mut T> {
     // CAST: Casting a pointer to `*const core::ffi::c_void` is always valid.
     let const_ptr: *const core::ffi::c_void = ptr.cast();
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 89dcd5537830..4f07da71e1ec 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -13,6 +13,7 @@
 use macros::{pin_data, pinned_drop};
 use sb::SuperBlock;
 
+pub mod dentry;
 pub mod inode;
 pub mod sb;
 
diff --git a/rust/kernel/fs/dentry.rs b/rust/kernel/fs/dentry.rs
new file mode 100644
index 000000000000..6a36a48cd28b
--- /dev/null
+++ b/rust/kernel/fs/dentry.rs
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! File system directory entries.
+//!
+//! This module allows Rust code to use dentries.
+//!
+//! C headers: [`include/linux/dcache.h`](srctree/include/linux/dcache.h)
+
+use super::{inode::INode, FileSystem, SuperBlock};
+use crate::bindings;
+use crate::error::{code::*, from_err_ptr, Result};
+use crate::types::{ARef, AlwaysRefCounted, Opaque};
+use core::{marker::PhantomData, mem::ManuallyDrop, ops::Deref, ptr};
+
+/// A directory entry.
+///
+/// Wraps the kernel's `struct dentry`.
+///
+/// # Invariants
+///
+/// Instances of this type are always ref-counted, that is, a call to `dget` ensures that the
+/// allocation remains valid at least until the matching call to `dput`.
+#[repr(transparent)]
+pub struct DEntry<T: FileSystem + ?Sized>(pub(crate) Opaque<bindings::dentry>, PhantomData<T>);
+
+// SAFETY: The type invariants guarantee that `DEntry` is always ref-counted.
+unsafe impl<T: FileSystem + ?Sized> AlwaysRefCounted for DEntry<T> {
+    fn inc_ref(&self) {
+        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+        unsafe { bindings::dget(self.0.get()) };
+    }
+
+    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+        unsafe { bindings::dput(obj.as_ref().0.get()) }
+    }
+}
+
+impl<T: FileSystem + ?Sized> DEntry<T> {
+    /// Creates a new [`DEntry`] from a raw C pointer.
+    ///
+    /// # Safety
+    ///
+    /// * `ptr` must be valid for at least the lifetime of the returned reference.
+    /// * `ptr` has the correct file system type, or `T` is [`super::UnspecifiedFS`].
+    #[allow(dead_code)]
+    pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::dentry) -> &'a Self {
+        // SAFETY: The safety requirements guarantee that the reference is and remains valid.
+        unsafe { &*ptr.cast::<Self>() }
+    }
+
+    /// Returns the superblock of the dentry.
+    pub fn super_block(&self) -> &SuperBlock<T> {
+        // `d_sb` is immutable, so it's safe to read it.
+        unsafe { SuperBlock::from_raw((*self.0.get()).d_sb) }
+    }
+}
+
+/// A dentry that is known to be unhashed.
+pub struct Unhashed<'a, T: FileSystem + ?Sized>(pub(crate) &'a DEntry<T>);
+
+impl<T: FileSystem + ?Sized> Unhashed<'_, T> {
+    /// Splices a disconnected dentry into the tree if one exists.
+    pub fn splice_alias(self, inode: Option<ARef<INode<T>>>) -> Result<Option<ARef<DEntry<T>>>> {
+        let inode_ptr = if let Some(i) = inode {
+            // Reject inode if it belongs to a different superblock.
+            if !ptr::eq(i.super_block(), self.0.super_block()) {
+                return Err(EINVAL);
+            }
+
+            ManuallyDrop::new(i).0.get()
+        } else {
+            ptr::null_mut()
+        };
+
+        // SAFETY: Both inode and dentry are known to be valid.
+        let ptr = from_err_ptr(unsafe { bindings::d_splice_alias(inode_ptr, self.0 .0.get()) })?;
+
+        // SAFETY: The C API guarantees that if a dentry is returned, the refcount has been
+        // incremented.
+        Ok(ptr::NonNull::new(ptr).map(|v| unsafe { ARef::from_raw(v.cast::<DEntry<T>>()) }))
+    }
+
+    /// Returns the name of the dentry.
+    ///
+    /// Being unhashed guarantees that the name won't change.
+    pub fn name(&self) -> &[u8] {
+        // SAFETY: The name is immutable, so it is ok to read it.
+        let name = unsafe { &*ptr::addr_of!((*self.0 .0.get()).d_name) };
+
+        // This ensures that a `u32` is representable in `usize`. If it isn't, we'll get a build
+        // break.
+        const _: usize = 0xffffffff;
+
+        // SAFETY: The union is just allow an easy way to get the `hash` and `len` at once. `len`
+        // is always valid.
+        let len = unsafe { name.__bindgen_anon_1.__bindgen_anon_1.len } as usize;
+
+        // SAFETY: The name is immutable, so it is ok to read it.
+        unsafe { core::slice::from_raw_parts(name.name, len) }
+    }
+}
+
+impl<T: FileSystem + ?Sized> Deref for Unhashed<'_, T> {
+    type Target = DEntry<T>;
+
+    fn deref(&self) -> &Self::Target {
+        self.0
+    }
+}
+
+/// A dentry that is meant to be used as the root of a file system.
+pub struct Root<T: FileSystem + ?Sized>(ARef<DEntry<T>>);
+
+impl<T: FileSystem + ?Sized> Root<T> {
+    /// Creates a root dentry.
+    pub fn try_new(inode: ARef<INode<T>>) -> Result<Root<T>> {
+        // SAFETY: `d_make_root` requires that `inode` be valid and referenced, which is the
+        // case for this call.
+        //
+        // It takes over the inode, even on failure, so we don't need to clean it up.
+        let dentry_ptr = unsafe { bindings::d_make_root(ManuallyDrop::new(inode).0.get()) };
+        let dentry = ptr::NonNull::new(dentry_ptr).ok_or(ENOMEM)?;
+
+        // SAFETY: `dentry` is valid and referenced. It reference ownership is transferred to
+        // `ARef`.
+        Ok(Root(unsafe { ARef::from_raw(dentry.cast::<DEntry<T>>()) }))
+    }
+}
+
+impl<T: FileSystem + ?Sized> Deref for Root<T> {
+    type Target = DEntry<T>;
+
+    fn deref(&self) -> &Self::Target {
+        &self.0
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 07/30] rust: fs: introduce `FileSystem::init_root`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (5 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 06/30] rust: fs: introduce `DEntry<T>` Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 08/30] rust: file: move `kernel::file` to `kernel::fs::file` Wedson Almeida Filho
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to specify their root directory. Also allow them
to create (and do cache lookups of) directory inodes. (More types of
inodes are added in subsequent patches in the series.)

The `inode::New` type ensures that a new inode is properly initialised
before it is marked so. It also facilitates error paths by automatically
marking inodes as failed if they're not properly initialised.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c            |  11 ++++
 rust/kernel/fs.rs         |  56 ++++++++-----------
 rust/kernel/fs/inode.rs   | 111 +++++++++++++++++++++++++++++++++++++-
 rust/kernel/fs/sb.rs      |  48 ++++++++++++++++-
 samples/rust/rust_rofs.rs |  25 +++++++--
 5 files changed, 211 insertions(+), 40 deletions(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index c7fe6917251e..87301e1ace65 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -164,6 +164,17 @@ struct file *rust_helper_get_file(struct file *f)
 }
 EXPORT_SYMBOL_GPL(rust_helper_get_file);
 
+void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
+{
+	i_uid_write(inode, uid);
+}
+EXPORT_SYMBOL_GPL(rust_helper_i_uid_write);
+
+void rust_helper_i_gid_write(struct inode *inode, gid_t gid)
+{
+	i_gid_write(inode, gid);
+}
+EXPORT_SYMBOL_GPL(rust_helper_i_gid_write);
 
 struct dentry *rust_helper_dget(struct dentry *dentry)
 {
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 4f07da71e1ec..f32c2f89f781 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -9,7 +9,7 @@
 use crate::error::{code::*, from_result, to_result, Error, Result};
 use crate::types::Opaque;
 use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
-use core::{ffi, marker::PhantomData, pin::Pin};
+use core::{ffi, marker::PhantomData, mem::ManuallyDrop, pin::Pin, ptr};
 use macros::{pin_data, pinned_drop};
 use sb::SuperBlock;
 
@@ -38,6 +38,12 @@ pub trait FileSystem {
 
     /// Initialises the new superblock.
     fn fill_super(sb: &mut SuperBlock<Self>) -> Result;
+
+    /// Initialises and returns the root inode of the given superblock.
+    ///
+    /// This is called during initialisation of a superblock after [`FileSystem::fill_super`] has
+    /// completed successfully.
+    fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>>;
 }
 
 /// A file system that is unspecified.
@@ -51,6 +57,10 @@ impl FileSystem for UnspecifiedFS {
     fn fill_super(_: &mut SuperBlock<Self>) -> Result {
         Err(ENOTSUPP)
     }
+
+    fn init_root(_: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
+        Err(ENOTSUPP)
+    }
 }
 
 /// A registration of a file system.
@@ -154,41 +164,18 @@ impl<T: FileSystem + ?Sized> Tables<T> {
 
             T::fill_super(new_sb)?;
 
-            // The following is scaffolding code that will be removed in a subsequent patch. It is
-            // needed to build a root dentry, otherwise core code will BUG().
-            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
-            let inode = unsafe { bindings::new_inode(sb) };
-            if inode.is_null() {
-                return Err(ENOMEM);
-            }
-
-            // SAFETY: `inode` is valid for write.
-            unsafe { bindings::set_nlink(inode, 2) };
+            let root = T::init_root(new_sb)?;
 
-            {
-                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
-                // safe to mutably dereference it.
-                let inode = unsafe { &mut *inode };
-                inode.i_ino = 1;
-                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
-
-                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
-                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
-
-                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
-                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
+            // Reject root inode if it belongs to a different superblock.
+            if !ptr::eq(root.super_block(), new_sb) {
+                return Err(EINVAL);
             }
 
-            // SAFETY: `d_make_root` requires that `inode` be valid and referenced, which is the
-            // case for this call.
-            //
-            // It takes over the inode, even on failure, so we don't need to clean it up.
-            let dentry = unsafe { bindings::d_make_root(inode) };
-            if dentry.is_null() {
-                return Err(ENOMEM);
-            }
+            let dentry = ManuallyDrop::new(root).0.get();
 
-            sb.s_root = dentry;
+            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
+            // newly-created (and initialised above) superblock.
+            unsafe { (*sb_ptr).s_root = dentry };
 
             Ok(0)
         })
@@ -253,7 +240,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///
 /// ```
 /// # mod module_fs_sample {
-/// use kernel::fs::{sb::SuperBlock, self};
+/// use kernel::fs::{dentry, inode::INode, sb::SuperBlock, self};
 /// use kernel::prelude::*;
 ///
 /// kernel::module_fs! {
@@ -270,6 +257,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///     fn fill_super(_: &mut SuperBlock<Self>) -> Result {
 ///         todo!()
 ///     }
+///     fn init_root(_sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
+///         todo!()
+///     }
 /// }
 /// # }
 /// ```
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index bcb9c8ce59a9..4ccbb4145918 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -7,8 +7,10 @@
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
 use super::{sb::SuperBlock, FileSystem, Offset, UnspecifiedFS};
-use crate::bindings;
-use crate::types::{AlwaysRefCounted, Opaque};
+use crate::error::Result;
+use crate::types::{ARef, AlwaysRefCounted, Opaque};
+use crate::{bindings, block, time::Timespec};
+use core::mem::ManuallyDrop;
 use core::{marker::PhantomData, ptr};
 
 /// The number of an inode.
@@ -76,3 +78,108 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
         unsafe { bindings::iput(obj.as_ref().0.get()) }
     }
 }
+
+/// An inode that is locked and hasn't been initialised yet.
+///
+/// # Invariants
+///
+/// The inode is a new one, locked, and valid for write.
+pub struct New<T: FileSystem + ?Sized>(
+    pub(crate) ptr::NonNull<bindings::inode>,
+    pub(crate) PhantomData<T>,
+);
+
+impl<T: FileSystem + ?Sized> New<T> {
+    /// Initialises the new inode with the given parameters.
+    pub fn init(mut self, params: Params) -> Result<ARef<INode<T>>> {
+        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
+        let inode = unsafe { self.0.as_mut() };
+        let mode = match params.typ {
+            Type::Dir => {
+                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
+                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
+
+                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
+                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
+
+                bindings::S_IFDIR
+            }
+        };
+
+        inode.i_mode = (params.mode & 0o777) | u16::try_from(mode)?;
+        inode.i_size = params.size;
+        inode.i_blocks = params.blocks;
+
+        inode.__i_ctime = params.ctime.into();
+        inode.__i_mtime = params.mtime.into();
+        inode.__i_atime = params.atime.into();
+
+        // SAFETY: inode is a new inode, so it is valid for write.
+        unsafe {
+            bindings::set_nlink(inode, params.nlink);
+            bindings::i_uid_write(inode, params.uid);
+            bindings::i_gid_write(inode, params.gid);
+            bindings::unlock_new_inode(inode);
+        }
+
+        let manual = ManuallyDrop::new(self);
+        // SAFETY: We transferred ownership of the refcount to `ARef` by preventing `drop` from
+        // being called with the `ManuallyDrop` instance created above.
+        Ok(unsafe { ARef::from_raw(manual.0.cast::<INode<T>>()) })
+    }
+}
+
+impl<T: FileSystem + ?Sized> Drop for New<T> {
+    fn drop(&mut self) {
+        // SAFETY: The new inode failed to be turned into an initialised inode, so it's safe (and
+        // in fact required) to call `iget_failed` on it.
+        unsafe { bindings::iget_failed(self.0.as_ptr()) };
+    }
+}
+
+/// The type of an inode.
+#[derive(Copy, Clone)]
+pub enum Type {
+    /// Directory type.
+    Dir,
+}
+
+/// Required inode parameters.
+///
+/// This is used when creating new inodes.
+pub struct Params {
+    /// The access mode. It's a mask that grants execute (1), write (2) and read (4) access to
+    /// everyone, the owner group, and the owner.
+    pub mode: u16,
+
+    /// Type of inode.
+    ///
+    /// Also carries additional per-type data.
+    pub typ: Type,
+
+    /// Size of the contents of the inode.
+    ///
+    /// Its maximum value is [`super::MAX_LFS_FILESIZE`].
+    pub size: Offset,
+
+    /// Number of blocks.
+    pub blocks: block::Count,
+
+    /// Number of links to the inode.
+    pub nlink: u32,
+
+    /// User id.
+    pub uid: u32,
+
+    /// Group id.
+    pub gid: u32,
+
+    /// Creation time.
+    pub ctime: Timespec,
+
+    /// Last modification time.
+    pub mtime: Timespec,
+
+    /// Last access time.
+    pub atime: Timespec,
+}
diff --git a/rust/kernel/fs/sb.rs b/rust/kernel/fs/sb.rs
index f48e0e2695fa..fa10f3db5593 100644
--- a/rust/kernel/fs/sb.rs
+++ b/rust/kernel/fs/sb.rs
@@ -6,9 +6,12 @@
 //!
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
+use super::inode::{self, INode, Ino};
 use super::FileSystem;
-use crate::{bindings, types::Opaque};
-use core::marker::PhantomData;
+use crate::bindings;
+use crate::error::{code::*, Result};
+use crate::types::{ARef, Either, Opaque};
+use core::{marker::PhantomData, ptr};
 
 /// A file system super block.
 ///
@@ -60,4 +63,45 @@ pub fn set_magic(&mut self, magic: usize) -> &mut Self {
         unsafe { (*self.0.get()).s_magic = magic as core::ffi::c_ulong };
         self
     }
+
+    /// Tries to get an existing inode or create a new one if it doesn't exist yet.
+    pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, inode::New<T>>> {
+        // SAFETY: All superblock-related state needed by `iget_locked` is initialised by C code
+        // before calling `fill_super_callback`, or by `fill_super_callback` itself before calling
+        // `super_params`, which is the first function to see a new superblock.
+        let inode =
+            ptr::NonNull::new(unsafe { bindings::iget_locked(self.0.get(), ino) }).ok_or(ENOMEM)?;
+
+        // SAFETY: `inode` is a valid pointer returned by `iget_locked`.
+        unsafe { bindings::spin_lock(ptr::addr_of_mut!((*inode.as_ptr()).i_lock)) };
+
+        // SAFETY: `inode` is valid and was locked by the previous lock.
+        let state = unsafe { *ptr::addr_of!((*inode.as_ptr()).i_state) };
+
+        // SAFETY: `inode` is a valid pointer returned by `iget_locked`.
+        unsafe { bindings::spin_unlock(ptr::addr_of_mut!((*inode.as_ptr()).i_lock)) };
+
+        if state & u64::from(bindings::I_NEW) == 0 {
+            // The inode is cached. Just return it.
+            //
+            // SAFETY: `inode` had its refcount incremented by `iget_locked`; this increment is now
+            // owned by `ARef`.
+            Ok(Either::Left(unsafe { ARef::from_raw(inode.cast()) }))
+        } else {
+            // SAFETY: The new inode is valid but not fully initialised yet, so it's ok to create a
+            // `inode::New`.
+            Ok(Either::Right(inode::New(inode, PhantomData)))
+        }
+    }
+
+    /// Creates an inode with the given inode number.
+    ///
+    /// Fails with `EEXIST` if an inode with the given number already exists.
+    pub fn create_inode(&self, ino: Ino) -> Result<inode::New<T>> {
+        if let Either::Right(new) = self.get_or_create_inode(ino)? {
+            Ok(new)
+        } else {
+            Err(EEXIST)
+        }
+    }
 }
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 022addf68891..d32c4645ebe8 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -2,9 +2,9 @@
 
 //! Rust read-only file system sample.
 
-use kernel::fs::sb;
+use kernel::fs::{dentry, inode, sb::SuperBlock};
 use kernel::prelude::*;
-use kernel::{c_str, fs};
+use kernel::{c_str, fs, time::UNIX_EPOCH, types::Either};
 
 kernel::module_fs! {
     type: RoFs,
@@ -18,8 +18,27 @@
 impl fs::FileSystem for RoFs {
     const NAME: &'static CStr = c_str!("rust_rofs");
 
-    fn fill_super(sb: &mut sb::SuperBlock<Self>) -> Result {
+    fn fill_super(sb: &mut SuperBlock<Self>) -> Result {
         sb.set_magic(0x52555354);
         Ok(())
     }
+
+    fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
+        let inode = match sb.get_or_create_inode(1)? {
+            Either::Left(existing) => existing,
+            Either::Right(new) => new.init(inode::Params {
+                typ: inode::Type::Dir,
+                mode: 0o555,
+                size: 1,
+                blocks: 1,
+                nlink: 2,
+                uid: 0,
+                gid: 0,
+                atime: UNIX_EPOCH,
+                ctime: UNIX_EPOCH,
+                mtime: UNIX_EPOCH,
+            })?,
+        };
+        dentry::Root::try_new(inode)
+    }
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 08/30] rust: file: move `kernel::file` to `kernel::fs::file`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (6 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 07/30] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 09/30] rust: fs: generalise `File` for different file systems Wedson Almeida Filho
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

This is in preparation for making `File` parametrised on the file system
type, so we can get a typed inode in file system implementations that
have data attached to inodes.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs.rs            | 1 +
 rust/kernel/{ => fs}/file.rs | 2 +-
 rust/kernel/lib.rs           | 1 -
 3 files changed, 2 insertions(+), 2 deletions(-)
 rename rust/kernel/{ => fs}/file.rs (99%)

diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index f32c2f89f781..20fb6107eb4b 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -14,6 +14,7 @@
 use sb::SuperBlock;
 
 pub mod dentry;
+pub mod file;
 pub mod inode;
 pub mod sb;
 
diff --git a/rust/kernel/file.rs b/rust/kernel/fs/file.rs
similarity index 99%
rename from rust/kernel/file.rs
rename to rust/kernel/fs/file.rs
index b7ded0cdd063..908e2672676f 100644
--- a/rust/kernel/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -76,7 +76,7 @@ pub mod flags {
     /// # Examples
     ///
     /// ```
-    /// use kernel::file;
+    /// use kernel::fs::file;
     /// # fn do_something() {}
     /// # let flags = 0;
     /// if (flags & file::flags::O_ACCMODE) == file::flags::O_RDONLY {
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index e664f80b8141..81065d1bd679 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -30,7 +30,6 @@
 pub mod block;
 mod build_assert;
 pub mod error;
-pub mod file;
 pub mod fs;
 pub mod init;
 pub mod ioctl;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 09/30] rust: fs: generalise `File` for different file systems
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (7 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 08/30] rust: file: move `kernel::file` to `kernel::fs::file` Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 10/30] rust: fs: add empty file operations Wedson Almeida Filho
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

This is in preparation for allowing file operation implementations for
different file systems.

Also add an unspecified file system so that users of the `File` type
that don't care about the file system may continue to do so.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs/file.rs  | 53 ++++++++++++++++++++++++++++-------------
 rust/kernel/fs/inode.rs |  1 -
 2 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index 908e2672676f..b8386a396251 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -2,15 +2,18 @@
 
 //! Files and file descriptors.
 //!
-//! C headers: [`include/linux/fs.h`](../../../../include/linux/fs.h) and
-//! [`include/linux/file.h`](../../../../include/linux/file.h)
+//! This module allows Rust code to interact with and implement files.
+//!
+//! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h) and
+//! [`include/linux/file.h`](srctree/include/linux/file.h)
 
+use super::{dentry::DEntry, inode::INode, FileSystem, UnspecifiedFS};
 use crate::{
     bindings,
     error::{code::*, Error, Result},
     types::{ARef, AlwaysRefCounted, Opaque},
 };
-use core::ptr;
+use core::{marker::PhantomData, ptr};
 
 /// Flags associated with a [`File`].
 pub mod flags {
@@ -95,6 +98,8 @@ pub mod flags {
     pub const O_RDWR: u32 = bindings::O_RDWR;
 }
 
+/// A file.
+///
 /// Wraps the kernel's `struct file`.
 ///
 /// # Refcounting
@@ -139,7 +144,7 @@ pub mod flags {
 /// * The Rust borrow-checker normally ensures this by enforcing that the `ARef<File>` from which
 ///   a `&File` is created outlives the `&File`.
 ///
-/// * Using the unsafe [`File::from_ptr`] means that it is up to the caller to ensure that the
+/// * Using the unsafe [`File::from_raw`] means that it is up to the caller to ensure that the
 ///   `&File` only exists while the reference count is positive.
 ///
 /// * You can think of `fdget` as using an fd to look up an `ARef<File>` in the `struct
@@ -154,20 +159,20 @@ pub mod flags {
 ///   closed.
 /// * A light refcount must be dropped before returning to userspace.
 #[repr(transparent)]
-pub struct File(Opaque<bindings::file>);
+pub struct File<T: FileSystem + ?Sized = UnspecifiedFS>(Opaque<bindings::file>, PhantomData<T>);
 
 // SAFETY: By design, the only way to access a `File` is via an immutable reference or an `ARef`.
 // This means that the only situation in which a `File` can be accessed mutably is when the
 // refcount drops to zero and the destructor runs. It is safe for that to happen on any thread, so
 // it is ok for this type to be `Send`.
-unsafe impl Send for File {}
+unsafe impl<T: FileSystem + ?Sized> Send for File<T> {}
 
 // SAFETY: All methods defined on `File` that take `&self` are safe to call even if other threads
 // are concurrently accessing the same `struct file`, because those methods either access immutable
 // properties or have proper synchronization to ensure that such accesses are safe.
-unsafe impl Sync for File {}
+unsafe impl<T: FileSystem + ?Sized> Sync for File<T> {}
 
-impl File {
+impl<T: FileSystem + ?Sized> File<T> {
     /// Constructs a new `struct file` wrapper from a file descriptor.
     ///
     /// The file descriptor belongs to the current process.
@@ -187,15 +192,17 @@ pub fn fget(fd: u32) -> Result<ARef<Self>, BadFdError> {
     ///
     /// # Safety
     ///
-    /// The caller must ensure that `ptr` points at a valid file and that the file's refcount is
-    /// positive for the duration of 'a.
-    pub unsafe fn from_ptr<'a>(ptr: *const bindings::file) -> &'a File {
+    /// Callers must ensure that:
+    ///
+    /// * `ptr` is valid and remains so for the duration of 'a.
+    /// * `ptr` has the correct file system type, or `T` is [`UnspecifiedFS`].
+    pub unsafe fn from_raw<'a>(ptr: *const bindings::file) -> &'a Self {
         // SAFETY: The caller guarantees that the pointer is not dangling and stays valid for the
         // duration of 'a. The cast is okay because `File` is `repr(transparent)`.
         //
         // INVARIANT: The safety requirements guarantee that the refcount does not hit zero during
         // 'a.
-        unsafe { &*ptr.cast() }
+        unsafe { &*ptr.cast::<Self>() }
     }
 
     /// Returns a raw pointer to the inner C struct.
@@ -215,20 +222,32 @@ pub fn flags(&self) -> u32 {
         // TODO: Replace with `read_once` when available on the Rust side.
         unsafe { core::ptr::addr_of!((*self.as_ptr()).f_flags).read_volatile() }
     }
+
+    /// Returns the inode associated with the file.
+    pub fn inode(&self) -> &INode<T> {
+        // SAFETY: `f_inode` is an immutable field, so it's safe to read it.
+        unsafe { INode::from_raw((*self.0.get()).f_inode) }
+    }
+
+    /// Returns the dentry associated with the file.
+    pub fn dentry(&self) -> &DEntry<T> {
+        // SAFETY: `f_path` is an immutable field, so it's safe to read it. And will remain safe to
+        // read while the `&self` is valid.
+        unsafe { DEntry::from_raw((*self.0.get()).f_path.dentry) }
+    }
 }
 
 // SAFETY: The type invariants guarantee that `File` is always ref-counted. This implementation
 // makes `ARef<File>` own a normal refcount.
-unsafe impl AlwaysRefCounted for File {
+unsafe impl<T: FileSystem + ?Sized> AlwaysRefCounted for File<T> {
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference means that the refcount is nonzero.
         unsafe { bindings::get_file(self.as_ptr()) };
     }
 
-    unsafe fn dec_ref(obj: ptr::NonNull<File>) {
-        // SAFETY: To call this method, the caller passes us ownership of a normal refcount, so we
-        // may drop it. The cast is okay since `File` has the same representation as `struct file`.
-        unsafe { bindings::fput(obj.cast().as_ptr()) }
+    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+        unsafe { bindings::fput(obj.as_ref().0.get()) }
     }
 }
 
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index 4ccbb4145918..11df493314ea 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -39,7 +39,6 @@ impl<T: FileSystem + ?Sized> INode<T> {
     ///
     /// * `ptr` is valid and remains so for the lifetime of the returned object.
     /// * `ptr` has the correct file system type, or `T` is [`super::UnspecifiedFS`].
-    #[allow(dead_code)]
     pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::inode) -> &'a Self {
         // SAFETY: The safety requirements guarantee that the cast below is ok.
         unsafe { &*ptr.cast::<Self>() }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 10/30] rust: fs: add empty file operations
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (8 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 09/30] rust: fs: generalise `File` for different file systems Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 11/30] rust: fs: introduce `file::Operations::read_dir` Wedson Almeida Filho
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

This is in preparation for allowing modules to implement different file
callbacks, which will be introduced in subsequent patches.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs/file.rs | 57 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index b8386a396251..67dd3ecf7d98 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -14,6 +14,7 @@
     types::{ARef, AlwaysRefCounted, Opaque},
 };
 use core::{marker::PhantomData, ptr};
+use macros::vtable;
 
 /// Flags associated with a [`File`].
 pub mod flags {
@@ -268,3 +269,59 @@ fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
         f.pad("EBADF")
     }
 }
+
+/// Operations implemented by files.
+#[vtable]
+pub trait Operations {
+    /// File system that these operations are compatible with.
+    type FileSystem: FileSystem + ?Sized;
+}
+
+/// Represents file operations.
+#[allow(dead_code)]
+pub struct Ops<T: FileSystem + ?Sized>(pub(crate) *const bindings::file_operations, PhantomData<T>);
+
+impl<T: FileSystem + ?Sized> Ops<T> {
+    /// Creates file operations from a type that implements the [`Operations`] trait.
+    pub const fn new<U: Operations<FileSystem = T> + ?Sized>() -> Self {
+        struct Table<T: Operations + ?Sized>(PhantomData<T>);
+        impl<T: Operations + ?Sized> Table<T> {
+            const TABLE: bindings::file_operations = bindings::file_operations {
+                owner: ptr::null_mut(),
+                llseek: None,
+                read: None,
+                write: None,
+                read_iter: None,
+                write_iter: None,
+                iopoll: None,
+                iterate_shared: None,
+                poll: None,
+                unlocked_ioctl: None,
+                compat_ioctl: None,
+                mmap: None,
+                mmap_supported_flags: 0,
+                open: None,
+                flush: None,
+                release: None,
+                fsync: None,
+                fasync: None,
+                lock: None,
+                get_unmapped_area: None,
+                check_flags: None,
+                flock: None,
+                splice_write: None,
+                splice_read: None,
+                splice_eof: None,
+                setlease: None,
+                fallocate: None,
+                show_fdinfo: None,
+                copy_file_range: None,
+                remap_file_range: None,
+                fadvise: None,
+                uring_cmd: None,
+                uring_cmd_iopoll: None,
+            };
+        }
+        Self(&Table::<U>::TABLE, PhantomData)
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 11/30] rust: fs: introduce `file::Operations::read_dir`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (9 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 10/30] rust: fs: add empty file operations Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 12/30] rust: fs: introduce `file::Operations::seek` Wedson Almeida Filho
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allows Rust file systems to report the contents of their directory
inodes. The reported entries cannot be opened yet.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c            |  12 +++
 rust/kernel/fs/file.rs    | 176 ++++++++++++++++++++++++++++++++++++--
 rust/kernel/fs/inode.rs   |  31 +++++--
 samples/rust/rust_rofs.rs |  85 +++++++++++++++---
 4 files changed, 279 insertions(+), 25 deletions(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index 87301e1ace65..deb2d21f3096 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -195,6 +195,18 @@ unsigned long rust_helper_copy_to_user(void __user *to, const void *from,
 }
 EXPORT_SYMBOL_GPL(rust_helper_copy_to_user);
 
+void rust_helper_inode_lock_shared(struct inode *inode)
+{
+	inode_lock_shared(inode);
+}
+EXPORT_SYMBOL_GPL(rust_helper_inode_lock_shared);
+
+void rust_helper_inode_unlock_shared(struct inode *inode)
+{
+	inode_unlock_shared(inode);
+}
+EXPORT_SYMBOL_GPL(rust_helper_inode_unlock_shared);
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index 67dd3ecf7d98..6d61723f440d 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -7,13 +7,13 @@
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h) and
 //! [`include/linux/file.h`](srctree/include/linux/file.h)
 
-use super::{dentry::DEntry, inode::INode, FileSystem, UnspecifiedFS};
+use super::{dentry::DEntry, inode, inode::INode, inode::Ino, FileSystem, Offset, UnspecifiedFS};
 use crate::{
     bindings,
-    error::{code::*, Error, Result},
-    types::{ARef, AlwaysRefCounted, Opaque},
+    error::{code::*, from_result, Error, Result},
+    types::{ARef, AlwaysRefCounted, Locked, Opaque},
 };
-use core::{marker::PhantomData, ptr};
+use core::{marker::PhantomData, mem::ManuallyDrop, ptr};
 use macros::vtable;
 
 /// Flags associated with a [`File`].
@@ -275,10 +275,20 @@ fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
 pub trait Operations {
     /// File system that these operations are compatible with.
     type FileSystem: FileSystem + ?Sized;
+
+    /// Reads directory entries from directory files.
+    ///
+    /// [`DirEmitter::pos`] holds the current position of the directory reader.
+    fn read_dir(
+        _file: &File<Self::FileSystem>,
+        _inode: &Locked<&INode<Self::FileSystem>, inode::ReadSem>,
+        _emitter: &mut DirEmitter,
+    ) -> Result {
+        Err(EINVAL)
+    }
 }
 
 /// Represents file operations.
-#[allow(dead_code)]
 pub struct Ops<T: FileSystem + ?Sized>(pub(crate) *const bindings::file_operations, PhantomData<T>);
 
 impl<T: FileSystem + ?Sized> Ops<T> {
@@ -294,7 +304,11 @@ impl<T: Operations + ?Sized> Table<T> {
                 read_iter: None,
                 write_iter: None,
                 iopoll: None,
-                iterate_shared: None,
+                iterate_shared: if T::HAS_READ_DIR {
+                    Some(Self::read_dir_callback)
+                } else {
+                    None
+                },
                 poll: None,
                 unlocked_ioctl: None,
                 compat_ioctl: None,
@@ -321,7 +335,157 @@ impl<T: Operations + ?Sized> Table<T> {
                 uring_cmd: None,
                 uring_cmd_iopoll: None,
             };
+
+            unsafe extern "C" fn read_dir_callback(
+                file_ptr: *mut bindings::file,
+                ctx_ptr: *mut bindings::dir_context,
+            ) -> core::ffi::c_int {
+                from_result(|| {
+                    // SAFETY: The C API guarantees that `file` is valid for the duration of the
+                    // callback. Since this callback is specifically for filesystem T, we know `T`
+                    // is the right filesystem.
+                    let file = unsafe { File::from_raw(file_ptr) };
+
+                    // SAFETY: The C API guarantees that this is the only reference to the
+                    // `dir_context` instance.
+                    let emitter = unsafe { &mut *ctx_ptr.cast::<DirEmitter>() };
+                    let orig_pos = emitter.pos();
+
+                    // SAFETY: The C API guarantees that the inode's rw semaphore is locked in read
+                    // mode. It does not expect callees to unlock it, so we make the locked object
+                    // manually dropped to avoid unlocking it.
+                    let locked = ManuallyDrop::new(unsafe { Locked::new(file.inode()) });
+
+                    // Call the module implementation. We ignore errors if directory entries have
+                    // been succesfully emitted: this is because we want users to see them before
+                    // the error.
+                    match T::read_dir(file, &locked, emitter) {
+                        Ok(_) => Ok(0),
+                        Err(e) => {
+                            if emitter.pos() == orig_pos {
+                                Err(e)
+                            } else {
+                                Ok(0)
+                            }
+                        }
+                    }
+                })
+            }
         }
         Self(&Table::<U>::TABLE, PhantomData)
     }
 }
+
+/// The types of directory entries reported by [`Operations::read_dir`].
+#[repr(u32)]
+#[derive(Copy, Clone)]
+pub enum DirEntryType {
+    /// Unknown type.
+    Unknown = bindings::DT_UNKNOWN,
+
+    /// Named pipe (first-in, first-out) type.
+    Fifo = bindings::DT_FIFO,
+
+    /// Character device type.
+    Chr = bindings::DT_CHR,
+
+    /// Directory type.
+    Dir = bindings::DT_DIR,
+
+    /// Block device type.
+    Blk = bindings::DT_BLK,
+
+    /// Regular file type.
+    Reg = bindings::DT_REG,
+
+    /// Symbolic link type.
+    Lnk = bindings::DT_LNK,
+
+    /// Named unix-domain socket type.
+    Sock = bindings::DT_SOCK,
+
+    /// White-out type.
+    Wht = bindings::DT_WHT,
+}
+
+impl From<inode::Type> for DirEntryType {
+    fn from(value: inode::Type) -> Self {
+        match value {
+            inode::Type::Dir => DirEntryType::Dir,
+        }
+    }
+}
+
+impl TryFrom<u32> for DirEntryType {
+    type Error = crate::error::Error;
+
+    fn try_from(v: u32) -> Result<Self> {
+        match v {
+            v if v == Self::Unknown as u32 => Ok(Self::Unknown),
+            v if v == Self::Fifo as u32 => Ok(Self::Fifo),
+            v if v == Self::Chr as u32 => Ok(Self::Chr),
+            v if v == Self::Dir as u32 => Ok(Self::Dir),
+            v if v == Self::Blk as u32 => Ok(Self::Blk),
+            v if v == Self::Reg as u32 => Ok(Self::Reg),
+            v if v == Self::Lnk as u32 => Ok(Self::Lnk),
+            v if v == Self::Sock as u32 => Ok(Self::Sock),
+            v if v == Self::Wht as u32 => Ok(Self::Wht),
+            _ => Err(EDOM),
+        }
+    }
+}
+
+/// Directory entry emitter.
+///
+/// This is used in [`Operations::read_dir`] implementations to report the directory entry.
+#[repr(transparent)]
+pub struct DirEmitter(bindings::dir_context);
+
+impl DirEmitter {
+    /// Returns the current position of the emitter.
+    pub fn pos(&self) -> Offset {
+        self.0.pos
+    }
+
+    /// Emits a directory entry.
+    ///
+    /// `pos_inc` is the number with which to increment the current position on success.
+    ///
+    /// `name` is the name of the entry.
+    ///
+    /// `ino` is the inode number of the entry.
+    ///
+    /// `etype` is the type of the entry.
+    ///
+    /// Returns `false` when the entry could not be emitted, possibly because the user-provided
+    /// buffer is full.
+    pub fn emit(&mut self, pos_inc: Offset, name: &[u8], ino: Ino, etype: DirEntryType) -> bool {
+        let Ok(name_len) = i32::try_from(name.len()) else {
+            return false;
+        };
+
+        let Some(actor) = self.0.actor else {
+            return false;
+        };
+
+        let Some(new_pos) = self.0.pos.checked_add(pos_inc) else {
+            return false;
+        };
+
+        // SAFETY: `name` is valid at least for the duration of the `actor` call.
+        let ret = unsafe {
+            actor(
+                &mut self.0,
+                name.as_ptr().cast::<i8>(),
+                name_len,
+                self.0.pos,
+                ino,
+                etype as _,
+            )
+        };
+        if ret {
+            self.0.pos = new_pos;
+        }
+        ret
+    }
+}
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index 11df493314ea..d84d8d2f7076 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -6,9 +6,9 @@
 //!
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
-use super::{sb::SuperBlock, FileSystem, Offset, UnspecifiedFS};
+use super::{file, sb::SuperBlock, FileSystem, Offset, UnspecifiedFS};
 use crate::error::Result;
-use crate::types::{ARef, AlwaysRefCounted, Opaque};
+use crate::types::{ARef, AlwaysRefCounted, Lockable, Opaque};
 use crate::{bindings, block, time::Timespec};
 use core::mem::ManuallyDrop;
 use core::{marker::PhantomData, ptr};
@@ -78,6 +78,22 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
     }
 }
 
+/// Indicates that the an inode's rw semapahore is locked in read (shared) mode.
+pub struct ReadSem;
+
+unsafe impl<T: FileSystem + ?Sized> Lockable<ReadSem> for INode<T> {
+    fn raw_lock(&self) {
+        // SAFETY: Since there's a reference to the inode, it must be valid.
+        unsafe { bindings::inode_lock_shared(self.0.get()) };
+    }
+
+    unsafe fn unlock(&self) {
+        // SAFETY: Since there's a reference to the inode, it must be valid. Additionally, the
+        // safety requirements of this functino require that the inode be locked in read mode.
+        unsafe { bindings::inode_unlock_shared(self.0.get()) };
+    }
+}
+
 /// An inode that is locked and hasn't been initialised yet.
 ///
 /// # Invariants
@@ -95,9 +111,6 @@ pub fn init(mut self, params: Params) -> Result<ARef<INode<T>>> {
         let inode = unsafe { self.0.as_mut() };
         let mode = match params.typ {
             Type::Dir => {
-                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
-                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
-
                 // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
                 inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
 
@@ -126,6 +139,14 @@ pub fn init(mut self, params: Params) -> Result<ARef<INode<T>>> {
         // being called with the `ManuallyDrop` instance created above.
         Ok(unsafe { ARef::from_raw(manual.0.cast::<INode<T>>()) })
     }
+
+    /// Sets the file operations on this new inode.
+    pub fn set_fops(&mut self, fops: file::Ops<T>) -> &mut Self {
+        // SAFETY: By the type invariants, it's ok to modify the inode.
+        let inode = unsafe { self.0.as_mut() };
+        inode.__bindgen_anon_3.i_fop = fops.0;
+        self
+    }
 }
 
 impl<T: FileSystem + ?Sized> Drop for New<T> {
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index d32c4645ebe8..9da01346d8f8 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -2,9 +2,9 @@
 
 //! Rust read-only file system sample.
 
-use kernel::fs::{dentry, inode, sb::SuperBlock};
+use kernel::fs::{dentry, file, file::File, inode, inode::INode, sb::SuperBlock};
 use kernel::prelude::*;
-use kernel::{c_str, fs, time::UNIX_EPOCH, types::Either};
+use kernel::{c_str, fs, time::UNIX_EPOCH, types::Either, types::Locked};
 
 kernel::module_fs! {
     type: RoFs,
@@ -14,6 +14,32 @@
     license: "GPL",
 }
 
+struct Entry {
+    name: &'static [u8],
+    ino: u64,
+    etype: inode::Type,
+}
+
+const ENTRIES: [Entry; 3] = [
+    Entry {
+        name: b".",
+        ino: 1,
+        etype: inode::Type::Dir,
+    },
+    Entry {
+        name: b"..",
+        ino: 1,
+        etype: inode::Type::Dir,
+    },
+    Entry {
+        name: b"subdir",
+        ino: 2,
+        etype: inode::Type::Dir,
+    },
+];
+
+const DIR_FOPS: file::Ops<RoFs> = file::Ops::new::<RoFs>();
+
 struct RoFs;
 impl fs::FileSystem for RoFs {
     const NAME: &'static CStr = c_str!("rust_rofs");
@@ -26,19 +52,50 @@ fn fill_super(sb: &mut SuperBlock<Self>) -> Result {
     fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
         let inode = match sb.get_or_create_inode(1)? {
             Either::Left(existing) => existing,
-            Either::Right(new) => new.init(inode::Params {
-                typ: inode::Type::Dir,
-                mode: 0o555,
-                size: 1,
-                blocks: 1,
-                nlink: 2,
-                uid: 0,
-                gid: 0,
-                atime: UNIX_EPOCH,
-                ctime: UNIX_EPOCH,
-                mtime: UNIX_EPOCH,
-            })?,
+            Either::Right(mut new) => {
+                new.set_fops(DIR_FOPS);
+                new.init(inode::Params {
+                    typ: inode::Type::Dir,
+                    mode: 0o555,
+                    size: ENTRIES.len().try_into()?,
+                    blocks: 1,
+                    nlink: 2,
+                    uid: 0,
+                    gid: 0,
+                    atime: UNIX_EPOCH,
+                    ctime: UNIX_EPOCH,
+                    mtime: UNIX_EPOCH,
+                })?
+            }
         };
         dentry::Root::try_new(inode)
     }
 }
+
+#[vtable]
+impl file::Operations for RoFs {
+    type FileSystem = Self;
+
+    fn read_dir(
+        _file: &File<Self>,
+        inode: &Locked<&INode<Self>, inode::ReadSem>,
+        emitter: &mut file::DirEmitter,
+    ) -> Result {
+        if inode.ino() != 1 {
+            return Ok(());
+        }
+
+        let pos = emitter.pos();
+        if pos >= ENTRIES.len().try_into()? {
+            return Ok(());
+        }
+
+        for e in ENTRIES.iter().skip(pos.try_into()?) {
+            if !emitter.emit(1, e.name, e.ino, e.etype.into()) {
+                break;
+            }
+        }
+
+        Ok(())
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 12/30] rust: fs: introduce `file::Operations::seek`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (10 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 11/30] rust: fs: introduce `file::Operations::read_dir` Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 13/30] rust: fs: introduce `file::Operations::read` Wedson Almeida Filho
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

This allows file systems to customise their behaviour when callers want
to seek to a different file location, which may also be used when
reading directory entries.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs/file.rs    | 73 ++++++++++++++++++++++++++++++++++++++-
 samples/rust/rust_rofs.rs |  6 +++-
 2 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index 6d61723f440d..77eb6d230568 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -270,12 +270,65 @@ fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
     }
 }
 
+/// Indicates how to interpret the `offset` argument in [`Operations::seek`].
+#[repr(u32)]
+pub enum Whence {
+    /// `offset` bytes from the start of the file.
+    Set = bindings::SEEK_SET,
+
+    /// `offset` bytes from the end of the file.
+    End = bindings::SEEK_END,
+
+    /// `offset` bytes from the current location.
+    Cur = bindings::SEEK_CUR,
+
+    /// The next location greater than or equal to `offset` that contains data.
+    Data = bindings::SEEK_DATA,
+
+    /// The next location greater than or equal to `offset` that contains a hole.
+    Hole = bindings::SEEK_HOLE,
+}
+
+impl TryFrom<i32> for Whence {
+    type Error = crate::error::Error;
+
+    fn try_from(v: i32) -> Result<Self> {
+        match v {
+            v if v == Self::Set as i32 => Ok(Self::Set),
+            v if v == Self::End as i32 => Ok(Self::End),
+            v if v == Self::Cur as i32 => Ok(Self::Cur),
+            v if v == Self::Data as i32 => Ok(Self::Data),
+            v if v == Self::Hole as i32 => Ok(Self::Hole),
+            _ => Err(EDOM),
+        }
+    }
+}
+
+/// Generic implementation of [`Operations::seek`].
+pub fn generic_seek(
+    file: &File<impl FileSystem + ?Sized>,
+    offset: Offset,
+    whence: Whence,
+) -> Result<Offset> {
+    let n = unsafe { bindings::generic_file_llseek(file.0.get(), offset, whence as i32) };
+    if n < 0 {
+        Err(Error::from_errno(n.try_into()?))
+    } else {
+        Ok(n)
+    }
+}
+
 /// Operations implemented by files.
 #[vtable]
 pub trait Operations {
     /// File system that these operations are compatible with.
     type FileSystem: FileSystem + ?Sized;
 
+    /// Seeks the file to the given offset.
+    fn seek(_file: &File<Self::FileSystem>, _offset: Offset, _whence: Whence) -> Result<Offset> {
+        Err(EINVAL)
+    }
+
     /// Reads directory entries from directory files.
     ///
     /// [`DirEmitter::pos`] holds the current position of the directory reader.
@@ -298,7 +351,11 @@ pub const fn new<U: Operations<FileSystem = T> + ?Sized>() -> Self {
         impl<T: Operations + ?Sized> Table<T> {
             const TABLE: bindings::file_operations = bindings::file_operations {
                 owner: ptr::null_mut(),
-                llseek: None,
+                llseek: if T::HAS_SEEK {
+                    Some(Self::seek_callback)
+                } else {
+                    None
+                },
                 read: None,
                 write: None,
                 read_iter: None,
@@ -336,6 +393,20 @@ impl<T: Operations + ?Sized> Table<T> {
                 uring_cmd_iopoll: None,
             };
 
+            unsafe extern "C" fn seek_callback(
+                file_ptr: *mut bindings::file,
+                offset: bindings::loff_t,
+                whence: i32,
+            ) -> bindings::loff_t {
+                from_result(|| {
+                    // SAFETY: The C API guarantees that `file` is valid for the duration of the
+                    // callback. Since this callback is specifically for filesystem T, we know `T`
+                    // is the right filesystem.
+                    let file = unsafe { File::from_raw(file_ptr) };
+                    T::seek(file, offset, whence.try_into()?)
+                })
+            }
+
             unsafe extern "C" fn read_dir_callback(
                 file_ptr: *mut bindings::file,
                 ctx_ptr: *mut bindings::dir_context,
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 9da01346d8f8..abec084360da 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -2,7 +2,7 @@
 
 //! Rust read-only file system sample.
 
-use kernel::fs::{dentry, file, file::File, inode, inode::INode, sb::SuperBlock};
+use kernel::fs::{dentry, file, file::File, inode, inode::INode, sb::SuperBlock, Offset};
 use kernel::prelude::*;
 use kernel::{c_str, fs, time::UNIX_EPOCH, types::Either, types::Locked};
 
@@ -76,6 +76,10 @@ fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
 impl file::Operations for RoFs {
     type FileSystem = Self;
 
+    fn seek(file: &File<Self>, offset: Offset, whence: file::Whence) -> Result<Offset> {
+        file::generic_seek(file, offset, whence)
+    }
+
     fn read_dir(
         _file: &File<Self>,
         inode: &Locked<&INode<Self>, inode::ReadSem>,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 13/30] rust: fs: introduce `file::Operations::read`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (11 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 12/30] rust: fs: introduce `file::Operations::seek` Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 14/30] rust: fs: add empty inode operations Wedson Almeida Filho
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

This allows file systems to customise their behaviour when callers want
to read from a file.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs/file.rs    | 35 ++++++++++++++++++++++++++++++++++-
 rust/kernel/user.rs       |  1 -
 samples/rust/rust_rofs.rs |  6 +++++-
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index 77eb6d230568..2ba456a1eee1 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -12,6 +12,7 @@
     bindings,
     error::{code::*, from_result, Error, Result},
     types::{ARef, AlwaysRefCounted, Locked, Opaque},
+    user,
 };
 use core::{marker::PhantomData, mem::ManuallyDrop, ptr};
 use macros::vtable;
@@ -324,6 +325,15 @@ pub trait Operations {
     /// File system that these operations are compatible with.
     type FileSystem: FileSystem + ?Sized;
 
+    /// Reads data from this file into the caller's buffer.
+    fn read(
+        _file: &File<Self::FileSystem>,
+        _buffer: &mut user::Writer,
+        _offset: &mut Offset,
+    ) -> Result<usize> {
+        Err(EINVAL)
+    }
+
     /// Seeks the file to the given offset.
     fn seek(_file: &File<Self::FileSystem>, _offset: Offset, _whence: Whence) -> Result<Offset> {
         Err(EINVAL)
@@ -356,7 +366,11 @@ impl<T: Operations + ?Sized> Table<T> {
                 } else {
                     None
                 },
-                read: None,
+                read: if T::HAS_READ {
+                    Some(Self::read_callback)
+                } else {
+                    None
+                },
                 write: None,
                 read_iter: None,
                 write_iter: None,
@@ -407,6 +421,25 @@ impl<T: Operations + ?Sized> Table<T> {
                 })
             }
 
+            unsafe extern "C" fn read_callback(
+                file_ptr: *mut bindings::file,
+                ptr: *mut core::ffi::c_char,
+                len: usize,
+                offset: *mut bindings::loff_t,
+            ) -> isize {
+                from_result(|| {
+                    // SAFETY: The C API guarantees that `file` is valid for the duration of the
+                    // callback. Since this callback is specifically for filesystem T, we know `T`
+                    // is the right filesystem.
+                    let file = unsafe { File::from_raw(file_ptr) };
+                    let mut writer = user::Writer::new(ptr, len);
+
+                    // SAFETY: The C API guarantees that `offset` is valid for read and write.
+                    let read = T::read(file, &mut writer, unsafe { &mut *offset })?;
+                    Ok(isize::try_from(read)?)
+                })
+            }
+
             unsafe extern "C" fn read_dir_callback(
                 file_ptr: *mut bindings::file,
                 ctx_ptr: *mut bindings::dir_context,
diff --git a/rust/kernel/user.rs b/rust/kernel/user.rs
index 35a673ebcd58..20fb887f4640 100644
--- a/rust/kernel/user.rs
+++ b/rust/kernel/user.rs
@@ -11,7 +11,6 @@ pub struct Writer {
 }
 
 impl Writer {
-    #[allow(dead_code)]
     pub(crate) fn new(ptr: *mut i8, len: usize) -> Self {
         Self {
             ptr: ptr.cast::<u8>(),
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index abec084360da..f4be5908369c 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -4,7 +4,7 @@
 
 use kernel::fs::{dentry, file, file::File, inode, inode::INode, sb::SuperBlock, Offset};
 use kernel::prelude::*;
-use kernel::{c_str, fs, time::UNIX_EPOCH, types::Either, types::Locked};
+use kernel::{c_str, fs, time::UNIX_EPOCH, types::Either, types::Locked, user};
 
 kernel::module_fs! {
     type: RoFs,
@@ -80,6 +80,10 @@ fn seek(file: &File<Self>, offset: Offset, whence: file::Whence) -> Result<Offse
         file::generic_seek(file, offset, whence)
     }
 
+    fn read(_: &File<Self>, _: &mut user::Writer, _: &mut Offset) -> Result<usize> {
+        Err(EISDIR)
+    }
+
     fn read_dir(
         _file: &File<Self>,
         inode: &Locked<&INode<Self>, inode::ReadSem>,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 14/30] rust: fs: add empty inode operations
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (12 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 13/30] rust: fs: introduce `file::Operations::read` Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 15/30] rust: fs: introduce `inode::Operations::lookup` Wedson Almeida Filho
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

This is in preparation for allowing modules to implement different inode
callbacks, which will be introduced in subsequent patches.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs/inode.rs | 48 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index d84d8d2f7076..3d65b917af0e 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -12,10 +12,18 @@
 use crate::{bindings, block, time::Timespec};
 use core::mem::ManuallyDrop;
 use core::{marker::PhantomData, ptr};
+use macros::vtable;
 
 /// The number of an inode.
 pub type Ino = u64;
 
+/// Operations implemented by inodes.
+#[vtable]
+pub trait Operations {
+    /// File system that these operations are compatible with.
+    type FileSystem: FileSystem + ?Sized;
+}
+
 /// A node (inode) in the file index.
 ///
 /// Wraps the kernel's `struct inode`.
@@ -203,3 +211,43 @@ pub struct Params {
     /// Last access time.
     pub atime: Timespec,
 }
+
+/// Represents inode operations.
+pub struct Ops<T: FileSystem + ?Sized>(*const bindings::inode_operations, PhantomData<T>);
+
+impl<T: FileSystem + ?Sized> Ops<T> {
+    /// Creates the inode operations from a type that implements the [`Operations`] trait.
+    pub const fn new<U: Operations<FileSystem = T> + ?Sized>() -> Self {
+        struct Table<T: Operations + ?Sized>(PhantomData<T>);
+        impl<T: Operations + ?Sized> Table<T> {
+            const TABLE: bindings::inode_operations = bindings::inode_operations {
+                lookup: None,
+                get_link: None,
+                permission: None,
+                get_inode_acl: None,
+                readlink: None,
+                create: None,
+                link: None,
+                unlink: None,
+                symlink: None,
+                mkdir: None,
+                rmdir: None,
+                mknod: None,
+                rename: None,
+                setattr: None,
+                getattr: None,
+                listxattr: None,
+                fiemap: None,
+                update_time: None,
+                atomic_open: None,
+                tmpfile: None,
+                get_acl: None,
+                set_acl: None,
+                fileattr_set: None,
+                fileattr_get: None,
+                get_offset_ctx: None,
+            };
+        }
+        Self(&Table::<U>::TABLE, PhantomData)
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 15/30] rust: fs: introduce `inode::Operations::lookup`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (13 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 14/30] rust: fs: add empty inode operations Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 16/30] rust: folio: introduce basic support for folios Wedson Almeida Filho
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to create inodes that are children of a
directory inode when they're looked up by name.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/error.rs      |  1 -
 rust/kernel/fs/dentry.rs  |  1 -
 rust/kernel/fs/inode.rs   | 58 ++++++++++++++++++++++++-----
 samples/rust/rust_rofs.rs | 77 +++++++++++++++++++++++++++++----------
 4 files changed, 105 insertions(+), 32 deletions(-)

diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index bb13bd4a7fa6..15628d2fa3b2 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -129,7 +129,6 @@ pub fn to_errno(self) -> core::ffi::c_int {
     }
 
     /// Returns the error encoded as a pointer.
-    #[allow(dead_code)]
     pub(crate) fn to_ptr<T>(self) -> *mut T {
         // SAFETY: `self.0` is a valid error due to its invariant.
         unsafe { bindings::ERR_PTR(self.0.into()) as *mut _ }
diff --git a/rust/kernel/fs/dentry.rs b/rust/kernel/fs/dentry.rs
index 6a36a48cd28b..c93debb70ea3 100644
--- a/rust/kernel/fs/dentry.rs
+++ b/rust/kernel/fs/dentry.rs
@@ -43,7 +43,6 @@ impl<T: FileSystem + ?Sized> DEntry<T> {
     ///
     /// * `ptr` must be valid for at least the lifetime of the returned reference.
     /// * `ptr` has the correct file system type, or `T` is [`super::UnspecifiedFS`].
-    #[allow(dead_code)]
     pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::dentry) -> &'a Self {
         // SAFETY: The safety requirements guarantee that the reference is and remains valid.
         unsafe { &*ptr.cast::<Self>() }
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index 3d65b917af0e..c314d036c87e 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -6,9 +6,9 @@
 //!
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
-use super::{file, sb::SuperBlock, FileSystem, Offset, UnspecifiedFS};
-use crate::error::Result;
-use crate::types::{ARef, AlwaysRefCounted, Lockable, Opaque};
+use super::{dentry, dentry::DEntry, file, sb::SuperBlock, FileSystem, Offset, UnspecifiedFS};
+use crate::error::{code::*, Result};
+use crate::types::{ARef, AlwaysRefCounted, Lockable, Locked, Opaque};
 use crate::{bindings, block, time::Timespec};
 use core::mem::ManuallyDrop;
 use core::{marker::PhantomData, ptr};
@@ -22,6 +22,14 @@
 pub trait Operations {
     /// File system that these operations are compatible with.
     type FileSystem: FileSystem + ?Sized;
+
+    /// Returns the inode corresponding to the directory entry with the given name.
+    fn lookup(
+        _parent: &Locked<&INode<Self::FileSystem>, ReadSem>,
+        _dentry: dentry::Unhashed<'_, Self::FileSystem>,
+    ) -> Result<Option<ARef<DEntry<Self::FileSystem>>>> {
+        Err(ENOTSUPP)
+    }
 }
 
 /// A node (inode) in the file index.
@@ -118,12 +126,7 @@ pub fn init(mut self, params: Params) -> Result<ARef<INode<T>>> {
         // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
         let inode = unsafe { self.0.as_mut() };
         let mode = match params.typ {
-            Type::Dir => {
-                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
-                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
-
-                bindings::S_IFDIR
-            }
+            Type::Dir => bindings::S_IFDIR,
         };
 
         inode.i_mode = (params.mode & 0o777) | u16::try_from(mode)?;
@@ -148,6 +151,14 @@ pub fn init(mut self, params: Params) -> Result<ARef<INode<T>>> {
         Ok(unsafe { ARef::from_raw(manual.0.cast::<INode<T>>()) })
     }
 
+    /// Sets the inode operations on this new inode.
+    pub fn set_iops(&mut self, iops: Ops<T>) -> &mut Self {
+        // SAFETY: By the type invariants, it's ok to modify the inode.
+        let inode = unsafe { self.0.as_mut() };
+        inode.i_op = iops.0;
+        self
+    }
+
     /// Sets the file operations on this new inode.
     pub fn set_fops(&mut self, fops: file::Ops<T>) -> &mut Self {
         // SAFETY: By the type invariants, it's ok to modify the inode.
@@ -221,7 +232,11 @@ pub const fn new<U: Operations<FileSystem = T> + ?Sized>() -> Self {
         struct Table<T: Operations + ?Sized>(PhantomData<T>);
         impl<T: Operations + ?Sized> Table<T> {
             const TABLE: bindings::inode_operations = bindings::inode_operations {
-                lookup: None,
+                lookup: if T::HAS_LOOKUP {
+                    Some(Self::lookup_callback)
+                } else {
+                    None
+                },
                 get_link: None,
                 permission: None,
                 get_inode_acl: None,
@@ -247,6 +262,29 @@ impl<T: Operations + ?Sized> Table<T> {
                 fileattr_get: None,
                 get_offset_ctx: None,
             };
+
+            extern "C" fn lookup_callback(
+                parent_ptr: *mut bindings::inode,
+                dentry_ptr: *mut bindings::dentry,
+                _flags: u32,
+            ) -> *mut bindings::dentry {
+                // SAFETY: The C API guarantees that `parent_ptr` is a valid inode.
+                let parent = unsafe { INode::from_raw(parent_ptr) };
+
+                // SAFETY: The C API guarantees that `dentry_ptr` is a valid dentry.
+                let dentry = unsafe { DEntry::from_raw(dentry_ptr) };
+
+                // SAFETY: The C API guarantees that the inode's rw semaphore is locked at least in
+                // read mode. It does not expect callees to unlock it, so we make the locked object
+                // manually dropped to avoid unlocking it.
+                let locked = ManuallyDrop::new(unsafe { Locked::new(parent) });
+
+                match T::lookup(&locked, dentry::Unhashed(dentry)) {
+                    Err(e) => e.to_ptr(),
+                    Ok(None) => ptr::null_mut(),
+                    Ok(Some(ret)) => ManuallyDrop::new(ret).0.get(),
+                }
+            }
         }
         Self(&Table::<U>::TABLE, PhantomData)
     }
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index f4be5908369c..2a87e524e0e1 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -2,9 +2,11 @@
 
 //! Rust read-only file system sample.
 
-use kernel::fs::{dentry, file, file::File, inode, inode::INode, sb::SuperBlock, Offset};
+use kernel::fs::{
+    dentry, dentry::DEntry, file, file::File, inode, inode::INode, sb::SuperBlock, Offset,
+};
 use kernel::prelude::*;
-use kernel::{c_str, fs, time::UNIX_EPOCH, types::Either, types::Locked, user};
+use kernel::{c_str, fs, time::UNIX_EPOCH, types::ARef, types::Either, types::Locked, user};
 
 kernel::module_fs! {
     type: RoFs,
@@ -39,8 +41,36 @@ struct Entry {
 ];
 
 const DIR_FOPS: file::Ops<RoFs> = file::Ops::new::<RoFs>();
+const DIR_IOPS: inode::Ops<RoFs> = inode::Ops::new::<RoFs>();
 
 struct RoFs;
+
+impl RoFs {
+    fn iget(sb: &SuperBlock<Self>, e: &'static Entry) -> Result<ARef<INode<Self>>> {
+        let mut new = match sb.get_or_create_inode(e.ino)? {
+            Either::Left(existing) => return Ok(existing),
+            Either::Right(new) => new,
+        };
+
+        match e.etype {
+            inode::Type::Dir => new.set_iops(DIR_IOPS).set_fops(DIR_FOPS),
+        };
+
+        new.init(inode::Params {
+            typ: e.etype,
+            mode: 0o555,
+            size: ENTRIES.len().try_into()?,
+            blocks: 1,
+            nlink: 2,
+            uid: 0,
+            gid: 0,
+            atime: UNIX_EPOCH,
+            ctime: UNIX_EPOCH,
+            mtime: UNIX_EPOCH,
+        })
+    }
+}
+
 impl fs::FileSystem for RoFs {
     const NAME: &'static CStr = c_str!("rust_rofs");
 
@@ -50,28 +80,35 @@ fn fill_super(sb: &mut SuperBlock<Self>) -> Result {
     }
 
     fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
-        let inode = match sb.get_or_create_inode(1)? {
-            Either::Left(existing) => existing,
-            Either::Right(mut new) => {
-                new.set_fops(DIR_FOPS);
-                new.init(inode::Params {
-                    typ: inode::Type::Dir,
-                    mode: 0o555,
-                    size: ENTRIES.len().try_into()?,
-                    blocks: 1,
-                    nlink: 2,
-                    uid: 0,
-                    gid: 0,
-                    atime: UNIX_EPOCH,
-                    ctime: UNIX_EPOCH,
-                    mtime: UNIX_EPOCH,
-                })?
-            }
-        };
+        let inode = Self::iget(sb, &ENTRIES[0])?;
         dentry::Root::try_new(inode)
     }
 }
 
+#[vtable]
+impl inode::Operations for RoFs {
+    type FileSystem = Self;
+
+    fn lookup(
+        parent: &Locked<&INode<Self>, inode::ReadSem>,
+        dentry: dentry::Unhashed<'_, Self>,
+    ) -> Result<Option<ARef<DEntry<Self>>>> {
+        if parent.ino() != 1 {
+            return dentry.splice_alias(None);
+        }
+
+        let name = dentry.name();
+        for e in &ENTRIES {
+            if name == e.name {
+                let inode = Self::iget(parent.super_block(), e)?;
+                return dentry.splice_alias(Some(inode));
+            }
+        }
+
+        dentry.splice_alias(None)
+    }
+}
+
 #[vtable]
 impl file::Operations for RoFs {
     type FileSystem = Self;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 16/30] rust: folio: introduce basic support for folios
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (14 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 15/30] rust: fs: introduce `inode::Operations::lookup` Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 17/30] rust: fs: add empty address space operations Wedson Almeida Filho
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to handle ref-counted folios.

Provide the minimum needed to implement `read_folio` (part of `struct
address_space_operations`) in read-only file systems and to read
uncached blocks.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |   3 +
 rust/helpers.c                  |  94 ++++++++++
 rust/kernel/folio.rs            | 306 ++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs              |   1 +
 4 files changed, 404 insertions(+)
 create mode 100644 rust/kernel/folio.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index dabb5a787e0d..fd22b1eafb1d 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -15,6 +15,7 @@
 #include <linux/fs_context.h>
 #include <linux/jiffies.h>
 #include <linux/mdio.h>
+#include <linux/pagemap.h>
 #include <linux/phy.h>
 #include <linux/refcount.h>
 #include <linux/sched.h>
@@ -37,3 +38,5 @@ const slab_flags_t RUST_CONST_HELPER_SLAB_ACCOUNT = SLAB_ACCOUNT;
 const unsigned long RUST_CONST_HELPER_SB_RDONLY = SB_RDONLY;
 
 const loff_t RUST_CONST_HELPER_MAX_LFS_FILESIZE = MAX_LFS_FILESIZE;
+
+const size_t RUST_CONST_HELPER_PAGE_SIZE = PAGE_SIZE;
diff --git a/rust/helpers.c b/rust/helpers.c
index deb2d21f3096..acff58e6caff 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -23,10 +23,14 @@
 #include <kunit/test-bug.h>
 #include <linux/bug.h>
 #include <linux/build_bug.h>
+#include <linux/cacheflush.h>
 #include <linux/err.h>
 #include <linux/errname.h>
 #include <linux/fs.h>
+#include <linux/highmem.h>
+#include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/pagemap.h>
 #include <linux/refcount.h>
 #include <linux/sched/signal.h>
 #include <linux/spinlock.h>
@@ -164,6 +168,96 @@ struct file *rust_helper_get_file(struct file *f)
 }
 EXPORT_SYMBOL_GPL(rust_helper_get_file);
 
+void *rust_helper_kmap(struct page *page)
+{
+	return kmap(page);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kmap);
+
+void rust_helper_kunmap(struct page *page)
+{
+	kunmap(page);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kunmap);
+
+void rust_helper_folio_get(struct folio *folio)
+{
+	folio_get(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_get);
+
+void rust_helper_folio_put(struct folio *folio)
+{
+	folio_put(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_put);
+
+struct folio *rust_helper_folio_alloc(gfp_t gfp, unsigned int order)
+{
+	return folio_alloc(gfp, order);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_alloc);
+
+struct page *rust_helper_folio_page(struct folio *folio, size_t n)
+{
+	return folio_page(folio, n);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_page);
+
+loff_t rust_helper_folio_pos(struct folio *folio)
+{
+	return folio_pos(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_pos);
+
+size_t rust_helper_folio_size(struct folio *folio)
+{
+	return folio_size(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_size);
+
+void rust_helper_folio_lock(struct folio *folio)
+{
+	folio_lock(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_lock);
+
+bool rust_helper_folio_test_uptodate(struct folio *folio)
+{
+	return folio_test_uptodate(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_test_uptodate);
+
+void rust_helper_folio_mark_uptodate(struct folio *folio)
+{
+	folio_mark_uptodate(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_mark_uptodate);
+
+bool rust_helper_folio_test_highmem(struct folio *folio)
+{
+	return folio_test_highmem(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_test_highmem);
+
+void rust_helper_flush_dcache_folio(struct folio *folio)
+{
+	flush_dcache_folio(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_flush_dcache_folio);
+
+void *rust_helper_kmap_local_folio(struct folio *folio, size_t offset)
+{
+	return kmap_local_folio(folio, offset);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kmap_local_folio);
+
+void rust_helper_kunmap_local(const void *vaddr)
+{
+	kunmap_local(vaddr);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);
+
 void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
 {
 	i_uid_write(inode, uid);
diff --git a/rust/kernel/folio.rs b/rust/kernel/folio.rs
new file mode 100644
index 000000000000..20f51db920e4
--- /dev/null
+++ b/rust/kernel/folio.rs
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Groups of contiguous pages, folios.
+//!
+//! C headers: [`include/linux/mm.h`](srctree/include/linux/mm.h)
+
+use crate::error::{code::*, Result};
+use crate::fs::{self, inode::INode, FileSystem};
+use crate::types::{self, ARef, AlwaysRefCounted, Locked, Opaque, ScopeGuard};
+use core::{cmp::min, marker::PhantomData, ops::Deref, ptr};
+
+/// The type of a [`Folio`] is unspecified.
+pub struct Unspecified;
+
+/// The [`Folio`] instance is a page-cache one.
+pub struct PageCache<T: FileSystem + ?Sized>(PhantomData<T>);
+
+/// A folio.
+///
+/// The `S` type parameter specifies the type of folio.
+///
+/// Wraps the kernel's `struct folio`.
+///
+/// # Invariants
+///
+/// Instances of this type are always ref-counted, that is, a call to `folio_get` ensures that the
+/// allocation remains valid at least until the matching call to `folio_put`.
+#[repr(transparent)]
+pub struct Folio<S = Unspecified>(pub(crate) Opaque<bindings::folio>, PhantomData<S>);
+
+// SAFETY: The type invariants guarantee that `Folio` is always ref-counted.
+unsafe impl<S> AlwaysRefCounted for Folio<S> {
+    fn inc_ref(&self) {
+        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+        unsafe { bindings::folio_get(self.0.get()) };
+    }
+
+    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+        unsafe { bindings::folio_put(obj.as_ref().0.get()) }
+    }
+}
+
+impl<S> Folio<S> {
+    /// Creates a new folio reference from the given raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that:
+    /// * `ptr` is valid and remains so for the lifetime of the returned reference.
+    /// * The folio has the right state.
+    #[allow(dead_code)]
+    pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::folio) -> &'a Self {
+        // SAFETY: The safety requirements guarantee that the cast below is ok.
+        unsafe { &*ptr.cast::<Self>() }
+    }
+
+    /// Returns the byte position of this folio in its file.
+    pub fn pos(&self) -> fs::Offset {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_pos(self.0.get()) }
+    }
+
+    /// Returns the byte size of this folio.
+    pub fn size(&self) -> usize {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_size(self.0.get()) }
+    }
+
+    /// Flushes the data cache for the pages that make up the folio.
+    pub fn flush_dcache(&self) {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::flush_dcache_folio(self.0.get()) }
+    }
+
+    /// Returns true if the folio is in highmem.
+    pub fn test_highmem(&self) -> bool {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_test_highmem(self.0.get()) }
+    }
+
+    /// Returns whether the folio is up to date.
+    pub fn test_uptodate(&self) -> bool {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_test_uptodate(self.0.get()) }
+    }
+
+    /// Consumes the folio and returns an owned mapped reference.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that the folio is not concurrently mapped for write.
+    pub unsafe fn map_owned(folio: ARef<Self>, offset: usize) -> Result<Mapped<'static, S>> {
+        // SAFETY: The safety requirements of this function satisfy those of `map`.
+        let guard = unsafe { folio.map(offset)? };
+        let to_unmap = guard.page;
+        let data = &guard[0] as *const u8;
+        let data_len = guard.len();
+        core::mem::forget(guard);
+        Ok(Mapped {
+            _folio: folio,
+            to_unmap,
+            data,
+            data_len,
+            _p: PhantomData,
+        })
+    }
+
+    /// Maps the contents of a folio page into a slice.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that the folio is not concurrently mapped for write.
+    pub unsafe fn map(&self, offset: usize) -> Result<MapGuard<'_>> {
+        if offset >= self.size() {
+            return Err(EDOM);
+        }
+
+        let page_index = offset / bindings::PAGE_SIZE;
+        let page_offset = offset % bindings::PAGE_SIZE;
+
+        // SAFETY: We just checked that the index is within bounds of the folio.
+        let page = unsafe { bindings::folio_page(self.0.get(), page_index) };
+
+        // SAFETY: `page` is valid because it was returned by `folio_page` above.
+        let ptr = unsafe { bindings::kmap(page) };
+
+        let size = if self.test_highmem() {
+            bindings::PAGE_SIZE
+        } else {
+            self.size()
+        };
+
+        // SAFETY: We just mapped `ptr`, so it's valid for read.
+        let data = unsafe {
+            core::slice::from_raw_parts(ptr.cast::<u8>().add(page_offset), size - page_offset)
+        };
+        Ok(MapGuard { data, page })
+    }
+}
+
+impl<T: FileSystem + ?Sized> Folio<PageCache<T>> {
+    /// Returns the inode for which this folio holds data.
+    pub fn inode(&self) -> &INode<T> {
+        // SAFETY: The type parameter guarantees that this is a page-cache folio, so host is
+        // populated.
+        unsafe {
+            INode::from_raw((*(*self.0.get()).__bindgen_anon_1.__bindgen_anon_1.mapping).host)
+        }
+    }
+}
+
+/// An owned mapped folio.
+///
+/// That is, a mapped version of a folio that holds a reference to it.
+///
+/// The lifetime is used to tie the mapping to other lifetime, for example, the lifetime of a lock
+/// guard. This allows the mapping to exist only while a lock is held.
+///
+/// # Invariants
+///
+/// `to_unmap` is a mapped page of the folio. The byte range starting at `data` and extending for
+/// `data_len` bytes is within the mapped page.
+pub struct Mapped<'a, S = Unspecified> {
+    _folio: ARef<Folio<S>>,
+    to_unmap: *mut bindings::page,
+    data: *const u8,
+    data_len: usize,
+    _p: PhantomData<&'a ()>,
+}
+
+impl<S> Mapped<'_, S> {
+    /// Limits the length of the mapped region.
+    pub fn cap_len(&mut self, new_len: usize) {
+        if new_len < self.data_len {
+            self.data_len = new_len;
+        }
+    }
+}
+
+impl<S> Deref for Mapped<'_, S> {
+    type Target = [u8];
+
+    fn deref(&self) -> &Self::Target {
+        // SAFETY: By the type invariant, we know that `data` and `data_len` form a valid slice.
+        unsafe { core::slice::from_raw_parts(self.data, self.data_len) }
+    }
+}
+
+impl<S> Drop for Mapped<'_, S> {
+    fn drop(&mut self) {
+        // SAFETY: By the type invariant, we know that `to_unmap` is mapped.
+        unsafe { bindings::kunmap(self.to_unmap) };
+    }
+}
+
+/// A mapped [`Folio`].
+pub struct MapGuard<'a> {
+    data: &'a [u8],
+    page: *mut bindings::page,
+}
+
+impl Deref for MapGuard<'_> {
+    type Target = [u8];
+
+    fn deref(&self) -> &Self::Target {
+        self.data
+    }
+}
+
+impl Drop for MapGuard<'_> {
+    fn drop(&mut self) {
+        // SAFETY: A `MapGuard` instance is only created when `kmap` succeeds, so it's ok to unmap
+        // it when the guard is dropped.
+        unsafe { bindings::kunmap(self.page) };
+    }
+}
+
+// SAFETY: `raw_lock` calls folio_lock, which actually locks the folio.
+unsafe impl<S> types::Lockable for Folio<S> {
+    fn raw_lock(&self) {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_lock(self.0.get()) }
+    }
+
+    unsafe fn unlock(&self) {
+        // SAFETY: The safety requirements guarantee that the folio is locked.
+        unsafe { bindings::folio_unlock(self.0.get()) }
+    }
+}
+
+impl<T: Deref<Target = Folio<S>>, S> Locked<T> {
+    /// Marks the folio as being up to date.
+    pub fn mark_uptodate(&mut self) {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_mark_uptodate(self.deref().0.get()) }
+    }
+
+    /// Runs `cb` with the mapped folio for `len` bytes starting at `offset`.
+    ///
+    /// It may require more than one callback if the folio needs to be mapped one page at a time
+    /// (for example, when in highmem).
+    fn for_each_page(
+        &mut self,
+        offset: usize,
+        len: usize,
+        mut cb: impl FnMut(&mut [u8]) -> Result,
+    ) -> Result {
+        let mut remaining = len;
+        let mut next_offset = offset;
+
+        if self.test_uptodate() {
+            return Err(EIO);
+        }
+
+        // Check that we don't overflow the folio.
+        let end = offset.checked_add(len).ok_or(EDOM)?;
+        if end > self.deref().size() {
+            return Err(EINVAL);
+        }
+
+        while remaining > 0 {
+            let map_size = if self.test_highmem() {
+                bindings::PAGE_SIZE - (next_offset & (bindings::PAGE_SIZE - 1))
+            } else {
+                self.size() - next_offset
+            };
+            let usable = min(remaining, map_size);
+            // SAFETY: The folio is valid because the shared reference implies a non-zero refcount;
+            // `next_offset` is also guaranteed be lesss than the folio size.
+            let ptr = unsafe { bindings::kmap_local_folio(self.deref().0.get(), next_offset) };
+
+            // SAFETY: `ptr` was just returned by the `kmap_local_folio` above.
+            let _guard = ScopeGuard::new(|| unsafe { bindings::kunmap_local(ptr) });
+
+            // SAFETY: `kmap_local_folio` maps whole page so we know it's mapped for at least
+            // `usable` bytes.
+            let s = unsafe { core::slice::from_raw_parts_mut(ptr.cast::<u8>(), usable) };
+            cb(s)?;
+
+            next_offset += usable;
+            remaining -= usable;
+        }
+
+        Ok(())
+    }
+
+    /// Writes the given slice into the folio.
+    pub fn write(&mut self, offset: usize, data: &[u8]) -> Result {
+        let mut remaining = data;
+
+        self.for_each_page(offset, data.len(), |s| {
+            s.copy_from_slice(&remaining[..s.len()]);
+            remaining = &remaining[s.len()..];
+            Ok(())
+        })
+    }
+
+    /// Writes zeroes into the folio.
+    pub fn zero_out(&mut self, offset: usize, len: usize) -> Result {
+        self.for_each_page(offset, len, |s| {
+            s.fill(0);
+            Ok(())
+        })
+    }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 81065d1bd679..445599d4bff6 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -30,6 +30,7 @@
 pub mod block;
 mod build_assert;
 pub mod error;
+pub mod folio;
 pub mod fs;
 pub mod init;
 pub mod ioctl;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 17/30] rust: fs: add empty address space operations
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (15 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 16/30] rust: folio: introduce basic support for folios Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:16 ` [RFC PATCH v2 18/30] rust: fs: introduce `address_space::Operations::read_folio` Wedson Almeida Filho
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

This is in preparation for allowing modules to implement different
address space callbacks, which will be introduced in subsequent patches.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs.rs               |  1 +
 rust/kernel/fs/address_space.rs | 58 +++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)
 create mode 100644 rust/kernel/fs/address_space.rs

diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 20fb6107eb4b..f1c1972fabcf 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -13,6 +13,7 @@
 use macros::{pin_data, pinned_drop};
 use sb::SuperBlock;
 
+pub mod address_space;
 pub mod dentry;
 pub mod file;
 pub mod inode;
diff --git a/rust/kernel/fs/address_space.rs b/rust/kernel/fs/address_space.rs
new file mode 100644
index 000000000000..5b4fcb568f46
--- /dev/null
+++ b/rust/kernel/fs/address_space.rs
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! File system address spaces.
+//!
+//! This module allows Rust code implement address space operations.
+//!
+//! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
+
+use super::FileSystem;
+use crate::bindings;
+use core::marker::PhantomData;
+use macros::vtable;
+
+/// Operations implemented by address spaces.
+#[vtable]
+pub trait Operations {
+    /// File system that these operations are compatible with.
+    type FileSystem: FileSystem + ?Sized;
+}
+
+/// Represents address space operations.
+#[allow(dead_code)]
+pub struct Ops<T: FileSystem + ?Sized>(
+    pub(crate) *const bindings::address_space_operations,
+    pub(crate) PhantomData<T>,
+);
+
+impl<T: FileSystem + ?Sized> Ops<T> {
+    /// Creates the address space operations from a type that implements the [`Operations`] trait.
+    pub const fn new<U: Operations<FileSystem = T> + ?Sized>() -> Self {
+        struct Table<T: Operations + ?Sized>(PhantomData<T>);
+        impl<T: Operations + ?Sized> Table<T> {
+            const TABLE: bindings::address_space_operations = bindings::address_space_operations {
+                writepage: None,
+                read_folio: None,
+                writepages: None,
+                dirty_folio: None,
+                readahead: None,
+                write_begin: None,
+                write_end: None,
+                bmap: None,
+                invalidate_folio: None,
+                release_folio: None,
+                free_folio: None,
+                direct_IO: None,
+                migrate_folio: None,
+                launder_folio: None,
+                is_partially_uptodate: None,
+                is_dirty_writeback: None,
+                error_remove_folio: None,
+                swap_activate: None,
+                swap_deactivate: None,
+                swap_rw: None,
+            };
+        }
+        Self(&Table::<U>::TABLE, PhantomData)
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 18/30] rust: fs: introduce `address_space::Operations::read_folio`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (16 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 17/30] rust: fs: add empty address space operations Wedson Almeida Filho
@ 2024-05-14 13:16 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 19/30] rust: fs: introduce `FileSystem::read_xattr` Wedson Almeida Filho
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:16 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to create regular file inodes backed by the page
cache. The contents of such files are read into folios via `read_folio`.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c                  |  6 +++
 rust/kernel/folio.rs            |  1 -
 rust/kernel/fs/address_space.rs | 40 ++++++++++++++++++--
 rust/kernel/fs/file.rs          |  7 ++++
 rust/kernel/fs/inode.rs         | 20 +++++++++-
 samples/rust/rust_rofs.rs       | 67 ++++++++++++++++++++++++++-------
 6 files changed, 122 insertions(+), 19 deletions(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index acff58e6caff..2db5df578df2 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -282,6 +282,12 @@ loff_t rust_helper_i_size_read(const struct inode *inode)
 }
 EXPORT_SYMBOL_GPL(rust_helper_i_size_read);
 
+void rust_helper_mapping_set_large_folios(struct address_space *mapping)
+{
+	mapping_set_large_folios(mapping);
+}
+EXPORT_SYMBOL_GPL(rust_helper_mapping_set_large_folios);
+
 unsigned long rust_helper_copy_to_user(void __user *to, const void *from,
 				       unsigned long n)
 {
diff --git a/rust/kernel/folio.rs b/rust/kernel/folio.rs
index 20f51db920e4..077328b733e4 100644
--- a/rust/kernel/folio.rs
+++ b/rust/kernel/folio.rs
@@ -49,7 +49,6 @@ impl<S> Folio<S> {
     /// Callers must ensure that:
     /// * `ptr` is valid and remains so for the lifetime of the returned reference.
     /// * The folio has the right state.
-    #[allow(dead_code)]
     pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::folio) -> &'a Self {
         // SAFETY: The safety requirements guarantee that the cast below is ok.
         unsafe { &*ptr.cast::<Self>() }
diff --git a/rust/kernel/fs/address_space.rs b/rust/kernel/fs/address_space.rs
index 5b4fcb568f46..e539d690235b 100644
--- a/rust/kernel/fs/address_space.rs
+++ b/rust/kernel/fs/address_space.rs
@@ -6,8 +6,9 @@
 //!
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
-use super::FileSystem;
-use crate::bindings;
+use super::{file::File, FileSystem};
+use crate::error::{from_result, Result};
+use crate::{bindings, folio::Folio, folio::PageCache, types::Locked};
 use core::marker::PhantomData;
 use macros::vtable;
 
@@ -16,10 +17,15 @@
 pub trait Operations {
     /// File system that these operations are compatible with.
     type FileSystem: FileSystem + ?Sized;
+
+    /// Reads the contents of the inode into the given folio.
+    fn read_folio(
+        file: Option<&File<Self::FileSystem>>,
+        folio: Locked<&Folio<PageCache<Self::FileSystem>>>,
+    ) -> Result;
 }
 
 /// Represents address space operations.
-#[allow(dead_code)]
 pub struct Ops<T: FileSystem + ?Sized>(
     pub(crate) *const bindings::address_space_operations,
     pub(crate) PhantomData<T>,
@@ -32,7 +38,11 @@ pub const fn new<U: Operations<FileSystem = T> + ?Sized>() -> Self {
         impl<T: Operations + ?Sized> Table<T> {
             const TABLE: bindings::address_space_operations = bindings::address_space_operations {
                 writepage: None,
-                read_folio: None,
+                read_folio: if T::HAS_READ_FOLIO {
+                    Some(Self::read_folio_callback)
+                } else {
+                    None
+                },
                 writepages: None,
                 dirty_folio: None,
                 readahead: None,
@@ -52,6 +62,28 @@ impl<T: Operations + ?Sized> Table<T> {
                 swap_deactivate: None,
                 swap_rw: None,
             };
+
+            extern "C" fn read_folio_callback(
+                file_ptr: *mut bindings::file,
+                folio_ptr: *mut bindings::folio,
+            ) -> i32 {
+                from_result(|| {
+                    let file = if file_ptr.is_null() {
+                        None
+                    } else {
+                        // SAFETY: The C API guarantees that `file_ptr` is a valid file if non-null.
+                        Some(unsafe { File::from_raw(file_ptr) })
+                    };
+
+                    // SAFETY: The C API guarantees that `folio_ptr` is a valid folio.
+                    let folio = unsafe { Folio::from_raw(folio_ptr) };
+
+                    // SAFETY: The C contract guarantees that the folio is valid and locked, with
+                    // ownership of the lock transferred to the callee (this function).
+                    T::read_folio(file, unsafe { Locked::new(folio) })?;
+                    Ok(0)
+                })
+            }
         }
         Self(&Table::<U>::TABLE, PhantomData)
     }
diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index 2ba456a1eee1..0828676eae1c 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -355,6 +355,12 @@ fn read_dir(
 pub struct Ops<T: FileSystem + ?Sized>(pub(crate) *const bindings::file_operations, PhantomData<T>);
 
 impl<T: FileSystem + ?Sized> Ops<T> {
+    /// Returns file operations for page-cache-based ro files.
+    pub fn generic_ro_file() -> Self {
+        // SAFETY: This is a constant in C, it never changes.
+        Self(unsafe { &bindings::generic_ro_fops }, PhantomData)
+    }
+
     /// Creates file operations from a type that implements the [`Operations`] trait.
     pub const fn new<U: Operations<FileSystem = T> + ?Sized>() -> Self {
         struct Table<T: Operations + ?Sized>(PhantomData<T>);
@@ -516,6 +522,7 @@ impl From<inode::Type> for DirEntryType {
     fn from(value: inode::Type) -> Self {
         match value {
             inode::Type::Dir => DirEntryType::Dir,
+            inode::Type::Reg => DirEntryType::Reg,
         }
     }
 }
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index c314d036c87e..1a41c824d30d 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -6,7 +6,9 @@
 //!
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
-use super::{dentry, dentry::DEntry, file, sb::SuperBlock, FileSystem, Offset, UnspecifiedFS};
+use super::{
+    address_space, dentry, dentry::DEntry, file, sb::SuperBlock, FileSystem, Offset, UnspecifiedFS,
+};
 use crate::error::{code::*, Result};
 use crate::types::{ARef, AlwaysRefCounted, Lockable, Locked, Opaque};
 use crate::{bindings, block, time::Timespec};
@@ -127,6 +129,11 @@ pub fn init(mut self, params: Params) -> Result<ARef<INode<T>>> {
         let inode = unsafe { self.0.as_mut() };
         let mode = match params.typ {
             Type::Dir => bindings::S_IFDIR,
+            Type::Reg => {
+                // SAFETY: The `i_mapping` pointer doesn't change and is valid.
+                unsafe { bindings::mapping_set_large_folios(inode.i_mapping) };
+                bindings::S_IFREG
+            }
         };
 
         inode.i_mode = (params.mode & 0o777) | u16::try_from(mode)?;
@@ -166,6 +173,14 @@ pub fn set_fops(&mut self, fops: file::Ops<T>) -> &mut Self {
         inode.__bindgen_anon_3.i_fop = fops.0;
         self
     }
+
+    /// Sets the address space operations on this new inode.
+    pub fn set_aops(&mut self, aops: address_space::Ops<T>) -> &mut Self {
+        // SAFETY: By the type invariants, it's ok to modify the inode.
+        let inode = unsafe { self.0.as_mut() };
+        inode.i_data.a_ops = aops.0;
+        self
+    }
 }
 
 impl<T: FileSystem + ?Sized> Drop for New<T> {
@@ -181,6 +196,9 @@ fn drop(&mut self) {
 pub enum Type {
     /// Directory type.
     Dir,
+
+    /// Regular file type.
+    Reg,
 }
 
 /// Required inode parameters.
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 2a87e524e0e1..8005fd14b2e1 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -3,10 +3,11 @@
 //! Rust read-only file system sample.
 
 use kernel::fs::{
-    dentry, dentry::DEntry, file, file::File, inode, inode::INode, sb::SuperBlock, Offset,
+    address_space, dentry, dentry::DEntry, file, file::File, inode, inode::INode, sb, Offset,
 };
 use kernel::prelude::*;
-use kernel::{c_str, fs, time::UNIX_EPOCH, types::ARef, types::Either, types::Locked, user};
+use kernel::types::{ARef, Either, Locked};
+use kernel::{c_str, folio::Folio, folio::PageCache, fs, time::UNIX_EPOCH, user};
 
 kernel::module_fs! {
     type: RoFs,
@@ -20,6 +21,7 @@ struct Entry {
     name: &'static [u8],
     ino: u64,
     etype: inode::Type,
+    contents: &'static [u8],
 }
 
 const ENTRIES: [Entry; 3] = [
@@ -27,41 +29,53 @@ struct Entry {
         name: b".",
         ino: 1,
         etype: inode::Type::Dir,
+        contents: b"",
     },
     Entry {
         name: b"..",
         ino: 1,
         etype: inode::Type::Dir,
+        contents: b"",
     },
     Entry {
-        name: b"subdir",
+        name: b"test.txt",
         ino: 2,
-        etype: inode::Type::Dir,
+        etype: inode::Type::Reg,
+        contents: b"hello world\n",
     },
 ];
 
 const DIR_FOPS: file::Ops<RoFs> = file::Ops::new::<RoFs>();
 const DIR_IOPS: inode::Ops<RoFs> = inode::Ops::new::<RoFs>();
+const FILE_AOPS: address_space::Ops<RoFs> = address_space::Ops::new::<RoFs>();
 
 struct RoFs;
 
 impl RoFs {
-    fn iget(sb: &SuperBlock<Self>, e: &'static Entry) -> Result<ARef<INode<Self>>> {
+    fn iget(sb: &sb::SuperBlock<Self>, e: &'static Entry) -> Result<ARef<INode<Self>>> {
         let mut new = match sb.get_or_create_inode(e.ino)? {
             Either::Left(existing) => return Ok(existing),
             Either::Right(new) => new,
         };
 
-        match e.etype {
-            inode::Type::Dir => new.set_iops(DIR_IOPS).set_fops(DIR_FOPS),
+        let (mode, nlink, size) = match e.etype {
+            inode::Type::Dir => {
+                new.set_iops(DIR_IOPS).set_fops(DIR_FOPS);
+                (0o555, 2, ENTRIES.len().try_into()?)
+            }
+            inode::Type::Reg => {
+                new.set_fops(file::Ops::generic_ro_file())
+                    .set_aops(FILE_AOPS);
+                (0o444, 1, e.contents.len().try_into()?)
+            }
         };
 
         new.init(inode::Params {
             typ: e.etype,
-            mode: 0o555,
-            size: ENTRIES.len().try_into()?,
-            blocks: 1,
-            nlink: 2,
+            mode,
+            size,
+            blocks: (u64::try_from(size)? + 511) / 512,
+            nlink,
             uid: 0,
             gid: 0,
             atime: UNIX_EPOCH,
@@ -74,12 +88,12 @@ fn iget(sb: &SuperBlock<Self>, e: &'static Entry) -> Result<ARef<INode<Self>>> {
 impl fs::FileSystem for RoFs {
     const NAME: &'static CStr = c_str!("rust_rofs");
 
-    fn fill_super(sb: &mut SuperBlock<Self>) -> Result {
+    fn fill_super(sb: &mut sb::SuperBlock<Self>) -> Result {
         sb.set_magic(0x52555354);
         Ok(())
     }
 
-    fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
+    fn init_root(sb: &sb::SuperBlock<Self>) -> Result<dentry::Root<Self>> {
         let inode = Self::iget(sb, &ENTRIES[0])?;
         dentry::Root::try_new(inode)
     }
@@ -109,6 +123,33 @@ fn lookup(
     }
 }
 
+#[vtable]
+impl address_space::Operations for RoFs {
+    type FileSystem = Self;
+
+    fn read_folio(_: Option<&File<Self>>, mut folio: Locked<&Folio<PageCache<Self>>>) -> Result {
+        let data = match folio.inode().ino() {
+            2 => ENTRIES[2].contents,
+            _ => return Err(EINVAL),
+        };
+
+        let pos = usize::try_from(folio.pos()).unwrap_or(usize::MAX);
+        let copied = if pos >= data.len() {
+            0
+        } else {
+            let to_copy = core::cmp::min(data.len() - pos, folio.size());
+            folio.write(0, &data[pos..][..to_copy])?;
+            to_copy
+        };
+
+        folio.zero_out(copied, folio.size() - copied)?;
+        folio.mark_uptodate();
+        folio.flush_dcache();
+
+        Ok(())
+    }
+}
+
 #[vtable]
 impl file::Operations for RoFs {
     type FileSystem = Self;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 19/30] rust: fs: introduce `FileSystem::read_xattr`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (17 preceding siblings ...)
  2024-05-14 13:16 ` [RFC PATCH v2 18/30] rust: fs: introduce `address_space::Operations::read_folio` Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 20/30] rust: fs: introduce `FileSystem::statfs` Wedson Almeida Filho
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to expose xattrs associated with inodes.
`overlayfs` uses an xattr to indicate that a directory is opaque (i.e.,
that lower layers should not be looked up). The planned file systems
need to support opaque directories, so they must be able to implement
this.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |  1 +
 rust/kernel/error.rs            |  2 ++
 rust/kernel/fs.rs               | 59 +++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index fd22b1eafb1d..2133f95e8be5 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -22,6 +22,7 @@
 #include <linux/slab.h>
 #include <linux/wait.h>
 #include <linux/workqueue.h>
+#include <linux/xattr.h>
 
 /* `bindgen` gets confused at certain things. */
 const size_t RUST_CONST_HELPER_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 15628d2fa3b2..f40a2bdf28d4 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -77,6 +77,8 @@ macro_rules! declare_err {
     declare_err!(EIOCBQUEUED, "iocb queued, will get completion event.");
     declare_err!(ERECALLCONFLICT, "Conflict with recalled state.");
     declare_err!(ENOGRACE, "NFS file lock reclaim refused.");
+    declare_err!(ENODATA, "No data available.");
+    declare_err!(EOPNOTSUPP, "Operation not supported on transport endpoint.");
     declare_err!(ESTALE, "Stale file handle.");
     declare_err!(EUCLEAN, "Structure needs cleaning.");
 }
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index f1c1972fabcf..5b8f9c346767 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -10,6 +10,8 @@
 use crate::types::Opaque;
 use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
 use core::{ffi, marker::PhantomData, mem::ManuallyDrop, pin::Pin, ptr};
+use dentry::DEntry;
+use inode::INode;
 use macros::{pin_data, pinned_drop};
 use sb::SuperBlock;
 
@@ -46,6 +48,19 @@ pub trait FileSystem {
     /// This is called during initialisation of a superblock after [`FileSystem::fill_super`] has
     /// completed successfully.
     fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>>;
+
+    /// Reads an xattr.
+    ///
+    /// Returns the number of bytes written to `outbuf`. If it is too small, returns the number of
+    /// bytes needs to hold the attribute.
+    fn read_xattr(
+        _dentry: &DEntry<Self>,
+        _inode: &INode<Self>,
+        _name: &CStr,
+        _outbuf: &mut [u8],
+    ) -> Result<usize> {
+        Err(EOPNOTSUPP)
+    }
 }
 
 /// A file system that is unspecified.
@@ -162,6 +177,7 @@ impl<T: FileSystem + ?Sized> Tables<T> {
             // derived, is valid for write.
             let sb = unsafe { &mut *new_sb.0.get() };
             sb.s_op = &Tables::<T>::SUPER_BLOCK;
+            sb.s_xattr = &Tables::<T>::XATTR_HANDLERS[0];
             sb.s_flags |= bindings::SB_RDONLY;
 
             T::fill_super(new_sb)?;
@@ -214,6 +230,49 @@ impl<T: FileSystem + ?Sized> Tables<T> {
         free_cached_objects: None,
         shutdown: None,
     };
+
+    const XATTR_HANDLERS: [*const bindings::xattr_handler; 2] = [&Self::XATTR_HANDLER, ptr::null()];
+
+    const XATTR_HANDLER: bindings::xattr_handler = bindings::xattr_handler {
+        name: ptr::null(),
+        prefix: crate::c_str!("").as_char_ptr(),
+        flags: 0,
+        list: None,
+        get: Some(Self::xattr_get_callback),
+        set: None,
+    };
+
+    unsafe extern "C" fn xattr_get_callback(
+        _handler: *const bindings::xattr_handler,
+        dentry_ptr: *mut bindings::dentry,
+        inode_ptr: *mut bindings::inode,
+        name: *const ffi::c_char,
+        buffer: *mut ffi::c_void,
+        size: usize,
+    ) -> ffi::c_int {
+        from_result(|| {
+            // SAFETY: The C API guarantees that `inode_ptr` is a valid dentry.
+            let dentry = unsafe { DEntry::from_raw(dentry_ptr) };
+
+            // SAFETY: The C API guarantees that `inode_ptr` is a valid inode.
+            let inode = unsafe { INode::from_raw(inode_ptr) };
+
+            // SAFETY: The c API guarantees that `name` is a valid null-terminated string. It
+            // also guarantees that it's valid for the duration of the callback.
+            let name = unsafe { CStr::from_char_ptr(name) };
+
+            let (buf_ptr, size) = if buffer.is_null() {
+                (ptr::NonNull::dangling().as_ptr(), 0)
+            } else {
+                (buffer.cast::<u8>(), size)
+            };
+
+            // SAFETY: The C API guarantees that `buffer` is at least `size` bytes in length.
+            let buf = unsafe { core::slice::from_raw_parts_mut(buf_ptr, size) };
+            let len = T::read_xattr(dentry, inode, name, buf)?;
+            Ok(len.try_into()?)
+        })
+    }
 }
 
 /// Kernel module that exposes a single file system implemented by `T`.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 20/30] rust: fs: introduce `FileSystem::statfs`
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (18 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 19/30] rust: fs: introduce `FileSystem::read_xattr` Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 21/30] rust: fs: introduce more inode types Wedson Almeida Filho
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to expose their stats. `overlayfs` requires that
this be implemented by all file systems that are part of an overlay.
The planned file systems need to be overlayed with overlayfs, so they
must be able to implement this.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |  1 +
 rust/kernel/error.rs            |  1 +
 rust/kernel/fs.rs               | 47 ++++++++++++++++++++++++++++++++-
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 2133f95e8be5..f4c7c3951dbe 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -20,6 +20,7 @@
 #include <linux/refcount.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/statfs.h>
 #include <linux/wait.h>
 #include <linux/workqueue.h>
 #include <linux/xattr.h>
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index f40a2bdf28d4..edada157879a 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -79,6 +79,7 @@ macro_rules! declare_err {
     declare_err!(ENOGRACE, "NFS file lock reclaim refused.");
     declare_err!(ENODATA, "No data available.");
     declare_err!(EOPNOTSUPP, "Operation not supported on transport endpoint.");
+    declare_err!(ENOSYS, "Invalid system call number.");
     declare_err!(ESTALE, "Stale file handle.");
     declare_err!(EUCLEAN, "Structure needs cleaning.");
 }
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 5b8f9c346767..51de73008857 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -61,6 +61,31 @@ fn read_xattr(
     ) -> Result<usize> {
         Err(EOPNOTSUPP)
     }
+
+    /// Get filesystem statistics.
+    fn statfs(_dentry: &DEntry<Self>) -> Result<Stat> {
+        Err(ENOSYS)
+    }
+}
+
+/// File system stats.
+///
+/// A subset of C's `kstatfs`.
+pub struct Stat {
+    /// Magic number of the file system.
+    pub magic: usize,
+
+    /// The maximum length of a file name.
+    pub namelen: isize,
+
+    /// Block size.
+    pub bsize: isize,
+
+    /// Number of files in the file system.
+    pub files: u64,
+
+    /// Number of blocks in the file system.
+    pub blocks: u64,
 }
 
 /// A file system that is unspecified.
@@ -213,7 +238,7 @@ impl<T: FileSystem + ?Sized> Tables<T> {
         freeze_fs: None,
         thaw_super: None,
         unfreeze_fs: None,
-        statfs: None,
+        statfs: Some(Self::statfs_callback),
         remount_fs: None,
         umount_begin: None,
         show_options: None,
@@ -231,6 +256,26 @@ impl<T: FileSystem + ?Sized> Tables<T> {
         shutdown: None,
     };
 
+    unsafe extern "C" fn statfs_callback(
+        dentry_ptr: *mut bindings::dentry,
+        buf: *mut bindings::kstatfs,
+    ) -> ffi::c_int {
+        from_result(|| {
+            // SAFETY: The C API guarantees that `dentry_ptr` is a valid dentry.
+            let dentry = unsafe { DEntry::from_raw(dentry_ptr) };
+            let s = T::statfs(dentry)?;
+
+            // SAFETY: The C API guarantees that `buf` is valid for read and write.
+            let buf = unsafe { &mut *buf };
+            buf.f_type = s.magic as ffi::c_long;
+            buf.f_namelen = s.namelen as ffi::c_long;
+            buf.f_bsize = s.bsize as ffi::c_long;
+            buf.f_files = s.files;
+            buf.f_blocks = s.blocks;
+            Ok(0)
+        })
+    }
+
     const XATTR_HANDLERS: [*const bindings::xattr_handler; 2] = [&Self::XATTR_HANDLER, ptr::null()];
 
     const XATTR_HANDLER: bindings::xattr_handler = bindings::xattr_handler {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 21/30] rust: fs: introduce more inode types
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (19 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 20/30] rust: fs: introduce `FileSystem::statfs` Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 22/30] rust: fs: add per-superblock data Wedson Almeida Filho
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file system modules to create inodes that are symlinks,
pipes, sockets, char devices and block devices (in addition to the
already-supported directories and regular files).

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c            |  13 ++++
 rust/kernel/fs/file.rs    |   5 ++
 rust/kernel/fs/inode.rs   | 131 +++++++++++++++++++++++++++++++++++++-
 samples/rust/rust_rofs.rs |  43 ++++++++++++-
 4 files changed, 187 insertions(+), 5 deletions(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index 2db5df578df2..360a1d38ac19 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -288,6 +288,12 @@ void rust_helper_mapping_set_large_folios(struct address_space *mapping)
 }
 EXPORT_SYMBOL_GPL(rust_helper_mapping_set_large_folios);
 
+unsigned int rust_helper_MKDEV(unsigned int major, unsigned int minor)
+{
+	return MKDEV(major, minor);
+}
+EXPORT_SYMBOL_GPL(rust_helper_MKDEV);
+
 unsigned long rust_helper_copy_to_user(void __user *to, const void *from,
 				       unsigned long n)
 {
@@ -307,6 +313,13 @@ void rust_helper_inode_unlock_shared(struct inode *inode)
 }
 EXPORT_SYMBOL_GPL(rust_helper_inode_unlock_shared);
 
+void rust_helper_set_delayed_call(struct delayed_call *call,
+				  void (*fn)(void *), void *arg)
+{
+	set_delayed_call(call, fn, arg);
+}
+EXPORT_SYMBOL_GPL(rust_helper_set_delayed_call);
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index 0828676eae1c..a819724b75f8 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -521,8 +521,13 @@ pub enum DirEntryType {
 impl From<inode::Type> for DirEntryType {
     fn from(value: inode::Type) -> Self {
         match value {
+            inode::Type::Fifo => DirEntryType::Fifo,
+            inode::Type::Chr(_, _) => DirEntryType::Chr,
             inode::Type::Dir => DirEntryType::Dir,
+            inode::Type::Blk(_, _) => DirEntryType::Blk,
             inode::Type::Reg => DirEntryType::Reg,
+            inode::Type::Lnk => DirEntryType::Lnk,
+            inode::Type::Sock => DirEntryType::Sock,
         }
     }
 }
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index 1a41c824d30d..75b68d697a6e 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -10,8 +10,8 @@
     address_space, dentry, dentry::DEntry, file, sb::SuperBlock, FileSystem, Offset, UnspecifiedFS,
 };
 use crate::error::{code::*, Result};
-use crate::types::{ARef, AlwaysRefCounted, Lockable, Locked, Opaque};
-use crate::{bindings, block, time::Timespec};
+use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Lockable, Locked, Opaque};
+use crate::{bindings, block, str::CStr, str::CString, time::Timespec};
 use core::mem::ManuallyDrop;
 use core::{marker::PhantomData, ptr};
 use macros::vtable;
@@ -25,6 +25,18 @@ pub trait Operations {
     /// File system that these operations are compatible with.
     type FileSystem: FileSystem + ?Sized;
 
+    /// Returns the string that represents the name of the file a symbolic link inode points to.
+    ///
+    /// When `dentry` is `None`, `get_link` is called with the RCU read-side lock held, so it may
+    /// not sleep. Implementations must return `Err(ECHILD)` for it to be called again without
+    /// holding the RCU lock.
+    fn get_link<'a>(
+        _dentry: Option<&DEntry<Self::FileSystem>>,
+        _inode: &'a INode<Self::FileSystem>,
+    ) -> Result<Either<CString, &'a CStr>> {
+        Err(ENOTSUPP)
+    }
+
     /// Returns the inode corresponding to the directory entry with the given name.
     fn lookup(
         _parent: &Locked<&INode<Self::FileSystem>, ReadSem>,
@@ -134,6 +146,52 @@ pub fn init(mut self, params: Params) -> Result<ARef<INode<T>>> {
                 unsafe { bindings::mapping_set_large_folios(inode.i_mapping) };
                 bindings::S_IFREG
             }
+            Type::Lnk => {
+                // If we are using `page_get_link`, we need to prevent the use of high mem.
+                if !inode.i_op.is_null() {
+                    // SAFETY: We just checked that `i_op` is non-null, and we always just set it
+                    // to valid values.
+                    if unsafe {
+                        (*inode.i_op).get_link == bindings::page_symlink_inode_operations.get_link
+                    } {
+                        // SAFETY: `inode` is valid for write as it's a new inode.
+                        unsafe { bindings::inode_nohighmem(inode) };
+                    }
+                }
+                bindings::S_IFLNK
+            }
+            Type::Fifo => {
+                // SAFETY: `inode` is valid for write as it's a new inode.
+                unsafe { bindings::init_special_inode(inode, bindings::S_IFIFO as _, 0) };
+                bindings::S_IFIFO
+            }
+            Type::Sock => {
+                // SAFETY: `inode` is valid for write as it's a new inode.
+                unsafe { bindings::init_special_inode(inode, bindings::S_IFSOCK as _, 0) };
+                bindings::S_IFSOCK
+            }
+            Type::Chr(major, minor) => {
+                // SAFETY: `inode` is valid for write as it's a new inode.
+                unsafe {
+                    bindings::init_special_inode(
+                        inode,
+                        bindings::S_IFCHR as _,
+                        bindings::MKDEV(major, minor & bindings::MINORMASK),
+                    )
+                };
+                bindings::S_IFCHR
+            }
+            Type::Blk(major, minor) => {
+                // SAFETY: `inode` is valid for write as it's a new inode.
+                unsafe {
+                    bindings::init_special_inode(
+                        inode,
+                        bindings::S_IFBLK as _,
+                        bindings::MKDEV(major, minor & bindings::MINORMASK),
+                    )
+                };
+                bindings::S_IFBLK
+            }
         };
 
         inode.i_mode = (params.mode & 0o777) | u16::try_from(mode)?;
@@ -194,11 +252,26 @@ fn drop(&mut self) {
 /// The type of an inode.
 #[derive(Copy, Clone)]
 pub enum Type {
+    /// Named pipe (first-in, first-out) type.
+    Fifo,
+
+    /// Character device type.
+    Chr(u32, u32),
+
     /// Directory type.
     Dir,
 
+    /// Block device type.
+    Blk(u32, u32),
+
     /// Regular file type.
     Reg,
+
+    /// Symbolic link type.
+    Lnk,
+
+    /// Named unix-domain socket type.
+    Sock,
 }
 
 /// Required inode parameters.
@@ -245,6 +318,15 @@ pub struct Params {
 pub struct Ops<T: FileSystem + ?Sized>(*const bindings::inode_operations, PhantomData<T>);
 
 impl<T: FileSystem + ?Sized> Ops<T> {
+    /// Returns inode operations for symbolic links that are stored in a single page.
+    pub fn page_symlink_inode() -> Self {
+        // SAFETY: This is a constant in C, it never changes.
+        Self(
+            unsafe { &bindings::page_symlink_inode_operations },
+            PhantomData,
+        )
+    }
+
     /// Creates the inode operations from a type that implements the [`Operations`] trait.
     pub const fn new<U: Operations<FileSystem = T> + ?Sized>() -> Self {
         struct Table<T: Operations + ?Sized>(PhantomData<T>);
@@ -255,7 +337,11 @@ impl<T: Operations + ?Sized> Table<T> {
                 } else {
                     None
                 },
-                get_link: None,
+                get_link: if T::HAS_GET_LINK {
+                    Some(Self::get_link_callback)
+                } else {
+                    None
+                },
                 permission: None,
                 get_inode_acl: None,
                 readlink: None,
@@ -303,6 +389,45 @@ extern "C" fn lookup_callback(
                     Ok(Some(ret)) => ManuallyDrop::new(ret).0.get(),
                 }
             }
+
+            extern "C" fn get_link_callback(
+                dentry_ptr: *mut bindings::dentry,
+                inode_ptr: *mut bindings::inode,
+                delayed_call: *mut bindings::delayed_call,
+            ) -> *const core::ffi::c_char {
+                extern "C" fn drop_cstring(ptr: *mut core::ffi::c_void) {
+                    // SAFETY: The argument came from a previous call to `into_foreign` below.
+                    unsafe { CString::from_foreign(ptr) };
+                }
+
+                let dentry = if dentry_ptr.is_null() {
+                    None
+                } else {
+                    // SAFETY: The C API guarantees that `dentry_ptr` is a valid dentry when it's
+                    // non-null.
+                    Some(unsafe { DEntry::from_raw(dentry_ptr) })
+                };
+
+                // SAFETY: The C API guarantees that `parent_ptr` is a valid inode.
+                let inode = unsafe { INode::from_raw(inode_ptr) };
+
+                match T::get_link(dentry, inode) {
+                    Err(e) => e.to_ptr::<core::ffi::c_char>(),
+                    Ok(Either::Right(str)) => str.as_char_ptr(),
+                    Ok(Either::Left(str)) => {
+                        let ptr = str.into_foreign();
+                        unsafe {
+                            bindings::set_delayed_call(
+                                delayed_call,
+                                Some(drop_cstring),
+                                ptr.cast_mut(),
+                            )
+                        };
+
+                        ptr.cast::<core::ffi::c_char>()
+                    }
+                }
+            }
         }
         Self(&Table::<U>::TABLE, PhantomData)
     }
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 8005fd14b2e1..7a09e2db878d 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -7,7 +7,7 @@
 };
 use kernel::prelude::*;
 use kernel::types::{ARef, Either, Locked};
-use kernel::{c_str, folio::Folio, folio::PageCache, fs, time::UNIX_EPOCH, user};
+use kernel::{c_str, folio::Folio, folio::PageCache, fs, str::CString, time::UNIX_EPOCH, user};
 
 kernel::module_fs! {
     type: RoFs,
@@ -24,7 +24,7 @@ struct Entry {
     contents: &'static [u8],
 }
 
-const ENTRIES: [Entry; 3] = [
+const ENTRIES: [Entry; 4] = [
     Entry {
         name: b".",
         ino: 1,
@@ -43,11 +43,18 @@ struct Entry {
         etype: inode::Type::Reg,
         contents: b"hello world\n",
     },
+    Entry {
+        name: b"link.txt",
+        ino: 3,
+        etype: inode::Type::Lnk,
+        contents: b"./test.txt",
+    },
 ];
 
 const DIR_FOPS: file::Ops<RoFs> = file::Ops::new::<RoFs>();
 const DIR_IOPS: inode::Ops<RoFs> = inode::Ops::new::<RoFs>();
 const FILE_AOPS: address_space::Ops<RoFs> = address_space::Ops::new::<RoFs>();
+const LNK_IOPS: inode::Ops<RoFs> = inode::Ops::new::<Link>();
 
 struct RoFs;
 
@@ -68,6 +75,11 @@ fn iget(sb: &sb::SuperBlock<Self>, e: &'static Entry) -> Result<ARef<INode<Self>
                     .set_aops(FILE_AOPS);
                 (0o444, 1, e.contents.len().try_into()?)
             }
+            inode::Type::Lnk => {
+                new.set_iops(LNK_IOPS);
+                (0o444, 1, e.contents.len().try_into()?)
+            }
+            _ => return Err(ENOENT),
         };
 
         new.init(inode::Params {
@@ -123,6 +135,33 @@ fn lookup(
     }
 }
 
+struct Link;
+#[vtable]
+impl inode::Operations for Link {
+    type FileSystem = RoFs;
+
+    fn get_link<'a>(
+        dentry: Option<&DEntry<RoFs>>,
+        inode: &'a INode<RoFs>,
+    ) -> Result<Either<CString, &'a CStr>> {
+        if dentry.is_none() {
+            return Err(ECHILD);
+        }
+
+        let name_buf = match inode.ino() {
+            3 => ENTRIES[3].contents,
+            _ => return Err(EINVAL),
+        };
+        let mut name = Box::new_slice(
+            name_buf.len().checked_add(1).ok_or(ENOMEM)?,
+            b'\0',
+            GFP_NOFS,
+        )?;
+        name[..name_buf.len()].copy_from_slice(name_buf);
+        Ok(Either::Left(name.try_into()?))
+    }
+}
+
 #[vtable]
 impl address_space::Operations for RoFs {
     type FileSystem = Self;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 22/30] rust: fs: add per-superblock data
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (20 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 21/30] rust: fs: introduce more inode types Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 23/30] rust: fs: allow file systems backed by a block device Wedson Almeida Filho
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to associate [typed] data to super blocks when
they're created. Since we only have a pointer-sized field in which to
store the state, it must implement the `ForeignOwnable` trait.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs.rs         | 46 ++++++++++++++++++++++++++++---------
 rust/kernel/fs/sb.rs      | 48 +++++++++++++++++++++++++++++++++++----
 samples/rust/rust_rofs.rs |  3 ++-
 3 files changed, 81 insertions(+), 16 deletions(-)

diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 51de73008857..387e87e3edaf 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -7,7 +7,7 @@
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
 use crate::error::{code::*, from_result, to_result, Error, Result};
-use crate::types::Opaque;
+use crate::types::{ForeignOwnable, Opaque};
 use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
 use core::{ffi, marker::PhantomData, mem::ManuallyDrop, pin::Pin, ptr};
 use dentry::DEntry;
@@ -31,6 +31,9 @@
 
 /// A file system type.
 pub trait FileSystem {
+    /// Data associated with each file system instance (super-block).
+    type Data: ForeignOwnable + Send + Sync;
+
     /// The name of the file system type.
     const NAME: &'static CStr;
 
@@ -40,8 +43,8 @@ pub trait FileSystem {
     #[doc(hidden)]
     const IS_UNSPECIFIED: bool = false;
 
-    /// Initialises the new superblock.
-    fn fill_super(sb: &mut SuperBlock<Self>) -> Result;
+    /// Initialises the new superblock and returns the data to attach to it.
+    fn fill_super(sb: &mut SuperBlock<Self, sb::New>) -> Result<Self::Data>;
 
     /// Initialises and returns the root inode of the given superblock.
     ///
@@ -94,9 +97,10 @@ pub struct Stat {
 pub struct UnspecifiedFS;
 
 impl FileSystem for UnspecifiedFS {
+    type Data = ();
     const NAME: &'static CStr = crate::c_str!("unspecified");
     const IS_UNSPECIFIED: bool = true;
-    fn fill_super(_: &mut SuperBlock<Self>) -> Result {
+    fn fill_super(_: &mut SuperBlock<Self, sb::New>) -> Result {
         Err(ENOTSUPP)
     }
 
@@ -134,7 +138,7 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
                 fs.owner = module.0;
                 fs.name = T::NAME.as_char_ptr();
                 fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
-                fs.kill_sb = Some(Self::kill_sb_callback);
+                fs.kill_sb = Some(Self::kill_sb_callback::<T>);
                 fs.fs_flags = 0;
 
                 // SAFETY: Pointers stored in `fs` are static so will live for as long as the
@@ -155,10 +159,22 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
         })
     }
 
-    unsafe extern "C" fn kill_sb_callback(sb_ptr: *mut bindings::super_block) {
+    unsafe extern "C" fn kill_sb_callback<T: FileSystem + ?Sized>(
+        sb_ptr: *mut bindings::super_block,
+    ) {
         // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
         // the appropriate function to call for cleanup.
         unsafe { bindings::kill_anon_super(sb_ptr) };
+
+        // SAFETY: The C API contract guarantees that `sb_ptr` is valid for read.
+        let ptr = unsafe { (*sb_ptr).s_fs_info };
+        if !ptr.is_null() {
+            // SAFETY: The only place where `s_fs_info` is assigned is `NewSuperBlock::init`, where
+            // it's initialised with the result of an `into_foreign` call. We checked above that
+            // `ptr` is non-null because it would be null if we never reached the point where we
+            // init the field.
+            unsafe { T::Data::from_foreign(ptr) };
+        }
     }
 }
 
@@ -205,12 +221,19 @@ impl<T: FileSystem + ?Sized> Tables<T> {
             sb.s_xattr = &Tables::<T>::XATTR_HANDLERS[0];
             sb.s_flags |= bindings::SB_RDONLY;
 
-            T::fill_super(new_sb)?;
+            let data = T::fill_super(new_sb)?;
+
+            // N.B.: Even on failure, `kill_sb` is called and frees the data.
+            sb.s_fs_info = data.into_foreign().cast_mut();
 
-            let root = T::init_root(new_sb)?;
+            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
+            // newly-created (and initialised above) superblock. And we have just initialised
+            // `s_fs_info`.
+            let sb = unsafe { SuperBlock::from_raw(sb_ptr) };
+            let root = T::init_root(sb)?;
 
             // Reject root inode if it belongs to a different superblock.
-            if !ptr::eq(root.super_block(), new_sb) {
+            if !ptr::eq(root.super_block(), sb) {
                 return Err(EINVAL);
             }
 
@@ -346,7 +369,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///
 /// ```
 /// # mod module_fs_sample {
-/// use kernel::fs::{dentry, inode::INode, sb::SuperBlock, self};
+/// use kernel::fs::{dentry, inode::INode, sb, sb::SuperBlock, self};
 /// use kernel::prelude::*;
 ///
 /// kernel::module_fs! {
@@ -359,8 +382,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///
 /// struct MyFs;
 /// impl fs::FileSystem for MyFs {
+///     type Data = ();
 ///     const NAME: &'static CStr = kernel::c_str!("myfs");
-///     fn fill_super(_: &mut SuperBlock<Self>) -> Result {
+///     fn fill_super(_: &mut SuperBlock<Self, sb::New>) -> Result {
 ///         todo!()
 ///     }
 ///     fn init_root(_sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
diff --git a/rust/kernel/fs/sb.rs b/rust/kernel/fs/sb.rs
index fa10f3db5593..7c0c52e6da0a 100644
--- a/rust/kernel/fs/sb.rs
+++ b/rust/kernel/fs/sb.rs
@@ -10,19 +10,37 @@
 use super::FileSystem;
 use crate::bindings;
 use crate::error::{code::*, Result};
-use crate::types::{ARef, Either, Opaque};
+use crate::types::{ARef, Either, ForeignOwnable, Opaque};
 use core::{marker::PhantomData, ptr};
 
+/// A typestate for [`SuperBlock`] that indicates that it's a new one, so not fully initialized
+/// yet.
+pub struct New;
+
+/// A typestate for [`SuperBlock`] that indicates that it's ready to be used.
+pub struct Ready;
+
+// SAFETY: Instances of `SuperBlock<T, Ready>` are only created after initialising the data.
+unsafe impl DataInited for Ready {}
+
+/// Indicates that a superblock in this typestate has data initialized.
+///
+/// # Safety
+///
+/// Implementers must ensure that `s_fs_info` is properly initialised in this state.
+#[doc(hidden)]
+pub unsafe trait DataInited {}
+
 /// A file system super block.
 ///
 /// Wraps the kernel's `struct super_block`.
 #[repr(transparent)]
-pub struct SuperBlock<T: FileSystem + ?Sized>(
+pub struct SuperBlock<T: FileSystem + ?Sized, S = Ready>(
     pub(crate) Opaque<bindings::super_block>,
-    PhantomData<T>,
+    PhantomData<(S, T)>,
 );
 
-impl<T: FileSystem + ?Sized> SuperBlock<T> {
+impl<T: FileSystem + ?Sized, S> SuperBlock<T, S> {
     /// Creates a new superblock reference from the given raw pointer.
     ///
     /// # Safety
@@ -31,6 +49,7 @@ impl<T: FileSystem + ?Sized> SuperBlock<T> {
     ///
     /// * `ptr` is valid and remains so for the lifetime of the returned object.
     /// * `ptr` has the correct file system type, or `T` is [`super::UnspecifiedFS`].
+    /// * `ptr` in the right typestate.
     pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::super_block) -> &'a Self {
         // SAFETY: The safety requirements guarantee that the cast below is ok.
         unsafe { &*ptr.cast::<Self>() }
@@ -44,6 +63,7 @@ pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::super_block) -> &'a Self {
     ///
     /// * `ptr` is valid and remains so for the lifetime of the returned object.
     /// * `ptr` has the correct file system type, or `T` is [`super::UnspecifiedFS`].
+    /// * `ptr` in the right typestate.
     /// * `ptr` is the only active pointer to the superblock.
     pub(crate) unsafe fn from_raw_mut<'a>(ptr: *mut bindings::super_block) -> &'a mut Self {
         // SAFETY: The safety requirements guarantee that the cast below is ok.
@@ -55,7 +75,9 @@ pub fn rdonly(&self) -> bool {
         // SAFETY: `s_flags` only changes during init, so it is safe to read it.
         unsafe { (*self.0.get()).s_flags & bindings::SB_RDONLY != 0 }
     }
+}
 
+impl<T: FileSystem + ?Sized> SuperBlock<T, New> {
     /// Sets the magic number of the superblock.
     pub fn set_magic(&mut self, magic: usize) -> &mut Self {
         // SAFETY: This is a new superblock that is being initialised, so it's ok to write to its
@@ -63,8 +85,26 @@ pub fn set_magic(&mut self, magic: usize) -> &mut Self {
         unsafe { (*self.0.get()).s_magic = magic as core::ffi::c_ulong };
         self
     }
+}
+
+impl<T: FileSystem + ?Sized, S: DataInited> SuperBlock<T, S> {
+    /// Returns the data associated with the superblock.
+    pub fn data(&self) -> <T::Data as ForeignOwnable>::Borrowed<'_> {
+        if T::IS_UNSPECIFIED {
+            crate::build_error!("super block data type is unspecified");
+        }
+
+        // SAFETY: This method is only available if the typestate implements `DataInited`, whose
+        // safety requirements include `s_fs_info` being properly initialised.
+        let ptr = unsafe { (*self.0.get()).s_fs_info };
+        unsafe { T::Data::borrow(ptr) }
+    }
 
     /// Tries to get an existing inode or create a new one if it doesn't exist yet.
+    ///
+    /// This method is not callable from a superblock where data isn't inited yet because it would
+    /// allow one to get access to the uninited data via `inode::New::init()` ->
+    /// `INode::super_block()` -> `SuperBlock::data()`.
     pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, inode::New<T>>> {
         // SAFETY: All superblock-related state needed by `iget_locked` is initialised by C code
         // before calling `fill_super_callback`, or by `fill_super_callback` itself before calling
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 7a09e2db878d..7027ca067f8f 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -98,9 +98,10 @@ fn iget(sb: &sb::SuperBlock<Self>, e: &'static Entry) -> Result<ARef<INode<Self>
 }
 
 impl fs::FileSystem for RoFs {
+    type Data = ();
     const NAME: &'static CStr = c_str!("rust_rofs");
 
-    fn fill_super(sb: &mut sb::SuperBlock<Self>) -> Result {
+    fn fill_super(sb: &mut sb::SuperBlock<Self, sb::New>) -> Result {
         sb.set_magic(0x52555354);
         Ok(())
     }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 23/30] rust: fs: allow file systems backed by a block device
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (21 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 22/30] rust: fs: add per-superblock data Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 24/30] rust: fs: allow per-inode data Wedson Almeida Filho
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems that are backed by block devices (in addition to
in-memory ones).

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c            |  14 +++
 rust/kernel/block.rs      |   1 -
 rust/kernel/fs.rs         |  60 ++++++++---
 rust/kernel/fs/inode.rs   | 221 +++++++++++++++++++++++++++++++++++++-
 rust/kernel/fs/sb.rs      |  49 ++++++++-
 samples/rust/rust_rofs.rs |   2 +-
 6 files changed, 328 insertions(+), 19 deletions(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index 360a1d38ac19..6c6d18df055f 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -21,6 +21,7 @@
  */
 
 #include <kunit/test-bug.h>
+#include <linux/blkdev.h>
 #include <linux/bug.h>
 #include <linux/build_bug.h>
 #include <linux/cacheflush.h>
@@ -258,6 +259,13 @@ void rust_helper_kunmap_local(const void *vaddr)
 }
 EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);
 
+struct folio *rust_helper_read_mapping_folio(struct address_space *mapping,
+					     pgoff_t index, struct file *file)
+{
+	return read_mapping_folio(mapping, index, file);
+}
+EXPORT_SYMBOL_GPL(rust_helper_read_mapping_folio);
+
 void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
 {
 	i_uid_write(inode, uid);
@@ -294,6 +302,12 @@ unsigned int rust_helper_MKDEV(unsigned int major, unsigned int minor)
 }
 EXPORT_SYMBOL_GPL(rust_helper_MKDEV);
 
+sector_t rust_helper_bdev_nr_sectors(struct block_device *bdev)
+{
+	return bdev_nr_sectors(bdev);
+}
+EXPORT_SYMBOL_GPL(rust_helper_bdev_nr_sectors);
+
 unsigned long rust_helper_copy_to_user(void __user *to, const void *from,
 				       unsigned long n)
 {
diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs
index 868623d7c873..4d669bd5dce9 100644
--- a/rust/kernel/block.rs
+++ b/rust/kernel/block.rs
@@ -31,7 +31,6 @@ impl Device {
     ///
     /// Callers must ensure that `ptr` is valid and remains so for the lifetime of the returned
     /// object.
-    #[allow(dead_code)]
     pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::block_device) -> &'a Self {
         // SAFETY: The safety requirements guarantee that the cast below is ok.
         unsafe { &*ptr.cast::<Self>() }
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 387e87e3edaf..864aca24d12c 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -26,6 +26,11 @@
 /// This is C's `loff_t`.
 pub type Offset = i64;
 
+/// An index into the page cache.
+///
+/// This is C's `pgoff_t`.
+pub type PageOffset = usize;
+
 /// Maximum size of an inode.
 pub const MAX_LFS_FILESIZE: Offset = bindings::MAX_LFS_FILESIZE;
 
@@ -37,6 +42,9 @@ pub trait FileSystem {
     /// The name of the file system type.
     const NAME: &'static CStr;
 
+    /// Determines how superblocks for this file system type are keyed.
+    const SUPER_TYPE: sb::Type = sb::Type::Independent;
+
     /// Determines if an implementation doesn't specify the required types.
     ///
     /// This is meant for internal use only.
@@ -44,7 +52,10 @@ pub trait FileSystem {
     const IS_UNSPECIFIED: bool = false;
 
     /// Initialises the new superblock and returns the data to attach to it.
-    fn fill_super(sb: &mut SuperBlock<Self, sb::New>) -> Result<Self::Data>;
+    fn fill_super(
+        sb: &mut SuperBlock<Self, sb::New>,
+        mapper: Option<inode::Mapper>,
+    ) -> Result<Self::Data>;
 
     /// Initialises and returns the root inode of the given superblock.
     ///
@@ -100,7 +111,7 @@ impl FileSystem for UnspecifiedFS {
     type Data = ();
     const NAME: &'static CStr = crate::c_str!("unspecified");
     const IS_UNSPECIFIED: bool = true;
-    fn fill_super(_: &mut SuperBlock<Self, sb::New>) -> Result {
+    fn fill_super(_: &mut SuperBlock<Self, sb::New>, _: Option<inode::Mapper>) -> Result {
         Err(ENOTSUPP)
     }
 
@@ -139,7 +150,9 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
                 fs.name = T::NAME.as_char_ptr();
                 fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
                 fs.kill_sb = Some(Self::kill_sb_callback::<T>);
-                fs.fs_flags = 0;
+                fs.fs_flags = if let sb::Type::BlockDev = T::SUPER_TYPE {
+                    bindings::FS_REQUIRES_DEV as i32
+                } else { 0 };
 
                 // SAFETY: Pointers stored in `fs` are static so will live for as long as the
                 // registration is active (it is undone in `drop`).
@@ -162,9 +175,16 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
     unsafe extern "C" fn kill_sb_callback<T: FileSystem + ?Sized>(
         sb_ptr: *mut bindings::super_block,
     ) {
-        // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
-        // the appropriate function to call for cleanup.
-        unsafe { bindings::kill_anon_super(sb_ptr) };
+        match T::SUPER_TYPE {
+            // SAFETY: In `get_tree_callback` we always call `get_tree_bdev` for
+            // `sb::Type::BlockDev`, so `kill_block_super` is the appropriate function to call
+            // for cleanup.
+            sb::Type::BlockDev => unsafe { bindings::kill_block_super(sb_ptr) },
+            // SAFETY: In `get_tree_callback` we always call `get_tree_nodev` for
+            // `sb::Type::Independent`, so `kill_anon_super` is the appropriate function to call
+            // for cleanup.
+            sb::Type::Independent => unsafe { bindings::kill_anon_super(sb_ptr) },
+        }
 
         // SAFETY: The C API contract guarantees that `sb_ptr` is valid for read.
         let ptr = unsafe { (*sb_ptr).s_fs_info };
@@ -200,9 +220,18 @@ impl<T: FileSystem + ?Sized> Tables<T> {
     };
 
     unsafe extern "C" fn get_tree_callback(fc: *mut bindings::fs_context) -> ffi::c_int {
-        // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
-        // the right type and is a valid callback.
-        unsafe { bindings::get_tree_nodev(fc, Some(Self::fill_super_callback)) }
+        match T::SUPER_TYPE {
+            // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
+            // the right type and is a valid callback.
+            sb::Type::BlockDev => unsafe {
+                bindings::get_tree_bdev(fc, Some(Self::fill_super_callback))
+            },
+            // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
+            // the right type and is a valid callback.
+            sb::Type::Independent => unsafe {
+                bindings::get_tree_nodev(fc, Some(Self::fill_super_callback))
+            },
+        }
     }
 
     unsafe extern "C" fn fill_super_callback(
@@ -221,7 +250,14 @@ impl<T: FileSystem + ?Sized> Tables<T> {
             sb.s_xattr = &Tables::<T>::XATTR_HANDLERS[0];
             sb.s_flags |= bindings::SB_RDONLY;
 
-            let data = T::fill_super(new_sb)?;
+            let mapper = if matches!(T::SUPER_TYPE, sb::Type::BlockDev) {
+                // SAFETY: This is the only mapper created for this inode, so it is unique.
+                Some(unsafe { new_sb.bdev().inode().mapper() })
+            } else {
+                None
+            };
+
+            let data = T::fill_super(new_sb, mapper)?;
 
             // N.B.: Even on failure, `kill_sb` is called and frees the data.
             sb.s_fs_info = data.into_foreign().cast_mut();
@@ -369,7 +405,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///
 /// ```
 /// # mod module_fs_sample {
-/// use kernel::fs::{dentry, inode::INode, sb, sb::SuperBlock, self};
+/// use kernel::fs::{dentry, inode::INode, inode::Mapper, sb, sb::SuperBlock, self};
 /// use kernel::prelude::*;
 ///
 /// kernel::module_fs! {
@@ -384,7 +420,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 /// impl fs::FileSystem for MyFs {
 ///     type Data = ();
 ///     const NAME: &'static CStr = kernel::c_str!("myfs");
-///     fn fill_super(_: &mut SuperBlock<Self, sb::New>) -> Result {
+///     fn fill_super(_: &mut SuperBlock<Self, sb::New>, _: Option<Mapper>) -> Result {
 ///         todo!()
 ///     }
 ///     fn init_root(_sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index 75b68d697a6e..5b3602362521 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -7,13 +7,16 @@
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
 use super::{
-    address_space, dentry, dentry::DEntry, file, sb::SuperBlock, FileSystem, Offset, UnspecifiedFS,
+    address_space, dentry, dentry::DEntry, file, sb::SuperBlock, FileSystem, Offset, PageOffset,
+    UnspecifiedFS,
 };
-use crate::error::{code::*, Result};
+use crate::error::{code::*, from_err_ptr, Result};
 use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Lockable, Locked, Opaque};
-use crate::{bindings, block, str::CStr, str::CString, time::Timespec};
+use crate::{
+    bindings, block, build_error, folio, folio::Folio, str::CStr, str::CString, time::Timespec,
+};
 use core::mem::ManuallyDrop;
-use core::{marker::PhantomData, ptr};
+use core::{cmp, marker::PhantomData, ops::Deref, ptr};
 use macros::vtable;
 
 /// The number of an inode.
@@ -93,6 +96,129 @@ pub fn size(&self) -> Offset {
         // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference.
         unsafe { bindings::i_size_read(self.0.get()) }
     }
+
+    /// Returns a mapper for this inode.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that mappers are unique for a given inode and range. For inodes that
+    /// back a block device, a mapper is always created when the filesystem is mounted; so callers
+    /// in such situations must ensure that that mapper is never used.
+    pub unsafe fn mapper(&self) -> Mapper<T> {
+        Mapper {
+            inode: self.into(),
+            begin: 0,
+            end: Offset::MAX,
+        }
+    }
+
+    /// Returns a mapped folio at the given offset.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that there are no concurrent mutable mappings of the folio.
+    pub unsafe fn mapped_folio(
+        &self,
+        offset: Offset,
+    ) -> Result<folio::Mapped<'_, folio::PageCache<T>>> {
+        let page_index = offset >> bindings::PAGE_SHIFT;
+        let page_offset = offset & ((bindings::PAGE_SIZE - 1) as Offset);
+        let folio = self.read_mapping_folio(page_index.try_into()?)?;
+
+        // SAFETY: The safety requirements guarantee that there are no concurrent mutable mappings
+        // of the folio.
+        unsafe { Folio::map_owned(folio, page_offset.try_into()?) }
+    }
+
+    /// Returns the folio at the given page index.
+    pub fn read_mapping_folio(
+        &self,
+        index: PageOffset,
+    ) -> Result<ARef<Folio<folio::PageCache<T>>>> {
+        let folio = from_err_ptr(unsafe {
+            bindings::read_mapping_folio(
+                (*self.0.get()).i_mapping,
+                index.try_into()?,
+                ptr::null_mut(),
+            )
+        })?;
+        let ptr = ptr::NonNull::new(folio)
+            .ok_or(EIO)?
+            .cast::<Folio<folio::PageCache<T>>>();
+        // SAFETY: The folio returned by read_mapping_folio has had its refcount incremented.
+        Ok(unsafe { ARef::from_raw(ptr) })
+    }
+
+    /// Iterate over the given range, one folio at a time.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that there are no concurrent mutable mappings of the folio.
+    pub unsafe fn for_each_page<U>(
+        &self,
+        first: Offset,
+        len: Offset,
+        mut cb: impl FnMut(&[u8]) -> Result<Option<U>>,
+    ) -> Result<Option<U>> {
+        if first >= self.size() {
+            return Ok(None);
+        }
+        let mut remain = cmp::min(len, self.size() - first);
+        first.checked_add(remain).ok_or(EIO)?;
+
+        let mut next = first;
+        while remain > 0 {
+            // SAFETY: The safety requirements of this function satisfy those of `mapped_folio`.
+            let data = unsafe { self.mapped_folio(next)? };
+            let avail = cmp::min(data.len(), remain.try_into().unwrap_or(usize::MAX));
+            let ret = cb(&data[..avail])?;
+            if ret.is_some() {
+                return Ok(ret);
+            }
+
+            next += avail as Offset;
+            remain -= avail as Offset;
+        }
+
+        Ok(None)
+    }
+}
+
+impl<T: FileSystem + ?Sized, U: Deref<Target = INode<T>>> Locked<U, ReadSem> {
+    /// Returns a mapped folio at the given offset.
+    // TODO: This conflicts with Locked<Folio>::write. Once we settle on a way to handle reading
+    // the contents of certain inodes (e.g., directories, links), then we switch to that and
+    // remove this.
+    pub fn mapped_folio<'a>(
+        &'a self,
+        offset: Offset,
+    ) -> Result<folio::Mapped<'a, folio::PageCache<T>>>
+    where
+        T: 'a,
+    {
+        if T::IS_UNSPECIFIED {
+            build_error!("unspecified file systems cannot safely map folios");
+        }
+
+        // SAFETY: The inode is locked in read mode, so it's ok to map its contents.
+        unsafe { self.deref().mapped_folio(offset) }
+    }
+
+    /// Iterate over the given range, one folio at a time.
+    // TODO: This has the same issue as mapped_folio above.
+    pub fn for_each_page<V>(
+        &self,
+        first: Offset,
+        len: Offset,
+        cb: impl FnMut(&[u8]) -> Result<Option<V>>,
+    ) -> Result<Option<V>> {
+        if T::IS_UNSPECIFIED {
+            build_error!("unspecified file systems cannot safely map folios");
+        }
+
+        // SAFETY: The inode is locked in read mode, so it's ok to map its contents.
+        unsafe { self.deref().for_each_page(first, len, cb) }
+    }
 }
 
 // SAFETY: The type invariants guarantee that `INode` is always ref-counted.
@@ -111,6 +237,7 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
 /// Indicates that the an inode's rw semapahore is locked in read (shared) mode.
 pub struct ReadSem;
 
+// SAFETY: `raw_lock` calls `inode_lock_shared` which locks the inode in shared mode.
 unsafe impl<T: FileSystem + ?Sized> Lockable<ReadSem> for INode<T> {
     fn raw_lock(&self) {
         // SAFETY: Since there's a reference to the inode, it must be valid.
@@ -432,3 +559,89 @@ extern "C" fn drop_cstring(ptr: *mut core::ffi::c_void) {
         Self(&Table::<U>::TABLE, PhantomData)
     }
 }
+
+/// Allows mapping the contents of the inode.
+///
+/// # Invariants
+///
+/// Mappers are unique per range per inode.
+pub struct Mapper<T: FileSystem + ?Sized = UnspecifiedFS> {
+    inode: ARef<INode<T>>,
+    begin: Offset,
+    end: Offset,
+}
+
+// SAFETY: All inode and folio operations are safe from any thread.
+unsafe impl<T: FileSystem + ?Sized> Send for Mapper<T> {}
+
+// SAFETY: All inode and folio operations are safe from any thread.
+unsafe impl<T: FileSystem + ?Sized> Sync for Mapper<T> {}
+
+impl<T: FileSystem + ?Sized> Mapper<T> {
+    /// Splits the mapper into two ranges.
+    ///
+    /// The first range is from the beginning of `self` up to and including `offset - 1`. The
+    /// second range is from `offset` to the end of `self`.
+    pub fn split_at(mut self, offset: Offset) -> (Self, Self) {
+        let inode = self.inode.clone();
+        if offset <= self.begin {
+            (
+                Self {
+                    inode,
+                    begin: offset,
+                    end: offset,
+                },
+                self,
+            )
+        } else if offset >= self.end {
+            (
+                self,
+                Self {
+                    inode,
+                    begin: offset,
+                    end: offset,
+                },
+            )
+        } else {
+            let end = self.end;
+            self.end = offset;
+            (
+                self,
+                Self {
+                    inode,
+                    begin: offset,
+                    end,
+                },
+            )
+        }
+    }
+
+    /// Returns a mapped folio at the given offset.
+    pub fn mapped_folio(&self, offset: Offset) -> Result<folio::Mapped<'_, folio::PageCache<T>>> {
+        if offset < self.begin || offset >= self.end {
+            return Err(ERANGE);
+        }
+
+        // SAFETY: By the type invariant, there are no other mutable mappings of the folio.
+        let mut map = unsafe { self.inode.mapped_folio(offset) }?;
+        map.cap_len((self.end - offset).try_into()?);
+        Ok(map)
+    }
+
+    /// Iterate over the given range, one folio at a time.
+    pub fn for_each_page<U>(
+        &self,
+        first: Offset,
+        len: Offset,
+        cb: impl FnMut(&[u8]) -> Result<Option<U>>,
+    ) -> Result<Option<U>> {
+        if first < self.begin || first >= self.end {
+            return Err(ERANGE);
+        }
+
+        let actual_len = cmp::min(len, self.end - first);
+
+        // SAFETY: By the type invariant, there are no other mutable mappings of the folio.
+        unsafe { self.inode.for_each_page(first, actual_len, cb) }
+    }
+}
diff --git a/rust/kernel/fs/sb.rs b/rust/kernel/fs/sb.rs
index 7c0c52e6da0a..93c7b2770163 100644
--- a/rust/kernel/fs/sb.rs
+++ b/rust/kernel/fs/sb.rs
@@ -8,11 +8,22 @@
 
 use super::inode::{self, INode, Ino};
 use super::FileSystem;
-use crate::bindings;
 use crate::error::{code::*, Result};
 use crate::types::{ARef, Either, ForeignOwnable, Opaque};
+use crate::{bindings, block, build_error};
 use core::{marker::PhantomData, ptr};
 
+/// Type of superblock keying.
+///
+/// It determines how C's `fs_context_operations::get_tree` is implemented.
+pub enum Type {
+    /// Multiple independent superblocks may exist.
+    Independent,
+
+    /// Uses a block device.
+    BlockDev,
+}
+
 /// A typestate for [`SuperBlock`] that indicates that it's a new one, so not fully initialized
 /// yet.
 pub struct New;
@@ -75,6 +86,28 @@ pub fn rdonly(&self) -> bool {
         // SAFETY: `s_flags` only changes during init, so it is safe to read it.
         unsafe { (*self.0.get()).s_flags & bindings::SB_RDONLY != 0 }
     }
+
+    /// Returns the block device associated with the superblock.
+    pub fn bdev(&self) -> &block::Device {
+        if !matches!(T::SUPER_TYPE, Type::BlockDev) {
+            build_error!("bdev is only available in blockdev superblocks");
+        }
+
+        // SAFETY: The superblock is valid and given that it's a blockdev superblock it must have a
+        // valid `s_bdev` that remains valid while the superblock (`self`) is valid.
+        unsafe { block::Device::from_raw((*self.0.get()).s_bdev) }
+    }
+
+    /// Returns the number of sectors in the underlying block device.
+    pub fn sector_count(&self) -> block::Sector {
+        if !matches!(T::SUPER_TYPE, Type::BlockDev) {
+            build_error!("sector_count is only available in blockdev superblocks");
+        }
+
+        // SAFETY: The superblock is valid and given that it's a blockdev superblock it must have a
+        // valid `s_bdev`.
+        unsafe { bindings::bdev_nr_sectors((*self.0.get()).s_bdev) }
+    }
 }
 
 impl<T: FileSystem + ?Sized> SuperBlock<T, New> {
@@ -85,6 +118,20 @@ pub fn set_magic(&mut self, magic: usize) -> &mut Self {
         unsafe { (*self.0.get()).s_magic = magic as core::ffi::c_ulong };
         self
     }
+
+    /// Sets the device blocksize, subjected to the minimum accepted by the device.
+    ///
+    /// Returns the actual value set.
+    pub fn min_blocksize(&mut self, size: i32) -> i32 {
+        if !matches!(T::SUPER_TYPE, Type::BlockDev) {
+            build_error!("min_blocksize is only available in blockdev superblocks");
+        }
+
+        // SAFETY: This a new superblock that is being initialised, so it it's ok to set the block
+        // size. Additionally, we've checked that this is the superblock is backed by a block
+        // device, so it is also valid.
+        unsafe { bindings::sb_min_blocksize(self.0.get(), size) }
+    }
 }
 
 impl<T: FileSystem + ?Sized, S: DataInited> SuperBlock<T, S> {
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 7027ca067f8f..fea3360b6e7a 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -101,7 +101,7 @@ impl fs::FileSystem for RoFs {
     type Data = ();
     const NAME: &'static CStr = c_str!("rust_rofs");
 
-    fn fill_super(sb: &mut sb::SuperBlock<Self, sb::New>) -> Result {
+    fn fill_super(sb: &mut sb::SuperBlock<Self, sb::New>, _: Option<inode::Mapper>) -> Result {
         sb.set_magic(0x52555354);
         Ok(())
     }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 24/30] rust: fs: allow per-inode data
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (22 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 23/30] rust: fs: allow file systems backed by a block device Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 25/30] rust: fs: export file type from mode constants Wedson Almeida Filho
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to attach extra [typed] data to each inode. If
no data is needed, use the regular inode kmem_cache, otherwise we create
a new one.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c            |   7 +++
 rust/kernel/fs.rs         |  19 ++++--
 rust/kernel/fs/inode.rs   | 123 ++++++++++++++++++++++++++++++++++++--
 rust/kernel/mem_cache.rs  |   2 -
 samples/rust/rust_rofs.rs |  13 ++--
 5 files changed, 143 insertions(+), 21 deletions(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index 6c6d18df055f..edf12868962c 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -266,6 +266,13 @@ struct folio *rust_helper_read_mapping_folio(struct address_space *mapping,
 }
 EXPORT_SYMBOL_GPL(rust_helper_read_mapping_folio);
 
+void *rust_helper_alloc_inode_sb(struct super_block *sb,
+				 struct kmem_cache *cache, gfp_t gfp)
+{
+	return alloc_inode_sb(sb, cache, gfp);
+}
+EXPORT_SYMBOL_GPL(rust_helper_alloc_inode_sb);
+
 void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
 {
 	i_uid_write(inode, uid);
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 864aca24d12c..d64fe1a5812f 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -8,8 +8,8 @@
 
 use crate::error::{code::*, from_result, to_result, Error, Result};
 use crate::types::{ForeignOwnable, Opaque};
-use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
-use core::{ffi, marker::PhantomData, mem::ManuallyDrop, pin::Pin, ptr};
+use crate::{bindings, init::PinInit, mem_cache::MemCache, str::CStr, try_pin_init, ThisModule};
+use core::{ffi, marker::PhantomData, mem::size_of, mem::ManuallyDrop, pin::Pin, ptr};
 use dentry::DEntry;
 use inode::INode;
 use macros::{pin_data, pinned_drop};
@@ -39,6 +39,9 @@ pub trait FileSystem {
     /// Data associated with each file system instance (super-block).
     type Data: ForeignOwnable + Send + Sync;
 
+    /// Type of data associated with each inode.
+    type INodeData: Send + Sync;
+
     /// The name of the file system type.
     const NAME: &'static CStr;
 
@@ -109,6 +112,7 @@ pub struct Stat {
 
 impl FileSystem for UnspecifiedFS {
     type Data = ();
+    type INodeData = ();
     const NAME: &'static CStr = crate::c_str!("unspecified");
     const IS_UNSPECIFIED: bool = true;
     fn fill_super(_: &mut SuperBlock<Self, sb::New>, _: Option<inode::Mapper>) -> Result {
@@ -125,6 +129,7 @@ fn init_root(_: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
 pub struct Registration {
     #[pin]
     fs: Opaque<bindings::file_system_type>,
+    inode_cache: Option<MemCache>,
 }
 
 // SAFETY: `Registration` doesn't provide any `&self` methods, so it is safe to pass references
@@ -139,6 +144,7 @@ impl Registration {
     /// Creates the initialiser of a new file system registration.
     pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<Self, Error> {
         try_pin_init!(Self {
+            inode_cache: INode::<T>::new_cache()?,
             fs <- Opaque::try_ffi_init(|fs_ptr: *mut bindings::file_system_type| {
                 // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write.
                 unsafe { fs_ptr.write(bindings::file_system_type::default()) };
@@ -284,8 +290,12 @@ impl<T: FileSystem + ?Sized> Tables<T> {
     }
 
     const SUPER_BLOCK: bindings::super_operations = bindings::super_operations {
-        alloc_inode: None,
-        destroy_inode: None,
+        alloc_inode: if size_of::<T::INodeData>() != 0 {
+            Some(INode::<T>::alloc_inode_callback)
+        } else {
+            None
+        },
+        destroy_inode: Some(INode::<T>::destroy_inode_callback),
         free_inode: None,
         dirty_inode: None,
         write_inode: None,
@@ -419,6 +429,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 /// struct MyFs;
 /// impl fs::FileSystem for MyFs {
 ///     type Data = ();
+///     type INodeData = ();
 ///     const NAME: &'static CStr = kernel::c_str!("myfs");
 ///     fn fill_super(_: &mut SuperBlock<Self, sb::New>, _: Option<Mapper>) -> Result {
 ///         todo!()
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index 5b3602362521..5230ff2fe0dd 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -13,9 +13,10 @@
 use crate::error::{code::*, from_err_ptr, Result};
 use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Lockable, Locked, Opaque};
 use crate::{
-    bindings, block, build_error, folio, folio::Folio, str::CStr, str::CString, time::Timespec,
+    bindings, block, build_error, container_of, folio, folio::Folio, mem_cache::MemCache,
+    str::CStr, str::CString, time::Timespec,
 };
-use core::mem::ManuallyDrop;
+use core::mem::{size_of, ManuallyDrop, MaybeUninit};
 use core::{cmp, marker::PhantomData, ops::Deref, ptr};
 use macros::vtable;
 
@@ -91,6 +92,18 @@ pub fn super_block(&self) -> &SuperBlock<T> {
         unsafe { SuperBlock::from_raw((*self.0.get()).i_sb) }
     }
 
+    /// Returns the data associated with the inode.
+    pub fn data(&self) -> &T::INodeData {
+        if T::IS_UNSPECIFIED {
+            crate::build_error!("inode data type is unspecified");
+        }
+        let outerp = container_of!(self.0.get(), WithData<T::INodeData>, inode);
+        // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference
+        // (`&self`) to it. Additionally, we know `T::INodeData` is always initialised in an
+        // `INode`.
+        unsafe { &*(*outerp).data.as_ptr() }
+    }
+
     /// Returns the size of the inode contents.
     pub fn size(&self) -> Offset {
         // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference.
@@ -182,6 +195,87 @@ pub unsafe fn for_each_page<U>(
 
         Ok(None)
     }
+
+    pub(crate) fn new_cache() -> Result<Option<MemCache>> {
+        Ok(if size_of::<T::INodeData>() == 0 {
+            None
+        } else {
+            Some(MemCache::try_new::<WithData<T::INodeData>>(
+                T::NAME,
+                Some(Self::inode_init_once_callback),
+            )?)
+        })
+    }
+
+    unsafe extern "C" fn inode_init_once_callback(outer_inode: *mut core::ffi::c_void) {
+        let ptr = outer_inode.cast::<WithData<T::INodeData>>();
+
+        // SAFETY: This is only used in `new`, so we know that we have a valid `inode::WithData`
+        // instance whose inode part can be initialised.
+        unsafe { bindings::inode_init_once(ptr::addr_of_mut!((*ptr).inode)) };
+    }
+
+    pub(crate) unsafe extern "C" fn alloc_inode_callback(
+        sb: *mut bindings::super_block,
+    ) -> *mut bindings::inode {
+        // SAFETY: The callback contract guarantees that `sb` is valid for read.
+        let super_type = unsafe { (*sb).s_type };
+
+        // SAFETY: This callback is only used in `Registration`, so `super_type` is necessarily
+        // embedded in a `Registration`, which is guaranteed to be valid because it has a
+        // superblock associated to it.
+        let reg = unsafe { &*container_of!(super_type, super::Registration, fs) };
+
+        // SAFETY: `sb` and `cache` are guaranteed to be valid by the callback contract and by
+        // the existence of a superblock respectively.
+        let ptr = unsafe {
+            bindings::alloc_inode_sb(sb, MemCache::ptr(&reg.inode_cache), bindings::GFP_KERNEL)
+        }
+        .cast::<WithData<T::INodeData>>();
+        if ptr.is_null() {
+            return ptr::null_mut();
+        }
+
+        // SAFETY: `ptr` was just allocated, so it is valid for dereferencing.
+        unsafe { ptr::addr_of_mut!((*ptr).inode) }
+    }
+
+    pub(crate) unsafe extern "C" fn destroy_inode_callback(inode: *mut bindings::inode) {
+        // SAFETY: By the C contract, `inode` is a valid pointer.
+        let is_bad = unsafe { bindings::is_bad_inode(inode) };
+
+        // SAFETY: The inode is guaranteed to be valid by the callback contract. Additionally, the
+        // superblock is also guaranteed to still be valid by the inode existence.
+        let super_type = unsafe { (*(*inode).i_sb).s_type };
+
+        // SAFETY: This callback is only used in `Registration`, so `super_type` is necessarily
+        // embedded in a `Registration`, which is guaranteed to be valid because it has a
+        // superblock associated to it.
+        let reg = unsafe { &*container_of!(super_type, super::Registration, fs) };
+        let ptr = container_of!(inode, WithData<T::INodeData>, inode).cast_mut();
+
+        if !is_bad {
+            // SAFETY: The code either initialises the data or marks the inode as bad. Since the
+            // inode is not bad, the data is initialised, and thus safe to drop.
+            unsafe { ptr::drop_in_place((*ptr).data.as_mut_ptr()) };
+        }
+
+        if size_of::<T::INodeData>() == 0 {
+            // SAFETY: When the size of `INodeData` is zero, we don't use a separate mem_cache, so
+            // it is allocated from the regular mem_cache, which is what `free_inode_nonrcu` uses
+            // to free the inode.
+            unsafe { bindings::free_inode_nonrcu(inode) };
+        } else {
+            // The callback contract guarantees that the inode was previously allocated via the
+            // `alloc_inode_callback` callback, so it is safe to free it back to the cache.
+            unsafe {
+                bindings::kmem_cache_free(
+                    MemCache::ptr(&reg.inode_cache),
+                    ptr.cast::<core::ffi::c_void>(),
+                )
+            };
+        }
+    }
 }
 
 impl<T: FileSystem + ?Sized, U: Deref<Target = INode<T>>> Locked<U, ReadSem> {
@@ -251,6 +345,11 @@ unsafe fn unlock(&self) {
     }
 }
 
+struct WithData<T> {
+    data: MaybeUninit<T>,
+    inode: bindings::inode,
+}
+
 /// An inode that is locked and hasn't been initialised yet.
 ///
 /// # Invariants
@@ -263,9 +362,18 @@ pub struct New<T: FileSystem + ?Sized>(
 
 impl<T: FileSystem + ?Sized> New<T> {
     /// Initialises the new inode with the given parameters.
-    pub fn init(mut self, params: Params) -> Result<ARef<INode<T>>> {
-        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
-        let inode = unsafe { self.0.as_mut() };
+    pub fn init(self, params: Params<T::INodeData>) -> Result<ARef<INode<T>>> {
+        let outerp = container_of!(self.0.as_ptr(), WithData<T::INodeData>, inode);
+
+        // SAFETY: This is a newly-created inode. No other references to it exist, so it is
+        // safe to mutably dereference it.
+        let outer = unsafe { &mut *outerp.cast_mut() };
+
+        // N.B. We must always write this to a newly allocated inode because the free callback
+        // expects the data to be initialised and drops it.
+        outer.data.write(params.value);
+
+        let inode = &mut outer.inode;
         let mode = match params.typ {
             Type::Dir => bindings::S_IFDIR,
             Type::Reg => {
@@ -404,7 +512,7 @@ pub enum Type {
 /// Required inode parameters.
 ///
 /// This is used when creating new inodes.
-pub struct Params {
+pub struct Params<T> {
     /// The access mode. It's a mask that grants execute (1), write (2) and read (4) access to
     /// everyone, the owner group, and the owner.
     pub mode: u16,
@@ -439,6 +547,9 @@ pub struct Params {
 
     /// Last access time.
     pub atime: Timespec,
+
+    /// Value to attach to this node.
+    pub value: T,
 }
 
 /// Represents inode operations.
diff --git a/rust/kernel/mem_cache.rs b/rust/kernel/mem_cache.rs
index e7e2720ff6cd..cbf1b7e75334 100644
--- a/rust/kernel/mem_cache.rs
+++ b/rust/kernel/mem_cache.rs
@@ -20,7 +20,6 @@ impl MemCache {
     /// Allocates a new `kmem_cache` for type `T`.
     ///
     /// `init` is called by the C code when entries are allocated.
-    #[allow(dead_code)]
     pub(crate) fn try_new<T>(
         name: &'static CStr,
         init: Option<unsafe extern "C" fn(*mut core::ffi::c_void)>,
@@ -43,7 +42,6 @@ pub(crate) fn try_new<T>(
     /// Returns the pointer to the `kmem_cache` instance, or null if it's `None`.
     ///
     /// This is a helper for functions like `alloc_inode_sb` where the cache is optional.
-    #[allow(dead_code)]
     pub(crate) fn ptr(c: &Option<Self>) -> *mut bindings::kmem_cache {
         match c {
             Some(m) => m.ptr.as_ptr(),
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index fea3360b6e7a..5b6c3f50adf4 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -93,12 +93,14 @@ fn iget(sb: &sb::SuperBlock<Self>, e: &'static Entry) -> Result<ARef<INode<Self>
             atime: UNIX_EPOCH,
             ctime: UNIX_EPOCH,
             mtime: UNIX_EPOCH,
+            value: e,
         })
     }
 }
 
 impl fs::FileSystem for RoFs {
     type Data = ();
+    type INodeData = &'static Entry;
     const NAME: &'static CStr = c_str!("rust_rofs");
 
     fn fill_super(sb: &mut sb::SuperBlock<Self, sb::New>, _: Option<inode::Mapper>) -> Result {
@@ -149,10 +151,7 @@ fn get_link<'a>(
             return Err(ECHILD);
         }
 
-        let name_buf = match inode.ino() {
-            3 => ENTRIES[3].contents,
-            _ => return Err(EINVAL),
-        };
+        let name_buf = inode.data().contents;
         let mut name = Box::new_slice(
             name_buf.len().checked_add(1).ok_or(ENOMEM)?,
             b'\0',
@@ -168,11 +167,7 @@ impl address_space::Operations for RoFs {
     type FileSystem = Self;
 
     fn read_folio(_: Option<&File<Self>>, mut folio: Locked<&Folio<PageCache<Self>>>) -> Result {
-        let data = match folio.inode().ino() {
-            2 => ENTRIES[2].contents,
-            _ => return Err(EINVAL),
-        };
-
+        let data = folio.inode().data().contents;
         let pos = usize::try_from(folio.pos()).unwrap_or(usize::MAX);
         let copied = if pos >= data.len() {
             0
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 25/30] rust: fs: export file type from mode constants
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (23 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 24/30] rust: fs: allow per-inode data Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 26/30] rust: fs: allow populating i_lnk Wedson Almeida Filho
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file system modules to use these constants if needed.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs.rs | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index d64fe1a5812f..4d90b23735bc 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -31,6 +31,33 @@
 /// This is C's `pgoff_t`.
 pub type PageOffset = usize;
 
+/// Contains constants related to Linux file modes.
+pub mod mode {
+    /// A bitmask used to the file type from a mode value.
+    pub const S_IFMT: u16 = bindings::S_IFMT as u16;
+
+    /// File type constant for block devices.
+    pub const S_IFBLK: u16 = bindings::S_IFBLK as u16;
+
+    /// File type constant for char devices.
+    pub const S_IFCHR: u16 = bindings::S_IFCHR as u16;
+
+    /// File type constant for directories.
+    pub const S_IFDIR: u16 = bindings::S_IFDIR as u16;
+
+    /// File type constant for pipes.
+    pub const S_IFIFO: u16 = bindings::S_IFIFO as u16;
+
+    /// File type constant for symbolic links.
+    pub const S_IFLNK: u16 = bindings::S_IFLNK as u16;
+
+    /// File type constant for regular files.
+    pub const S_IFREG: u16 = bindings::S_IFREG as u16;
+
+    /// File type constant for sockets.
+    pub const S_IFSOCK: u16 = bindings::S_IFSOCK as u16;
+}
+
 /// Maximum size of an inode.
 pub const MAX_LFS_FILESIZE: Offset = bindings::MAX_LFS_FILESIZE;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 26/30] rust: fs: allow populating i_lnk
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (24 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 25/30] rust: fs: export file type from mode constants Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 27/30] rust: fs: add `iomap` module Wedson Almeida Filho
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to store a string that represents the
destination of a symbolic link inode in the inode itself.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs/file.rs    |  6 ++---
 rust/kernel/fs/inode.rs   | 32 +++++++++++++++++++++----
 samples/rust/rust_rofs.rs | 50 ++++++++++++---------------------------
 3 files changed, 45 insertions(+), 43 deletions(-)

diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index a819724b75f8..9db70eff1169 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -518,15 +518,15 @@ pub enum DirEntryType {
     Wht = bindings::DT_WHT,
 }
 
-impl From<inode::Type> for DirEntryType {
-    fn from(value: inode::Type) -> Self {
+impl From<&inode::Type> for DirEntryType {
+    fn from(value: &inode::Type) -> Self {
         match value {
             inode::Type::Fifo => DirEntryType::Fifo,
             inode::Type::Chr(_, _) => DirEntryType::Chr,
             inode::Type::Dir => DirEntryType::Dir,
             inode::Type::Blk(_, _) => DirEntryType::Blk,
             inode::Type::Reg => DirEntryType::Reg,
-            inode::Type::Lnk => DirEntryType::Lnk,
+            inode::Type::Lnk(_) => DirEntryType::Lnk,
             inode::Type::Sock => DirEntryType::Sock,
         }
     }
diff --git a/rust/kernel/fs/inode.rs b/rust/kernel/fs/inode.rs
index 5230ff2fe0dd..b2b7d000080e 100644
--- a/rust/kernel/fs/inode.rs
+++ b/rust/kernel/fs/inode.rs
@@ -7,8 +7,8 @@
 //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
 
 use super::{
-    address_space, dentry, dentry::DEntry, file, sb::SuperBlock, FileSystem, Offset, PageOffset,
-    UnspecifiedFS,
+    address_space, dentry, dentry::DEntry, file, mode, sb::SuperBlock, FileSystem, Offset,
+    PageOffset, UnspecifiedFS,
 };
 use crate::error::{code::*, from_err_ptr, Result};
 use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Lockable, Locked, Opaque};
@@ -255,6 +255,17 @@ pub(crate) fn new_cache() -> Result<Option<MemCache>> {
         let ptr = container_of!(inode, WithData<T::INodeData>, inode).cast_mut();
 
         if !is_bad {
+            // SAFETY: The API contract guarantees that `inode` is valid.
+            if unsafe { (*inode).i_mode & mode::S_IFMT == mode::S_IFLNK } {
+                // SAFETY: We just checked that the inode is a link.
+                let lnk = unsafe { (*inode).__bindgen_anon_4.i_link };
+                if !lnk.is_null() {
+                    // SAFETY: This value is on link inode are only populated from with the result
+                    // of `CString::into_foreign`.
+                    unsafe { CString::from_foreign(lnk.cast::<core::ffi::c_void>()) };
+                }
+            }
+
             // SAFETY: The code either initialises the data or marks the inode as bad. Since the
             // inode is not bad, the data is initialised, and thus safe to drop.
             unsafe { ptr::drop_in_place((*ptr).data.as_mut_ptr()) };
@@ -381,7 +392,7 @@ pub fn init(self, params: Params<T::INodeData>) -> Result<ARef<INode<T>>> {
                 unsafe { bindings::mapping_set_large_folios(inode.i_mapping) };
                 bindings::S_IFREG
             }
-            Type::Lnk => {
+            Type::Lnk(str) => {
                 // If we are using `page_get_link`, we need to prevent the use of high mem.
                 if !inode.i_op.is_null() {
                     // SAFETY: We just checked that `i_op` is non-null, and we always just set it
@@ -393,6 +404,9 @@ pub fn init(self, params: Params<T::INodeData>) -> Result<ARef<INode<T>>> {
                         unsafe { bindings::inode_nohighmem(inode) };
                     }
                 }
+                if let Some(s) = str {
+                    inode.__bindgen_anon_4.i_link = s.into_foreign().cast::<i8>().cast_mut();
+                }
                 bindings::S_IFLNK
             }
             Type::Fifo => {
@@ -485,7 +499,6 @@ fn drop(&mut self) {
 }
 
 /// The type of an inode.
-#[derive(Copy, Clone)]
 pub enum Type {
     /// Named pipe (first-in, first-out) type.
     Fifo,
@@ -503,7 +516,7 @@ pub enum Type {
     Reg,
 
     /// Symbolic link type.
-    Lnk,
+    Lnk(Option<CString>),
 
     /// Named unix-domain socket type.
     Sock,
@@ -565,6 +578,15 @@ pub fn page_symlink_inode() -> Self {
         )
     }
 
+    /// Returns inode operations for symbolic links that are stored in the `i_lnk` field.
+    pub fn simple_symlink_inode() -> Self {
+        // SAFETY: This is a constant in C, it never changes.
+        Self(
+            unsafe { &bindings::simple_symlink_inode_operations },
+            PhantomData,
+        )
+    }
+
     /// Creates the inode operations from a type that implements the [`Operations`] trait.
     pub const fn new<U: Operations<FileSystem = T> + ?Sized>() -> Self {
         struct Table<T: Operations + ?Sized>(PhantomData<T>);
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 5b6c3f50adf4..5d0d1936459d 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -7,7 +7,7 @@
 };
 use kernel::prelude::*;
 use kernel::types::{ARef, Either, Locked};
-use kernel::{c_str, folio::Folio, folio::PageCache, fs, str::CString, time::UNIX_EPOCH, user};
+use kernel::{c_str, folio::Folio, folio::PageCache, fs, time::UNIX_EPOCH, user};
 
 kernel::module_fs! {
     type: RoFs,
@@ -46,7 +46,7 @@ struct Entry {
     Entry {
         name: b"link.txt",
         ino: 3,
-        etype: inode::Type::Lnk,
+        etype: inode::Type::Lnk(None),
         contents: b"./test.txt",
     },
 ];
@@ -54,7 +54,6 @@ struct Entry {
 const DIR_FOPS: file::Ops<RoFs> = file::Ops::new::<RoFs>();
 const DIR_IOPS: inode::Ops<RoFs> = inode::Ops::new::<RoFs>();
 const FILE_AOPS: address_space::Ops<RoFs> = address_space::Ops::new::<RoFs>();
-const LNK_IOPS: inode::Ops<RoFs> = inode::Ops::new::<Link>();
 
 struct RoFs;
 
@@ -65,25 +64,30 @@ fn iget(sb: &sb::SuperBlock<Self>, e: &'static Entry) -> Result<ARef<INode<Self>
             Either::Right(new) => new,
         };
 
-        let (mode, nlink, size) = match e.etype {
+        let (mode, nlink, size, typ) = match e.etype {
             inode::Type::Dir => {
                 new.set_iops(DIR_IOPS).set_fops(DIR_FOPS);
-                (0o555, 2, ENTRIES.len().try_into()?)
+                (0o555, 2, ENTRIES.len().try_into()?, inode::Type::Dir)
             }
             inode::Type::Reg => {
                 new.set_fops(file::Ops::generic_ro_file())
                     .set_aops(FILE_AOPS);
-                (0o444, 1, e.contents.len().try_into()?)
+                (0o444, 1, e.contents.len().try_into()?, inode::Type::Reg)
             }
-            inode::Type::Lnk => {
-                new.set_iops(LNK_IOPS);
-                (0o444, 1, e.contents.len().try_into()?)
+            inode::Type::Lnk(_) => {
+                new.set_iops(inode::Ops::simple_symlink_inode());
+                (
+                    0o444,
+                    1,
+                    e.contents.len().try_into()?,
+                    inode::Type::Lnk(Some(e.contents.try_into()?)),
+                )
             }
             _ => return Err(ENOENT),
         };
 
         new.init(inode::Params {
-            typ: e.etype,
+            typ,
             mode,
             size,
             blocks: (u64::try_from(size)? + 511) / 512,
@@ -138,30 +142,6 @@ fn lookup(
     }
 }
 
-struct Link;
-#[vtable]
-impl inode::Operations for Link {
-    type FileSystem = RoFs;
-
-    fn get_link<'a>(
-        dentry: Option<&DEntry<RoFs>>,
-        inode: &'a INode<RoFs>,
-    ) -> Result<Either<CString, &'a CStr>> {
-        if dentry.is_none() {
-            return Err(ECHILD);
-        }
-
-        let name_buf = inode.data().contents;
-        let mut name = Box::new_slice(
-            name_buf.len().checked_add(1).ok_or(ENOMEM)?,
-            b'\0',
-            GFP_NOFS,
-        )?;
-        name[..name_buf.len()].copy_from_slice(name_buf);
-        Ok(Either::Left(name.try_into()?))
-    }
-}
-
 #[vtable]
 impl address_space::Operations for RoFs {
     type FileSystem = Self;
@@ -212,7 +192,7 @@ fn read_dir(
         }
 
         for e in ENTRIES.iter().skip(pos.try_into()?) {
-            if !emitter.emit(1, e.name, e.ino, e.etype.into()) {
+            if !emitter.emit(1, e.name, e.ino, (&e.etype).into()) {
                 break;
             }
         }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 27/30] rust: fs: add `iomap` module
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (25 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 26/30] rust: fs: allow populating i_lnk Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-20 19:32   ` Darrick J. Wong
  2024-05-14 13:17 ` [RFC PATCH v2 28/30] rust: fs: add memalloc_nofs support Wedson Almeida Filho
                   ` (3 subsequent siblings)
  30 siblings, 1 reply; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow file systems to implement their address space operations via
iomap, which delegates a lot of the complexity to common code.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |   1 +
 rust/kernel/fs.rs               |   1 +
 rust/kernel/fs/iomap.rs         | 281 ++++++++++++++++++++++++++++++++
 3 files changed, 283 insertions(+)
 create mode 100644 rust/kernel/fs/iomap.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index f4c7c3951dbe..629fce394dbe 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -13,6 +13,7 @@
 #include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/fs_context.h>
+#include <linux/iomap.h>
 #include <linux/jiffies.h>
 #include <linux/mdio.h>
 #include <linux/pagemap.h>
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 4d90b23735bc..7a1c4884c370 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -19,6 +19,7 @@
 pub mod dentry;
 pub mod file;
 pub mod inode;
+pub mod iomap;
 pub mod sb;
 
 /// The offset of a file in a file system.
diff --git a/rust/kernel/fs/iomap.rs b/rust/kernel/fs/iomap.rs
new file mode 100644
index 000000000000..e48e200e555e
--- /dev/null
+++ b/rust/kernel/fs/iomap.rs
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! File system io maps.
+//!
+//! This module allows Rust code to use iomaps to implement filesystems.
+//!
+//! C headers: [`include/linux/iomap.h`](srctree/include/linux/iomap.h)
+
+use super::{address_space, FileSystem, INode, Offset};
+use crate::error::{from_result, Result};
+use crate::{bindings, block};
+use core::marker::PhantomData;
+
+/// The type of mapping.
+///
+/// This is used in [`Map`].
+#[repr(u16)]
+pub enum Type {
+    /// No blocks allocated, need allocation.
+    Hole = bindings::IOMAP_HOLE as u16,
+
+    /// Delayed allocation blocks.
+    DelAlloc = bindings::IOMAP_DELALLOC as u16,
+
+    /// Blocks allocated at the given address.
+    Mapped = bindings::IOMAP_MAPPED as u16,
+
+    /// Blocks allocated at the given address in unwritten state.
+    Unwritten = bindings::IOMAP_UNWRITTEN as u16,
+
+    /// Data inline in the inode.
+    Inline = bindings::IOMAP_INLINE as u16,
+}
+
+/// Flags usable in [`Map`], in [`Map::set_flags`] in particular.
+pub mod map_flags {
+    /// Indicates that the blocks have been newly allocated and need zeroing for areas that no data
+    /// is copied to.
+    pub const NEW: u16 = bindings::IOMAP_F_NEW as u16;
+
+    /// Indicates that the inode has uncommitted metadata needed to access written data and
+    /// requires fdatasync to commit them to persistent storage. This needs to take into account
+    /// metadata changes that *may* be made at IO completion, such as file size updates from direct
+    /// IO.
+    pub const DIRTY: u16 = bindings::IOMAP_F_DIRTY as u16;
+
+    /// Indicates that the blocks are shared, and will need to be unshared as part a write.
+    pub const SHARED: u16 = bindings::IOMAP_F_SHARED as u16;
+
+    /// Indicates that the iomap contains the merge of multiple block mappings.
+    pub const MERGED: u16 = bindings::IOMAP_F_MERGED as u16;
+
+    /// Indicates that the file system requires the use of buffer heads for this mapping.
+    pub const BUFFER_HEAD: u16 = bindings::IOMAP_F_BUFFER_HEAD as u16;
+
+    /// Indicates that the iomap is for an extended attribute extent rather than a file data
+    /// extent.
+    pub const XATTR: u16 = bindings::IOMAP_F_XATTR as u16;
+
+    /// Indicates to the iomap_end method that the file size has changed as the result of this
+    /// write operation.
+    pub const SIZE_CHANGED: u16 = bindings::IOMAP_F_SIZE_CHANGED as u16;
+
+    /// Indicates that the iomap is not valid any longer and the file range it covers needs to be
+    /// remapped by the high level before the operation can proceed.
+    pub const STALE: u16 = bindings::IOMAP_F_STALE as u16;
+
+    /// Flags from 0x1000 up are for file system specific usage.
+    pub const PRIVATE: u16 = bindings::IOMAP_F_PRIVATE as u16;
+}
+
+/// A map from address space to block device.
+#[repr(transparent)]
+pub struct Map<'a>(pub bindings::iomap, PhantomData<&'a ()>);
+
+impl<'a> Map<'a> {
+    /// Sets the map type.
+    pub fn set_type(&mut self, t: Type) -> &mut Self {
+        self.0.type_ = t as u16;
+        self
+    }
+
+    /// Sets the file offset, in bytes.
+    pub fn set_offset(&mut self, v: Offset) -> &mut Self {
+        self.0.offset = v;
+        self
+    }
+
+    /// Sets the length of the mapping, in bytes.
+    pub fn set_length(&mut self, len: u64) -> &mut Self {
+        self.0.length = len;
+        self
+    }
+
+    /// Sets the mapping flags.
+    ///
+    /// Values come from the [`map_flags`] module.
+    pub fn set_flags(&mut self, flags: u16) -> &mut Self {
+        self.0.flags = flags;
+        self
+    }
+
+    /// Sets the disk offset of the mapping, in bytes.
+    pub fn set_addr(&mut self, addr: u64) -> &mut Self {
+        self.0.addr = addr;
+        self
+    }
+
+    /// Sets the block device of the mapping.
+    pub fn set_bdev(&mut self, bdev: Option<&'a block::Device>) -> &mut Self {
+        self.0.bdev = if let Some(b) = bdev {
+            b.0.get()
+        } else {
+            core::ptr::null_mut()
+        };
+        self
+    }
+}
+
+/// Flags passed to [`Operations::begin`] and [`Operations::end`].
+pub mod flags {
+    /// Writing, must allocate block.
+    pub const WRITE: u32 = bindings::IOMAP_WRITE;
+
+    /// Zeroing operation, may skip holes.
+    pub const ZERO: u32 = bindings::IOMAP_ZERO;
+
+    /// Report extent status, e.g. FIEMAP.
+    pub const REPORT: u32 = bindings::IOMAP_REPORT;
+
+    /// Mapping for page fault.
+    pub const FAULT: u32 = bindings::IOMAP_FAULT;
+
+    /// Direct I/O.
+    pub const DIRECT: u32 = bindings::IOMAP_DIRECT;
+
+    /// Do not block.
+    pub const NOWAIT: u32 = bindings::IOMAP_NOWAIT;
+
+    /// Only pure overwrites allowed.
+    pub const OVERWRITE_ONLY: u32 = bindings::IOMAP_OVERWRITE_ONLY;
+
+    /// `unshare_file_range`.
+    pub const UNSHARE: u32 = bindings::IOMAP_UNSHARE;
+
+    /// DAX mapping.
+    pub const DAX: u32 = bindings::IOMAP_DAX;
+}
+
+/// Operations implemented by iomap users.
+pub trait Operations {
+    /// File system that these operations are compatible with.
+    type FileSystem: FileSystem + ?Sized;
+
+    /// Returns the existing mapping at `pos`, or reserves space starting at `pos` for up to
+    /// `length`, as long as it can be done as a single mapping. The actual length is returned in
+    /// `iomap`.
+    ///
+    /// The values of `flags` come from the [`flags`] module.
+    fn begin<'a>(
+        inode: &'a INode<Self::FileSystem>,
+        pos: Offset,
+        length: Offset,
+        flags: u32,
+        map: &mut Map<'a>,
+        srcmap: &mut Map<'a>,
+    ) -> Result;
+
+    /// Commits and/or unreserves space previously allocated using [`Operations::begin`]. `writte`n
+    /// indicates the length of the successful write operation which needs to be commited, while
+    /// the rest needs to be unreserved. `written` might be zero if no data was written.
+    ///
+    /// The values of `flags` come from the [`flags`] module.
+    fn end<'a>(
+        _inode: &'a INode<Self::FileSystem>,
+        _pos: Offset,
+        _length: Offset,
+        _written: isize,
+        _flags: u32,
+        _map: &Map<'a>,
+    ) -> Result {
+        Ok(())
+    }
+}
+
+/// Returns address space oprerations backed by iomaps.
+pub const fn ro_aops<T: Operations + ?Sized>() -> address_space::Ops<T::FileSystem> {
+    struct Table<T: Operations + ?Sized>(PhantomData<T>);
+    impl<T: Operations + ?Sized> Table<T> {
+        const MAP_TABLE: bindings::iomap_ops = bindings::iomap_ops {
+            iomap_begin: Some(Self::iomap_begin_callback),
+            iomap_end: Some(Self::iomap_end_callback),
+        };
+
+        extern "C" fn iomap_begin_callback(
+            inode_ptr: *mut bindings::inode,
+            pos: Offset,
+            length: Offset,
+            flags: u32,
+            map: *mut bindings::iomap,
+            srcmap: *mut bindings::iomap,
+        ) -> i32 {
+            from_result(|| {
+                // SAFETY: The C API guarantees that `inode_ptr` is a valid inode.
+                let inode = unsafe { INode::from_raw(inode_ptr) };
+                T::begin(
+                    inode,
+                    pos,
+                    length,
+                    flags,
+                    // SAFETY: The C API guarantees that `map` is valid for write.
+                    unsafe { &mut *map.cast::<Map<'_>>() },
+                    // SAFETY: The C API guarantees that `srcmap` is valid for write.
+                    unsafe { &mut *srcmap.cast::<Map<'_>>() },
+                )?;
+                Ok(0)
+            })
+        }
+
+        extern "C" fn iomap_end_callback(
+            inode_ptr: *mut bindings::inode,
+            pos: Offset,
+            length: Offset,
+            written: isize,
+            flags: u32,
+            map: *mut bindings::iomap,
+        ) -> i32 {
+            from_result(|| {
+                // SAFETY: The C API guarantees that `inode_ptr` is a valid inode.
+                let inode = unsafe { INode::from_raw(inode_ptr) };
+                // SAFETY: The C API guarantees that `map` is valid for read.
+                T::end(inode, pos, length, written, flags, unsafe {
+                    &*map.cast::<Map<'_>>()
+                })?;
+                Ok(0)
+            })
+        }
+
+        const TABLE: bindings::address_space_operations = bindings::address_space_operations {
+            writepage: None,
+            read_folio: Some(Self::read_folio_callback),
+            writepages: None,
+            dirty_folio: None,
+            readahead: Some(Self::readahead_callback),
+            write_begin: None,
+            write_end: None,
+            bmap: Some(Self::bmap_callback),
+            invalidate_folio: Some(bindings::iomap_invalidate_folio),
+            release_folio: Some(bindings::iomap_release_folio),
+            free_folio: None,
+            direct_IO: Some(bindings::noop_direct_IO),
+            migrate_folio: None,
+            launder_folio: None,
+            is_partially_uptodate: None,
+            is_dirty_writeback: None,
+            error_remove_folio: None,
+            swap_activate: None,
+            swap_deactivate: None,
+            swap_rw: None,
+        };
+
+        extern "C" fn read_folio_callback(
+            _file: *mut bindings::file,
+            folio: *mut bindings::folio,
+        ) -> i32 {
+            // SAFETY: `folio` is just forwarded from C and `Self::MAP_TABLE` is always valid.
+            unsafe { bindings::iomap_read_folio(folio, &Self::MAP_TABLE) }
+        }
+
+        extern "C" fn readahead_callback(rac: *mut bindings::readahead_control) {
+            // SAFETY: `rac` is just forwarded from C and `Self::MAP_TABLE` is always valid.
+            unsafe { bindings::iomap_readahead(rac, &Self::MAP_TABLE) }
+        }
+
+        extern "C" fn bmap_callback(mapping: *mut bindings::address_space, block: u64) -> u64 {
+            // SAFETY: `mapping` is just forwarded from C and `Self::MAP_TABLE` is always valid.
+            unsafe { bindings::iomap_bmap(mapping, block, &Self::MAP_TABLE) }
+        }
+    }
+    address_space::Ops(&Table::<T>::TABLE, PhantomData)
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 28/30] rust: fs: add memalloc_nofs support
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (26 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 27/30] rust: fs: add `iomap` module Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 29/30] tarfs: introduce tar fs Wedson Almeida Filho
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho,
	Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

When used in filesystems obviates the need for GFP_NOFS.

Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
---
 rust/helpers.c    | 12 ++++++++++++
 rust/kernel/fs.rs | 12 ++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/rust/helpers.c b/rust/helpers.c
index edf12868962c..c26aa07cb20f 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -273,6 +273,18 @@ void *rust_helper_alloc_inode_sb(struct super_block *sb,
 }
 EXPORT_SYMBOL_GPL(rust_helper_alloc_inode_sb);
 
+unsigned int rust_helper_memalloc_nofs_save(void)
+{
+	return memalloc_nofs_save();
+}
+EXPORT_SYMBOL_GPL(rust_helper_memalloc_nofs_save);
+
+void rust_helper_memalloc_nofs_restore(unsigned int flags)
+{
+	memalloc_nofs_restore(flags);
+}
+EXPORT_SYMBOL_GPL(rust_helper_memalloc_nofs_restore);
+
 void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
 {
 	i_uid_write(inode, uid);
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 7a1c4884c370..b7a654546d23 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -417,6 +417,18 @@ impl<T: FileSystem + ?Sized> Tables<T> {
     }
 }
 
+/// Calls `cb` in a nofs allocation context.
+///
+/// That is, if an allocation happens within `cb`, it will have the `__GFP_FS` bit cleared.
+pub fn memalloc_nofs<T>(cb: impl FnOnce() -> T) -> T {
+    // SAFETY: Function is safe to be called from any context.
+    let flags = unsafe { bindings::memalloc_nofs_save() };
+    let ret = cb();
+    // SAFETY: Function is safe to be called from any context.
+    unsafe { bindings::memalloc_nofs_restore(flags) };
+    ret
+}
+
 /// Kernel module that exposes a single file system implemented by `T`.
 #[pin_data]
 pub struct Module<T: FileSystem + ?Sized> {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 29/30] tarfs: introduce tar fs
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (27 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 28/30] rust: fs: add memalloc_nofs support Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-14 13:17 ` [RFC PATCH v2 30/30] WIP: fs: ext2: add rust ro ext2 implementation Wedson Almeida Filho
  2024-05-31 14:34 ` [RFC PATCH v2 00/30] Rust abstractions for VFS Danilo Krummrich
  30 siblings, 0 replies; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

It is a file system based on tar files and an index appended to them (to
facilitate finding fs entries without having to traverse the whole tar
file).

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 fs/Kconfig                        |   1 +
 fs/Makefile                       |   1 +
 fs/tarfs/Kconfig                  |  15 ++
 fs/tarfs/Makefile                 |   8 +
 fs/tarfs/defs.rs                  |  80 ++++++
 fs/tarfs/tar.rs                   | 394 ++++++++++++++++++++++++++++++
 scripts/generate_rust_analyzer.py |   2 +-
 7 files changed, 500 insertions(+), 1 deletion(-)
 create mode 100644 fs/tarfs/Kconfig
 create mode 100644 fs/tarfs/Makefile
 create mode 100644 fs/tarfs/defs.rs
 create mode 100644 fs/tarfs/tar.rs

diff --git a/fs/Kconfig b/fs/Kconfig
index a46b0cbc4d8f..2cbd99d6784c 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -337,6 +337,7 @@ source "fs/sysv/Kconfig"
 source "fs/ufs/Kconfig"
 source "fs/erofs/Kconfig"
 source "fs/vboxsf/Kconfig"
+source "fs/tarfs/Kconfig"
 
 endif # MISC_FILESYSTEMS
 
diff --git a/fs/Makefile b/fs/Makefile
index 6ecc9b0a53f2..d8bbda73e3a9 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -129,3 +129,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
 obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
 obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
+obj-$(CONFIG_TARFS_FS)		+= tarfs/
diff --git a/fs/tarfs/Kconfig b/fs/tarfs/Kconfig
new file mode 100644
index 000000000000..fd4f1ae0f83d
--- /dev/null
+++ b/fs/tarfs/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+
+config TARFS_FS
+	tristate "TAR file system support"
+	depends on RUST && BLOCK
+	help
+	  This is a simple read-only file system intended for mounting
+	  tar files that have had an index appened to them.
+
+	  To compile this file system support as a module, choose M here: the
+	  module will be called tarfs.
+
+	  If you don't know whether you need it, then you don't need it:
+	  answer N.
diff --git a/fs/tarfs/Makefile b/fs/tarfs/Makefile
new file mode 100644
index 000000000000..011c5d64fbe3
--- /dev/null
+++ b/fs/tarfs/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the linux tarfs filesystem routines.
+#
+
+obj-$(CONFIG_TARFS_FS) += tarfs.o
+
+tarfs-y := tar.o
diff --git a/fs/tarfs/defs.rs b/fs/tarfs/defs.rs
new file mode 100644
index 000000000000..7481b75aaab2
--- /dev/null
+++ b/fs/tarfs/defs.rs
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Definitions of tarfs structures.
+
+use kernel::types::LE;
+
+/// Flags used in [`Inode::flags`].
+pub mod inode_flags {
+    /// Indicates that the inode is opaque.
+    ///
+    /// When set, inode will have the "trusted.overlay.opaque" set to "y" at runtime.
+    pub const OPAQUE: u8 = 0x1;
+}
+
+kernel::derive_readable_from_bytes! {
+    /// An inode in the tarfs inode table.
+    #[repr(C)]
+    pub struct Inode {
+        /// The mode of the inode.
+        ///
+        /// The bottom 9 bits are the rwx bits for owner, group, all.
+        ///
+        /// The bits in the [`S_IFMT`] mask represent the file mode.
+        pub mode: LE<u16>,
+
+        /// Tarfs flags for the inode.
+        ///
+        /// Values are drawn from the [`inode_flags`] module.
+        pub flags: u8,
+
+        /// The bottom 4 bits represent the top 4 bits of mtime.
+        pub hmtime: u8,
+
+        /// The owner of the inode.
+        pub owner: LE<u32>,
+
+        /// The group of the inode.
+        pub group: LE<u32>,
+
+        /// The bottom 32 bits of mtime.
+        pub lmtime: LE<u32>,
+
+        /// Size of the contents of the inode.
+        pub size: LE<u64>,
+
+        /// Either the offset to the data, or the major and minor numbers of a device.
+        ///
+        /// For the latter, the 32 LSB are the minor, and the 32 MSB are the major numbers.
+        pub offset: LE<u64>,
+    }
+
+    /// An entry in a tarfs directory entry table.
+    #[repr(C)]
+    pub struct DirEntry {
+        /// The inode number this entry refers to.
+        pub ino: LE<u64>,
+
+        /// The offset to the name of the entry.
+        pub name_offset: LE<u64>,
+
+        /// The length of the name of the entry.
+        pub name_len: LE<u64>,
+
+        /// The type of entry.
+        pub etype: u8,
+
+        /// Unused padding.
+        pub _padding: [u8; 7],
+    }
+
+    /// The super-block of a tarfs instance.
+    #[repr(C)]
+    pub struct Header {
+        /// The offset to the beginning of the inode-table.
+        pub inode_table_offset: LE<u64>,
+
+        /// The number of inodes in the file system.
+        pub inode_count: LE<u64>,
+    }
+}
diff --git a/fs/tarfs/tar.rs b/fs/tarfs/tar.rs
new file mode 100644
index 000000000000..a3f6e468e566
--- /dev/null
+++ b/fs/tarfs/tar.rs
@@ -0,0 +1,394 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! File system based on tar files and an index.
+
+use core::mem::size_of;
+use defs::*;
+use kernel::fs::{
+    self, address_space, dentry, dentry::DEntry, file, file::File, inode, inode::INode,
+    inode::Type, iomap, sb, sb::SuperBlock, Offset, Stat,
+};
+use kernel::types::{ARef, Either, FromBytes, Locked};
+use kernel::{c_str, prelude::*, str::CString, user};
+
+pub mod defs;
+
+kernel::module_fs! {
+    type: TarFs,
+    name: "tarfs",
+    author: "Wedson Almeida Filho <walmeida@microsoft.com>",
+    description: "File system for indexed tar files",
+    license: "GPL",
+}
+
+const SECTOR_SIZE: u64 = 512;
+const TARFS_BSIZE: u64 = 1 << TARFS_BSIZE_BITS;
+const TARFS_BSIZE_BITS: u8 = 12;
+const SECTORS_PER_BLOCK: u64 = TARFS_BSIZE / SECTOR_SIZE;
+const TARFS_MAGIC: usize = 0x54415246;
+
+static_assert!(SECTORS_PER_BLOCK > 0);
+
+struct INodeData {
+    offset: u64,
+    flags: u8,
+}
+
+struct TarFs {
+    data_size: u64,
+    inode_table_offset: u64,
+    inode_count: u64,
+    mapper: inode::Mapper,
+}
+
+impl TarFs {
+    fn iget(sb: &SuperBlock<Self>, ino: u64) -> Result<ARef<INode<Self>>> {
+        // Check that the inode number is valid.
+        let h = sb.data();
+        if ino == 0 || ino > h.inode_count {
+            return Err(ENOENT);
+        }
+
+        // Create an inode or find an existing (cached) one.
+        let mut inode = match sb.get_or_create_inode(ino)? {
+            Either::Left(existing) => return Ok(existing),
+            Either::Right(new) => new,
+        };
+
+        static_assert!((TARFS_BSIZE as usize) % size_of::<Inode>() == 0);
+
+        // Load inode details from storage.
+        let offset = h.inode_table_offset + (ino - 1) * u64::try_from(size_of::<Inode>())?;
+        let b = h.mapper.mapped_folio(offset.try_into()?)?;
+        let idata = Inode::from_bytes(&b, 0).ok_or(EIO)?;
+
+        let mode = idata.mode.value();
+
+        // Ignore inodes that have unknown mode bits.
+        if (mode & !(fs::mode::S_IFMT | 0o777)) != 0 {
+            return Err(ENOENT);
+        }
+
+        const DIR_FOPS: file::Ops<TarFs> = file::Ops::new::<TarFs>();
+        const DIR_IOPS: inode::Ops<TarFs> = inode::Ops::new::<TarFs>();
+        const FILE_AOPS: address_space::Ops<TarFs> = iomap::ro_aops::<TarFs>();
+
+        let size = idata.size.value();
+        let doffset = idata.offset.value();
+        let secs = u64::from(idata.lmtime.value()) | (u64::from(idata.hmtime & 0xf) << 32);
+        let ts = kernel::time::Timespec::new(secs, 0)?;
+        let typ = match mode & fs::mode::S_IFMT {
+            fs::mode::S_IFREG => {
+                inode
+                    .set_fops(file::Ops::generic_ro_file())
+                    .set_aops(FILE_AOPS);
+                Type::Reg
+            }
+            fs::mode::S_IFDIR => {
+                inode.set_iops(DIR_IOPS).set_fops(DIR_FOPS);
+                Type::Dir
+            }
+            fs::mode::S_IFLNK => {
+                inode.set_iops(inode::Ops::simple_symlink_inode());
+                Type::Lnk(Some(Self::get_link(sb, doffset, size)?))
+            }
+            fs::mode::S_IFSOCK => Type::Sock,
+            fs::mode::S_IFIFO => Type::Fifo,
+            fs::mode::S_IFCHR => Type::Chr((doffset >> 32) as u32, doffset as u32),
+            fs::mode::S_IFBLK => Type::Blk((doffset >> 32) as u32, doffset as u32),
+            _ => return Err(ENOENT),
+        };
+        inode.init(inode::Params {
+            typ,
+            mode: mode & 0o777,
+            size: size.try_into()?,
+            blocks: (idata.size.value() + TARFS_BSIZE - 1) / TARFS_BSIZE,
+            nlink: 1,
+            uid: idata.owner.value(),
+            gid: idata.group.value(),
+            ctime: ts,
+            mtime: ts,
+            atime: ts,
+            value: INodeData {
+                offset: doffset,
+                flags: idata.flags,
+            },
+        })
+    }
+
+    fn name_eq(sb: &SuperBlock<Self>, mut name: &[u8], offset: u64) -> Result<bool> {
+        let ret =
+            sb.data()
+                .mapper
+                .for_each_page(offset as Offset, name.len().try_into()?, |data| {
+                    if data != &name[..data.len()] {
+                        return Ok(Some(()));
+                    }
+                    name = &name[data.len()..];
+                    Ok(None)
+                })?;
+        Ok(ret.is_none())
+    }
+
+    fn read_name(sb: &SuperBlock<Self>, name: &mut [u8], offset: u64) -> Result {
+        let mut copy_to = 0;
+        sb.data()
+            .mapper
+            .for_each_page(offset as Offset, name.len().try_into()?, |data| {
+                name[copy_to..][..data.len()].copy_from_slice(data);
+                copy_to += data.len();
+                Ok(None::<()>)
+            })?;
+        Ok(())
+    }
+
+    fn get_link(sb: &SuperBlock<Self>, offset: u64, len: u64) -> Result<CString> {
+        let name_len: usize = len.try_into()?;
+        let alloc_len = name_len.checked_add(1).ok_or(ENOMEM)?;
+        let mut name = Box::new_slice(alloc_len, b'\0', GFP_NOFS)?;
+        Self::read_name(sb, &mut name[..name_len], offset)?;
+        Ok(name.try_into()?)
+    }
+}
+
+impl fs::FileSystem for TarFs {
+    type Data = Box<Self>;
+    type INodeData = INodeData;
+    const NAME: &'static CStr = c_str!("tar");
+    const SUPER_TYPE: sb::Type = sb::Type::BlockDev;
+
+    fn fill_super(
+        sb: &mut SuperBlock<Self, sb::New>,
+        mapper: Option<inode::Mapper>,
+    ) -> Result<Self::Data> {
+        let Some(mapper) = mapper else {
+            return Err(EINVAL);
+        };
+
+        let scount = sb.sector_count();
+        if scount < SECTORS_PER_BLOCK {
+            pr_err!("Block device is too small: sector count={scount}\n");
+            return Err(ENXIO);
+        }
+
+        if sb.min_blocksize(SECTOR_SIZE as i32) != SECTOR_SIZE as i32 {
+            pr_err!("Block size not supported\n");
+            return Err(EIO);
+        }
+
+        let tarfs = {
+            let offset = (scount - 1) * SECTOR_SIZE;
+            let mapped = mapper.mapped_folio(offset.try_into()?)?;
+            let hdr = Header::from_bytes(&mapped, 0).ok_or(EIO)?;
+            let inode_table_offset = hdr.inode_table_offset.value();
+            let inode_count = hdr.inode_count.value();
+            drop(mapped);
+            Box::new(
+                TarFs {
+                    inode_table_offset,
+                    inode_count,
+                    data_size: scount.checked_mul(SECTOR_SIZE).ok_or(ERANGE)?,
+                    mapper,
+                },
+                GFP_KERNEL,
+            )?
+        };
+
+        // Check that the inode table starts within the device data and is aligned to the block
+        // size.
+        if tarfs.inode_table_offset >= tarfs.data_size {
+            pr_err!(
+                "inode table offset beyond data size: {} >= {}\n",
+                tarfs.inode_table_offset,
+                tarfs.data_size
+            );
+            return Err(E2BIG);
+        }
+
+        if tarfs.inode_table_offset % SECTOR_SIZE != 0 {
+            pr_err!(
+                "inode table offset not aligned to sector size: {}\n",
+                tarfs.inode_table_offset,
+            );
+            return Err(EDOM);
+        }
+
+        // Check that the last inode is within bounds (and that there is no overflow when
+        // calculating its offset).
+        let offset = tarfs
+            .inode_count
+            .checked_mul(u64::try_from(size_of::<Inode>())?)
+            .ok_or(ERANGE)?
+            .checked_add(tarfs.inode_table_offset)
+            .ok_or(ERANGE)?;
+        if offset > tarfs.data_size {
+            pr_err!(
+                "inode table extends beyond the data size : {} > {}\n",
+                tarfs.inode_table_offset + (tarfs.inode_count * size_of::<Inode>() as u64),
+                tarfs.data_size,
+            );
+            return Err(E2BIG);
+        }
+
+        sb.set_magic(TARFS_MAGIC);
+        Ok(tarfs)
+    }
+
+    fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
+        let inode = Self::iget(sb, 1)?;
+        dentry::Root::try_new(inode)
+    }
+
+    fn read_xattr(
+        _: &DEntry<Self>,
+        inode: &INode<Self>,
+        name: &CStr,
+        outbuf: &mut [u8],
+    ) -> Result<usize> {
+        if inode.data().flags & inode_flags::OPAQUE == 0
+            || name.as_bytes() != b"trusted.overlay.opaque"
+        {
+            return Err(ENODATA);
+        }
+
+        if !outbuf.is_empty() {
+            outbuf[0] = b'y';
+        }
+
+        Ok(1)
+    }
+
+    fn statfs(dentry: &DEntry<Self>) -> Result<Stat> {
+        let data = dentry.super_block().data();
+        Ok(Stat {
+            magic: TARFS_MAGIC,
+            namelen: isize::MAX,
+            bsize: TARFS_BSIZE as _,
+            blocks: data.inode_table_offset / TARFS_BSIZE,
+            files: data.inode_count,
+        })
+    }
+}
+
+impl iomap::Operations for TarFs {
+    type FileSystem = Self;
+
+    fn begin<'a>(
+        inode: &'a INode<Self>,
+        pos: Offset,
+        length: Offset,
+        _flags: u32,
+        map: &mut iomap::Map<'a>,
+        _srcmap: &mut iomap::Map<'a>,
+    ) -> Result {
+        let size = (inode.size() + 511) & !511;
+        if pos >= size {
+            map.set_offset(pos)
+                .set_length(length.try_into()?)
+                .set_flags(iomap::map_flags::MERGED)
+                .set_type(iomap::Type::Hole);
+            return Ok(());
+        }
+
+        map.set_offset(pos)
+            .set_length(core::cmp::min(length, size - pos) as u64)
+            .set_flags(iomap::map_flags::MERGED)
+            .set_type(iomap::Type::Mapped)
+            .set_bdev(Some(inode.super_block().bdev()))
+            .set_addr(u64::try_from(pos)? + inode.data().offset);
+
+        Ok(())
+    }
+}
+
+#[vtable]
+impl inode::Operations for TarFs {
+    type FileSystem = Self;
+
+    fn lookup(
+        parent: &Locked<&INode<Self>, inode::ReadSem>,
+        dentry: dentry::Unhashed<'_, Self>,
+    ) -> Result<Option<ARef<DEntry<Self>>>> {
+        let sb = parent.super_block();
+        let name = dentry.name();
+
+        let inode = sb.data().mapper.for_each_page(
+            parent.data().offset.try_into()?,
+            parent.size(),
+            |data| {
+                for e in DirEntry::from_bytes_to_slice(data).ok_or(EIO)? {
+                    if Self::name_eq(sb, name, e.name_offset.value())? {
+                        return Ok(Some(Self::iget(sb, e.ino.value())?));
+                    }
+                }
+                Ok(None)
+            },
+        )?;
+
+        dentry.splice_alias(inode)
+    }
+}
+
+#[vtable]
+impl file::Operations for TarFs {
+    type FileSystem = Self;
+
+    fn seek(file: &File<Self>, offset: Offset, whence: file::Whence) -> Result<Offset> {
+        file::generic_seek(file, offset, whence)
+    }
+
+    fn read(_: &File<Self>, _: &mut user::Writer, _: &mut Offset) -> Result<usize> {
+        Err(EISDIR)
+    }
+
+    fn read_dir(
+        _file: &File<Self>,
+        inode: &Locked<&INode<Self>, inode::ReadSem>,
+        emitter: &mut file::DirEmitter,
+    ) -> Result {
+        let sb = inode.super_block();
+        let mut name = Vec::<u8>::new();
+        let pos = emitter.pos();
+
+        if pos < 0 || pos % size_of::<DirEntry>() as i64 != 0 {
+            return Err(ENOENT);
+        }
+
+        if pos >= inode.size() {
+            return Ok(());
+        }
+
+        // Make sure the inode data doesn't overflow the data area.
+        let sizeu = u64::try_from(inode.size())?;
+        if inode.data().offset.checked_add(sizeu).ok_or(EIO)? > sb.data().data_size {
+            return Err(EIO);
+        }
+
+        sb.data().mapper.for_each_page(
+            inode.data().offset as i64 + pos,
+            inode.size() - pos,
+            |data| {
+                for e in DirEntry::from_bytes_to_slice(data).ok_or(EIO)? {
+                    let name_len = usize::try_from(e.name_len.value())?;
+                    if name_len > name.len() {
+                        name.resize(name_len, 0, GFP_NOFS)?;
+                    }
+
+                    Self::read_name(sb, &mut name[..name_len], e.name_offset.value())?;
+
+                    if !emitter.emit(
+                        size_of::<DirEntry>() as i64,
+                        &name[..name_len],
+                        e.ino.value(),
+                        file::DirEntryType::try_from(u32::from(e.etype))?,
+                    ) {
+                        return Ok(Some(()));
+                    }
+                }
+                Ok(None)
+            },
+        )?;
+
+        Ok(())
+    }
+}
diff --git a/scripts/generate_rust_analyzer.py b/scripts/generate_rust_analyzer.py
index f270c7b0cf34..6985b9e37429 100755
--- a/scripts/generate_rust_analyzer.py
+++ b/scripts/generate_rust_analyzer.py
@@ -116,7 +116,7 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
     # Then, the rest outside of `rust/`.
     #
     # We explicitly mention the top-level folders we want to cover.
-    extra_dirs = map(lambda dir: srctree / dir, ("samples", "drivers"))
+    extra_dirs = map(lambda dir: srctree / dir, ("samples", "drivers", "fs"))
     if external_src is not None:
         extra_dirs = [external_src]
     for folder in extra_dirs:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 30/30] WIP: fs: ext2: add rust ro ext2 implementation
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (28 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 29/30] tarfs: introduce tar fs Wedson Almeida Filho
@ 2024-05-14 13:17 ` Wedson Almeida Filho
  2024-05-20 20:01   ` Darrick J. Wong
  2024-05-31 14:34 ` [RFC PATCH v2 00/30] Rust abstractions for VFS Danilo Krummrich
  30 siblings, 1 reply; 35+ messages in thread
From: Wedson Almeida Filho @ 2024-05-14 13:17 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 fs/Kconfig            |   1 +
 fs/Makefile           |   1 +
 fs/rust-ext2/Kconfig  |  13 +
 fs/rust-ext2/Makefile |   8 +
 fs/rust-ext2/defs.rs  | 173 +++++++++++++
 fs/rust-ext2/ext2.rs  | 551 ++++++++++++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs    |   3 +
 7 files changed, 750 insertions(+)
 create mode 100644 fs/rust-ext2/Kconfig
 create mode 100644 fs/rust-ext2/Makefile
 create mode 100644 fs/rust-ext2/defs.rs
 create mode 100644 fs/rust-ext2/ext2.rs

diff --git a/fs/Kconfig b/fs/Kconfig
index 2cbd99d6784c..cf0cac5c5b1e 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -338,6 +338,7 @@ source "fs/ufs/Kconfig"
 source "fs/erofs/Kconfig"
 source "fs/vboxsf/Kconfig"
 source "fs/tarfs/Kconfig"
+source "fs/rust-ext2/Kconfig"
 
 endif # MISC_FILESYSTEMS
 
diff --git a/fs/Makefile b/fs/Makefile
index d8bbda73e3a9..c1a3007efc7d 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -130,3 +130,4 @@ obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
 obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
 obj-$(CONFIG_TARFS_FS)		+= tarfs/
+obj-$(CONFIG_RUST_EXT2_FS)	+= rust-ext2/
diff --git a/fs/rust-ext2/Kconfig b/fs/rust-ext2/Kconfig
new file mode 100644
index 000000000000..976371655ca6
--- /dev/null
+++ b/fs/rust-ext2/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+
+config RUST_EXT2_FS
+	tristate "Rust second extended fs support"
+	depends on RUST && BLOCK
+	help
+	  Ext2 is a standard Linux file system for hard disks.
+
+	  To compile this file system support as a module, choose M here: the
+	  module will be called rust_ext2.
+
+	  If unsure, say Y.
diff --git a/fs/rust-ext2/Makefile b/fs/rust-ext2/Makefile
new file mode 100644
index 000000000000..ac960b5f89d7
--- /dev/null
+++ b/fs/rust-ext2/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the linux tarfs filesystem routines.
+#
+
+obj-$(CONFIG_RUST_EXT2_FS) += rust_ext2.o
+
+rust_ext2-y := ext2.o
diff --git a/fs/rust-ext2/defs.rs b/fs/rust-ext2/defs.rs
new file mode 100644
index 000000000000..5f84852b4961
--- /dev/null
+++ b/fs/rust-ext2/defs.rs
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Definitions of tarfs structures.
+
+use kernel::types::LE;
+
+pub(crate) const EXT2_SUPER_MAGIC: u16 = 0xEF53;
+
+pub(crate) const EXT2_MAX_BLOCK_LOG_SIZE: u32 = 16;
+
+pub(crate) const EXT2_GOOD_OLD_REV: u32 = 0; /* The good old (original) format */
+pub(crate) const EXT2_DYNAMIC_REV: u32 = 1; /* V2 format w/ dynamic inode sizes */
+
+pub(crate) const EXT2_GOOD_OLD_INODE_SIZE: u16 = 128;
+
+pub(crate) const EXT2_ROOT_INO: u32 = 2; /* Root inode */
+
+/* First non-reserved inode for old ext2 filesystems. */
+pub(crate) const EXT2_GOOD_OLD_FIRST_INO: u32 = 11;
+
+pub(crate) const EXT2_FEATURE_INCOMPAT_FILETYPE: u32 = 0x0002;
+
+/*
+ * Constants relative to the data blocks
+ */
+pub(crate) const EXT2_NDIR_BLOCKS: usize = 12;
+pub(crate) const EXT2_IND_BLOCK: usize = EXT2_NDIR_BLOCKS;
+pub(crate) const EXT2_DIND_BLOCK: usize = EXT2_IND_BLOCK + 1;
+pub(crate) const EXT2_TIND_BLOCK: usize = EXT2_DIND_BLOCK + 1;
+pub(crate) const EXT2_N_BLOCKS: usize = EXT2_TIND_BLOCK + 1;
+
+kernel::derive_readable_from_bytes! {
+    #[repr(C)]
+    pub(crate) struct Super {
+        pub(crate) inodes_count: LE<u32>,
+        pub(crate) blocks_count: LE<u32>,
+        pub(crate) r_blocks_count: LE<u32>,
+        pub(crate) free_blocks_count: LE<u32>, /* Free blocks count */
+        pub(crate) free_inodes_count: LE<u32>, /* Free inodes count */
+        pub(crate) first_data_block: LE<u32>,  /* First Data Block */
+        pub(crate) log_block_size: LE<u32>,    /* Block size */
+        pub(crate) log_frag_size: LE<u32>,     /* Fragment size */
+        pub(crate) blocks_per_group: LE<u32>,  /* # Blocks per group */
+        pub(crate) frags_per_group: LE<u32>,   /* # Fragments per group */
+        pub(crate) inodes_per_group: LE<u32>,  /* # Inodes per group */
+        pub(crate) mtime: LE<u32>,             /* Mount time */
+        pub(crate) wtime: LE<u32>,             /* Write time */
+        pub(crate) mnt_count: LE<u16>,         /* Mount count */
+        pub(crate) max_mnt_count: LE<u16>,     /* Maximal mount count */
+        pub(crate) magic: LE<u16>,             /* Magic signature */
+        pub(crate) state: LE<u16>,             /* File system state */
+        pub(crate) errors: LE<u16>,            /* Behaviour when detecting errors */
+        pub(crate) minor_rev_level: LE<u16>,   /* minor revision level */
+        pub(crate) lastcheck: LE<u32>,         /* time of last check */
+        pub(crate) checkinterval: LE<u32>,     /* max. time between checks */
+        pub(crate) creator_os: LE<u32>,        /* OS */
+        pub(crate) rev_level: LE<u32>,         /* Revision level */
+        pub(crate) def_resuid: LE<u16>,        /* Default uid for reserved blocks */
+        pub(crate) def_resgid: LE<u16>,        /* Default gid for reserved blocks */
+        /*
+         * These fields are for EXT2_DYNAMIC_REV superblocks only.
+         *
+         * Note: the difference between the compatible feature set and
+         * the incompatible feature set is that if there is a bit set
+         * in the incompatible feature set that the kernel doesn't
+         * know about, it should refuse to mount the filesystem.
+         *
+         * e2fsck's requirements are more strict; if it doesn't know
+         * about a feature in either the compatible or incompatible
+         * feature set, it must abort and not try to meddle with
+         * things it doesn't understand...
+         */
+        pub(crate) first_ino: LE<u32>,              /* First non-reserved inode */
+        pub(crate) inode_size: LE<u16>,             /* size of inode structure */
+        pub(crate) block_group_nr: LE<u16>,         /* block group # of this superblock */
+        pub(crate) feature_compat: LE<u32>,         /* compatible feature set */
+        pub(crate) feature_incompat: LE<u32>,       /* incompatible feature set */
+        pub(crate) feature_ro_compat: LE<u32>,      /* readonly-compatible feature set */
+        pub(crate) uuid: [u8; 16],                  /* 128-bit uuid for volume */
+        pub(crate) volume_name: [u8; 16],           /* volume name */
+        pub(crate) last_mounted: [u8; 64],          /* directory where last mounted */
+        pub(crate) algorithm_usage_bitmap: LE<u32>, /* For compression */
+        /*
+         * Performance hints.  Directory preallocation should only
+         * happen if the EXT2_COMPAT_PREALLOC flag is on.
+         */
+        pub(crate) prealloc_blocks: u8,    /* Nr of blocks to try to preallocate*/
+        pub(crate) prealloc_dir_blocks: u8,        /* Nr to preallocate for dirs */
+        padding1: u16,
+        /*
+         * Journaling support valid if EXT3_FEATURE_COMPAT_HAS_JOURNAL set.
+         */
+        pub(crate) journal_uuid: [u8; 16],      /* uuid of journal superblock */
+        pub(crate) journal_inum: u32,           /* inode number of journal file */
+        pub(crate) journal_dev: u32,            /* device number of journal file */
+        pub(crate) last_orphan: u32,            /* start of list of inodes to delete */
+        pub(crate) hash_seed: [u32; 4],         /* HTREE hash seed */
+        pub(crate) def_hash_version: u8,        /* Default hash version to use */
+        pub(crate) reserved_char_pad: u8,
+        pub(crate) reserved_word_pad: u16,
+        pub(crate) default_mount_opts: LE<u32>,
+        pub(crate) first_meta_bg: LE<u32>,      /* First metablock block group */
+        reserved: [u32; 190],                   /* Padding to the end of the block */
+    }
+
+    #[repr(C)]
+    #[derive(Clone, Copy)]
+    pub(crate) struct Group {
+        /// Blocks bitmap block.
+        pub block_bitmap: LE<u32>,
+
+        /// Inodes bitmap block.
+        pub inode_bitmap: LE<u32>,
+
+        /// Inodes table block.
+        pub inode_table: LE<u32>,
+
+        /// Number of free blocks.
+        pub free_blocks_count: LE<u16>,
+
+        /// Number of free inodes.
+        pub free_inodes_count: LE<u16>,
+
+        /// Number of directories.
+        pub used_dirs_count: LE<u16>,
+
+        pad: LE<u16>,
+        reserved: [u32; 3],
+    }
+
+    #[repr(C)]
+    pub(crate) struct INode {
+        pub mode: LE<u16>,                  /* File mode */
+        pub uid: LE<u16>,                   /* Low 16 bits of Owner Uid */
+        pub size: LE<u32>,                  /* Size in bytes */
+        pub atime: LE<u32>,                 /* Access time */
+        pub ctime: LE<u32>,                 /* Creation time */
+        pub mtime: LE<u32>,                 /* Modification time */
+        pub dtime: LE<u32>,                 /* Deletion Time */
+        pub gid: LE<u16>,                   /* Low 16 bits of Group Id */
+        pub links_count: LE<u16>,           /* Links count */
+        pub blocks: LE<u32>,                /* Blocks count */
+        pub flags: LE<u32>,                 /* File flags */
+        pub reserved1: LE<u32>,
+        pub block: [LE<u32>; EXT2_N_BLOCKS],/* Pointers to blocks */
+        pub generation: LE<u32>,            /* File version (for NFS) */
+        pub file_acl: LE<u32>,              /* File ACL */
+        pub dir_acl: LE<u32>,               /* Directory ACL */
+        pub faddr: LE<u32>,                 /* Fragment address */
+        pub frag: u8,	                    /* Fragment number */
+        pub fsize: u8,	                    /* Fragment size */
+        pub pad1: LE<u16>,
+        pub uid_high: LE<u16>,
+        pub gid_high: LE<u16>,
+        pub reserved2: LE<u32>,
+    }
+
+    #[repr(C)]
+    pub(crate) struct DirEntry {
+        pub(crate) inode: LE<u32>,       /* Inode number */
+        pub(crate) rec_len: LE<u16>,     /* Directory entry length */
+        pub(crate) name_len: u8,         /* Name length */
+        pub(crate) file_type: u8,        /* Only if the "filetype" feature flag is set. */
+    }
+}
+
+pub(crate) const FT_REG_FILE: u8 = 1;
+pub(crate) const FT_DIR: u8 = 2;
+pub(crate) const FT_CHRDEV: u8 = 3;
+pub(crate) const FT_BLKDEV: u8 = 4;
+pub(crate) const FT_FIFO: u8 = 5;
+pub(crate) const FT_SOCK: u8 = 6;
+pub(crate) const FT_SYMLINK: u8 = 7;
diff --git a/fs/rust-ext2/ext2.rs b/fs/rust-ext2/ext2.rs
new file mode 100644
index 000000000000..2d6b1e7ca156
--- /dev/null
+++ b/fs/rust-ext2/ext2.rs
@@ -0,0 +1,551 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Ext2 file system.
+
+use alloc::vec::Vec;
+use core::mem::size_of;
+use defs::*;
+use kernel::fs::{
+    self, address_space, dentry, dentry::DEntry, file, file::File, inode, inode::INode, iomap, sb,
+    sb::SuperBlock, Offset,
+};
+use kernel::types::{ARef, Either, FromBytes, Locked, LE};
+use kernel::{block, c_str, prelude::*, str::CString, time::Timespec, user, PAGE_SIZE};
+
+pub mod defs;
+
+kernel::module_fs! {
+    type: Ext2Fs,
+    name: "ext2",
+    author: "Wedson Almeida Filho <walmeida@microsoft.com>",
+    description: "ext2 file system",
+    license: "GPL",
+}
+
+const SB_OFFSET: Offset = 1024;
+
+struct INodeData {
+    data_blocks: [u32; defs::EXT2_N_BLOCKS],
+}
+
+struct Ext2Fs {
+    mapper: inode::Mapper,
+    block_size: u32,
+    has_file_type: bool,
+    _block_size_bits: u32,
+    inodes_per_block: u32,
+    inodes_per_group: u32,
+    inode_count: u32,
+    inode_size: u16,
+    first_ino: u32,
+    group: Vec<defs::Group>,
+}
+
+impl Ext2Fs {
+    fn iget(sb: &SuperBlock<Self>, ino: u32) -> Result<ARef<INode<Self>>> {
+        let s = sb.data();
+        if (ino != EXT2_ROOT_INO && ino < s.first_ino) || ino > s.inode_count {
+            return Err(ENOENT);
+        }
+        let group = ((ino - 1) / s.inodes_per_group) as usize;
+        let offset = (ino - 1) % s.inodes_per_group;
+
+        if group >= s.group.len() {
+            return Err(ENOENT);
+        }
+
+        // Create an inode or find an existing (cached) one.
+        let mut inode = match sb.get_or_create_inode(ino.into())? {
+            Either::Left(existing) => return Ok(existing),
+            Either::Right(new) => new,
+        };
+
+        let inodes_block = Offset::from(s.group[group].inode_table.value());
+        let inode_block = inodes_block + Offset::from(offset / s.inodes_per_block);
+        let offset = (offset % s.inodes_per_block) as usize;
+        let b = sb
+            .data()
+            .mapper
+            .mapped_folio(inode_block * Offset::from(s.block_size))?;
+        let idata = defs::INode::from_bytes(&b, offset * s.inode_size as usize).ok_or(EIO)?;
+        let mode = idata.mode.value();
+
+        if idata.links_count.value() == 0 && (mode == 0 || idata.dtime.value() != 0) {
+            return Err(ESTALE);
+        }
+
+        const DIR_FOPS: file::Ops<Ext2Fs> = file::Ops::new::<Ext2Fs>();
+        const DIR_IOPS: inode::Ops<Ext2Fs> = inode::Ops::new::<Ext2Fs>();
+        const FILE_AOPS: address_space::Ops<Ext2Fs> = iomap::ro_aops::<Ext2Fs>();
+
+        let mut size = idata.size.value().into();
+        let typ = match mode & fs::mode::S_IFMT {
+            fs::mode::S_IFREG => {
+                size |= Offset::from(idata.dir_acl.value())
+                    .checked_shl(32)
+                    .ok_or(EUCLEAN)?;
+                inode
+                    .set_aops(FILE_AOPS)
+                    .set_fops(file::Ops::generic_ro_file());
+                inode::Type::Reg
+            }
+            fs::mode::S_IFDIR => {
+                inode
+                    .set_iops(DIR_IOPS)
+                    .set_fops(DIR_FOPS)
+                    .set_aops(FILE_AOPS);
+                inode::Type::Dir
+            }
+            fs::mode::S_IFLNK => {
+                if idata.blocks.value() == 0 {
+                    const OFFSET: usize = core::mem::offset_of!(defs::INode, block);
+                    let name = &b[offset * usize::from(s.inode_size) + OFFSET..];
+                    let name_len = size as usize;
+                    if name_len > name.len() || name_len == 0 {
+                        return Err(EIO);
+                    }
+                    inode.set_iops(inode::Ops::simple_symlink_inode());
+                    inode::Type::Lnk(Some(CString::try_from(&name[..name_len])?))
+                } else {
+                    inode
+                        .set_aops(FILE_AOPS)
+                        .set_iops(inode::Ops::page_symlink_inode());
+                    inode::Type::Lnk(None)
+                }
+            }
+            fs::mode::S_IFSOCK => inode::Type::Sock,
+            fs::mode::S_IFIFO => inode::Type::Fifo,
+            fs::mode::S_IFCHR => {
+                let (major, minor) = decode_dev(&idata.block);
+                inode::Type::Chr(major, minor)
+            }
+            fs::mode::S_IFBLK => {
+                let (major, minor) = decode_dev(&idata.block);
+                inode::Type::Blk(major, minor)
+            }
+            _ => return Err(ENOENT),
+        };
+        inode.init(inode::Params {
+            typ,
+            mode: mode & 0o777,
+            size,
+            blocks: idata.blocks.value().into(),
+            nlink: idata.links_count.value().into(),
+            uid: u32::from(idata.uid.value()) | u32::from(idata.uid_high.value()) << 16,
+            gid: u32::from(idata.gid.value()) | u32::from(idata.gid_high.value()) << 16,
+            ctime: Timespec::new(idata.ctime.value().into(), 0)?,
+            mtime: Timespec::new(idata.mtime.value().into(), 0)?,
+            atime: Timespec::new(idata.atime.value().into(), 0)?,
+            value: INodeData {
+                data_blocks: core::array::from_fn(|i| idata.block[i].value()),
+            },
+        })
+    }
+
+    fn offsets<'a>(&self, mut block: u64, out: &'a mut [u32]) -> Option<&'a [u32]> {
+        let ptrs = u64::from(self.block_size / size_of::<u32>() as u32);
+        let ptr_mask = ptrs - 1;
+        let ptr_bits = ptrs.trailing_zeros();
+
+        if block < EXT2_NDIR_BLOCKS as u64 {
+            out[0] = block as u32;
+            return Some(&out[..1]);
+        }
+
+        block -= EXT2_NDIR_BLOCKS as u64;
+        if block < ptrs {
+            out[0] = EXT2_IND_BLOCK as u32;
+            out[1] = block as u32;
+            return Some(&out[..2]);
+        }
+
+        block -= ptrs;
+        if block < (1 << (2 * ptr_bits)) {
+            out[0] = EXT2_DIND_BLOCK as u32;
+            out[1] = (block >> ptr_bits) as u32;
+            out[2] = (block & ptr_mask) as u32;
+            return Some(&out[..3]);
+        }
+
+        block -= ptrs * ptrs;
+        if block < ptrs * ptrs * ptrs {
+            out[0] = EXT2_TIND_BLOCK as u32;
+            out[1] = (block >> (2 * ptr_bits)) as u32;
+            out[2] = ((block >> ptr_bits) & ptr_mask) as u32;
+            out[3] = (block & ptr_mask) as u32;
+            return Some(&out[..4]);
+        }
+
+        None
+    }
+
+    fn offset_to_block(inode: &INode<Self>, block: Offset) -> Result<u64> {
+        let s = inode.super_block().data();
+        let mut indices = [0u32; 4];
+        let boffsets = s.offsets(block as u64, &mut indices).ok_or(EIO)?;
+        let mut boffset = inode.data().data_blocks[boffsets[0] as usize];
+        let mapper = &s.mapper;
+        for i in &boffsets[1..] {
+            let b = mapper.mapped_folio(Offset::from(boffset) * Offset::from(s.block_size))?;
+            let table = LE::<u32>::from_bytes_to_slice(&b).ok_or(EIO)?;
+            boffset = table[*i as usize].value();
+        }
+        Ok(boffset.into())
+    }
+
+    fn check_descriptors(s: &Super, groups: &[Group]) -> Result {
+        for (i, g) in groups.iter().enumerate() {
+            let first = i as u32 * s.blocks_per_group.value() + s.first_data_block.value();
+            let last = if i == groups.len() - 1 {
+                s.blocks_count.value()
+            } else {
+                first + s.blocks_per_group.value() - 1
+            };
+
+            if g.block_bitmap.value() < first || g.block_bitmap.value() > last {
+                pr_err!(
+                    "Block bitmap for group {i} no in group (block {})\n",
+                    g.block_bitmap.value()
+                );
+                return Err(EINVAL);
+            }
+
+            if g.inode_bitmap.value() < first || g.inode_bitmap.value() > last {
+                pr_err!(
+                    "Inode bitmap for group {i} no in group (block {})\n",
+                    g.inode_bitmap.value()
+                );
+                return Err(EINVAL);
+            }
+
+            if g.inode_table.value() < first || g.inode_table.value() > last {
+                pr_err!(
+                    "Inode table for group {i} no in group (block {})\n",
+                    g.inode_table.value()
+                );
+                return Err(EINVAL);
+            }
+        }
+        Ok(())
+    }
+}
+
+impl fs::FileSystem for Ext2Fs {
+    type Data = Box<Self>;
+    type INodeData = INodeData;
+    const NAME: &'static CStr = c_str!("rust-ext2");
+    const SUPER_TYPE: sb::Type = sb::Type::BlockDev;
+
+    fn fill_super(
+        sb: &mut SuperBlock<Self, sb::New>,
+        mapper: Option<inode::Mapper>,
+    ) -> Result<Self::Data> {
+        let Some(mapper) = mapper else {
+            return Err(EINVAL);
+        };
+
+        if sb.min_blocksize(PAGE_SIZE as i32) == 0 {
+            pr_err!("Unable to set block size\n");
+            return Err(EINVAL);
+        }
+
+        // Map the super block and check the magic number.
+        let mapped = mapper.mapped_folio(SB_OFFSET)?;
+        let s = Super::from_bytes(&mapped, 0).ok_or(EIO)?;
+
+        if s.magic.value() != EXT2_SUPER_MAGIC {
+            return Err(EINVAL);
+        }
+
+        // Check for unsupported flags.
+        let mut has_file_type = false;
+        if s.rev_level.value() >= EXT2_DYNAMIC_REV {
+            let features = s.feature_incompat.value();
+            if features & !EXT2_FEATURE_INCOMPAT_FILETYPE != 0 {
+                pr_err!("Unsupported incompatible feature: {:x}\n", features);
+                return Err(EINVAL);
+            }
+
+            has_file_type = features & EXT2_FEATURE_INCOMPAT_FILETYPE != 0;
+
+            let features = s.feature_ro_compat.value();
+            if !sb.rdonly() && features != 0 {
+                pr_err!("Unsupported rw incompatible feature: {:x}\n", features);
+                return Err(EINVAL);
+            }
+        }
+
+        // Set the block size.
+        let block_size_bits = s.log_block_size.value();
+        if block_size_bits > EXT2_MAX_BLOCK_LOG_SIZE - 10 {
+            pr_err!("Invalid log block size: {}\n", block_size_bits);
+            return Err(EINVAL);
+        }
+
+        let block_size = 1024u32 << block_size_bits;
+        if sb.min_blocksize(block_size as i32) != block_size as i32 {
+            pr_err!("Bad block size: {}\n", block_size);
+            return Err(ENXIO);
+        }
+
+        // Get the first inode and the inode size.
+        let (inode_size, first_ino) = if s.rev_level.value() == EXT2_GOOD_OLD_REV {
+            (EXT2_GOOD_OLD_INODE_SIZE, EXT2_GOOD_OLD_FIRST_INO)
+        } else {
+            let size = s.inode_size.value();
+            if size < EXT2_GOOD_OLD_INODE_SIZE
+                || !size.is_power_of_two()
+                || u32::from(size) > block_size
+            {
+                pr_err!("Unsupported inode size: {}\n", size);
+                return Err(EINVAL);
+            }
+            (size, s.first_ino.value())
+        };
+
+        // Get the number of inodes per group and per block.
+        let inode_count = s.inodes_count.value();
+        let inodes_per_group = s.inodes_per_group.value();
+        let inodes_per_block = block_size / u32::from(inode_size);
+        if inodes_per_group == 0 || inodes_per_block == 0 {
+            return Err(EINVAL);
+        }
+
+        if inodes_per_group > block_size * 8 || inodes_per_group < inodes_per_block {
+            pr_err!("Bad inodes per group: {}\n", inodes_per_group);
+            return Err(EINVAL);
+        }
+
+        // Check the size of the groups.
+        let itb_per_group = inodes_per_group / inodes_per_block;
+        let blocks_per_group = s.blocks_per_group.value();
+        if blocks_per_group > block_size * 8 || blocks_per_group <= itb_per_group + 3 {
+            pr_err!("Bad blocks per group: {}\n", blocks_per_group);
+            return Err(EINVAL);
+        }
+
+        let blocks_count = s.blocks_count.value();
+        if block::Sector::from(blocks_count) > sb.sector_count() >> (1 + block_size_bits) {
+            pr_err!(
+                "Block count ({blocks_count}) exceeds size of device ({})\n",
+                sb.sector_count() >> (1 + block_size_bits)
+            );
+            return Err(EINVAL);
+        }
+
+        let group_count = (blocks_count - s.first_data_block.value() - 1) / blocks_per_group + 1;
+        if group_count * inodes_per_group != inode_count {
+            pr_err!(
+                "Unexpected inode count: {inode_count} vs {}",
+                group_count * inodes_per_group
+            );
+            return Err(EINVAL);
+        }
+
+        let mut groups = Vec::new();
+        groups.reserve(group_count as usize, GFP_NOFS)?;
+
+        let mut remain = group_count;
+        let mut offset = (SB_OFFSET / Offset::from(block_size) + 1) * Offset::from(block_size);
+        while remain > 0 {
+            let b = mapper.mapped_folio(offset)?;
+            for g in Group::from_bytes_to_slice(&b).ok_or(EIO)? {
+                groups.push(*g, GFP_NOFS)?;
+                remain -= 1;
+                if remain == 0 {
+                    break;
+                }
+            }
+            offset += Offset::try_from(b.len())?;
+        }
+
+        Self::check_descriptors(s, &groups)?;
+
+        sb.set_magic(s.magic.value().into());
+        drop(mapped);
+        Ok(Box::new(
+            Ext2Fs {
+                mapper,
+                block_size,
+                _block_size_bits: block_size_bits,
+                has_file_type,
+                inodes_per_group,
+                inodes_per_block,
+                inode_count,
+                inode_size,
+                first_ino,
+                group: groups,
+            },
+            GFP_KERNEL,
+        )?)
+    }
+
+    fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
+        let inode = Self::iget(sb, EXT2_ROOT_INO)?;
+        dentry::Root::try_new(inode)
+    }
+}
+
+fn rec_len(d: &DirEntry) -> u32 {
+    let len = d.rec_len.value();
+
+    if PAGE_SIZE >= 65536 && len == u16::MAX {
+        1u32 << 16
+    } else {
+        len.into()
+    }
+}
+
+#[vtable]
+impl file::Operations for Ext2Fs {
+    type FileSystem = Self;
+
+    fn seek(file: &File<Self>, offset: Offset, whence: file::Whence) -> Result<Offset> {
+        file::generic_seek(file, offset, whence)
+    }
+
+    fn read(_: &File<Self>, _: &mut user::Writer, _: &mut Offset) -> Result<usize> {
+        Err(EISDIR)
+    }
+
+    fn read_dir(
+        _file: &File<Self>,
+        inode: &Locked<&INode<Self>, inode::ReadSem>,
+        emitter: &mut file::DirEmitter,
+    ) -> Result {
+        let has_file_type = inode.super_block().data().has_file_type;
+
+        inode.for_each_page(emitter.pos(), Offset::MAX, |data| {
+            let mut offset = 0usize;
+            let mut acc: Offset = 0;
+            let limit = data.len().saturating_sub(size_of::<DirEntry>());
+            while offset < limit {
+                let dirent = DirEntry::from_bytes(data, offset).ok_or(EIO)?;
+                offset += size_of::<DirEntry>();
+
+                let name_len = usize::from(dirent.name_len);
+                if data.len() - offset < name_len {
+                    return Err(EIO);
+                }
+
+                let name = &data[offset..][..name_len];
+                let rec_len = rec_len(dirent);
+                offset = offset - size_of::<DirEntry>() + rec_len as usize;
+                if rec_len == 0 || offset > data.len() {
+                    return Err(EIO);
+                }
+
+                acc += Offset::from(rec_len);
+                let ino = dirent.inode.value();
+                if ino == 0 {
+                    continue;
+                }
+
+                let t = if !has_file_type {
+                    file::DirEntryType::Unknown
+                } else {
+                    match dirent.file_type {
+                        FT_REG_FILE => file::DirEntryType::Reg,
+                        FT_DIR => file::DirEntryType::Dir,
+                        FT_SYMLINK => file::DirEntryType::Lnk,
+                        FT_CHRDEV => file::DirEntryType::Chr,
+                        FT_BLKDEV => file::DirEntryType::Blk,
+                        FT_FIFO => file::DirEntryType::Fifo,
+                        FT_SOCK => file::DirEntryType::Sock,
+                        _ => continue,
+                    }
+                };
+
+                if !emitter.emit(acc, name, ino.into(), t) {
+                    return Ok(Some(()));
+                }
+                acc = 0;
+            }
+            Ok(None)
+        })?;
+        Ok(())
+    }
+}
+
+#[vtable]
+impl inode::Operations for Ext2Fs {
+    type FileSystem = Self;
+
+    fn lookup(
+        parent: &Locked<&INode<Self>, inode::ReadSem>,
+        dentry: dentry::Unhashed<'_, Self>,
+    ) -> Result<Option<ARef<DEntry<Self>>>> {
+        let inode = parent.for_each_page(0, Offset::MAX, |data| {
+            let mut offset = 0usize;
+            while data.len() - offset > size_of::<DirEntry>() {
+                let dirent = DirEntry::from_bytes(data, offset).ok_or(EIO)?;
+                offset += size_of::<DirEntry>();
+
+                let name_len = usize::from(dirent.name_len);
+                if data.len() - offset < name_len {
+                    return Err(EIO);
+                }
+
+                let name = &data[offset..][..name_len];
+
+                offset = offset - size_of::<DirEntry>() + usize::from(dirent.rec_len.value());
+                if offset > data.len() {
+                    return Err(EIO);
+                }
+
+                let ino = dirent.inode.value();
+                if ino != 0 && name == dentry.name() {
+                    return Ok(Some(Self::iget(parent.super_block(), ino)?));
+                }
+            }
+            Ok(None)
+        })?;
+
+        dentry.splice_alias(inode)
+    }
+}
+
+impl iomap::Operations for Ext2Fs {
+    type FileSystem = Self;
+
+    fn begin<'a>(
+        inode: &'a INode<Self>,
+        pos: Offset,
+        length: Offset,
+        _flags: u32,
+        map: &mut iomap::Map<'a>,
+        _srcmap: &mut iomap::Map<'a>,
+    ) -> Result {
+        let size = inode.size();
+        if pos >= size {
+            map.set_offset(pos)
+                .set_length(length.try_into()?)
+                .set_flags(iomap::map_flags::MERGED)
+                .set_type(iomap::Type::Hole);
+            return Ok(());
+        }
+
+        let block_size = inode.super_block().data().block_size as Offset;
+        let block = pos / block_size;
+
+        let boffset = Self::offset_to_block(inode, block)?;
+        map.set_offset(block * block_size)
+            .set_length(block_size as u64)
+            .set_flags(iomap::map_flags::MERGED)
+            .set_type(iomap::Type::Mapped)
+            .set_bdev(Some(inode.super_block().bdev()))
+            .set_addr(boffset * block_size as u64);
+
+        Ok(())
+    }
+}
+
+fn decode_dev(block: &[LE<u32>]) -> (u32, u32) {
+    let v = block[0].value();
+    if v != 0 {
+        ((v >> 8) & 255, v & 255)
+    } else {
+        let v = block[1].value();
+        ((v & 0xfff00) >> 8, (v & 0xff) | ((v >> 12) & 0xfff00))
+    }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 445599d4bff6..732bc9939f7f 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -165,3 +165,6 @@ macro_rules! container_of {
         ptr.wrapping_sub(offset) as *const $type
     }}
 }
+
+/// The size in bytes of a page of memory.
+pub const PAGE_SIZE: usize = bindings::PAGE_SIZE;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 27/30] rust: fs: add `iomap` module
  2024-05-14 13:17 ` [RFC PATCH v2 27/30] rust: fs: add `iomap` module Wedson Almeida Filho
@ 2024-05-20 19:32   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2024-05-20 19:32 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

On Tue, May 14, 2024 at 10:17:08AM -0300, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow file systems to implement their address space operations via
> iomap, which delegates a lot of the complexity to common code.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>  rust/bindings/bindings_helper.h |   1 +
>  rust/kernel/fs.rs               |   1 +
>  rust/kernel/fs/iomap.rs         | 281 ++++++++++++++++++++++++++++++++
>  3 files changed, 283 insertions(+)
>  create mode 100644 rust/kernel/fs/iomap.rs
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index f4c7c3951dbe..629fce394dbe 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -13,6 +13,7 @@
>  #include <linux/file.h>
>  #include <linux/fs.h>
>  #include <linux/fs_context.h>
> +#include <linux/iomap.h>
>  #include <linux/jiffies.h>
>  #include <linux/mdio.h>
>  #include <linux/pagemap.h>
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index 4d90b23735bc..7a1c4884c370 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -19,6 +19,7 @@
>  pub mod dentry;
>  pub mod file;
>  pub mod inode;
> +pub mod iomap;
>  pub mod sb;
>  
>  /// The offset of a file in a file system.
> diff --git a/rust/kernel/fs/iomap.rs b/rust/kernel/fs/iomap.rs
> new file mode 100644
> index 000000000000..e48e200e555e
> --- /dev/null
> +++ b/rust/kernel/fs/iomap.rs
> @@ -0,0 +1,281 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! File system io maps.

iomap is really more of a space mapping library for filesystems at this
point.  Most of the code supports IO to pmem or block devices, but some
of the pieces (e.g. swapfile, FIEMAP, llseek) aren't the IO path.

> +//!
> +//! This module allows Rust code to use iomaps to implement filesystems.
> +//!
> +//! C headers: [`include/linux/iomap.h`](srctree/include/linux/iomap.h)
> +
> +use super::{address_space, FileSystem, INode, Offset};
> +use crate::error::{from_result, Result};
> +use crate::{bindings, block};
> +use core::marker::PhantomData;
> +
> +/// The type of mapping.
> +///
> +/// This is used in [`Map`].
> +#[repr(u16)]
> +pub enum Type {
> +    /// No blocks allocated, need allocation.
> +    Hole = bindings::IOMAP_HOLE as u16,
> +
> +    /// Delayed allocation blocks.
> +    DelAlloc = bindings::IOMAP_DELALLOC as u16,
> +
> +    /// Blocks allocated at the given address.
> +    Mapped = bindings::IOMAP_MAPPED as u16,
> +
> +    /// Blocks allocated at the given address in unwritten state.
> +    Unwritten = bindings::IOMAP_UNWRITTEN as u16,
> +
> +    /// Data inline in the inode.
> +    Inline = bindings::IOMAP_INLINE as u16,
> +}
> +
> +/// Flags usable in [`Map`], in [`Map::set_flags`] in particular.
> +pub mod map_flags {
> +    /// Indicates that the blocks have been newly allocated and need zeroing for areas that no data
> +    /// is copied to.
> +    pub const NEW: u16 = bindings::IOMAP_F_NEW as u16;
> +
> +    /// Indicates that the inode has uncommitted metadata needed to access written data and
> +    /// requires fdatasync to commit them to persistent storage. This needs to take into account
> +    /// metadata changes that *may* be made at IO completion, such as file size updates from direct
> +    /// IO.
> +    pub const DIRTY: u16 = bindings::IOMAP_F_DIRTY as u16;
> +
> +    /// Indicates that the blocks are shared, and will need to be unshared as part a write.
> +    pub const SHARED: u16 = bindings::IOMAP_F_SHARED as u16;
> +
> +    /// Indicates that the iomap contains the merge of multiple block mappings.
> +    pub const MERGED: u16 = bindings::IOMAP_F_MERGED as u16;
> +
> +    /// Indicates that the file system requires the use of buffer heads for this mapping.
> +    pub const BUFFER_HEAD: u16 = bindings::IOMAP_F_BUFFER_HEAD as u16;

Maybe leave this one commented out; we don't really want new filesystems
to use bufferhead support in iomap.

> +
> +    /// Indicates that the iomap is for an extended attribute extent rather than a file data
> +    /// extent.
> +    pub const XATTR: u16 = bindings::IOMAP_F_XATTR as u16;
> +
> +    /// Indicates to the iomap_end method that the file size has changed as the result of this
> +    /// write operation.
> +    pub const SIZE_CHANGED: u16 = bindings::IOMAP_F_SIZE_CHANGED as u16;
> +
> +    /// Indicates that the iomap is not valid any longer and the file range it covers needs to be
> +    /// remapped by the high level before the operation can proceed.
> +    pub const STALE: u16 = bindings::IOMAP_F_STALE as u16;
> +
> +    /// Flags from 0x1000 up are for file system specific usage.
> +    pub const PRIVATE: u16 = bindings::IOMAP_F_PRIVATE as u16;
> +}
> +
> +/// A map from address space to block device.

It's a mapping from a range of a file to space on a storage device.
Storage devices can include pmem and whatever inlinedata does.  Though I
guess you don't support fsdax.

> +#[repr(transparent)]
> +pub struct Map<'a>(pub bindings::iomap, PhantomData<&'a ()>);
> +
> +impl<'a> Map<'a> {
> +    /// Sets the map type.
> +    pub fn set_type(&mut self, t: Type) -> &mut Self {
> +        self.0.type_ = t as u16;
> +        self
> +    }
> +
> +    /// Sets the file offset, in bytes.
> +    pub fn set_offset(&mut self, v: Offset) -> &mut Self {
> +        self.0.offset = v;
> +        self
> +    }
> +
> +    /// Sets the length of the mapping, in bytes.
> +    pub fn set_length(&mut self, len: u64) -> &mut Self {
> +        self.0.length = len;
> +        self
> +    }
> +
> +    /// Sets the mapping flags.
> +    ///
> +    /// Values come from the [`map_flags`] module.
> +    pub fn set_flags(&mut self, flags: u16) -> &mut Self {
> +        self.0.flags = flags;
> +        self
> +    }
> +
> +    /// Sets the disk offset of the mapping, in bytes.
> +    pub fn set_addr(&mut self, addr: u64) -> &mut Self {
> +        self.0.addr = addr;
> +        self
> +    }
> +
> +    /// Sets the block device of the mapping.
> +    pub fn set_bdev(&mut self, bdev: Option<&'a block::Device>) -> &mut Self {
> +        self.0.bdev = if let Some(b) = bdev {
> +            b.0.get()
> +        } else {
> +            core::ptr::null_mut()
> +        };
> +        self
> +    }
> +}
> +
> +/// Flags passed to [`Operations::begin`] and [`Operations::end`].
> +pub mod flags {
> +    /// Writing, must allocate block.
> +    pub const WRITE: u32 = bindings::IOMAP_WRITE;
> +
> +    /// Zeroing operation, may skip holes.
> +    pub const ZERO: u32 = bindings::IOMAP_ZERO;
> +
> +    /// Report extent status, e.g. FIEMAP.
> +    pub const REPORT: u32 = bindings::IOMAP_REPORT;
> +
> +    /// Mapping for page fault.
> +    pub const FAULT: u32 = bindings::IOMAP_FAULT;
> +
> +    /// Direct I/O.
> +    pub const DIRECT: u32 = bindings::IOMAP_DIRECT;
> +
> +    /// Do not block.
> +    pub const NOWAIT: u32 = bindings::IOMAP_NOWAIT;
> +
> +    /// Only pure overwrites allowed.
> +    pub const OVERWRITE_ONLY: u32 = bindings::IOMAP_OVERWRITE_ONLY;
> +
> +    /// `unshare_file_range`.
> +    pub const UNSHARE: u32 = bindings::IOMAP_UNSHARE;
> +
> +    /// DAX mapping.
> +    pub const DAX: u32 = bindings::IOMAP_DAX;
> +}

I wonder, how hard will it be to update/regenerate these bindings when
someone wants to add new features to the C iomap implementation?  IIRC
Ritesh's port of C ext2 to iomap adds a boundary flag somewhere.

> +
> +/// Operations implemented by iomap users.
> +pub trait Operations {
> +    /// File system that these operations are compatible with.
> +    type FileSystem: FileSystem + ?Sized;
> +
> +    /// Returns the existing mapping at `pos`, or reserves space starting at `pos` for up to
> +    /// `length`, as long as it can be done as a single mapping. The actual length is returned in
> +    /// `iomap`.
> +    ///
> +    /// The values of `flags` come from the [`flags`] module.
> +    fn begin<'a>(
> +        inode: &'a INode<Self::FileSystem>,
> +        pos: Offset,
> +        length: Offset,
> +        flags: u32,
> +        map: &mut Map<'a>,
> +        srcmap: &mut Map<'a>,
> +    ) -> Result;
> +
> +    /// Commits and/or unreserves space previously allocated using [`Operations::begin`]. `writte`n

`written`

> +    /// indicates the length of the successful write operation which needs to be commited, while
> +    /// the rest needs to be unreserved. `written` might be zero if no data was written.
> +    ///
> +    /// The values of `flags` come from the [`flags`] module.
> +    fn end<'a>(
> +        _inode: &'a INode<Self::FileSystem>,
> +        _pos: Offset,
> +        _length: Offset,
> +        _written: isize,
> +        _flags: u32,
> +        _map: &Map<'a>,
> +    ) -> Result {
> +        Ok(())
> +    }
> +}
> +
> +/// Returns address space oprerations backed by iomaps.
> +pub const fn ro_aops<T: Operations + ?Sized>() -> address_space::Ops<T::FileSystem> {
> +    struct Table<T: Operations + ?Sized>(PhantomData<T>);
> +    impl<T: Operations + ?Sized> Table<T> {
> +        const MAP_TABLE: bindings::iomap_ops = bindings::iomap_ops {
> +            iomap_begin: Some(Self::iomap_begin_callback),
> +            iomap_end: Some(Self::iomap_end_callback),
> +        };

Hmmm.  Is the model here that you can call ro_aops() with a pair of
iomap function, and then it returns an address_space::Ops object (aka
address_space_operations) that is ready to go with iomap functions?

  const FILE_AOPS: address_space::Ops<Ext2Fs> = iomap::ro_aops::<Ext2Fs>();

is a neat trick, but consider that XFS implements a bunch of different
iomap ops structures.  I suppose it could be interesting to have a bunch
of different XfsInode subtypes (e.g. XfsDaxInode) and you'd always know
which file is in which mode, etc.

On the other hand, coupling together two things that are /not/ coupled
in the C API is awkward.  XFS implements separate iomap ops for buffered
reads, buffered writes, direct io, fsdax io, FIEMAP, and llseek.  I've
been pushing porters to make separate iomap ops so that they don't ned
up with a single huge foofs_iomap_begin function that tries to dispatch
based on iomap flags.

That's a bit different from the C model where the fs implementation has
to assemble all the pieces on its own.  But then, perhaps the strength
of organizing it this way is that you don't end up with a bunch of:

STATIC int
xfs_vm_read_folio(
	struct file		*unused,
	struct folio		*folio)
{
	return iomap_read_folio(folio, &xfs_read_iomap_ops);
}

wrappers polluting the file namespace?

But then, what happens for some future rustfs that wants to implement
read write support?  Does that imply the creation of an iomap::rw_ops?
What if they also want to support swapfiles?  Is there an elegant way to
tamp down the combinatoric rise?  Or would we be better off leaving it
decoupled the same way the C iomap API does?

> +
> +        extern "C" fn iomap_begin_callback(
> +            inode_ptr: *mut bindings::inode,
> +            pos: Offset,
> +            length: Offset,
> +            flags: u32,
> +            map: *mut bindings::iomap,
> +            srcmap: *mut bindings::iomap,
> +        ) -> i32 {
> +            from_result(|| {
> +                // SAFETY: The C API guarantees that `inode_ptr` is a valid inode.
> +                let inode = unsafe { INode::from_raw(inode_ptr) };
> +                T::begin(
> +                    inode,
> +                    pos,
> +                    length,
> +                    flags,
> +                    // SAFETY: The C API guarantees that `map` is valid for write.
> +                    unsafe { &mut *map.cast::<Map<'_>>() },
> +                    // SAFETY: The C API guarantees that `srcmap` is valid for write.
> +                    unsafe { &mut *srcmap.cast::<Map<'_>>() },
> +                )?;
> +                Ok(0)
> +            })
> +        }
> +
> +        extern "C" fn iomap_end_callback(
> +            inode_ptr: *mut bindings::inode,
> +            pos: Offset,
> +            length: Offset,
> +            written: isize,
> +            flags: u32,
> +            map: *mut bindings::iomap,
> +        ) -> i32 {
> +            from_result(|| {
> +                // SAFETY: The C API guarantees that `inode_ptr` is a valid inode.
> +                let inode = unsafe { INode::from_raw(inode_ptr) };
> +                // SAFETY: The C API guarantees that `map` is valid for read.
> +                T::end(inode, pos, length, written, flags, unsafe {
> +                    &*map.cast::<Map<'_>>()
> +                })?;
> +                Ok(0)
> +            })
> +        }
> +
> +        const TABLE: bindings::address_space_operations = bindings::address_space_operations {
> +            writepage: None,
> +            read_folio: Some(Self::read_folio_callback),
> +            writepages: None,
> +            dirty_folio: None,
> +            readahead: Some(Self::readahead_callback),
> +            write_begin: None,
> +            write_end: None,
> +            bmap: Some(Self::bmap_callback),
> +            invalidate_folio: Some(bindings::iomap_invalidate_folio),
> +            release_folio: Some(bindings::iomap_release_folio),
> +            free_folio: None,
> +            direct_IO: Some(bindings::noop_direct_IO),
> +            migrate_folio: None,
> +            launder_folio: None,
> +            is_partially_uptodate: None,

Hm, isn't this needed for blocksize < pagesize?

> +            is_dirty_writeback: None,
> +            error_remove_folio: None,
> +            swap_activate: None,
> +            swap_deactivate: None,
> +            swap_rw: None,

Would be kinda nice to sort these by name order.

--D

> +        };
> +
> +        extern "C" fn read_folio_callback(
> +            _file: *mut bindings::file,
> +            folio: *mut bindings::folio,
> +        ) -> i32 {
> +            // SAFETY: `folio` is just forwarded from C and `Self::MAP_TABLE` is always valid.
> +            unsafe { bindings::iomap_read_folio(folio, &Self::MAP_TABLE) }
> +        }
> +
> +        extern "C" fn readahead_callback(rac: *mut bindings::readahead_control) {
> +            // SAFETY: `rac` is just forwarded from C and `Self::MAP_TABLE` is always valid.
> +            unsafe { bindings::iomap_readahead(rac, &Self::MAP_TABLE) }
> +        }
> +
> +        extern "C" fn bmap_callback(mapping: *mut bindings::address_space, block: u64) -> u64 {
> +            // SAFETY: `mapping` is just forwarded from C and `Self::MAP_TABLE` is always valid.
> +            unsafe { bindings::iomap_bmap(mapping, block, &Self::MAP_TABLE) }
> +        }
> +    }
> +    address_space::Ops(&Table::<T>::TABLE, PhantomData)
> +}
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 04/30] rust: fs: introduce `FileSystem::fill_super`
  2024-05-14 13:16 ` [RFC PATCH v2 04/30] rust: fs: introduce `FileSystem::fill_super` Wedson Almeida Filho
@ 2024-05-20 19:38   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2024-05-20 19:38 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

On Tue, May 14, 2024 at 10:16:45AM -0300, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to initialise superblocks, which allows them
> to be mounted (though they are still empty).
> 
> Some scaffolding code is added to create an empty directory as the root.
> It is replaced by proper inode creation in a subsequent patch in this
> series.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>  rust/bindings/bindings_helper.h |   5 ++
>  rust/kernel/fs.rs               | 147 ++++++++++++++++++++++++++++++--
>  rust/kernel/fs/sb.rs            |  50 +++++++++++
>  samples/rust/rust_rofs.rs       |   6 ++
>  4 files changed, 202 insertions(+), 6 deletions(-)
>  create mode 100644 rust/kernel/fs/sb.rs
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 1bef4dff3019..dabb5a787e0d 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -12,6 +12,7 @@
>  #include <linux/ethtool.h>
>  #include <linux/file.h>
>  #include <linux/fs.h>
> +#include <linux/fs_context.h>
>  #include <linux/jiffies.h>
>  #include <linux/mdio.h>
>  #include <linux/phy.h>
> @@ -32,3 +33,7 @@ const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO;
>  
>  const slab_flags_t RUST_CONST_HELPER_SLAB_RECLAIM_ACCOUNT = SLAB_RECLAIM_ACCOUNT;
>  const slab_flags_t RUST_CONST_HELPER_SLAB_ACCOUNT = SLAB_ACCOUNT;
> +
> +const unsigned long RUST_CONST_HELPER_SB_RDONLY = SB_RDONLY;
> +
> +const loff_t RUST_CONST_HELPER_MAX_LFS_FILESIZE = MAX_LFS_FILESIZE;
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index fb7a9b200b85..263b4b6186ae 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -6,16 +6,30 @@
>  //!
>  //! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
>  
> -use crate::error::{code::*, from_result, to_result, Error};
> +use crate::error::{code::*, from_result, to_result, Error, Result};
>  use crate::types::Opaque;
>  use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
>  use core::{ffi, marker::PhantomData, pin::Pin};
>  use macros::{pin_data, pinned_drop};
> +use sb::SuperBlock;
> +
> +pub mod sb;
> +
> +/// The offset of a file in a file system.

This is really the position of some data within a file, in bytes.

> +///
> +/// This is C's `loff_t`.
> +pub type Offset = i64;

Ergh, I really wish this was loff (or LOff if we're really doing
camelcase for rust code) for somewhat better greppability.

> +
> +/// Maximum size of an inode.
> +pub const MAX_LFS_FILESIZE: Offset = bindings::MAX_LFS_FILESIZE;
>  
>  /// A file system type.
>  pub trait FileSystem {
>      /// The name of the file system type.
>      const NAME: &'static CStr;
> +
> +    /// Initialises the new superblock.
> +    fn fill_super(sb: &mut SuperBlock<Self>) -> Result;
>  }
>  
>  /// A registration of a file system.
> @@ -46,7 +60,7 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>                  let fs = unsafe { &mut *fs_ptr };
>                  fs.owner = module.0;
>                  fs.name = T::NAME.as_char_ptr();
> -                fs.init_fs_context = Some(Self::init_fs_context_callback);
> +                fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
>                  fs.kill_sb = Some(Self::kill_sb_callback);
>                  fs.fs_flags = 0;
>  
> @@ -57,11 +71,22 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>          })
>      }
>  
> -    unsafe extern "C" fn init_fs_context_callback(_fc: *mut bindings::fs_context) -> ffi::c_int {
> -        from_result(|| Err(ENOTSUPP))
> +    unsafe extern "C" fn init_fs_context_callback<T: FileSystem + ?Sized>(
> +        fc_ptr: *mut bindings::fs_context,
> +    ) -> ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The C callback API guarantees that `fc_ptr` is valid.
> +            let fc = unsafe { &mut *fc_ptr };
> +            fc.ops = &Tables::<T>::CONTEXT;
> +            Ok(0)
> +        })
>      }
>  
> -    unsafe extern "C" fn kill_sb_callback(_sb_ptr: *mut bindings::super_block) {}
> +    unsafe extern "C" fn kill_sb_callback(sb_ptr: *mut bindings::super_block) {
> +        // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
> +        // the appropriate function to call for cleanup.
> +        unsafe { bindings::kill_anon_super(sb_ptr) };
> +    }
>  }
>  
>  #[pinned_drop]
> @@ -74,6 +99,113 @@ fn drop(self: Pin<&mut Self>) {
>      }
>  }
>  
> +struct Tables<T: FileSystem + ?Sized>(T);
> +impl<T: FileSystem + ?Sized> Tables<T> {
> +    const CONTEXT: bindings::fs_context_operations = bindings::fs_context_operations {
> +        free: None,
> +        parse_param: None,
> +        get_tree: Some(Self::get_tree_callback),
> +        reconfigure: None,
> +        parse_monolithic: None,
> +        dup: None,
> +    };
> +
> +    unsafe extern "C" fn get_tree_callback(fc: *mut bindings::fs_context) -> ffi::c_int {
> +        // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
> +        // the right type and is a valid callback.
> +        unsafe { bindings::get_tree_nodev(fc, Some(Self::fill_super_callback)) }
> +    }
> +
> +    unsafe extern "C" fn fill_super_callback(
> +        sb_ptr: *mut bindings::super_block,
> +        _fc: *mut bindings::fs_context,
> +    ) -> ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
> +            // newly-created superblock.
> +            let new_sb = unsafe { SuperBlock::from_raw_mut(sb_ptr) };
> +
> +            // SAFETY: The callback contract guarantees that `sb_ptr`, from which `new_sb` is
> +            // derived, is valid for write.
> +            let sb = unsafe { &mut *new_sb.0.get() };
> +            sb.s_op = &Tables::<T>::SUPER_BLOCK;
> +            sb.s_flags |= bindings::SB_RDONLY;
> +
> +            T::fill_super(new_sb)?;
> +
> +            // The following is scaffolding code that will be removed in a subsequent patch. It is
> +            // needed to build a root dentry, otherwise core code will BUG().
> +            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
> +            let inode = unsafe { bindings::new_inode(sb) };
> +            if inode.is_null() {
> +                return Err(ENOMEM);
> +            }
> +
> +            // SAFETY: `inode` is valid for write.
> +            unsafe { bindings::set_nlink(inode, 2) };
> +
> +            {
> +                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
> +                // safe to mutably dereference it.
> +                let inode = unsafe { &mut *inode };
> +                inode.i_ino = 1;
> +                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
> +
> +                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
> +                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };

                         ^^^^^^^^^^^^^^^^
This is a gross way to handle anonymous struct fields.  What happens
when struct inode changes and we have to do a giant treewide sed?

(and yes, I understand that's likely going to be a rustc change...)

--D

> +
> +                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
> +                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
> +            }
> +
> +            // SAFETY: `d_make_root` requires that `inode` be valid and referenced, which is the
> +            // case for this call.
> +            //
> +            // It takes over the inode, even on failure, so we don't need to clean it up.
> +            let dentry = unsafe { bindings::d_make_root(inode) };
> +            if dentry.is_null() {
> +                return Err(ENOMEM);
> +            }
> +
> +            sb.s_root = dentry;
> +
> +            Ok(0)
> +        })
> +    }
> +
> +    const SUPER_BLOCK: bindings::super_operations = bindings::super_operations {
> +        alloc_inode: None,
> +        destroy_inode: None,
> +        free_inode: None,
> +        dirty_inode: None,
> +        write_inode: None,
> +        drop_inode: None,
> +        evict_inode: None,
> +        put_super: None,
> +        sync_fs: None,
> +        freeze_super: None,
> +        freeze_fs: None,
> +        thaw_super: None,
> +        unfreeze_fs: None,
> +        statfs: None,
> +        remount_fs: None,
> +        umount_begin: None,
> +        show_options: None,
> +        show_devname: None,
> +        show_path: None,
> +        show_stats: None,
> +        #[cfg(CONFIG_QUOTA)]
> +        quota_read: None,
> +        #[cfg(CONFIG_QUOTA)]
> +        quota_write: None,
> +        #[cfg(CONFIG_QUOTA)]
> +        get_dquots: None,
> +        nr_cached_objects: None,
> +        free_cached_objects: None,
> +        shutdown: None,
> +    };
> +}
> +
>  /// Kernel module that exposes a single file system implemented by `T`.
>  #[pin_data]
>  pub struct Module<T: FileSystem + ?Sized> {
> @@ -100,7 +232,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>  ///
>  /// ```
>  /// # mod module_fs_sample {
> -/// use kernel::fs;
> +/// use kernel::fs::{sb::SuperBlock, self};
>  /// use kernel::prelude::*;
>  ///
>  /// kernel::module_fs! {
> @@ -114,6 +246,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>  /// struct MyFs;
>  /// impl fs::FileSystem for MyFs {
>  ///     const NAME: &'static CStr = kernel::c_str!("myfs");
> +///     fn fill_super(_: &mut SuperBlock<Self>) -> Result {
> +///         todo!()
> +///     }
>  /// }
>  /// # }
>  /// ```
> diff --git a/rust/kernel/fs/sb.rs b/rust/kernel/fs/sb.rs
> new file mode 100644
> index 000000000000..113d3c0d8148
> --- /dev/null
> +++ b/rust/kernel/fs/sb.rs
> @@ -0,0 +1,50 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! File system super blocks.
> +//!
> +//! This module allows Rust code to use superblocks.
> +//!
> +//! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
> +
> +use super::FileSystem;
> +use crate::{bindings, types::Opaque};
> +use core::marker::PhantomData;
> +
> +/// A file system super block.
> +///
> +/// Wraps the kernel's `struct super_block`.
> +#[repr(transparent)]
> +pub struct SuperBlock<T: FileSystem + ?Sized>(
> +    pub(crate) Opaque<bindings::super_block>,
> +    PhantomData<T>,
> +);
> +
> +impl<T: FileSystem + ?Sized> SuperBlock<T> {
> +    /// Creates a new superblock mutable reference from the given raw pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that:
> +    ///
> +    /// * `ptr` is valid and remains so for the lifetime of the returned object.
> +    /// * `ptr` has the correct file system type.
> +    /// * `ptr` is the only active pointer to the superblock.
> +    pub(crate) unsafe fn from_raw_mut<'a>(ptr: *mut bindings::super_block) -> &'a mut Self {
> +        // SAFETY: The safety requirements guarantee that the cast below is ok.
> +        unsafe { &mut *ptr.cast::<Self>() }
> +    }
> +
> +    /// Returns whether the superblock is mounted in read-only mode.
> +    pub fn rdonly(&self) -> bool {
> +        // SAFETY: `s_flags` only changes during init, so it is safe to read it.
> +        unsafe { (*self.0.get()).s_flags & bindings::SB_RDONLY != 0 }
> +    }
> +
> +    /// Sets the magic number of the superblock.
> +    pub fn set_magic(&mut self, magic: usize) -> &mut Self {
> +        // SAFETY: This is a new superblock that is being initialised, so it's ok to write to its
> +        // fields.
> +        unsafe { (*self.0.get()).s_magic = magic as core::ffi::c_ulong };
> +        self
> +    }
> +}
> diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
> index d465b107a07d..022addf68891 100644
> --- a/samples/rust/rust_rofs.rs
> +++ b/samples/rust/rust_rofs.rs
> @@ -2,6 +2,7 @@
>  
>  //! Rust read-only file system sample.
>  
> +use kernel::fs::sb;
>  use kernel::prelude::*;
>  use kernel::{c_str, fs};
>  
> @@ -16,4 +17,9 @@
>  struct RoFs;
>  impl fs::FileSystem for RoFs {
>      const NAME: &'static CStr = c_str!("rust_rofs");
> +
> +    fn fill_super(sb: &mut sb::SuperBlock<Self>) -> Result {
> +        sb.set_magic(0x52555354);
> +        Ok(())
> +    }
>  }
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 30/30] WIP: fs: ext2: add rust ro ext2 implementation
  2024-05-14 13:17 ` [RFC PATCH v2 30/30] WIP: fs: ext2: add rust ro ext2 implementation Wedson Almeida Filho
@ 2024-05-20 20:01   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2024-05-20 20:01 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox, Dave Chinner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel, Wedson Almeida Filho

On Tue, May 14, 2024 at 10:17:11AM -0300, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>  fs/Kconfig            |   1 +
>  fs/Makefile           |   1 +
>  fs/rust-ext2/Kconfig  |  13 +
>  fs/rust-ext2/Makefile |   8 +
>  fs/rust-ext2/defs.rs  | 173 +++++++++++++
>  fs/rust-ext2/ext2.rs  | 551 ++++++++++++++++++++++++++++++++++++++++++
>  rust/kernel/lib.rs    |   3 +
>  7 files changed, 750 insertions(+)
>  create mode 100644 fs/rust-ext2/Kconfig
>  create mode 100644 fs/rust-ext2/Makefile
>  create mode 100644 fs/rust-ext2/defs.rs
>  create mode 100644 fs/rust-ext2/ext2.rs
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 2cbd99d6784c..cf0cac5c5b1e 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -338,6 +338,7 @@ source "fs/ufs/Kconfig"
>  source "fs/erofs/Kconfig"
>  source "fs/vboxsf/Kconfig"
>  source "fs/tarfs/Kconfig"
> +source "fs/rust-ext2/Kconfig"
>  
>  endif # MISC_FILESYSTEMS
>  
> diff --git a/fs/Makefile b/fs/Makefile
> index d8bbda73e3a9..c1a3007efc7d 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -130,3 +130,4 @@ obj-$(CONFIG_EROFS_FS)		+= erofs/
>  obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
>  obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
>  obj-$(CONFIG_TARFS_FS)		+= tarfs/
> +obj-$(CONFIG_RUST_EXT2_FS)	+= rust-ext2/
> diff --git a/fs/rust-ext2/Kconfig b/fs/rust-ext2/Kconfig
> new file mode 100644
> index 000000000000..976371655ca6
> --- /dev/null
> +++ b/fs/rust-ext2/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +
> +config RUST_EXT2_FS
> +	tristate "Rust second extended fs support"
> +	depends on RUST && BLOCK
> +	help
> +	  Ext2 is a standard Linux file system for hard disks.
> +
> +	  To compile this file system support as a module, choose M here: the
> +	  module will be called rust_ext2.
> +
> +	  If unsure, say Y.
> diff --git a/fs/rust-ext2/Makefile b/fs/rust-ext2/Makefile
> new file mode 100644
> index 000000000000..ac960b5f89d7
> --- /dev/null
> +++ b/fs/rust-ext2/Makefile
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Makefile for the linux tarfs filesystem routines.
> +#
> +
> +obj-$(CONFIG_RUST_EXT2_FS) += rust_ext2.o
> +
> +rust_ext2-y := ext2.o
> diff --git a/fs/rust-ext2/defs.rs b/fs/rust-ext2/defs.rs
> new file mode 100644
> index 000000000000..5f84852b4961
> --- /dev/null
> +++ b/fs/rust-ext2/defs.rs
> @@ -0,0 +1,173 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Definitions of tarfs structures.
> +
> +use kernel::types::LE;
> +
> +pub(crate) const EXT2_SUPER_MAGIC: u16 = 0xEF53;
> +
> +pub(crate) const EXT2_MAX_BLOCK_LOG_SIZE: u32 = 16;
> +
> +pub(crate) const EXT2_GOOD_OLD_REV: u32 = 0; /* The good old (original) format */
> +pub(crate) const EXT2_DYNAMIC_REV: u32 = 1; /* V2 format w/ dynamic inode sizes */
> +
> +pub(crate) const EXT2_GOOD_OLD_INODE_SIZE: u16 = 128;
> +
> +pub(crate) const EXT2_ROOT_INO: u32 = 2; /* Root inode */
> +
> +/* First non-reserved inode for old ext2 filesystems. */
> +pub(crate) const EXT2_GOOD_OLD_FIRST_INO: u32 = 11;
> +
> +pub(crate) const EXT2_FEATURE_INCOMPAT_FILETYPE: u32 = 0x0002;
> +
> +/*
> + * Constants relative to the data blocks
> + */
> +pub(crate) const EXT2_NDIR_BLOCKS: usize = 12;
> +pub(crate) const EXT2_IND_BLOCK: usize = EXT2_NDIR_BLOCKS;
> +pub(crate) const EXT2_DIND_BLOCK: usize = EXT2_IND_BLOCK + 1;
> +pub(crate) const EXT2_TIND_BLOCK: usize = EXT2_DIND_BLOCK + 1;
> +pub(crate) const EXT2_N_BLOCKS: usize = EXT2_TIND_BLOCK + 1;
> +
> +kernel::derive_readable_from_bytes! {
> +    #[repr(C)]
> +    pub(crate) struct Super {
> +        pub(crate) inodes_count: LE<u32>,
> +        pub(crate) blocks_count: LE<u32>,
> +        pub(crate) r_blocks_count: LE<u32>,
> +        pub(crate) free_blocks_count: LE<u32>, /* Free blocks count */
> +        pub(crate) free_inodes_count: LE<u32>, /* Free inodes count */
> +        pub(crate) first_data_block: LE<u32>,  /* First Data Block */
> +        pub(crate) log_block_size: LE<u32>,    /* Block size */
> +        pub(crate) log_frag_size: LE<u32>,     /* Fragment size */
> +        pub(crate) blocks_per_group: LE<u32>,  /* # Blocks per group */
> +        pub(crate) frags_per_group: LE<u32>,   /* # Fragments per group */
> +        pub(crate) inodes_per_group: LE<u32>,  /* # Inodes per group */
> +        pub(crate) mtime: LE<u32>,             /* Mount time */
> +        pub(crate) wtime: LE<u32>,             /* Write time */
> +        pub(crate) mnt_count: LE<u16>,         /* Mount count */
> +        pub(crate) max_mnt_count: LE<u16>,     /* Maximal mount count */
> +        pub(crate) magic: LE<u16>,             /* Magic signature */
> +        pub(crate) state: LE<u16>,             /* File system state */
> +        pub(crate) errors: LE<u16>,            /* Behaviour when detecting errors */
> +        pub(crate) minor_rev_level: LE<u16>,   /* minor revision level */
> +        pub(crate) lastcheck: LE<u32>,         /* time of last check */
> +        pub(crate) checkinterval: LE<u32>,     /* max. time between checks */
> +        pub(crate) creator_os: LE<u32>,        /* OS */
> +        pub(crate) rev_level: LE<u32>,         /* Revision level */
> +        pub(crate) def_resuid: LE<u16>,        /* Default uid for reserved blocks */
> +        pub(crate) def_resgid: LE<u16>,        /* Default gid for reserved blocks */
> +        /*
> +         * These fields are for EXT2_DYNAMIC_REV superblocks only.
> +         *
> +         * Note: the difference between the compatible feature set and
> +         * the incompatible feature set is that if there is a bit set
> +         * in the incompatible feature set that the kernel doesn't
> +         * know about, it should refuse to mount the filesystem.
> +         *
> +         * e2fsck's requirements are more strict; if it doesn't know
> +         * about a feature in either the compatible or incompatible
> +         * feature set, it must abort and not try to meddle with
> +         * things it doesn't understand...
> +         */
> +        pub(crate) first_ino: LE<u32>,              /* First non-reserved inode */
> +        pub(crate) inode_size: LE<u16>,             /* size of inode structure */
> +        pub(crate) block_group_nr: LE<u16>,         /* block group # of this superblock */
> +        pub(crate) feature_compat: LE<u32>,         /* compatible feature set */
> +        pub(crate) feature_incompat: LE<u32>,       /* incompatible feature set */
> +        pub(crate) feature_ro_compat: LE<u32>,      /* readonly-compatible feature set */
> +        pub(crate) uuid: [u8; 16],                  /* 128-bit uuid for volume */
> +        pub(crate) volume_name: [u8; 16],           /* volume name */
> +        pub(crate) last_mounted: [u8; 64],          /* directory where last mounted */
> +        pub(crate) algorithm_usage_bitmap: LE<u32>, /* For compression */
> +        /*
> +         * Performance hints.  Directory preallocation should only
> +         * happen if the EXT2_COMPAT_PREALLOC flag is on.
> +         */
> +        pub(crate) prealloc_blocks: u8,    /* Nr of blocks to try to preallocate*/
> +        pub(crate) prealloc_dir_blocks: u8,        /* Nr to preallocate for dirs */
> +        padding1: u16,
> +        /*
> +         * Journaling support valid if EXT3_FEATURE_COMPAT_HAS_JOURNAL set.
> +         */
> +        pub(crate) journal_uuid: [u8; 16],      /* uuid of journal superblock */
> +        pub(crate) journal_inum: u32,           /* inode number of journal file */
> +        pub(crate) journal_dev: u32,            /* device number of journal file */
> +        pub(crate) last_orphan: u32,            /* start of list of inodes to delete */
> +        pub(crate) hash_seed: [u32; 4],         /* HTREE hash seed */
> +        pub(crate) def_hash_version: u8,        /* Default hash version to use */
> +        pub(crate) reserved_char_pad: u8,
> +        pub(crate) reserved_word_pad: u16,
> +        pub(crate) default_mount_opts: LE<u32>,
> +        pub(crate) first_meta_bg: LE<u32>,      /* First metablock block group */
> +        reserved: [u32; 190],                   /* Padding to the end of the block */
> +    }
> +
> +    #[repr(C)]
> +    #[derive(Clone, Copy)]
> +    pub(crate) struct Group {

Might want to call these GroupDescriptor to match(ish) the ext2
structure?  I dunno, it's going to be hard to remember to change
"struct ext2_group_desc" in my head to "struct ext2::GroupDescriptor" or
even "struct ext2::group_desc".

> +        /// Blocks bitmap block.
> +        pub block_bitmap: LE<u32>,
> +
> +        /// Inodes bitmap block.
> +        pub inode_bitmap: LE<u32>,
> +
> +        /// Inodes table block.
> +        pub inode_table: LE<u32>,
> +
> +        /// Number of free blocks.
> +        pub free_blocks_count: LE<u16>,
> +
> +        /// Number of free inodes.
> +        pub free_inodes_count: LE<u16>,
> +
> +        /// Number of directories.
> +        pub used_dirs_count: LE<u16>,
> +
> +        pad: LE<u16>,
> +        reserved: [u32; 3],
> +    }
> +
> +    #[repr(C)]
> +    pub(crate) struct INode {
> +        pub mode: LE<u16>,                  /* File mode */
> +        pub uid: LE<u16>,                   /* Low 16 bits of Owner Uid */
> +        pub size: LE<u32>,                  /* Size in bytes */
> +        pub atime: LE<u32>,                 /* Access time */
> +        pub ctime: LE<u32>,                 /* Creation time */
> +        pub mtime: LE<u32>,                 /* Modification time */
> +        pub dtime: LE<u32>,                 /* Deletion Time */
> +        pub gid: LE<u16>,                   /* Low 16 bits of Group Id */
> +        pub links_count: LE<u16>,           /* Links count */
> +        pub blocks: LE<u32>,                /* Blocks count */
> +        pub flags: LE<u32>,                 /* File flags */
> +        pub reserved1: LE<u32>,
> +        pub block: [LE<u32>; EXT2_N_BLOCKS],/* Pointers to blocks */
> +        pub generation: LE<u32>,            /* File version (for NFS) */
> +        pub file_acl: LE<u32>,              /* File ACL */
> +        pub dir_acl: LE<u32>,               /* Directory ACL */
> +        pub faddr: LE<u32>,                 /* Fragment address */
> +        pub frag: u8,	                    /* Fragment number */
> +        pub fsize: u8,	                    /* Fragment size */
> +        pub pad1: LE<u16>,
> +        pub uid_high: LE<u16>,
> +        pub gid_high: LE<u16>,
> +        pub reserved2: LE<u32>,
> +    }
> +
> +    #[repr(C)]
> +    pub(crate) struct DirEntry {
> +        pub(crate) inode: LE<u32>,       /* Inode number */
> +        pub(crate) rec_len: LE<u16>,     /* Directory entry length */
> +        pub(crate) name_len: u8,         /* Name length */
> +        pub(crate) file_type: u8,        /* Only if the "filetype" feature flag is set. */
> +    }
> +}
> +
> +pub(crate) const FT_REG_FILE: u8 = 1;
> +pub(crate) const FT_DIR: u8 = 2;
> +pub(crate) const FT_CHRDEV: u8 = 3;
> +pub(crate) const FT_BLKDEV: u8 = 4;
> +pub(crate) const FT_FIFO: u8 = 5;
> +pub(crate) const FT_SOCK: u8 = 6;
> +pub(crate) const FT_SYMLINK: u8 = 7;
> diff --git a/fs/rust-ext2/ext2.rs b/fs/rust-ext2/ext2.rs
> new file mode 100644
> index 000000000000..2d6b1e7ca156
> --- /dev/null
> +++ b/fs/rust-ext2/ext2.rs
> @@ -0,0 +1,551 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Ext2 file system.
> +
> +use alloc::vec::Vec;
> +use core::mem::size_of;
> +use defs::*;
> +use kernel::fs::{
> +    self, address_space, dentry, dentry::DEntry, file, file::File, inode, inode::INode, iomap, sb,
> +    sb::SuperBlock, Offset,
> +};
> +use kernel::types::{ARef, Either, FromBytes, Locked, LE};
> +use kernel::{block, c_str, prelude::*, str::CString, time::Timespec, user, PAGE_SIZE};
> +
> +pub mod defs;
> +
> +kernel::module_fs! {
> +    type: Ext2Fs,
> +    name: "ext2",
> +    author: "Wedson Almeida Filho <walmeida@microsoft.com>",
> +    description: "ext2 file system",
> +    license: "GPL",
> +}
> +
> +const SB_OFFSET: Offset = 1024;
> +
> +struct INodeData {
> +    data_blocks: [u32; defs::EXT2_N_BLOCKS],
> +}
> +
> +struct Ext2Fs {
> +    mapper: inode::Mapper,
> +    block_size: u32,
> +    has_file_type: bool,
> +    _block_size_bits: u32,
> +    inodes_per_block: u32,
> +    inodes_per_group: u32,
> +    inode_count: u32,
> +    inode_size: u16,
> +    first_ino: u32,
> +    group: Vec<defs::Group>,
> +}
> +
> +impl Ext2Fs {
> +    fn iget(sb: &SuperBlock<Self>, ino: u32) -> Result<ARef<INode<Self>>> {
> +        let s = sb.data();
> +        if (ino != EXT2_ROOT_INO && ino < s.first_ino) || ino > s.inode_count {
> +            return Err(ENOENT);
> +        }
> +        let group = ((ino - 1) / s.inodes_per_group) as usize;
> +        let offset = (ino - 1) % s.inodes_per_group;
> +
> +        if group >= s.group.len() {
> +            return Err(ENOENT);
> +        }
> +
> +        // Create an inode or find an existing (cached) one.
> +        let mut inode = match sb.get_or_create_inode(ino.into())? {
> +            Either::Left(existing) => return Ok(existing),
> +            Either::Right(new) => new,
> +        };
> +
> +        let inodes_block = Offset::from(s.group[group].inode_table.value());
> +        let inode_block = inodes_block + Offset::from(offset / s.inodes_per_block);
> +        let offset = (offset % s.inodes_per_block) as usize;
> +        let b = sb
> +            .data()
> +            .mapper
> +            .mapped_folio(inode_block * Offset::from(s.block_size))?;

It almost feels like you need a buffer cache here for fs metadata... ;)

> +        let idata = defs::INode::from_bytes(&b, offset * s.inode_size as usize).ok_or(EIO)?;
> +        let mode = idata.mode.value();
> +
> +        if idata.links_count.value() == 0 && (mode == 0 || idata.dtime.value() != 0) {
> +            return Err(ESTALE);
> +        }
> +
> +        const DIR_FOPS: file::Ops<Ext2Fs> = file::Ops::new::<Ext2Fs>();
> +        const DIR_IOPS: inode::Ops<Ext2Fs> = inode::Ops::new::<Ext2Fs>();
> +        const FILE_AOPS: address_space::Ops<Ext2Fs> = iomap::ro_aops::<Ext2Fs>();
> +
> +        let mut size = idata.size.value().into();
> +        let typ = match mode & fs::mode::S_IFMT {
> +            fs::mode::S_IFREG => {
> +                size |= Offset::from(idata.dir_acl.value())
> +                    .checked_shl(32)
> +                    .ok_or(EUCLEAN)?;

I wonder, is there a clean way to log these kinds of corruption errors?
ext4 (and soon xfs) have the ability to log health problems and pass
those kinds of errors to a monitoring daemon via fanotify.

> +                inode
> +                    .set_aops(FILE_AOPS)
> +                    .set_fops(file::Ops::generic_ro_file());
> +                inode::Type::Reg
> +            }
> +            fs::mode::S_IFDIR => {
> +                inode
> +                    .set_iops(DIR_IOPS)
> +                    .set_fops(DIR_FOPS)
> +                    .set_aops(FILE_AOPS);
> +                inode::Type::Dir
> +            }
> +            fs::mode::S_IFLNK => {
> +                if idata.blocks.value() == 0 {
> +                    const OFFSET: usize = core::mem::offset_of!(defs::INode, block);
> +                    let name = &b[offset * usize::from(s.inode_size) + OFFSET..];
> +                    let name_len = size as usize;
> +                    if name_len > name.len() || name_len == 0 {
> +                        return Err(EIO);
> +                    }
> +                    inode.set_iops(inode::Ops::simple_symlink_inode());
> +                    inode::Type::Lnk(Some(CString::try_from(&name[..name_len])?))
> +                } else {
> +                    inode
> +                        .set_aops(FILE_AOPS)
> +                        .set_iops(inode::Ops::page_symlink_inode());
> +                    inode::Type::Lnk(None)
> +                }
> +            }
> +            fs::mode::S_IFSOCK => inode::Type::Sock,
> +            fs::mode::S_IFIFO => inode::Type::Fifo,
> +            fs::mode::S_IFCHR => {
> +                let (major, minor) = decode_dev(&idata.block);
> +                inode::Type::Chr(major, minor)
> +            }
> +            fs::mode::S_IFBLK => {
> +                let (major, minor) = decode_dev(&idata.block);
> +                inode::Type::Blk(major, minor)
> +            }
> +            _ => return Err(ENOENT),
> +        };
> +        inode.init(inode::Params {
> +            typ,
> +            mode: mode & 0o777,
> +            size,
> +            blocks: idata.blocks.value().into(),
> +            nlink: idata.links_count.value().into(),
> +            uid: u32::from(idata.uid.value()) | u32::from(idata.uid_high.value()) << 16,
> +            gid: u32::from(idata.gid.value()) | u32::from(idata.gid_high.value()) << 16,
> +            ctime: Timespec::new(idata.ctime.value().into(), 0)?,
> +            mtime: Timespec::new(idata.mtime.value().into(), 0)?,
> +            atime: Timespec::new(idata.atime.value().into(), 0)?,
> +            value: INodeData {
> +                data_blocks: core::array::from_fn(|i| idata.block[i].value()),
> +            },
> +        })
> +    }
> +
> +    fn offsets<'a>(&self, mut block: u64, out: &'a mut [u32]) -> Option<&'a [u32]> {
> +        let ptrs = u64::from(self.block_size / size_of::<u32>() as u32);
> +        let ptr_mask = ptrs - 1;
> +        let ptr_bits = ptrs.trailing_zeros();
> +
> +        if block < EXT2_NDIR_BLOCKS as u64 {
> +            out[0] = block as u32;
> +            return Some(&out[..1]);
> +        }
> +
> +        block -= EXT2_NDIR_BLOCKS as u64;
> +        if block < ptrs {
> +            out[0] = EXT2_IND_BLOCK as u32;
> +            out[1] = block as u32;
> +            return Some(&out[..2]);
> +        }
> +
> +        block -= ptrs;
> +        if block < (1 << (2 * ptr_bits)) {
> +            out[0] = EXT2_DIND_BLOCK as u32;
> +            out[1] = (block >> ptr_bits) as u32;
> +            out[2] = (block & ptr_mask) as u32;
> +            return Some(&out[..3]);
> +        }
> +
> +        block -= ptrs * ptrs;
> +        if block < ptrs * ptrs * ptrs {
> +            out[0] = EXT2_TIND_BLOCK as u32;
> +            out[1] = (block >> (2 * ptr_bits)) as u32;
> +            out[2] = ((block >> ptr_bits) & ptr_mask) as u32;
> +            out[3] = (block & ptr_mask) as u32;
> +            return Some(&out[..4]);
> +        }
> +
> +        None
> +    }
> +
> +    fn offset_to_block(inode: &INode<Self>, block: Offset) -> Result<u64> {
> +        let s = inode.super_block().data();
> +        let mut indices = [0u32; 4];
> +        let boffsets = s.offsets(block as u64, &mut indices).ok_or(EIO)?;
> +        let mut boffset = inode.data().data_blocks[boffsets[0] as usize];
> +        let mapper = &s.mapper;
> +        for i in &boffsets[1..] {
> +            let b = mapper.mapped_folio(Offset::from(boffset) * Offset::from(s.block_size))?;
> +            let table = LE::<u32>::from_bytes_to_slice(&b).ok_or(EIO)?;
> +            boffset = table[*i as usize].value();
> +        }
> +        Ok(boffset.into())
> +    }
> +
> +    fn check_descriptors(s: &Super, groups: &[Group]) -> Result {

It's ... very odd to mix file space mapping functions and group
descriptors into the same structure.  Does offset_to_block belong in a
Ext2Inode structure?

I was also wondering, is there a convenient way to make it so that the
compiler can enforce that a directory inode can only be passed an
Operations that actually has all the directory operations initialized?
Or that a regular file can't have a lookup function in its iops?

> +        for (i, g) in groups.iter().enumerate() {
> +            let first = i as u32 * s.blocks_per_group.value() + s.first_data_block.value();
> +            let last = if i == groups.len() - 1 {
> +                s.blocks_count.value()
> +            } else {
> +                first + s.blocks_per_group.value() - 1
> +            };
> +
> +            if g.block_bitmap.value() < first || g.block_bitmap.value() > last {
> +                pr_err!(
> +                    "Block bitmap for group {i} no in group (block {})\n",
> +                    g.block_bitmap.value()
> +                );
> +                return Err(EINVAL);
> +            }
> +
> +            if g.inode_bitmap.value() < first || g.inode_bitmap.value() > last {
> +                pr_err!(
> +                    "Inode bitmap for group {i} no in group (block {})\n",
> +                    g.inode_bitmap.value()
> +                );
> +                return Err(EINVAL);
> +            }
> +
> +            if g.inode_table.value() < first || g.inode_table.value() > last {
> +                pr_err!(
> +                    "Inode table for group {i} no in group (block {})\n",
> +                    g.inode_table.value()
> +                );
> +                return Err(EINVAL);
> +            }
> +        }
> +        Ok(())
> +    }
> +}
> +
> +impl fs::FileSystem for Ext2Fs {
> +    type Data = Box<Self>;
> +    type INodeData = INodeData;
> +    const NAME: &'static CStr = c_str!("rust-ext2");
> +    const SUPER_TYPE: sb::Type = sb::Type::BlockDev;
> +
> +    fn fill_super(
> +        sb: &mut SuperBlock<Self, sb::New>,
> +        mapper: Option<inode::Mapper>,
> +    ) -> Result<Self::Data> {
> +        let Some(mapper) = mapper else {
> +            return Err(EINVAL);
> +        };
> +
> +        if sb.min_blocksize(PAGE_SIZE as i32) == 0 {
> +            pr_err!("Unable to set block size\n");
> +            return Err(EINVAL);
> +        }
> +
> +        // Map the super block and check the magic number.
> +        let mapped = mapper.mapped_folio(SB_OFFSET)?;
> +        let s = Super::from_bytes(&mapped, 0).ok_or(EIO)?;
> +
> +        if s.magic.value() != EXT2_SUPER_MAGIC {
> +            return Err(EINVAL);
> +        }
> +
> +        // Check for unsupported flags.
> +        let mut has_file_type = false;
> +        if s.rev_level.value() >= EXT2_DYNAMIC_REV {
> +            let features = s.feature_incompat.value();
> +            if features & !EXT2_FEATURE_INCOMPAT_FILETYPE != 0 {
> +                pr_err!("Unsupported incompatible feature: {:x}\n", features);
> +                return Err(EINVAL);
> +            }
> +
> +            has_file_type = features & EXT2_FEATURE_INCOMPAT_FILETYPE != 0;
> +
> +            let features = s.feature_ro_compat.value();
> +            if !sb.rdonly() && features != 0 {
> +                pr_err!("Unsupported rw incompatible feature: {:x}\n", features);
> +                return Err(EINVAL);
> +            }
> +        }
> +
> +        // Set the block size.
> +        let block_size_bits = s.log_block_size.value();
> +        if block_size_bits > EXT2_MAX_BLOCK_LOG_SIZE - 10 {
> +            pr_err!("Invalid log block size: {}\n", block_size_bits);
> +            return Err(EINVAL);
> +        }
> +
> +        let block_size = 1024u32 << block_size_bits;
> +        if sb.min_blocksize(block_size as i32) != block_size as i32 {
> +            pr_err!("Bad block size: {}\n", block_size);
> +            return Err(ENXIO);
> +        }
> +
> +        // Get the first inode and the inode size.
> +        let (inode_size, first_ino) = if s.rev_level.value() == EXT2_GOOD_OLD_REV {
> +            (EXT2_GOOD_OLD_INODE_SIZE, EXT2_GOOD_OLD_FIRST_INO)
> +        } else {
> +            let size = s.inode_size.value();
> +            if size < EXT2_GOOD_OLD_INODE_SIZE
> +                || !size.is_power_of_two()
> +                || u32::from(size) > block_size
> +            {
> +                pr_err!("Unsupported inode size: {}\n", size);
> +                return Err(EINVAL);
> +            }
> +            (size, s.first_ino.value())
> +        };
> +
> +        // Get the number of inodes per group and per block.
> +        let inode_count = s.inodes_count.value();
> +        let inodes_per_group = s.inodes_per_group.value();
> +        let inodes_per_block = block_size / u32::from(inode_size);
> +        if inodes_per_group == 0 || inodes_per_block == 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        if inodes_per_group > block_size * 8 || inodes_per_group < inodes_per_block {
> +            pr_err!("Bad inodes per group: {}\n", inodes_per_group);
> +            return Err(EINVAL);
> +        }
> +
> +        // Check the size of the groups.
> +        let itb_per_group = inodes_per_group / inodes_per_block;
> +        let blocks_per_group = s.blocks_per_group.value();
> +        if blocks_per_group > block_size * 8 || blocks_per_group <= itb_per_group + 3 {
> +            pr_err!("Bad blocks per group: {}\n", blocks_per_group);
> +            return Err(EINVAL);
> +        }
> +
> +        let blocks_count = s.blocks_count.value();
> +        if block::Sector::from(blocks_count) > sb.sector_count() >> (1 + block_size_bits) {
> +            pr_err!(
> +                "Block count ({blocks_count}) exceeds size of device ({})\n",
> +                sb.sector_count() >> (1 + block_size_bits)
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        let group_count = (blocks_count - s.first_data_block.value() - 1) / blocks_per_group + 1;
> +        if group_count * inodes_per_group != inode_count {
> +            pr_err!(
> +                "Unexpected inode count: {inode_count} vs {}",
> +                group_count * inodes_per_group
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        let mut groups = Vec::new();
> +        groups.reserve(group_count as usize, GFP_NOFS)?;

Why not GFP_KERNEL here?

Oh wow the C ext2 driver pins a bunch of buffer heads doesn't it...
/me runs

> +
> +        let mut remain = group_count;
> +        let mut offset = (SB_OFFSET / Offset::from(block_size) + 1) * Offset::from(block_size);
> +        while remain > 0 {
> +            let b = mapper.mapped_folio(offset)?;
> +            for g in Group::from_bytes_to_slice(&b).ok_or(EIO)? {
> +                groups.push(*g, GFP_NOFS)?;
> +                remain -= 1;
> +                if remain == 0 {
> +                    break;
> +                }
> +            }
> +            offset += Offset::try_from(b.len())?;
> +        }
> +
> +        Self::check_descriptors(s, &groups)?;
> +
> +        sb.set_magic(s.magic.value().into());
> +        drop(mapped);
> +        Ok(Box::new(
> +            Ext2Fs {
> +                mapper,
> +                block_size,
> +                _block_size_bits: block_size_bits,
> +                has_file_type,
> +                inodes_per_group,
> +                inodes_per_block,
> +                inode_count,
> +                inode_size,
> +                first_ino,
> +                group: groups,
> +            },
> +            GFP_KERNEL,
> +        )?)
> +    }
> +
> +    fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>> {
> +        let inode = Self::iget(sb, EXT2_ROOT_INO)?;
> +        dentry::Root::try_new(inode)
> +    }
> +}
> +
> +fn rec_len(d: &DirEntry) -> u32 {
> +    let len = d.rec_len.value();
> +
> +    if PAGE_SIZE >= 65536 && len == u16::MAX {
> +        1u32 << 16
> +    } else {
> +        len.into()
> +    }
> +}
> +
> +#[vtable]
> +impl file::Operations for Ext2Fs {
> +    type FileSystem = Self;
> +
> +    fn seek(file: &File<Self>, offset: Offset, whence: file::Whence) -> Result<Offset> {
> +        file::generic_seek(file, offset, whence)
> +    }
> +
> +    fn read(_: &File<Self>, _: &mut user::Writer, _: &mut Offset) -> Result<usize> {
> +        Err(EISDIR)
> +    }
> +
> +    fn read_dir(

Wait, does this imply that regular files also have read_dir that you can
call?

> +        _file: &File<Self>,
> +        inode: &Locked<&INode<Self>, inode::ReadSem>,
> +        emitter: &mut file::DirEmitter,
> +    ) -> Result {
> +        let has_file_type = inode.super_block().data().has_file_type;
> +
> +        inode.for_each_page(emitter.pos(), Offset::MAX, |data| {

Neat that Rust can turn an indirect call to a lambda function into a
direct call.  I hear C can do that now too?

> +            let mut offset = 0usize;
> +            let mut acc: Offset = 0;
> +            let limit = data.len().saturating_sub(size_of::<DirEntry>());
> +            while offset < limit {
> +                let dirent = DirEntry::from_bytes(data, offset).ok_or(EIO)?;
> +                offset += size_of::<DirEntry>();
> +
> +                let name_len = usize::from(dirent.name_len);
> +                if data.len() - offset < name_len {
> +                    return Err(EIO);
> +                }
> +
> +                let name = &data[offset..][..name_len];
> +                let rec_len = rec_len(dirent);
> +                offset = offset - size_of::<DirEntry>() + rec_len as usize;
> +                if rec_len == 0 || offset > data.len() {
> +                    return Err(EIO);
> +                }
> +
> +                acc += Offset::from(rec_len);
> +                let ino = dirent.inode.value();
> +                if ino == 0 {
> +                    continue;
> +                }
> +
> +                let t = if !has_file_type {
> +                    file::DirEntryType::Unknown
> +                } else {
> +                    match dirent.file_type {
> +                        FT_REG_FILE => file::DirEntryType::Reg,
> +                        FT_DIR => file::DirEntryType::Dir,
> +                        FT_SYMLINK => file::DirEntryType::Lnk,
> +                        FT_CHRDEV => file::DirEntryType::Chr,
> +                        FT_BLKDEV => file::DirEntryType::Blk,
> +                        FT_FIFO => file::DirEntryType::Fifo,
> +                        FT_SOCK => file::DirEntryType::Sock,
> +                        _ => continue,

Isn't this a directory corruption?  return Err(EFSCORRUPTED) ?

> +                    }
> +                };
> +
> +                if !emitter.emit(acc, name, ino.into(), t) {
> +                    return Ok(Some(()));
> +                }
> +                acc = 0;
> +            }
> +            Ok(None)
> +        })?;
> +        Ok(())
> +    }
> +}
> +
> +#[vtable]
> +impl inode::Operations for Ext2Fs {
> +    type FileSystem = Self;
> +
> +    fn lookup(
> +        parent: &Locked<&INode<Self>, inode::ReadSem>,
> +        dentry: dentry::Unhashed<'_, Self>,
> +    ) -> Result<Option<ARef<DEntry<Self>>>> {
> +        let inode = parent.for_each_page(0, Offset::MAX, |data| {
> +            let mut offset = 0usize;
> +            while data.len() - offset > size_of::<DirEntry>() {
> +                let dirent = DirEntry::from_bytes(data, offset).ok_or(EIO)?;
> +                offset += size_of::<DirEntry>();
> +
> +                let name_len = usize::from(dirent.name_len);
> +                if data.len() - offset < name_len {
> +                    return Err(EIO);
> +                }
> +
> +                let name = &data[offset..][..name_len];
> +
> +                offset = offset - size_of::<DirEntry>() + usize::from(dirent.rec_len.value());
> +                if offset > data.len() {
> +                    return Err(EIO);
> +                }
> +
> +                let ino = dirent.inode.value();
> +                if ino != 0 && name == dentry.name() {
> +                    return Ok(Some(Self::iget(parent.super_block(), ino)?));
> +                }
> +            }
> +            Ok(None)
> +        })?;
> +
> +        dentry.splice_alias(inode)
> +    }
> +}
> +
> +impl iomap::Operations for Ext2Fs {
> +    type FileSystem = Self;
> +
> +    fn begin<'a>(
> +        inode: &'a INode<Self>,
> +        pos: Offset,
> +        length: Offset,
> +        _flags: u32,
> +        map: &mut iomap::Map<'a>,
> +        _srcmap: &mut iomap::Map<'a>,
> +    ) -> Result {
> +        let size = inode.size();
> +        if pos >= size {
> +            map.set_offset(pos)
> +                .set_length(length.try_into()?)
> +                .set_flags(iomap::map_flags::MERGED)
> +                .set_type(iomap::Type::Hole);
> +            return Ok(());
> +        }
> +
> +        let block_size = inode.super_block().data().block_size as Offset;
> +        let block = pos / block_size;
> +
> +        let boffset = Self::offset_to_block(inode, block)?;
> +        map.set_offset(block * block_size)
> +            .set_length(block_size as u64)
> +            .set_flags(iomap::map_flags::MERGED)
> +            .set_type(iomap::Type::Mapped)
> +            .set_bdev(Some(inode.super_block().bdev()))
> +            .set_addr(boffset * block_size as u64);

Neat use of chaining here.

> +
> +        Ok(())
> +    }
> +}
> +
> +fn decode_dev(block: &[LE<u32>]) -> (u32, u32) {
> +    let v = block[0].value();
> +    if v != 0 {
> +        ((v >> 8) & 255, v & 255)
> +    } else {
> +        let v = block[1].value();
> +        ((v & 0xfff00) >> 8, (v & 0xff) | ((v >> 12) & 0xfff00))

Nice not to have leXX_to_cpu calls everywhere here.

--D

> +    }
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 445599d4bff6..732bc9939f7f 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -165,3 +165,6 @@ macro_rules! container_of {
>          ptr.wrapping_sub(offset) as *const $type
>      }}
>  }
> +
> +/// The size in bytes of a page of memory.
> +pub const PAGE_SIZE: usize = bindings::PAGE_SIZE;
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 00/30] Rust abstractions for VFS
  2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
                   ` (29 preceding siblings ...)
  2024-05-14 13:17 ` [RFC PATCH v2 30/30] WIP: fs: ext2: add rust ro ext2 implementation Wedson Almeida Filho
@ 2024-05-31 14:34 ` Danilo Krummrich
  30 siblings, 0 replies; 35+ messages in thread
From: Danilo Krummrich @ 2024-05-31 14:34 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Dave Chinner, Alexander Viro, Matthew Wilcox, Kent Overstreet,
	Christian Brauner, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, linux-kernel

Hi Wedson,

On 5/14/24 15:16, Wedson Almeida Filho wrote:
> This series introduces Rust abstractions that allow read-only file systems to
> be written in Rust.
> 
> There are three file systems implementations using these abstractions
> abstractions: ext2, tarfs, and puzzlefs. The first two are part of this series.
> 
> Rust file system modules can be declared with the `module_fs` macro and are
> required to implement the following functions (which are part of the
> `FileSystem` trait):
> 
>      fn fill_super(
>          sb: &mut SuperBlock<Self, sb::New>,
>          mapper: Option<inode::Mapper>,
>      ) -> Result<Self::Data>;
> 
>      fn init_root(sb: &SuperBlock<Self>) -> Result<dentry::Root<Self>>;
> 
> They can optionally implement the following:
> 
>      fn read_xattr(
>          _dentry: &DEntry<Self>,
>          _inode: &INode<Self>,
>          _name: &CStr,
>          _outbuf: &mut [u8],
>      ) -> Result<usize>;
> 
>      fn statfs(_dentry: &DEntry<Self>) -> Result<Stat>;
> 
> They may also choose the type of the data they can attach to superblocks and/or
> inodes.
> 
> Lastly, file systems can implement inode, file, and address space operations
> and attach them to inodes when they're created, similar to how C does it. They
> can get a ro address space operations table from an implementation of iomap
> operations, to be used with generic ro file operations.
> 
> A git tree is available here:
>      git://github.com/wedsonaf/linux.git vfs-v2
> 
> Web:
>      https://github.com/wedsonaf/linux/commits/vfs-v2

This branch indicates that this patch series might have a few more dependencies
that are not upstream yet, e.g. [1].

Do you intend to send them in a separate series (soon)? In case they were already
submitted somewhere and I just failed to find them, please be so kind an provide
me with a pointer.

[1] https://github.com/wedsonaf/linux/commit/96ef0376887f4194ebad608f9943eb41108cf255

- Danilo

> 
> ---
> 
> Changes in v2:
> 
> - Rebased to latest rust-next tree
> - Removed buffer heads
> - Added iomap support
> - Removed `_pin` field from `Registration` as it's not needed anymore
> - Renamed sample filesystem to match the module's name
> - Using typestate instead of a separate type for superblock/new-superblock
> - Created separate submodules for superblocks, inodes, dentries, and files
> - Split out operations from FileSystem to inode/file/address_space ops, similar to how C does it
> - Removed usages of folio_set_error
> - Removed UniqueFolio, for now reading blocks from devices via the pagecache
> - Changed map() to return the entire folio if not in highmem
> - Added support for unlocking the folio asynchronously
> - Added `from_raw` to all new ref-counted types
> - Added explicit types in calls to cast()
> - Added typestate to folio
> - Added support for implementing get_link
> - Fixed data race when reading inode->i_state
> - Added nofs scope support during allocation
> - Link to v1: https://lore.kernel.org/rust-for-linux/20231018122518.128049-1-wedsonaf@gmail.com/
> 
> ---
> 
> Wedson Almeida Filho (30):
>    rust: fs: add registration/unregistration of file systems
>    rust: fs: introduce the `module_fs` macro
>    samples: rust: add initial ro file system sample
>    rust: fs: introduce `FileSystem::fill_super`
>    rust: fs: introduce `INode<T>`
>    rust: fs: introduce `DEntry<T>`
>    rust: fs: introduce `FileSystem::init_root`
>    rust: file: move `kernel::file` to `kernel::fs::file`
>    rust: fs: generalise `File` for different file systems
>    rust: fs: add empty file operations
>    rust: fs: introduce `file::Operations::read_dir`
>    rust: fs: introduce `file::Operations::seek`
>    rust: fs: introduce `file::Operations::read`
>    rust: fs: add empty inode operations
>    rust: fs: introduce `inode::Operations::lookup`
>    rust: folio: introduce basic support for folios
>    rust: fs: add empty address space operations
>    rust: fs: introduce `address_space::Operations::read_folio`
>    rust: fs: introduce `FileSystem::read_xattr`
>    rust: fs: introduce `FileSystem::statfs`
>    rust: fs: introduce more inode types
>    rust: fs: add per-superblock data
>    rust: fs: allow file systems backed by a block device
>    rust: fs: allow per-inode data
>    rust: fs: export file type from mode constants
>    rust: fs: allow populating i_lnk
>    rust: fs: add `iomap` module
>    rust: fs: add memalloc_nofs support
>    tarfs: introduce tar fs
>    WIP: fs: ext2: add rust ro ext2 implementation
> 
>   fs/Kconfig                        |   2 +
>   fs/Makefile                       |   2 +
>   fs/rust-ext2/Kconfig              |  13 +
>   fs/rust-ext2/Makefile             |   8 +
>   fs/rust-ext2/defs.rs              | 173 +++++++
>   fs/rust-ext2/ext2.rs              | 551 +++++++++++++++++++++
>   fs/tarfs/Kconfig                  |  15 +
>   fs/tarfs/Makefile                 |   8 +
>   fs/tarfs/defs.rs                  |  80 +++
>   fs/tarfs/tar.rs                   | 394 +++++++++++++++
>   rust/bindings/bindings_helper.h   |  11 +
>   rust/helpers.c                    | 182 +++++++
>   rust/kernel/block.rs              |  10 +-
>   rust/kernel/error.rs              |   8 +-
>   rust/kernel/file.rs               | 251 ----------
>   rust/kernel/folio.rs              | 305 ++++++++++++
>   rust/kernel/fs.rs                 | 492 +++++++++++++++++++
>   rust/kernel/fs/address_space.rs   |  90 ++++
>   rust/kernel/fs/dentry.rs          | 136 ++++++
>   rust/kernel/fs/file.rs            | 607 +++++++++++++++++++++++
>   rust/kernel/fs/inode.rs           | 780 ++++++++++++++++++++++++++++++
>   rust/kernel/fs/iomap.rs           | 281 +++++++++++
>   rust/kernel/fs/sb.rs              | 194 ++++++++
>   rust/kernel/lib.rs                |   6 +-
>   rust/kernel/mem_cache.rs          |   2 -
>   rust/kernel/user.rs               |   1 -
>   samples/rust/Kconfig              |  10 +
>   samples/rust/Makefile             |   1 +
>   samples/rust/rust_rofs.rs         | 202 ++++++++
>   scripts/generate_rust_analyzer.py |   2 +-
>   30 files changed, 4555 insertions(+), 262 deletions(-)
>   create mode 100644 fs/rust-ext2/Kconfig
>   create mode 100644 fs/rust-ext2/Makefile
>   create mode 100644 fs/rust-ext2/defs.rs
>   create mode 100644 fs/rust-ext2/ext2.rs
>   create mode 100644 fs/tarfs/Kconfig
>   create mode 100644 fs/tarfs/Makefile
>   create mode 100644 fs/tarfs/defs.rs
>   create mode 100644 fs/tarfs/tar.rs
>   delete mode 100644 rust/kernel/file.rs
>   create mode 100644 rust/kernel/folio.rs
>   create mode 100644 rust/kernel/fs.rs
>   create mode 100644 rust/kernel/fs/address_space.rs
>   create mode 100644 rust/kernel/fs/dentry.rs
>   create mode 100644 rust/kernel/fs/file.rs
>   create mode 100644 rust/kernel/fs/inode.rs
>   create mode 100644 rust/kernel/fs/iomap.rs
>   create mode 100644 rust/kernel/fs/sb.rs
>   create mode 100644 samples/rust/rust_rofs.rs
> 
> 
> base-commit: 183ea65d1fcd71039cf4d111a22d69c337bfd344

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-05-31 14:40 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-14 13:16 [RFC PATCH v2 00/30] Rust abstractions for VFS Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 01/30] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 02/30] rust: fs: introduce the `module_fs` macro Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 03/30] samples: rust: add initial ro file system sample Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 04/30] rust: fs: introduce `FileSystem::fill_super` Wedson Almeida Filho
2024-05-20 19:38   ` Darrick J. Wong
2024-05-14 13:16 ` [RFC PATCH v2 05/30] rust: fs: introduce `INode<T>` Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 06/30] rust: fs: introduce `DEntry<T>` Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 07/30] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 08/30] rust: file: move `kernel::file` to `kernel::fs::file` Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 09/30] rust: fs: generalise `File` for different file systems Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 10/30] rust: fs: add empty file operations Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 11/30] rust: fs: introduce `file::Operations::read_dir` Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 12/30] rust: fs: introduce `file::Operations::seek` Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 13/30] rust: fs: introduce `file::Operations::read` Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 14/30] rust: fs: add empty inode operations Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 15/30] rust: fs: introduce `inode::Operations::lookup` Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 16/30] rust: folio: introduce basic support for folios Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 17/30] rust: fs: add empty address space operations Wedson Almeida Filho
2024-05-14 13:16 ` [RFC PATCH v2 18/30] rust: fs: introduce `address_space::Operations::read_folio` Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 19/30] rust: fs: introduce `FileSystem::read_xattr` Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 20/30] rust: fs: introduce `FileSystem::statfs` Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 21/30] rust: fs: introduce more inode types Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 22/30] rust: fs: add per-superblock data Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 23/30] rust: fs: allow file systems backed by a block device Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 24/30] rust: fs: allow per-inode data Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 25/30] rust: fs: export file type from mode constants Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 26/30] rust: fs: allow populating i_lnk Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 27/30] rust: fs: add `iomap` module Wedson Almeida Filho
2024-05-20 19:32   ` Darrick J. Wong
2024-05-14 13:17 ` [RFC PATCH v2 28/30] rust: fs: add memalloc_nofs support Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 29/30] tarfs: introduce tar fs Wedson Almeida Filho
2024-05-14 13:17 ` [RFC PATCH v2 30/30] WIP: fs: ext2: add rust ro ext2 implementation Wedson Almeida Filho
2024-05-20 20:01   ` Darrick J. Wong
2024-05-31 14:34 ` [RFC PATCH v2 00/30] Rust abstractions for VFS Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).