All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Groves <John@Groves.net>
To: John Groves <jgroves@micron.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	nvdimm@lists.linux.dev
Cc: John@Groves.net, john@jagalactic.com,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>,
	dave.hansen@linux.intel.com, gregory.price@memverge.com,
	John Groves <john@groves.net>
Subject: [RFC PATCH 13/20] famfs: Add iomap_ops
Date: Fri, 23 Feb 2024 11:41:57 -0600	[thread overview]
Message-ID: <2996a7e757c3762a9a28c789645acd289f5f7bc0.1708709155.git.john@groves.net> (raw)
In-Reply-To: <cover.1708709155.git.john@groves.net>

This commit introduces the famfs iomap_ops. When either
dax_iomap_fault() or dax_iomap_rw() is called, we get a callback
via our iomap_begin() handler. The question being asked is
"please resolve (file, offset) to (daxdev, offset)". The function
famfs_meta_to_dax_offset() does this.

The per-file metadata is just an extent list to the
backing dax dev.  The order of this resolution is O(N) for N
extents. Note with the current user space, files usually have
only one extent.

Signed-off-by: John Groves <john@groves.net>
---
 fs/famfs/famfs_file.c | 245 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 245 insertions(+)
 create mode 100644 fs/famfs/famfs_file.c

diff --git a/fs/famfs/famfs_file.c b/fs/famfs/famfs_file.c
new file mode 100644
index 000000000000..fc667d5f7be8
--- /dev/null
+++ b/fs/famfs/famfs_file.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * famfs - dax file system for shared fabric-attached memory
+ *
+ * Copyright 2023-2024 Micron Technology, Inc.
+ *
+ * This file system, originally based on ramfs the dax support from xfs,
+ * is intended to allow multiple host systems to mount a common file system
+ * view of dax files that map to shared memory.
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/dax.h>
+#include <linux/uio.h>
+#include <linux/iomap.h>
+#include <uapi/linux/famfs_ioctl.h>
+#include "famfs_internal.h"
+
+/*********************************************************************
+ * iomap_operations
+ *
+ * This stuff uses the iomap (dax-related) helpers to resolve file offsets to
+ * offsets within a dax device.
+ */
+
+/**
+ * famfs_meta_to_dax_offset()
+ *
+ * This function is called by famfs_iomap_begin() to resolve an offset in a file to
+ * an offset in a dax device. This is upcalled from dax from calls to both
+ * dax_iomap_fault() and dax_iomap_rw(). Dax finishes the job resolving a fault to
+ * a specific physical page (the fault case) or doing a memcpy variant (the rw case)
+ *
+ * Pages can be PTE (4k), PMD (2MiB) or (theoretically) PuD (1GiB)
+ * (these sizes are for X86; may vary on other cpu architectures
+ *
+ * @inode  - the file where the fault occurred
+ * @iomap  - struct iomap to be filled in to indicate where to find the right memory, relative
+ *           to a dax device.
+ * @offset - the offset within the file where the fault occurred (will be page boundary)
+ * @len    - the length of the faulted mapping (will be a page multiple)
+ *           (will be trimmed in *iomap if it's disjoint in the extent list)
+ * @flags
+ */
+static int
+famfs_meta_to_dax_offset(
+	struct inode *inode,
+	struct iomap *iomap,
+	loff_t        offset,
+	loff_t        len,
+	unsigned int  flags)
+{
+	struct famfs_file_meta *meta = (struct famfs_file_meta *)inode->i_private;
+	int i;
+	loff_t local_offset = offset;
+	struct famfs_fs_info  *fsi = inode->i_sb->s_fs_info;
+
+	iomap->offset = offset; /* file offset */
+
+	for (i = 0; i < meta->tfs_extent_ct; i++) {
+		loff_t dax_ext_offset = meta->tfs_extents[i].offset;
+		loff_t dax_ext_len    = meta->tfs_extents[i].len;
+
+		if ((dax_ext_offset == 0) && (meta->file_type != FAMFS_SUPERBLOCK))
+			pr_err("%s: zero offset on non-superblock file!!\n", __func__);
+
+		/* local_offset is the offset minus the size of extents skipped so far;
+		 * If local_offset < dax_ext_len, the data of interest starts in this extent
+		 */
+		if (local_offset < dax_ext_len) {
+			loff_t ext_len_remainder = dax_ext_len - local_offset;
+
+			/*+
+			 * OK, we found the file metadata extent where this data begins
+			 * @local_offset      - The offset within the current extent
+			 * @ext_len_remainder - Remaining length of ext after skipping local_offset
+			 *
+			 * iomap->addr is the offset within the dax device where that data
+			 * starts
+			 */
+			iomap->addr    = dax_ext_offset + local_offset; /* dax dev offset */
+			iomap->offset  = offset; /* file offset */
+			iomap->length  = min_t(loff_t, len, ext_len_remainder);
+			iomap->dax_dev = fsi->dax_devp;
+			iomap->type    = IOMAP_MAPPED;
+			iomap->flags   = flags;
+
+			return 0;
+		}
+		local_offset -= dax_ext_len; /* Get ready for the next extent */
+	}
+
+	/* Set iomap to zero length in this case, and return 0
+	 * This just means that the r/w is past EOF
+	 */
+	iomap->addr    = offset;
+	iomap->offset  = offset; /* file offset */
+	iomap->length  = 0; /* this had better result in no access to dax mem */
+	iomap->dax_dev = fsi->dax_devp;
+	iomap->type    = IOMAP_MAPPED;
+	iomap->flags   = flags;
+
+	return 0;
+}
+
+/**
+ * famfs_iomap_begin()
+ *
+ * This function is pretty simple because files are
+ * * never partially allocated
+ * * never have holes (never sparse)
+ * * never "allocate on write"
+ */
+static int
+famfs_iomap_begin(
+	struct inode	       *inode,
+	loff_t			offset,
+	loff_t			length,
+	unsigned int		flags,
+	struct iomap	       *iomap,
+	struct iomap	       *srcmap)
+{
+	struct famfs_file_meta *meta = inode->i_private;
+	size_t size;
+	int rc;
+
+	size = i_size_read(inode);
+
+	WARN_ON(size != meta->file_size);
+
+	rc = famfs_meta_to_dax_offset(inode, iomap, offset, length, flags);
+
+	return rc;
+}
+
+/* Note: We never need a special set of write_iomap_ops because famfs never
+ * performs allocation on write.
+ */
+const struct iomap_ops famfs_iomap_ops = {
+	.iomap_begin		= famfs_iomap_begin,
+};
+
+/*********************************************************************
+ * vm_operations
+ */
+static vm_fault_t
+__famfs_filemap_fault(
+	struct vm_fault		*vmf,
+	unsigned int		pe_size,
+	bool			write_fault)
+{
+	struct inode		*inode = file_inode(vmf->vma->vm_file);
+	vm_fault_t		ret;
+
+	if (write_fault) {
+		sb_start_pagefault(inode->i_sb);
+		file_update_time(vmf->vma->vm_file);
+	}
+
+	if (IS_DAX(inode)) {
+		pfn_t pfn;
+
+		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &famfs_iomap_ops);
+		if (ret & VM_FAULT_NEEDDSYNC)
+			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
+	} else {
+		/* All famfs faults will be dax... */
+		pr_err("%s: oops, non-dax fault\n", __func__);
+		ret = VM_FAULT_SIGBUS;
+	}
+
+	if (write_fault)
+		sb_end_pagefault(inode->i_sb);
+
+	return ret;
+}
+
+static inline bool
+famfs_is_write_fault(
+	struct vm_fault		*vmf)
+{
+	return (vmf->flags & FAULT_FLAG_WRITE) &&
+	       (vmf->vma->vm_flags & VM_SHARED);
+}
+
+static vm_fault_t
+famfs_filemap_fault(
+	struct vm_fault		*vmf)
+{
+	/* DAX can shortcut the normal fault path on write faults! */
+	return __famfs_filemap_fault(vmf, 0,
+			IS_DAX(file_inode(vmf->vma->vm_file)) && famfs_is_write_fault(vmf));
+}
+
+static vm_fault_t
+famfs_filemap_huge_fault(
+	struct vm_fault	*vmf,
+	unsigned int	 pe_size)
+{
+	if (!IS_DAX(file_inode(vmf->vma->vm_file))) {
+		pr_err("%s: file not marked IS_DAX!!\n", __func__);
+		return VM_FAULT_SIGBUS;
+	}
+
+	/* DAX can shortcut the normal fault path on write faults! */
+	return __famfs_filemap_fault(vmf, pe_size, famfs_is_write_fault(vmf));
+}
+
+static vm_fault_t
+famfs_filemap_page_mkwrite(
+	struct vm_fault		*vmf)
+{
+	return __famfs_filemap_fault(vmf, 0, true);
+}
+
+static vm_fault_t
+famfs_filemap_pfn_mkwrite(
+	struct vm_fault		*vmf)
+{
+	return __famfs_filemap_fault(vmf, 0, true);
+}
+
+static vm_fault_t
+famfs_filemap_map_pages(
+	struct vm_fault	       *vmf,
+	pgoff_t			start_pgoff,
+	pgoff_t			end_pgoff)
+{
+	vm_fault_t ret;
+
+	ret = filemap_map_pages(vmf, start_pgoff, end_pgoff);
+	return ret;
+}
+
+const struct vm_operations_struct famfs_file_vm_ops = {
+	.fault		= famfs_filemap_fault,
+	.huge_fault	= famfs_filemap_huge_fault,
+	.map_pages	= famfs_filemap_map_pages,
+	.page_mkwrite	= famfs_filemap_page_mkwrite,
+	.pfn_mkwrite	= famfs_filemap_pfn_mkwrite,
+};
+
-- 
2.43.0


  parent reply	other threads:[~2024-02-23 17:42 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-23 17:41 [RFC PATCH 00/20] Introduce the famfs shared-memory file system John Groves
2024-02-23 17:41 ` [RFC PATCH 01/20] famfs: Documentation John Groves
2024-02-23 17:41 ` [RFC PATCH 02/20] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2024-02-26 12:05   ` Jonathan Cameron
2024-02-26 15:00     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 03/20] dev_dax_iomap: Move dax_pgoff_to_phys from device.c to bus.c since both need it now John Groves
2024-02-26 12:10   ` Jonathan Cameron
2024-02-26 15:13     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 04/20] dev_dax_iomap: Save the kva from memremap John Groves
2024-02-26 12:21   ` Jonathan Cameron
2024-02-26 15:48     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 05/20] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
2024-02-26 12:32   ` Jonathan Cameron
2024-02-26 16:09     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 06/20] dev_dax_iomap: Add CONFIG_DEV_DAX_IOMAP kernel build parameter John Groves
2024-02-26 12:34   ` Jonathan Cameron
2024-02-26 16:12     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 07/20] famfs: Add include/linux/famfs_ioctl.h John Groves
2024-02-24  1:39   ` Randy Dunlap
2024-02-24  2:23     ` John Groves
2024-02-24  3:27       ` Randy Dunlap
2024-02-24 23:32         ` John Groves
2024-02-24 23:40           ` Randy Dunlap
2024-02-26 12:39   ` Jonathan Cameron
2024-02-26 16:44     ` John Groves
2024-02-26 16:56       ` Jonathan Cameron
2024-02-26 18:04         ` John Groves
2024-02-23 17:41 ` [RFC PATCH 08/20] famfs: Add famfs_internal.h John Groves
2024-02-26 12:48   ` Jonathan Cameron
2024-02-26 17:35     ` John Groves
2024-02-27 10:28       ` Jonathan Cameron
2024-02-28  1:06         ` John Groves
2024-02-27 13:38   ` Christian Brauner
2024-02-27 14:12     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 09/20] famfs: Add super_operations John Groves
2024-02-26 12:51   ` Jonathan Cameron
2024-02-26 21:47     ` John Groves
2024-02-27 10:34       ` Jonathan Cameron
2024-02-27 17:48     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 10/20] famfs: famfs_open_device() & dax_holder_operations John Groves
2024-02-26 12:56   ` Jonathan Cameron
2024-02-26 22:22     ` John Groves
2024-02-27 13:39   ` Christian Brauner
2024-02-27 18:38     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 11/20] famfs: Add fs_context_operations John Groves
2024-02-26 13:20   ` Jonathan Cameron
2024-02-26 22:43     ` John Groves
2024-02-27 13:41   ` Christian Brauner
2024-02-28  0:59     ` John Groves
2024-02-28  1:49       ` Randy Dunlap
2024-02-28  8:17         ` Christian Brauner
2024-02-28 10:07       ` Christian Brauner
2024-02-28 12:01         ` Christian Brauner
2024-02-23 17:41 ` [RFC PATCH 12/20] famfs: Add inode_operations and file_system_type John Groves
2024-02-26 13:25   ` Jonathan Cameron
2024-02-26 22:53     ` John Groves
2024-02-23 17:41 ` John Groves [this message]
2024-02-26 13:30   ` [RFC PATCH 13/20] famfs: Add iomap_ops Jonathan Cameron
2024-02-26 23:00     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 14/20] famfs: Add struct file_operations John Groves
2024-02-26 13:32   ` Jonathan Cameron
2024-02-26 23:09     ` John Groves
2024-02-23 17:41 ` [RFC PATCH 15/20] famfs: Add ioctl to file_operations John Groves
2024-02-26 13:44   ` Jonathan Cameron
2024-02-23 17:42 ` [RFC PATCH 16/20] famfs: Add fault counters John Groves
2024-02-23 18:23   ` Dave Hansen
2024-02-23 19:56     ` John Groves
2024-02-23 20:04       ` Dan Williams
2024-02-23 20:39         ` John Groves
2024-02-23 21:19           ` Dave Hansen
2024-02-23 23:50             ` Dan Williams
2024-02-24  3:59               ` Matthew Wilcox
2024-02-24  4:30                 ` Dan Williams
2024-02-23 17:42 ` [RFC PATCH 17/20] famfs: Add module stuff John Groves
2024-02-26 13:47   ` Jonathan Cameron
2024-02-27 22:15     ` John Groves
2024-02-23 17:42 ` [RFC PATCH 18/20] famfs: Support character dax via the dev_dax_iomap patch John Groves
2024-02-26 13:52   ` Jonathan Cameron
2024-02-27 22:27     ` John Groves
2024-02-23 17:42 ` [RFC PATCH 19/20] famfs: Update MAINTAINERS file John Groves
2024-02-23 17:42 ` [RFC PATCH 20/20] famfs: Add Kconfig and Makefile plumbing John Groves
2024-02-24  1:50   ` Randy Dunlap
2024-02-24  2:24     ` John Groves
2024-02-24  0:07 ` [RFC PATCH 00/20] Introduce the famfs shared-memory file system Luis Chamberlain
2024-02-26 13:27   ` John Groves
2024-02-26 15:53     ` Luis Chamberlain
2024-02-26 21:16       ` John Groves
2024-02-27  0:58         ` Luis Chamberlain
2024-02-27  2:05           ` John Groves
2024-02-29  2:15             ` Dave Chinner
2024-02-29 14:52               ` John Groves
2024-03-11  1:29                 ` Dave Chinner
2024-02-29  6:52 ` Amir Goldstein
2024-02-29 22:16   ` John Groves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2996a7e757c3762a9a28c789645acd289f5f7bc0.1708709155.git.john@groves.net \
    --to=john@groves.net \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=gregory.price@memverge.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jgroves@micron.com \
    --cc=john@jagalactic.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.