All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: hch@infradead.org, djwong@kernel.org, dchinner@redhat.com,
	kbusch@kernel.org, sagi@grimberg.me, axboe@fb.com
Cc: willy@infradead.org, brauner@kernel.org, hare@suse.de,
	ritesh.list@gmail.com, rgoldwyn@suse.com, jack@suse.cz,
	ziy@nvidia.com, ryan.roberts@arm.com, patches@lists.linux.dev,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, p.raghav@samsung.com,
	da.gomez@samsung.com, dan.helmick@samsung.com, mcgrof@kernel.org
Subject: [RFC v2 05/10] bdev: allow to switch between bdev aops
Date: Fri, 15 Sep 2023 14:32:49 -0700	[thread overview]
Message-ID: <20230915213254.2724586-6-mcgrof@kernel.org> (raw)
In-Reply-To: <20230915213254.2724586-1-mcgrof@kernel.org>

Now that we have annotations for filesystems which require buffer-heads we
can use that flag to verify if we can use the filesystem on the target
block devices which require higher order folios. A filesystems which requires
buffer-heads cannot be used on block devices which have a logical block size
greater than PAGE_SIZE. We also want to allow to use buffer-head filesystems
on block devices and at a later time then unmount and switch to a filesystem
which supports bs > PAGE_SIZE, even if the logical block size of the block
device is PAGE_SIZE, and this requires iomap. Provide helpers to do all these
checks and resets the aops to iomap when needed.

Leaving iomap in place after an umount would not make such block devices usable
for buffer-head filesystems so we must reset the aops to buffer-heads also
on unmount.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/bdev.c           | 55 ++++++++++++++++++++++++++++++++++++++++++
 fs/super.c             |  3 ++-
 include/linux/blkdev.h |  7 ++++++
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/block/bdev.c b/block/bdev.c
index 0d685270cd34..bf3cfc02aaf9 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -150,6 +150,59 @@ static int bdev_bsize_limit(struct block_device *bdev)
 	return PAGE_SIZE;
 }
 
+#ifdef CONFIG_BUFFER_HEAD
+static void bdev_aops_set(struct block_device *bdev,
+			  const struct address_space_operations *aops)
+{
+	kill_bdev(bdev);
+	bdev->bd_inode->i_data.a_ops = aops;
+}
+
+static void bdev_aops_sync(struct super_block *sb, struct block_device *bdev,
+			   const struct address_space_operations *aops)
+{
+	sync_blockdev(bdev);
+	bdev_aops_set(bdev, aops);
+	kill_bdev(bdev);
+	bdev->bd_inode->i_data.a_ops = aops;
+}
+
+void bdev_aops_reset(struct block_device *bdev)
+{
+	bdev_aops_set(bdev, &def_blk_aops);
+}
+
+static int sb_bdev_aops_set(struct super_block *sb)
+{
+	struct block_device *bdev = sb->s_bdev;
+
+	if (mapping_min_folio_order(bdev->bd_inode->i_mapping) != 0 &&
+	    sb->s_type->fs_flags & FS_BUFFER_HEADS) {
+			pr_warn_ratelimited(
+"block device logical block size > PAGE_SIZE, buffer-head filesystem cannot be used.\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * We can switch back and forth, but we need to use buffer-heads
+	 * first, otherwise a filesystem created which only uses iomap
+	 * will have it sticky and we can't detect buffer-head filesystems
+	 * on mount.
+	 */
+	bdev_aops_sync(sb, bdev, &def_blk_aops);
+	if (sb->s_type->fs_flags & FS_BUFFER_HEADS)
+		return 0;
+
+	bdev_aops_sync(sb, bdev, &def_blk_aops_iomap);
+	return 0;
+}
+#else
+static int sb_bdev_aops_set(struct super_block *sb)
+{
+	return 0;
+}
+#endif
+
 int set_blocksize(struct block_device *bdev, int size)
 {
 	/* Size must be a power of two, and between 512 and supported order */
@@ -173,6 +226,8 @@ EXPORT_SYMBOL(set_blocksize);
 
 int sb_set_blocksize(struct super_block *sb, int size)
 {
+	if (sb_bdev_aops_set(sb))
+		return 0;
 	if (set_blocksize(sb->s_bdev, size))
 		return 0;
 	/* If we get here, we know size is power of two
diff --git a/fs/super.c b/fs/super.c
index 816a22a5cad1..eb269c9489cb 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1649,12 +1649,13 @@ void kill_block_super(struct super_block *sb)
 	generic_shutdown_super(sb);
 	if (bdev) {
 		sync_blockdev(bdev);
+		bdev_aops_reset(bdev);
 		blkdev_put(bdev, sb);
 	}
 }
 
 EXPORT_SYMBOL(kill_block_super);
-#endif
+#endif /* CONFIG_BLOCK */
 
 struct dentry *mount_nodev(struct file_system_type *fs_type,
 	int flags, void *data,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index eef450f25982..738a879a0786 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1503,6 +1503,13 @@ void sync_bdevs(bool wait);
 void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
 void printk_all_partitions(void);
 int __init early_lookup_bdev(const char *pathname, dev_t *dev);
+#ifdef CONFIG_BUFFER_HEAD
+void bdev_aops_reset(struct block_device *bdev);
+#else
+static inline void bdev_aops_reset(struct block_device *bdev)
+{
+}
+#endif
 #else
 static inline void invalidate_bdev(struct block_device *bdev)
 {
-- 
2.39.2


  parent reply	other threads:[~2023-09-15 21:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-15 21:32 [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 01/10] bdev: rename iomap aops Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 02/10] bdev: dynamically set aops to enable LBS support Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 03/10] bdev: increase bdev max blocksize depending on the aops used Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 04/10] filesystems: add filesytem buffer-head flag Luis Chamberlain
2023-09-15 21:32 ` Luis Chamberlain [this message]
2023-09-15 21:32 ` [RFC v2 06/10] bdev: simplify coexistance Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 07/10] nvme: enhance max supported LBA format check Luis Chamberlain
2023-09-15 22:20   ` Matthew Wilcox
2023-09-15 22:27     ` Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 08/10] nvme: add awun / nawun sanity check Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 09/10] nvme: add nvme_core.debug_large_atomics to force high awun as phys_bs Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 10/10] nvme: enable LBS support Luis Chamberlain
2023-09-15 21:51 ` [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads Matthew Wilcox
2023-09-15 22:26   ` Luis Chamberlain
2023-09-17 11:50   ` Hannes Reinecke
2023-09-18 17:12   ` Luis Chamberlain
2023-09-18 18:15     ` Matthew Wilcox
2023-09-18 18:42       ` Hannes Reinecke
2023-09-17 22:38 ` Dave Chinner
2023-09-17 23:14   ` Matthew Wilcox
2023-09-18  0:59     ` Dave Chinner
2023-09-18  1:13       ` Luis Chamberlain
2023-09-18  2:49         ` Dave Chinner
2023-09-18 17:51           ` Luis Chamberlain
2023-09-18 11:34     ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230915213254.2724586-6-mcgrof@kernel.org \
    --to=mcgrof@kernel.org \
    --cc=axboe@fb.com \
    --cc=brauner@kernel.org \
    --cc=da.gomez@samsung.com \
    --cc=dan.helmick@samsung.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=patches@lists.linux.dev \
    --cc=rgoldwyn@suse.com \
    --cc=ritesh.list@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=sagi@grimberg.me \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.