linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] bdev: allow buffer-head & iomap aops to co-exist
@ 2023-06-08  3:24 Luis Chamberlain
  2023-06-08  3:24 ` [RFC 1/4] bdev: replace export of blockdev_superblock with BDEVFS_MAGIC Luis Chamberlain
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Luis Chamberlain @ 2023-06-08  3:24 UTC (permalink / raw)
  To: hch, djwong, dchinner, kbusch, willy
  Cc: hare, ritesh.list, rgoldwyn, jack, patches, linux-xfs,
	linux-fsdevel, linux-block, p.raghav, da.gomez, rohan.puri,
	rpuri.linux, mcgrof, corbet, jake

At LSFMM it was clear that for some in order to support large order folios
we want to use iomap. So the filesystems staying and requiring buffer-heads
cannot make use of high order folios. This simplifies support and reduces
the scope for what we need to do in order to support high order folios for
buffered-io.

As Christoph's patches show though, we can only end up without buffer-heads
if you build and boot a system without any support for any filesystem that
requires buffer-heads. That's a tall order today.

We however want to be able to support block devices which may want to
completely opt-in to to only use iomap and iomap based filesystems. We
cannot do that today. To help with this we must extend the block device
cache so to enable each block device to get its own super block, and so
to later enable us to pick and choose the aops we use for the block
device.

The first patch seems already applicable upstream. The second is just
makes future changes easier to read. The third patch probably just needs
to be squashed into Christoph's work.

The last patch is the meat of this.

It goes boot tested, and applies on top of Christoph's patches which
enable you to build a sytem without buffer-heads. For convenience I've
stashed what this looks like on my large-block-20230607-dev-cache
branch [0].

The hot swapping of the aops is what would be next, and when we do that.
One option is to pursue things as-is now and then only flip if we need
high order folios. I'm sure Christoph will have better ideas how to do
that cleanly. But this is is what I have so far.

Lemme know how crappy this looks or pitfalls I'm sure I missed.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=large-block-20230607-dev-cache

Luis Chamberlain (4):
  bdev: replace export of blockdev_superblock with BDEVFS_MAGIC
  bdev: abstract inode lookup on blkdev_get_no_open()
  bdev: rename iomap aops
  bdev: extend bdev inode with it's own super_block

 block/bdev.c       | 106 +++++++++++++++++++++++++++++++++++++--------
 block/blk.h        |   1 +
 block/fops.c       |  14 +++---
 include/linux/fs.h |   4 +-
 4 files changed, 98 insertions(+), 27 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 1/4] bdev: replace export of blockdev_superblock with BDEVFS_MAGIC
  2023-06-08  3:24 [RFC 0/4] bdev: allow buffer-head & iomap aops to co-exist Luis Chamberlain
@ 2023-06-08  3:24 ` Luis Chamberlain
  2023-06-08 10:22   ` Jan Kara
  2023-06-08 13:53   ` Christoph Hellwig
  2023-06-08  3:24 ` [RFC 2/4] bdev: abstract inode lookup on blkdev_get_no_open() Luis Chamberlain
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Luis Chamberlain @ 2023-06-08  3:24 UTC (permalink / raw)
  To: hch, djwong, dchinner, kbusch, willy
  Cc: hare, ritesh.list, rgoldwyn, jack, patches, linux-xfs,
	linux-fsdevel, linux-block, p.raghav, da.gomez, rohan.puri,
	rpuri.linux, mcgrof, corbet, jake

There is no need to export blockdev_superblock because we can just
use the magic value of the block device cache super block, which is
already in place, BDEVFS_MAGIC. So just check for that.

This let's us remove the export of blockdev_superblock and also
let's this block dev cache scale as it wishes internally. For
instance in the future we may have different super block for each
block device. Right now it is all shared on one super block.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/bdev.c       | 1 -
 include/linux/fs.h | 4 ++--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 21c63bfef323..91477c3849d2 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -379,7 +379,6 @@ static struct file_system_type bd_type = {
 };
 
 struct super_block *blockdev_superblock __read_mostly;
-EXPORT_SYMBOL_GPL(blockdev_superblock);
 
 void __init bdev_cache_init(void)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0b54ac1d331b..948a384af8a3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -43,6 +43,7 @@
 #include <linux/cred.h>
 #include <linux/mnt_idmapping.h>
 #include <linux/slab.h>
+#include <uapi/linux/magic.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -2388,10 +2389,9 @@ extern struct kmem_cache *names_cachep;
 #define __getname()		kmem_cache_alloc(names_cachep, GFP_KERNEL)
 #define __putname(name)		kmem_cache_free(names_cachep, (void *)(name))
 
-extern struct super_block *blockdev_superblock;
 static inline bool sb_is_blkdev_sb(struct super_block *sb)
 {
-	return IS_ENABLED(CONFIG_BLOCK) && sb == blockdev_superblock;
+	return IS_ENABLED(CONFIG_BLOCK) && sb->s_magic == BDEVFS_MAGIC;
 }
 
 void emergency_thaw_all(void);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 2/4] bdev: abstract inode lookup on blkdev_get_no_open()
  2023-06-08  3:24 [RFC 0/4] bdev: allow buffer-head & iomap aops to co-exist Luis Chamberlain
  2023-06-08  3:24 ` [RFC 1/4] bdev: replace export of blockdev_superblock with BDEVFS_MAGIC Luis Chamberlain
@ 2023-06-08  3:24 ` Luis Chamberlain
  2023-06-08  3:24 ` [RFC 3/4] bdev: rename iomap aops Luis Chamberlain
  2023-06-08  3:24 ` [RFC 4/4] bdev: extend bdev inode with it's own super_block Luis Chamberlain
  3 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2023-06-08  3:24 UTC (permalink / raw)
  To: hch, djwong, dchinner, kbusch, willy
  Cc: hare, ritesh.list, rgoldwyn, jack, patches, linux-xfs,
	linux-fsdevel, linux-block, p.raghav, da.gomez, rohan.puri,
	rpuri.linux, mcgrof, corbet, jake

Provide an abstraction for how we lookup an inode
on blkdev_get_no_open() so we can later expand upon
the implementation on just relying on one super block.

This will make subsequent changes easier to review.

This introduces no functional changes.

Although we all probably want to just remove BLOCK_LEGACY_AUTOLOAD
removing it before has proven issues with both loopback [0] and
is expected to break mdraid [1], so this takes the more careful
approach to just keeping it.

[0] https://lore.kernel.org/all/20220222085354.GA6423@lst.de/T/#u
[1] https://lore.kernel.org/all/20220503212848.5853-1-dmoulding@me.com/

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/bdev.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 91477c3849d2..61d8d2722cda 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -666,15 +666,20 @@ static void blkdev_put_part(struct block_device *part, fmode_t mode)
 	blkdev_put_whole(whole, mode);
 }
 
+static struct inode *blkdev_inode_lookup(dev_t dev)
+{
+	return ilookup(blockdev_superblock, dev);
+}
+
 struct block_device *blkdev_get_no_open(dev_t dev)
 {
 	struct block_device *bdev;
 	struct inode *inode;
 
-	inode = ilookup(blockdev_superblock, dev);
+	inode = blkdev_inode_lookup(dev);
 	if (!inode && IS_ENABLED(CONFIG_BLOCK_LEGACY_AUTOLOAD)) {
 		blk_request_module(dev);
-		inode = ilookup(blockdev_superblock, dev);
+		inode = blkdev_inode_lookup(dev);
 		if (inode)
 			pr_warn_ratelimited(
 "block device autoloading is deprecated and will be removed.\n");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 3/4] bdev: rename iomap aops
  2023-06-08  3:24 [RFC 0/4] bdev: allow buffer-head & iomap aops to co-exist Luis Chamberlain
  2023-06-08  3:24 ` [RFC 1/4] bdev: replace export of blockdev_superblock with BDEVFS_MAGIC Luis Chamberlain
  2023-06-08  3:24 ` [RFC 2/4] bdev: abstract inode lookup on blkdev_get_no_open() Luis Chamberlain
@ 2023-06-08  3:24 ` Luis Chamberlain
  2023-06-08  3:24 ` [RFC 4/4] bdev: extend bdev inode with it's own super_block Luis Chamberlain
  3 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2023-06-08  3:24 UTC (permalink / raw)
  To: hch, djwong, dchinner, kbusch, willy
  Cc: hare, ritesh.list, rgoldwyn, jack, patches, linux-xfs,
	linux-fsdevel, linux-block, p.raghav, da.gomez, rohan.puri,
	rpuri.linux, mcgrof, corbet, jake

Allow buffer-head and iomap aops to co-exist on a build. Right
now the iomap aops is can only be used if you disable buffer-heads.
In the near future we should be able to dynamically select at runtime
the intended aops based on the nature of the filesystem and device
requirements.

So rename the iomap aops, and select use the new name if buffer-heads
is disabled. This introduces no functional changes.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/bdev.c |  4 ++++
 block/blk.h  |  1 +
 block/fops.c | 14 +++++++-------
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 61d8d2722cda..2b16afc2bd2a 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -408,7 +408,11 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 		return NULL;
 	inode->i_mode = S_IFBLK;
 	inode->i_rdev = 0;
+#ifdef CONFIG_BUFFER_HEAD
 	inode->i_data.a_ops = &def_blk_aops;
+#else
+	inode->i_data.a_ops = &def_blk_aops_iomap;
+#endif
 	mapping_set_gfp_mask(&inode->i_data, GFP_USER);
 
 	bdev = I_BDEV(inode);
diff --git a/block/blk.h b/block/blk.h
index 7ad7cb6ffa01..67bf2fa99fe9 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -453,6 +453,7 @@ long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
 long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
 
 extern const struct address_space_operations def_blk_aops;
+extern const struct address_space_operations def_blk_aops_iomap;
 
 int disk_register_independent_access_ranges(struct gendisk *disk);
 void disk_unregister_independent_access_ranges(struct gendisk *disk);
diff --git a/block/fops.c b/block/fops.c
index 24037b493f5f..51f7241ab389 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -455,13 +455,14 @@ const struct address_space_operations def_blk_aops = {
 	.migrate_folio	= buffer_migrate_folio_norefs,
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
-#else /* CONFIG_BUFFER_HEAD */
-static int blkdev_read_folio(struct file *file, struct folio *folio)
+
+#endif /* CONFIG_BUFFER_HEAD */
+static int blkdev_read_folio_iomap(struct file *file, struct folio *folio)
 {
 	return iomap_read_folio(folio, &blkdev_iomap_ops);
 }
 
-static void blkdev_readahead(struct readahead_control *rac)
+static void blkdev_readahead_iomap(struct readahead_control *rac)
 {
 	iomap_readahead(rac, &blkdev_iomap_ops);
 }
@@ -492,18 +493,17 @@ static int blkdev_writepages(struct address_space *mapping,
 	return iomap_writepages(mapping, wbc, &wpc, &blkdev_writeback_ops);
 }
 
-const struct address_space_operations def_blk_aops = {
+const struct address_space_operations def_blk_aops_iomap = {
 	.dirty_folio	= filemap_dirty_folio,
 	.release_folio		= iomap_release_folio,
 	.invalidate_folio	= iomap_invalidate_folio,
-	.read_folio		= blkdev_read_folio,
-	.readahead		= blkdev_readahead,
+	.read_folio		= blkdev_read_folio_iomap,
+	.readahead		= blkdev_readahead_iomap,
 	.writepages		= blkdev_writepages,
 	.is_partially_uptodate  = iomap_is_partially_uptodate,
 	.error_remove_page	= generic_error_remove_page,
 	.migrate_folio		= filemap_migrate_folio,
 };
-#endif /* CONFIG_BUFFER_HEAD */
 
 /*
  * for a block special file file_inode(file)->i_size is zero
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 4/4] bdev: extend bdev inode with it's own super_block
  2023-06-08  3:24 [RFC 0/4] bdev: allow buffer-head & iomap aops to co-exist Luis Chamberlain
                   ` (2 preceding siblings ...)
  2023-06-08  3:24 ` [RFC 3/4] bdev: rename iomap aops Luis Chamberlain
@ 2023-06-08  3:24 ` Luis Chamberlain
  2023-06-08 13:37   ` Matthew Wilcox
  2023-06-08 13:50   ` Christoph Hellwig
  3 siblings, 2 replies; 13+ messages in thread
From: Luis Chamberlain @ 2023-06-08  3:24 UTC (permalink / raw)
  To: hch, djwong, dchinner, kbusch, willy
  Cc: hare, ritesh.list, rgoldwyn, jack, patches, linux-xfs,
	linux-fsdevel, linux-block, p.raghav, da.gomez, rohan.puri,
	rpuri.linux, mcgrof, corbet, jake

We currently share a single super_block for the block device cache,
each block device corresponds to one inode on that super_block. This
implicates sharing one aops operation though, and in the near future
we want to be able to instead support using iomap on the super_block
for different block devices.

To allow more flexibility use a super_block per block device, so
that we can eventually allow co-existence with pure-iomap requirements
and block devices which require buffer-heads.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/bdev.c | 94 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 78 insertions(+), 16 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 2b16afc2bd2a..3ab952a77a11 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -30,9 +30,14 @@
 #include "../fs/internal.h"
 #include "blk.h"
 
+static LIST_HEAD(bdev_inode_list);
+static DEFINE_MUTEX(bdev_inode_mutex);
+
 struct bdev_inode {
 	struct block_device bdev;
 	struct inode vfs_inode;
+	struct vfsmount *bd_mnt;
+	struct list_head list;
 };
 
 static inline struct bdev_inode *BDEV_I(struct inode *inode)
@@ -321,10 +326,28 @@ static struct inode *bdev_alloc_inode(struct super_block *sb)
 	return &ei->vfs_inode;
 }
 
+static void bdev_remove_inode(struct bdev_inode *binode)
+{
+	struct bdev_inode *bdev_inode, *tmp;
+
+	kern_unmount(binode->bd_mnt);
+
+	mutex_lock(&bdev_inode_mutex);
+	list_for_each_entry_safe(bdev_inode, tmp, &bdev_inode_list, list) {
+		if (bdev_inode == binode) {
+			list_del_init(&bdev_inode->list);
+			break;
+		}
+	}
+	mutex_unlock(&bdev_inode_mutex);
+}
+
 static void bdev_free_inode(struct inode *inode)
 {
 	struct block_device *bdev = I_BDEV(inode);
 
+	bdev_remove_inode(BDEV_I(inode));
+
 	free_percpu(bdev->bd_stats);
 	kfree(bdev->bd_meta_info);
 
@@ -378,12 +401,9 @@ static struct file_system_type bd_type = {
 	.kill_sb	= kill_anon_super,
 };
 
-struct super_block *blockdev_superblock __read_mostly;
-
 void __init bdev_cache_init(void)
 {
 	int err;
-	static struct vfsmount *bd_mnt;
 
 	bdev_cachep = kmem_cache_create("bdev_cache", sizeof(struct bdev_inode),
 			0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
@@ -392,20 +412,23 @@ void __init bdev_cache_init(void)
 	err = register_filesystem(&bd_type);
 	if (err)
 		panic("Cannot register bdev pseudo-fs");
-	bd_mnt = kern_mount(&bd_type);
-	if (IS_ERR(bd_mnt))
-		panic("Cannot create bdev pseudo-fs");
-	blockdev_superblock = bd_mnt->mnt_sb;   /* For writeback */
 }
 
 struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 {
+	struct vfsmount *bd_mnt;
 	struct block_device *bdev;
 	struct inode *inode;
 
-	inode = new_inode(blockdev_superblock);
-	if (!inode)
+	bd_mnt = vfs_kern_mount(&bd_type, SB_KERNMOUNT, bd_type.name, NULL);
+	if (IS_ERR(bd_mnt))
 		return NULL;
+
+	inode = new_inode(bd_mnt->mnt_sb);
+	if (!inode) {
+		kern_unmount(bd_mnt);
+		goto err_out;
+	}
 	inode->i_mode = S_IFBLK;
 	inode->i_rdev = 0;
 #ifdef CONFIG_BUFFER_HEAD
@@ -426,12 +449,14 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 	else
 		bdev->bd_has_submit_bio = false;
 	bdev->bd_stats = alloc_percpu(struct disk_stats);
-	if (!bdev->bd_stats) {
-		iput(inode);
-		return NULL;
-	}
+	if (!bdev->bd_stats)
+		goto err_out;
 	bdev->bd_disk = disk;
+	BDEV_I(inode)->bd_mnt = bd_mnt; /* For writeback */
 	return bdev;
+err_out:
+	iput(inode);
+	return NULL;
 }
 
 void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors)
@@ -444,13 +469,16 @@ void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors)
 
 void bdev_add(struct block_device *bdev, dev_t dev)
 {
+	struct inode *inode = bdev->bd_inode;
+
 	bdev->bd_dev = dev;
 	bdev->bd_inode->i_rdev = dev;
 	bdev->bd_inode->i_ino = dev;
 	insert_inode_hash(bdev->bd_inode);
+	list_add_tail(&BDEV_I(inode)->list, &bdev_inode_list);
 }
 
-long nr_blockdev_pages(void)
+static long nr_blockdev_pages_sb(struct super_block *blockdev_superblock)
 {
 	struct inode *inode;
 	long ret = 0;
@@ -463,6 +491,19 @@ long nr_blockdev_pages(void)
 	return ret;
 }
 
+long nr_blockdev_pages(void)
+{
+	struct bdev_inode *bdev_inode;
+	long ret = 0;
+
+	mutex_lock(&bdev_inode_mutex);
+	list_for_each_entry(bdev_inode, &bdev_inode_list, list)
+		ret += nr_blockdev_pages_sb(bdev_inode->bd_mnt->mnt_sb);
+	mutex_unlock(&bdev_inode_mutex);
+
+	return ret;
+}
+
 /**
  * bd_may_claim - test whether a block device can be claimed
  * @bdev: block device of interest
@@ -672,7 +713,18 @@ static void blkdev_put_part(struct block_device *part, fmode_t mode)
 
 static struct inode *blkdev_inode_lookup(dev_t dev)
 {
-	return ilookup(blockdev_superblock, dev);
+	struct bdev_inode *bdev_inode;
+	struct inode *inode = NULL;
+
+	mutex_lock(&bdev_inode_mutex);
+	list_for_each_entry(bdev_inode, &bdev_inode_list, list) {
+		inode = ilookup(bdev_inode->bd_mnt->mnt_sb, dev);
+		if (inode)
+			break;
+	}
+	mutex_unlock(&bdev_inode_mutex);
+
+	return inode;
 }
 
 struct block_device *blkdev_get_no_open(dev_t dev)
@@ -961,7 +1013,7 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty)
 }
 EXPORT_SYMBOL(__invalidate_device);
 
-void sync_bdevs(bool wait)
+static void sync_bdev_sb(struct super_block *blockdev_superblock, bool wait)
 {
 	struct inode *inode, *old_inode = NULL;
 
@@ -1013,6 +1065,16 @@ void sync_bdevs(bool wait)
 	iput(old_inode);
 }
 
+void sync_bdevs(bool wait)
+{
+	struct bdev_inode *bdev_inode;
+
+	mutex_lock(&bdev_inode_mutex);
+	list_for_each_entry(bdev_inode, &bdev_inode_list, list)
+		sync_bdev_sb(bdev_inode->bd_mnt->mnt_sb, wait);
+	mutex_unlock(&bdev_inode_mutex);
+}
+
 /*
  * Handle STATX_DIOALIGN for block devices.
  *
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC 1/4] bdev: replace export of blockdev_superblock with BDEVFS_MAGIC
  2023-06-08  3:24 ` [RFC 1/4] bdev: replace export of blockdev_superblock with BDEVFS_MAGIC Luis Chamberlain
@ 2023-06-08 10:22   ` Jan Kara
  2023-06-08 13:53   ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Kara @ 2023-06-08 10:22 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, djwong, dchinner, kbusch, willy, hare, ritesh.list,
	rgoldwyn, jack, patches, linux-xfs, linux-fsdevel, linux-block,
	p.raghav, da.gomez, rohan.puri, rpuri.linux, corbet, jake

On Wed 07-06-23 20:24:01, Luis Chamberlain wrote:
> There is no need to export blockdev_superblock because we can just
> use the magic value of the block device cache super block, which is
> already in place, BDEVFS_MAGIC. So just check for that.
> 
> This let's us remove the export of blockdev_superblock and also
> let's this block dev cache scale as it wishes internally. For
> instance in the future we may have different super block for each
> block device. Right now it is all shared on one super block.
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  block/bdev.c       | 1 -
>  include/linux/fs.h | 4 ++--
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 21c63bfef323..91477c3849d2 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -379,7 +379,6 @@ static struct file_system_type bd_type = {
>  };
>  
>  struct super_block *blockdev_superblock __read_mostly;
> -EXPORT_SYMBOL_GPL(blockdev_superblock);

You can even make blockdev_superblock static. I like this! Otherwise the
patch looks good.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] bdev: extend bdev inode with it's own super_block
  2023-06-08  3:24 ` [RFC 4/4] bdev: extend bdev inode with it's own super_block Luis Chamberlain
@ 2023-06-08 13:37   ` Matthew Wilcox
  2023-06-08 13:50     ` Christoph Hellwig
  2023-06-08 13:50   ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2023-06-08 13:37 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, djwong, dchinner, kbusch, hare, ritesh.list, rgoldwyn, jack,
	patches, linux-xfs, linux-fsdevel, linux-block, p.raghav,
	da.gomez, rohan.puri, rpuri.linux, corbet, jake

On Wed, Jun 07, 2023 at 08:24:04PM -0700, Luis Chamberlain wrote:
> We currently share a single super_block for the block device cache,
> each block device corresponds to one inode on that super_block. This
> implicates sharing one aops operation though, and in the near future
> we want to be able to instead support using iomap on the super_block
> for different block devices.

> -struct super_block *blockdev_superblock __read_mostly;

Did we consider adding

+struct super_block *blockdev_sb_iomap __read_mostly;

and then considering only two superblocks instead of having a list of
all bdevs?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] bdev: extend bdev inode with it's own super_block
  2023-06-08 13:37   ` Matthew Wilcox
@ 2023-06-08 13:50     ` Christoph Hellwig
  2023-06-08 17:45       ` Luis Chamberlain
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-06-08 13:50 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luis Chamberlain, hch, djwong, dchinner, kbusch, hare,
	ritesh.list, rgoldwyn, jack, patches, linux-xfs, linux-fsdevel,
	linux-block, p.raghav, da.gomez, rohan.puri, rpuri.linux, corbet,
	jake

On Thu, Jun 08, 2023 at 02:37:34PM +0100, Matthew Wilcox wrote:
> On Wed, Jun 07, 2023 at 08:24:04PM -0700, Luis Chamberlain wrote:
> > We currently share a single super_block for the block device cache,
> > each block device corresponds to one inode on that super_block. This
> > implicates sharing one aops operation though, and in the near future
> > we want to be able to instead support using iomap on the super_block
> > for different block devices.
> 
> > -struct super_block *blockdev_superblock __read_mostly;
> 
> Did we consider adding
> 
> +struct super_block *blockdev_sb_iomap __read_mostly;
> 
> and then considering only two superblocks instead of having a list of
> all bdevs?

Or why the heck we would even do this to start with?  iomap has
absolutely nothing to do with superblocks.

Now maybe it might make sense to have a superblock per gendisk just
to remove all the weird special casing for the bdev inode in the
writeback code.  But that's something entirely different than this
patch.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] bdev: extend bdev inode with it's own super_block
  2023-06-08  3:24 ` [RFC 4/4] bdev: extend bdev inode with it's own super_block Luis Chamberlain
  2023-06-08 13:37   ` Matthew Wilcox
@ 2023-06-08 13:50   ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2023-06-08 13:50 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, djwong, dchinner, kbusch, willy, hare, ritesh.list,
	rgoldwyn, jack, patches, linux-xfs, linux-fsdevel, linux-block,
	p.raghav, da.gomez, rohan.puri, rpuri.linux, corbet, jake

On Wed, Jun 07, 2023 at 08:24:04PM -0700, Luis Chamberlain wrote:
> each block device corresponds to one inode on that super_block. This
> implicates sharing one aops operation though,

No, it does not.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/4] bdev: replace export of blockdev_superblock with BDEVFS_MAGIC
  2023-06-08  3:24 ` [RFC 1/4] bdev: replace export of blockdev_superblock with BDEVFS_MAGIC Luis Chamberlain
  2023-06-08 10:22   ` Jan Kara
@ 2023-06-08 13:53   ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2023-06-08 13:53 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, djwong, dchinner, kbusch, willy, hare, ritesh.list,
	rgoldwyn, jack, patches, linux-xfs, linux-fsdevel, linux-block,
	p.raghav, da.gomez, rohan.puri, rpuri.linux, corbet, jake

On Wed, Jun 07, 2023 at 08:24:01PM -0700, Luis Chamberlain wrote:
> -extern struct super_block *blockdev_superblock;
>  static inline bool sb_is_blkdev_sb(struct super_block *sb)
>  {
> -	return IS_ENABLED(CONFIG_BLOCK) && sb == blockdev_superblock;
> +	return IS_ENABLED(CONFIG_BLOCK) && sb->s_magic == BDEVFS_MAGIC;
>  }

So while I'd love to be able to make blockdev_superblock, I think
the existing code is the better idea here.  BDEVFS_MAGIC can easily
end up in any other s_magic as it's a free-form field.
blockdev_superblock can't accidentally have the same address as a
random non-bdev sb.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] bdev: extend bdev inode with it's own super_block
  2023-06-08 13:50     ` Christoph Hellwig
@ 2023-06-08 17:45       ` Luis Chamberlain
  2023-06-09  4:20         ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Luis Chamberlain @ 2023-06-08 17:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matthew Wilcox, djwong, dchinner, kbusch, hare, ritesh.list,
	rgoldwyn, jack, patches, linux-xfs, linux-fsdevel, linux-block,
	p.raghav, da.gomez, rohan.puri, rpuri.linux, corbet, jake

On Thu, Jun 08, 2023 at 06:50:15AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 08, 2023 at 02:37:34PM +0100, Matthew Wilcox wrote:
> > On Wed, Jun 07, 2023 at 08:24:04PM -0700, Luis Chamberlain wrote:
> > > We currently share a single super_block for the block device cache,
> > > each block device corresponds to one inode on that super_block. This
> > > implicates sharing one aops operation though, and in the near future
> > > we want to be able to instead support using iomap on the super_block
> > > for different block devices.
> > 
> > > -struct super_block *blockdev_superblock __read_mostly;
> > 
> > Did we consider adding
> > 
> > +struct super_block *blockdev_sb_iomap __read_mostly;
> > 
> > and then considering only two superblocks instead of having a list of
> > all bdevs?
> 
> Or why the heck we would even do this to start with?

That's what I gathered you suggested at LSFMM on hallway talk.

> iomap has absolutely nothing to do with superblocks.
> 
> Now maybe it might make sense to have a superblock per gendisk just
> to remove all the weird special casing for the bdev inode in the
> writeback code.  But that's something entirely different than this
> patch.

The goal behind this is to allow block devices to have its bdev cache
use iomap, right now now we show-horn in the buffer-head aops if we
have to build buffer-heads.

If this sort of approach is not desirable, let me know what alternative
you would prefer to see, because clearly, I must not have understood
your suggestion.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] bdev: extend bdev inode with it's own super_block
  2023-06-08 17:45       ` Luis Chamberlain
@ 2023-06-09  4:20         ` Christoph Hellwig
  2023-06-09  9:17           ` Luis Chamberlain
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-06-09  4:20 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, Matthew Wilcox, djwong, dchinner, kbusch,
	hare, ritesh.list, rgoldwyn, jack, patches, linux-xfs,
	linux-fsdevel, linux-block, p.raghav, da.gomez, rohan.puri,
	rpuri.linux, corbet, jake

On Thu, Jun 08, 2023 at 10:45:10AM -0700, Luis Chamberlain wrote:
> > > and then considering only two superblocks instead of having a list of
> > > all bdevs?
> > 
> > Or why the heck we would even do this to start with?
> 
> That's what I gathered you suggested at LSFMM on hallway talk.

No.  I explained you that sharing the superblock or has absolutely no
effct on the aops after you wanted to it.  I said it might be nice for
other reasons to have a sb per gendisk.

> > iomap has absolutely nothing to do with superblocks.
> > 
> > Now maybe it might make sense to have a superblock per gendisk just
> > to remove all the weird special casing for the bdev inode in the
> > writeback code.  But that's something entirely different than this
> > patch.
> 
> The goal behind this is to allow block devices to have its bdev cache
> use iomap, right now now we show-horn in the buffer-head aops if we
> have to build buffer-heads.
> 
> If this sort of approach is not desirable, let me know what alternative
> you would prefer to see, because clearly, I must not have understood
> your suggestion.

Again, every non-trivial file system right now has more than one set
of aops per superblock.  I'm not sure what problem you are trying to
solve here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] bdev: extend bdev inode with it's own super_block
  2023-06-09  4:20         ` Christoph Hellwig
@ 2023-06-09  9:17           ` Luis Chamberlain
  0 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2023-06-09  9:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matthew Wilcox, djwong, dchinner, kbusch, hare, ritesh.list,
	rgoldwyn, jack, patches, linux-xfs, linux-fsdevel, linux-block,
	p.raghav, da.gomez, rohan.puri, rpuri.linux, corbet, jake

On Thu, Jun 8, 2023 at 9:20 PM Christoph Hellwig <hch@infradead.org> wrote:
> Again, every non-trivial file system right now has more than one set
> of aops per superblock.  I'm not sure what problem you are trying to
> solve here.

Alright, a 2 liner does indeed let it co-exist and replace this mess, thanks.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-06-09  9:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08  3:24 [RFC 0/4] bdev: allow buffer-head & iomap aops to co-exist Luis Chamberlain
2023-06-08  3:24 ` [RFC 1/4] bdev: replace export of blockdev_superblock with BDEVFS_MAGIC Luis Chamberlain
2023-06-08 10:22   ` Jan Kara
2023-06-08 13:53   ` Christoph Hellwig
2023-06-08  3:24 ` [RFC 2/4] bdev: abstract inode lookup on blkdev_get_no_open() Luis Chamberlain
2023-06-08  3:24 ` [RFC 3/4] bdev: rename iomap aops Luis Chamberlain
2023-06-08  3:24 ` [RFC 4/4] bdev: extend bdev inode with it's own super_block Luis Chamberlain
2023-06-08 13:37   ` Matthew Wilcox
2023-06-08 13:50     ` Christoph Hellwig
2023-06-08 17:45       ` Luis Chamberlain
2023-06-09  4:20         ` Christoph Hellwig
2023-06-09  9:17           ` Luis Chamberlain
2023-06-08 13:50   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).