All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHES][RFC] set_blocksize() rework
@ 2024-04-27 21:09 Al Viro
  2024-04-27 21:10 ` [PATCH 1/7] bcache_register(): don't bother with set_blocksize() Al Viro
                   ` (7 more replies)
  0 siblings, 8 replies; 51+ messages in thread
From: Al Viro @ 2024-04-27 21:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

	Any buffer-cache based filesystem is going to be FUBAR
if somebody manages to change block size of device under it,
since primitives (sb_bread(), sb_getblk(), etc.) operate in
terms of block numbers.  If block size suddenly doubles, so
will the offsets from the beginning of device.  Results are
not pretty, obviously.

	The thing that (mostly) prevents that kind of mess
is that most of the mechanisms that lead to block size
change require the device being opened exclusive.  However,
there are several exceptions that allow to do that without
an exclusive open.  Fortunately, all of them require
CAP_SYS_ADMIN, so it's not a security problem - anyone
who already has that level of access can screw the system
into the ground in any number of ways.  However, security
problems or not, that crap should be fixed.

	The series below eliminates these calls of set_blocksize()
and changes calling conventsion of set_blocksize() so that it
uses struct file * instead of struct block_device * to tell
which device to act upon.  Unlike struct block_device, struct
file has enough information to tell an exclusive open from
non-exclusive one, so we can reject the operation in non-exclusive
case.

	The branch is available at
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.set_blocksize
Individual patches in followups.

Review (and testing, obviously) would be very welcome.

Shortlog:
Al Viro (7):
      bcache_register(): don't bother with set_blocksize()
      pktcdvd: sort set_blocksize() calls out
      swapon(2)/swapoff(2): don't bother with block size
      swapon(2): open swap with O_EXCL
      swsusp: don't bother with setting block size
      btrfs_get_dev_args_from_path(): don't call set_blocksize()
      set_blocksize(): switch to passing struct file *, fail if it's not opened exclusive

Diffstat:
 block/bdev.c              | 14 ++++++++++----
 block/ioctl.c             | 21 ++++++++++++---------
 drivers/block/pktcdvd.c   |  7 +------
 drivers/md/bcache/super.c |  4 ----
 fs/btrfs/dev-replace.c    |  2 +-
 fs/btrfs/volumes.c        | 13 ++++++++-----
 fs/ext4/super.c           |  2 +-
 fs/reiserfs/journal.c     |  5 ++---
 fs/xfs/xfs_buf.c          |  2 +-
 include/linux/blkdev.h    |  2 +-
 include/linux/swap.h      |  2 --
 kernel/power/swap.c       |  7 +------
 mm/swapfile.c             | 29 ++---------------------------
 13 files changed, 40 insertions(+), 70 deletions(-)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/7] bcache_register(): don't bother with set_blocksize()
  2024-04-27 21:09 [PATCHES][RFC] set_blocksize() rework Al Viro
@ 2024-04-27 21:10 ` Al Viro
  2024-04-29  5:06   ` Christoph Hellwig
  2024-04-29  8:37   ` Christian Brauner
  2024-04-27 21:10 ` [PATCH 2/7] pktcdvd: sort set_blocksize() calls out Al Viro
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 51+ messages in thread
From: Al Viro @ 2024-04-27 21:10 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

We are not using __bread() anymore and read_cache_page_gfp() doesn't
care about block size.  Moreover, we should *not* change block
size on a device that is currently held exclusive - filesystems
that use buffer cache expect the block numbers to be interpreted
in units set by filesystem.

ACKed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/md/bcache/super.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 330bcd9ea4a9..0ee5e17ae2dd 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2554,10 +2554,6 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 	if (IS_ERR(bdev_file))
 		goto out_free_sb;
 
-	err = "failed to set blocksize";
-	if (set_blocksize(file_bdev(bdev_file), 4096))
-		goto out_blkdev_put;
-
 	err = read_super(sb, file_bdev(bdev_file), &sb_disk);
 	if (err)
 		goto out_blkdev_put;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/7] pktcdvd: sort set_blocksize() calls out
  2024-04-27 21:09 [PATCHES][RFC] set_blocksize() rework Al Viro
  2024-04-27 21:10 ` [PATCH 1/7] bcache_register(): don't bother with set_blocksize() Al Viro
@ 2024-04-27 21:10 ` Al Viro
  2024-04-29  5:07   ` Christoph Hellwig
  2024-04-29  8:38   ` Christian Brauner
  2024-04-27 21:10 ` [PATCH 3/7] swapon(2)/swapoff(2): don't bother with block size Al Viro
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 51+ messages in thread
From: Al Viro @ 2024-04-27 21:10 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

1) it doesn't make any sense to have ->open() call set_blocksize() on the
device being opened - the caller will override that anyway.

2) setting block size on underlying device, OTOH, ought to be done when
we are opening it exclusive - i.e. as part of pkt_open_dev().  Having
it done at setup time doesn't guarantee us anything about the state
at the time we start talking to it.  Worse, if you happen to have
the underlying device containing e.g. ext2 with 4Kb blocks that
is currently mounted r/o, that set_blocksize() will confuse the hell
out of filesystem.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/block/pktcdvd.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 21728e9ea5c3..05933f25b397 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2215,6 +2215,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
 		}
 		dev_info(ddev, "%lukB available on disc\n", lba << 1);
 	}
+	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
 
 	return 0;
 
@@ -2278,11 +2279,6 @@ static int pkt_open(struct gendisk *disk, blk_mode_t mode)
 		ret = pkt_open_dev(pd, mode & BLK_OPEN_WRITE);
 		if (ret)
 			goto out_dec;
-		/*
-		 * needed here as well, since ext2 (among others) may change
-		 * the blocksize at mount time
-		 */
-		set_blocksize(disk->part0, CD_FRAMESIZE);
 	}
 	mutex_unlock(&ctl_mutex);
 	mutex_unlock(&pktcdvd_mutex);
@@ -2526,7 +2522,6 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
 	__module_get(THIS_MODULE);
 
 	pd->bdev_file = bdev_file;
-	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
 
 	atomic_set(&pd->cdrw.pending_bios, 0);
 	pd->cdrw.thread = kthread_run(kcdrwd, pd, "%s", pd->disk->disk_name);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/7] swapon(2)/swapoff(2): don't bother with block size
  2024-04-27 21:09 [PATCHES][RFC] set_blocksize() rework Al Viro
  2024-04-27 21:10 ` [PATCH 1/7] bcache_register(): don't bother with set_blocksize() Al Viro
  2024-04-27 21:10 ` [PATCH 2/7] pktcdvd: sort set_blocksize() calls out Al Viro
@ 2024-04-27 21:10 ` Al Viro
  2024-04-29  5:08   ` Christoph Hellwig
  2024-04-29  8:38   ` Christian Brauner
  2024-04-27 21:11 ` [PATCH 4/7] swapon(2): open swap with O_EXCL Al Viro
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 51+ messages in thread
From: Al Viro @ 2024-04-27 21:10 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

once upon a time that used to matter; these days we do swap IO for
swap devices at the level that doesn't give a damn about block size,
buffer_head or anything of that sort - just attach the page to
bio, set the location and size (the latter to PAGE_SIZE) and feed
into queue.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/swap.h |  1 -
 mm/swapfile.c        | 12 +-----------
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index f53d608daa01..a5b640cca459 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -301,7 +301,6 @@ struct swap_info_struct {
 	struct file *bdev_file;		/* open handle of the bdev */
 	struct block_device *bdev;	/* swap device or bdev of swap file */
 	struct file *swap_file;		/* seldom referenced */
-	unsigned int old_block_size;	/* seldom referenced */
 	struct completion comp;		/* seldom referenced */
 	spinlock_t lock;		/*
 					 * protect map scan related fields like
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4919423cce76..304f74d039f3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2417,7 +2417,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	struct inode *inode;
 	struct filename *pathname;
 	int err, found = 0;
-	unsigned int old_block_size;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -2529,7 +2528,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	}
 
 	swap_file = p->swap_file;
-	old_block_size = p->old_block_size;
 	p->swap_file = NULL;
 	p->max = 0;
 	swap_map = p->swap_map;
@@ -2553,7 +2551,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 
 	inode = mapping->host;
 	if (p->bdev_file) {
-		set_blocksize(p->bdev, old_block_size);
 		fput(p->bdev_file);
 		p->bdev_file = NULL;
 	}
@@ -2782,21 +2779,15 @@ static struct swap_info_struct *alloc_swap_info(void)
 
 static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 {
-	int error;
-
 	if (S_ISBLK(inode->i_mode)) {
 		p->bdev_file = bdev_file_open_by_dev(inode->i_rdev,
 				BLK_OPEN_READ | BLK_OPEN_WRITE, p, NULL);
 		if (IS_ERR(p->bdev_file)) {
-			error = PTR_ERR(p->bdev_file);
+			int error = PTR_ERR(p->bdev_file);
 			p->bdev_file = NULL;
 			return error;
 		}
 		p->bdev = file_bdev(p->bdev_file);
-		p->old_block_size = block_size(p->bdev);
-		error = set_blocksize(p->bdev, PAGE_SIZE);
-		if (error < 0)
-			return error;
 		/*
 		 * Zoned block devices contain zones that have a sequential
 		 * write only restriction.  Hence zoned block devices are not
@@ -3235,7 +3226,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	free_percpu(p->cluster_next_cpu);
 	p->cluster_next_cpu = NULL;
 	if (p->bdev_file) {
-		set_blocksize(p->bdev, p->old_block_size);
 		fput(p->bdev_file);
 		p->bdev_file = NULL;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-27 21:09 [PATCHES][RFC] set_blocksize() rework Al Viro
                   ` (2 preceding siblings ...)
  2024-04-27 21:10 ` [PATCH 3/7] swapon(2)/swapoff(2): don't bother with block size Al Viro
@ 2024-04-27 21:11 ` Al Viro
  2024-04-27 21:40   ` Linus Torvalds
                     ` (2 more replies)
  2024-04-27 21:11 ` [PATCH 5/7] swsusp: don't bother with setting block size Al Viro
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 51+ messages in thread
From: Al Viro @ 2024-04-27 21:11 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

... eliminating the need to reopen block devices so they could be
exclusively held.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/swap.h |  1 -
 mm/swapfile.c        | 19 ++-----------------
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index a5b640cca459..7e61a4aef2fc 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -298,7 +298,6 @@ struct swap_info_struct {
 	unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */
 	struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */
 	struct rb_root swap_extent_root;/* root of the swap extent rbtree */
-	struct file *bdev_file;		/* open handle of the bdev */
 	struct block_device *bdev;	/* swap device or bdev of swap file */
 	struct file *swap_file;		/* seldom referenced */
 	struct completion comp;		/* seldom referenced */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 304f74d039f3..71cb76a2d0ce 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2550,10 +2550,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	exit_swap_address_space(p->type);
 
 	inode = mapping->host;
-	if (p->bdev_file) {
-		fput(p->bdev_file);
-		p->bdev_file = NULL;
-	}
 
 	inode_lock(inode);
 	inode->i_flags &= ~S_SWAPFILE;
@@ -2780,14 +2776,7 @@ static struct swap_info_struct *alloc_swap_info(void)
 static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 {
 	if (S_ISBLK(inode->i_mode)) {
-		p->bdev_file = bdev_file_open_by_dev(inode->i_rdev,
-				BLK_OPEN_READ | BLK_OPEN_WRITE, p, NULL);
-		if (IS_ERR(p->bdev_file)) {
-			int error = PTR_ERR(p->bdev_file);
-			p->bdev_file = NULL;
-			return error;
-		}
-		p->bdev = file_bdev(p->bdev_file);
+		p->bdev = I_BDEV(inode);
 		/*
 		 * Zoned block devices contain zones that have a sequential
 		 * write only restriction.  Hence zoned block devices are not
@@ -3028,7 +3017,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		name = NULL;
 		goto bad_swap;
 	}
-	swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+	swap_file = file_open_name(name, O_RDWR|O_LARGEFILE|O_EXCL, 0);
 	if (IS_ERR(swap_file)) {
 		error = PTR_ERR(swap_file);
 		swap_file = NULL;
@@ -3225,10 +3214,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	p->percpu_cluster = NULL;
 	free_percpu(p->cluster_next_cpu);
 	p->cluster_next_cpu = NULL;
-	if (p->bdev_file) {
-		fput(p->bdev_file);
-		p->bdev_file = NULL;
-	}
 	inode = NULL;
 	destroy_swap_extents(p);
 	swap_cgroup_swapoff(p->type);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/7] swsusp: don't bother with setting block size
  2024-04-27 21:09 [PATCHES][RFC] set_blocksize() rework Al Viro
                   ` (3 preceding siblings ...)
  2024-04-27 21:11 ` [PATCH 4/7] swapon(2): open swap with O_EXCL Al Viro
@ 2024-04-27 21:11 ` Al Viro
  2024-04-29  5:10   ` Christoph Hellwig
  2024-04-29  8:40   ` Christian Brauner
  2024-04-27 21:12 ` [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize() Al Viro
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 51+ messages in thread
From: Al Viro @ 2024-04-27 21:11 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

same as with the swap...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/power/swap.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 5bc04bfe2db1..d9abb7ab031d 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -368,11 +368,7 @@ static int swsusp_swap_check(void)
 	if (IS_ERR(hib_resume_bdev_file))
 		return PTR_ERR(hib_resume_bdev_file);
 
-	res = set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
-	if (res < 0)
-		fput(hib_resume_bdev_file);
-
-	return res;
+	return 0;
 }
 
 /**
@@ -1574,7 +1570,6 @@ int swsusp_check(bool exclusive)
 	hib_resume_bdev_file = bdev_file_open_by_dev(swsusp_resume_device,
 				BLK_OPEN_READ, holder, NULL);
 	if (!IS_ERR(hib_resume_bdev_file)) {
-		set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
 		clear_page(swsusp_header);
 		error = hib_submit_io(REQ_OP_READ, swsusp_resume_block,
 					swsusp_header, NULL);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize()
  2024-04-27 21:09 [PATCHES][RFC] set_blocksize() rework Al Viro
                   ` (4 preceding siblings ...)
  2024-04-27 21:11 ` [PATCH 5/7] swsusp: don't bother with setting block size Al Viro
@ 2024-04-27 21:12 ` Al Viro
  2024-04-29  5:11   ` Christoph Hellwig
                     ` (2 more replies)
  2024-04-27 21:13 ` [PATCH 7/7] set_blocksize(): switch to passing struct file *, fail if it's not opened exclusive Al Viro
  2024-05-03  3:18 ` [PATCHES v2][RFC] set_blocksize() rework Al Viro
  7 siblings, 3 replies; 51+ messages in thread
From: Al Viro @ 2024-04-27 21:12 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

We don't have bdev opened exclusive there.  And I'm rather dubious
about the need to do set_blocksize() anywhere in btrfs, to be
honest - there's some access to page cache of underlying block
devices in there, but it's nowhere near the hot paths, AFAICT.

In any case, btrfs_get_dev_args_from_path() only needs to read
the on-disk superblock and copy several fields out of it; all
callers are only interested in devices that are already opened
and brought into per-filesystem set, so setting the block size
is redundant for those and actively harmful if we are given
a pathname of unrelated device.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/btrfs/volumes.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f15591f3e54f..43af5a9fb547 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -482,10 +482,12 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 
 	if (flush)
 		sync_blockdev(bdev);
-	ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
-	if (ret) {
-		fput(*bdev_file);
-		goto error;
+	if (holder) {
+		ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
+		if (ret) {
+			fput(*bdev_file);
+			goto error;
+		}
 	}
 	invalidate_bdev(bdev);
 	*disk_super = btrfs_read_dev_super(bdev);
@@ -498,6 +500,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 	return 0;
 
 error:
+	*disk_super = NULL;
 	*bdev_file = NULL;
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 7/7] set_blocksize(): switch to passing struct file *, fail if it's not opened exclusive
  2024-04-27 21:09 [PATCHES][RFC] set_blocksize() rework Al Viro
                   ` (5 preceding siblings ...)
  2024-04-27 21:12 ` [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize() Al Viro
@ 2024-04-27 21:13 ` Al Viro
  2024-04-29  5:12   ` Christoph Hellwig
  2024-04-29  8:42   ` Christian Brauner
  2024-05-03  3:18 ` [PATCHES v2][RFC] set_blocksize() rework Al Viro
  7 siblings, 2 replies; 51+ messages in thread
From: Al Viro @ 2024-04-27 21:13 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/bdev.c            | 14 ++++++++++----
 block/ioctl.c           | 21 ++++++++++++---------
 drivers/block/pktcdvd.c |  2 +-
 fs/btrfs/dev-replace.c  |  2 +-
 fs/btrfs/volumes.c      |  4 ++--
 fs/ext4/super.c         |  2 +-
 fs/reiserfs/journal.c   |  5 ++---
 fs/xfs/xfs_buf.c        |  2 +-
 include/linux/blkdev.h  |  2 +-
 9 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index b8e32d933a63..a89bce368b64 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -144,8 +144,11 @@ static void set_init_blocksize(struct block_device *bdev)
 	bdev->bd_inode->i_blkbits = blksize_bits(bsize);
 }
 
-int set_blocksize(struct block_device *bdev, int size)
+int set_blocksize(struct file *file, int size)
 {
+	struct inode *inode = file->f_mapping->host;
+	struct block_device *bdev = I_BDEV(inode);
+
 	/* Size must be a power of two, and between 512 and PAGE_SIZE */
 	if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
 		return -EINVAL;
@@ -154,10 +157,13 @@ int set_blocksize(struct block_device *bdev, int size)
 	if (size < bdev_logical_block_size(bdev))
 		return -EINVAL;
 
+	if (!file->private_data)
+		return -EINVAL;
+
 	/* Don't change the size if it is same as current */
-	if (bdev->bd_inode->i_blkbits != blksize_bits(size)) {
+	if (inode->i_blkbits != blksize_bits(size)) {
 		sync_blockdev(bdev);
-		bdev->bd_inode->i_blkbits = blksize_bits(size);
+		inode->i_blkbits = blksize_bits(size);
 		kill_bdev(bdev);
 	}
 	return 0;
@@ -167,7 +173,7 @@ EXPORT_SYMBOL(set_blocksize);
 
 int sb_set_blocksize(struct super_block *sb, int size)
 {
-	if (set_blocksize(sb->s_bdev, size))
+	if (set_blocksize(sb->s_bdev_file, size))
 		return 0;
 	/* If we get here, we know size is power of two
 	 * and it's value is between 512 and PAGE_SIZE */
diff --git a/block/ioctl.c b/block/ioctl.c
index a9028a2c2db5..1c800364bc70 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -473,11 +473,14 @@ static int compat_hdio_getgeo(struct block_device *bdev,
 #endif
 
 /* set the logical block size */
-static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
+static int blkdev_bszset(struct file *file, blk_mode_t mode,
 		int __user *argp)
 {
+	// this one might be file_inode(file)->i_rdev - a rare valid
+	// use of file_inode() for those.
+	dev_t dev = I_BDEV(file->f_mapping->host)->bd_dev;
+	struct file *excl_file;
 	int ret, n;
-	struct file *file;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
@@ -487,13 +490,13 @@ static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
 		return -EFAULT;
 
 	if (mode & BLK_OPEN_EXCL)
-		return set_blocksize(bdev, n);
+		return set_blocksize(file, n);
 
-	file = bdev_file_open_by_dev(bdev->bd_dev, mode, &bdev, NULL);
-	if (IS_ERR(file))
+	excl_file = bdev_file_open_by_dev(dev, mode, &dev, NULL);
+	if (IS_ERR(excl_file))
 		return -EBUSY;
-	ret = set_blocksize(bdev, n);
-	fput(file);
+	ret = set_blocksize(excl_file, n);
+	fput(excl_file);
 	return ret;
 }
 
@@ -621,7 +624,7 @@ long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	case BLKBSZGET: /* get block device soft block size (cf. BLKSSZGET) */
 		return put_int(argp, block_size(bdev));
 	case BLKBSZSET:
-		return blkdev_bszset(bdev, mode, argp);
+		return blkdev_bszset(file, mode, argp);
 	case BLKGETSIZE64:
 		return put_u64(argp, bdev_nr_bytes(bdev));
 
@@ -681,7 +684,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	case BLKBSZGET_32: /* get the logical block size (cf. BLKSSZGET) */
 		return put_int(argp, bdev_logical_block_size(bdev));
 	case BLKBSZSET_32:
-		return blkdev_bszset(bdev, mode, argp);
+		return blkdev_bszset(file, mode, argp);
 	case BLKGETSIZE64_32:
 		return put_u64(argp, bdev_nr_bytes(bdev));
 
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 05933f25b397..8a2ce8070010 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2215,7 +2215,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
 		}
 		dev_info(ddev, "%lukB available on disc\n", lba << 1);
 	}
-	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
+	set_blocksize(bdev_file, CD_FRAMESIZE);
 
 	return 0;
 
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 7696beec4c21..7130040d92ab 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -316,7 +316,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
 	set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
 	device->dev_stats_valid = 1;
-	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
+	set_blocksize(bdev_file, BTRFS_BDEV_BLOCKSIZE);
 	device->fs_devices = fs_devices;
 
 	ret = btrfs_get_dev_zone_info(device, false);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 43af5a9fb547..65c03ddecc59 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -483,7 +483,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 	if (flush)
 		sync_blockdev(bdev);
 	if (holder) {
-		ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
+		ret = set_blocksize(*bdev_file, BTRFS_BDEV_BLOCKSIZE);
 		if (ret) {
 			fput(*bdev_file);
 			goto error;
@@ -2717,7 +2717,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
 	clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
 	device->dev_stats_valid = 1;
-	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
+	set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
 
 	if (seeding_dev) {
 		btrfs_clear_sb_rdonly(sb);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 044135796f2b..9988b3a40b42 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5873,7 +5873,7 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
 
 	sb_block = EXT4_MIN_BLOCK_SIZE / blocksize;
 	offset = EXT4_MIN_BLOCK_SIZE % blocksize;
-	set_blocksize(bdev, blocksize);
+	set_blocksize(bdev_file, blocksize);
 	bh = __bread(bdev, sb_block, blocksize);
 	if (!bh) {
 		ext4_msg(sb, KERN_ERR, "couldn't read superblock of "
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index e539ccd39e1e..e477ee0ff35d 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -2626,8 +2626,7 @@ static int journal_init_dev(struct super_block *super,
 					 MAJOR(jdev), MINOR(jdev), result);
 			return result;
 		} else if (jdev != super->s_dev)
-			set_blocksize(file_bdev(journal->j_bdev_file),
-				      super->s_blocksize);
+			set_blocksize(journal->j_bdev_file, super->s_blocksize);
 
 		return 0;
 	}
@@ -2643,7 +2642,7 @@ static int journal_init_dev(struct super_block *super,
 		return result;
 	}
 
-	set_blocksize(file_bdev(journal->j_bdev_file), super->s_blocksize);
+	set_blocksize(journal->j_bdev_file, super->s_blocksize);
 	reiserfs_info(super,
 		      "journal_init_dev: journal device: %pg\n",
 		      file_bdev(journal->j_bdev_file));
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f0fa02264eda..2dc0eacb0999 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2043,7 +2043,7 @@ xfs_setsize_buftarg(
 	btp->bt_meta_sectorsize = sectorsize;
 	btp->bt_meta_sectormask = sectorsize - 1;
 
-	if (set_blocksize(btp->bt_bdev, sectorsize)) {
+	if (set_blocksize(btp->bt_bdev_file, sectorsize)) {
 		xfs_warn(btp->bt_mount,
 			"Cannot set_blocksize to %u on device %pg",
 			sectorsize, btp->bt_bdev);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 172c91879999..20c749b2ebc2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1474,7 +1474,7 @@ static inline void bio_end_io_acct(struct bio *bio, unsigned long start_time)
 }
 
 int bdev_read_only(struct block_device *bdev);
-int set_blocksize(struct block_device *bdev, int size);
+int set_blocksize(struct file *file, int size);
 
 int lookup_bdev(const char *pathname, dev_t *dev);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-27 21:11 ` [PATCH 4/7] swapon(2): open swap with O_EXCL Al Viro
@ 2024-04-27 21:40   ` Linus Torvalds
  2024-04-27 23:46     ` Al Viro
  2024-04-29  5:09   ` Christoph Hellwig
  2024-04-29  8:39   ` Christian Brauner
  2 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2024-04-27 21:40 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sat, 27 Apr 2024 at 14:11, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> ... eliminating the need to reopen block devices so they could be
> exclusively held.

This looks like a good change, but it raises the question of why we
did it this odd way to begin with?

Is it just because O_EXCL without O_CREAT is kind of odd, and only has
meaning for block devices?

Or is it just that before we used fiel pointers for block devices, the
old model made more sense?

Anyway, I like it, it just makes me go "why didn't we do it that way
originally?"

                Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-27 21:40   ` Linus Torvalds
@ 2024-04-27 23:46     ` Al Viro
  2024-04-28  1:25       ` Al Viro
  2024-04-28 18:19       ` Al Viro
  0 siblings, 2 replies; 51+ messages in thread
From: Al Viro @ 2024-04-27 23:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Christian Brauner, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 02:40:22PM -0700, Linus Torvalds wrote:
> On Sat, 27 Apr 2024 at 14:11, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > ... eliminating the need to reopen block devices so they could be
> > exclusively held.
> 
> This looks like a good change, but it raises the question of why we
> did it this odd way to begin with?
> 
> Is it just because O_EXCL without O_CREAT is kind of odd, and only has
> meaning for block devices?
> 
> Or is it just that before we used fiel pointers for block devices, the
> old model made more sense?
> 
> Anyway, I like it, it just makes me go "why didn't we do it that way
> originally?"

Exclusion for swap partitions:

commit 75e9c9e1bffbe4a1767172855296b94ccba28f71
Author: Alexander Viro <viro@math.psu.edu>
Date:   Mon Mar 4 22:56:47 2002 -0800

    [PATCH] death of is_mounted() and aother fixes


O_EXCL for block devices:

commit c366082d9ed0a0d3c46441d1b3fdf895d8e55ca9
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Aug 20 10:26:57 2003 -0700

    [PATCH] Allow O_EXCL on a block device to claim exclusive use.

IOW, O_EXCL hadn't been available at the time - it had been implemented
on top of bd_claim()/bd_release() introduced in the same earlier commit.

Switching swap exclusion to O_EXCL could've been done back in 2003 or
at any later point; it's just that swapon(2)/swapoff(2) is something that
rarely gets a look...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-27 23:46     ` Al Viro
@ 2024-04-28  1:25       ` Al Viro
  2024-04-28 18:19       ` Al Viro
  1 sibling, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-04-28  1:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Christian Brauner, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sun, Apr 28, 2024 at 12:46:23AM +0100, Al Viro wrote:

> Switching swap exclusion to O_EXCL could've been done back in 2003 or
> at any later point; it's just that swapon(2)/swapoff(2) is something that
> rarely gets a look...

BTW, a fun archaeological question: at which point has this
                /*
                 * Retrying may succeed; for example the folio may finish   
                 * writeback, or buffers may be cleaned.  This should not  
                 * happen very often; maybe we have old buffers attached to
                 * this blockdev's page cache and we're trying to change
                 * the block size?
                 */
                if (!try_to_free_buffers(folio)) {
                        end_block = ~0ULL;
                        goto unlock;
                }

in grow_dev_folio() (grow_dev_page() in earlier kernels) become unreachable?
I _think_ it was
commit fbc139f54fdb7edfec470421c2cc885d3796dfcd
Author: Linus Torvalds <torvalds@athlon.transmeta.com>
Date:   Mon Feb 4 20:19:55 2002 -0800

    v2.4.10.0.2 -> v2.4.10.0.3

      - more buffers-in-pagecache coherency

when set_blocksize() started to do
	sync_buffers(dev, 2);
	...
	invalidate_bdev(bdev, 1);
	truncate_inode_pages(bdev->bd_inode->i_mapping, 0);

at which point the "what if we'd found a page with attached buffers of the
wrong size?" should've become impossible.

Am I misreading that?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-27 23:46     ` Al Viro
  2024-04-28  1:25       ` Al Viro
@ 2024-04-28 18:19       ` Al Viro
  2024-04-28 18:46         ` Linus Torvalds
  1 sibling, 1 reply; 51+ messages in thread
From: Al Viro @ 2024-04-28 18:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Christian Brauner, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sun, Apr 28, 2024 at 12:46:23AM +0100, Al Viro wrote:
> On Sat, Apr 27, 2024 at 02:40:22PM -0700, Linus Torvalds wrote:
> > On Sat, 27 Apr 2024 at 14:11, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > ... eliminating the need to reopen block devices so they could be
> > > exclusively held.
> > 
> > This looks like a good change, but it raises the question of why we
> > did it this odd way to begin with?
> > 
> > Is it just because O_EXCL without O_CREAT is kind of odd, and only has
> > meaning for block devices?
> > 
> > Or is it just that before we used fiel pointers for block devices, the
> > old model made more sense?
> > 
> > Anyway, I like it, it just makes me go "why didn't we do it that way
> > originally?"
> 
> Exclusion for swap partitions:
> 
> commit 75e9c9e1bffbe4a1767172855296b94ccba28f71
> Author: Alexander Viro <viro@math.psu.edu>
> Date:   Mon Mar 4 22:56:47 2002 -0800
> 
>     [PATCH] death of is_mounted() and aother fixes
> 
> 
> O_EXCL for block devices:
> 
> commit c366082d9ed0a0d3c46441d1b3fdf895d8e55ca9
> Author: Andrew Morton <akpm@osdl.org>
> Date:   Wed Aug 20 10:26:57 2003 -0700
> 
>     [PATCH] Allow O_EXCL on a block device to claim exclusive use.
> 
> IOW, O_EXCL hadn't been available at the time - it had been implemented
> on top of bd_claim()/bd_release() introduced in the same earlier commit.
> 
> Switching swap exclusion to O_EXCL could've been done back in 2003 or
> at any later point; it's just that swapon(2)/swapoff(2) is something that
> rarely gets a look...

FWIW, pretty much the same can be done with zram - open with O_EXCL and to
hell with reopening.  Guys, are there any objections to that?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f0639df6cd18..d882a0c7b522 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -426,11 +426,10 @@ static void reset_bdev(struct zram *zram)
 	if (!zram->backing_dev)
 		return;
 
-	fput(zram->bdev_file);
 	/* hope filp_close flush all of IO */
 	filp_close(zram->backing_dev, NULL);
 	zram->backing_dev = NULL;
-	zram->bdev_file = NULL;
+	zram->bdev = NULL;
 	zram->disk->fops = &zram_devops;
 	kvfree(zram->bitmap);
 	zram->bitmap = NULL;
@@ -473,10 +472,8 @@ static ssize_t backing_dev_store(struct device *dev,
 	size_t sz;
 	struct file *backing_dev = NULL;
 	struct inode *inode;
-	struct address_space *mapping;
 	unsigned int bitmap_sz;
 	unsigned long nr_pages, *bitmap = NULL;
-	struct file *bdev_file = NULL;
 	int err;
 	struct zram *zram = dev_to_zram(dev);
 
@@ -497,15 +494,14 @@ static ssize_t backing_dev_store(struct device *dev,
 	if (sz > 0 && file_name[sz - 1] == '\n')
 		file_name[sz - 1] = 0x00;
 
-	backing_dev = filp_open(file_name, O_RDWR|O_LARGEFILE, 0);
+	backing_dev = filp_open(file_name, O_RDWR|O_LARGEFILE|O_EXCL, 0);
 	if (IS_ERR(backing_dev)) {
 		err = PTR_ERR(backing_dev);
 		backing_dev = NULL;
 		goto out;
 	}
 
-	mapping = backing_dev->f_mapping;
-	inode = mapping->host;
+	inode = backing_dev->f_mapping->host;
 
 	/* Support only block device in this moment */
 	if (!S_ISBLK(inode->i_mode)) {
@@ -513,14 +509,6 @@ static ssize_t backing_dev_store(struct device *dev,
 		goto out;
 	}
 
-	bdev_file = bdev_file_open_by_dev(inode->i_rdev,
-				BLK_OPEN_READ | BLK_OPEN_WRITE, zram, NULL);
-	if (IS_ERR(bdev_file)) {
-		err = PTR_ERR(bdev_file);
-		bdev_file = NULL;
-		goto out;
-	}
-
 	nr_pages = i_size_read(inode) >> PAGE_SHIFT;
 	bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
 	bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
@@ -531,7 +519,7 @@ static ssize_t backing_dev_store(struct device *dev,
 
 	reset_bdev(zram);
 
-	zram->bdev_file = bdev_file;
+	zram->bdev = I_BDEV(inode);
 	zram->backing_dev = backing_dev;
 	zram->bitmap = bitmap;
 	zram->nr_pages = nr_pages;
@@ -544,9 +532,6 @@ static ssize_t backing_dev_store(struct device *dev,
 out:
 	kvfree(bitmap);
 
-	if (bdev_file)
-		fput(bdev_file);
-
 	if (backing_dev)
 		filp_close(backing_dev, NULL);
 
@@ -587,7 +572,7 @@ static void read_from_bdev_async(struct zram *zram, struct page *page,
 {
 	struct bio *bio;
 
-	bio = bio_alloc(file_bdev(zram->bdev_file), 1, parent->bi_opf, GFP_NOIO);
+	bio = bio_alloc(zram->bdev, 1, parent->bi_opf, GFP_NOIO);
 	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
 	__bio_add_page(bio, page, PAGE_SIZE, 0);
 	bio_chain(bio, parent);
@@ -703,7 +688,7 @@ static ssize_t writeback_store(struct device *dev,
 			continue;
 		}
 
-		bio_init(&bio, file_bdev(zram->bdev_file), &bio_vec, 1,
+		bio_init(&bio, zram->bdev, &bio_vec, 1,
 			 REQ_OP_WRITE | REQ_SYNC);
 		bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
 		__bio_add_page(&bio, page, PAGE_SIZE, 0);
@@ -785,7 +770,7 @@ static void zram_sync_read(struct work_struct *work)
 	struct bio_vec bv;
 	struct bio bio;
 
-	bio_init(&bio, file_bdev(zw->zram->bdev_file), &bv, 1, REQ_OP_READ);
+	bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ);
 	bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
 	__bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
 	zw->error = submit_bio_wait(&bio);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 37bf29f34d26..35e322144629 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -132,7 +132,7 @@ struct zram {
 	spinlock_t wb_limit_lock;
 	bool wb_limit_enable;
 	u64 bd_wb_limit;
-	struct file *bdev_file;
+	struct block_device *bdev;
 	unsigned long *bitmap;
 	unsigned long nr_pages;
 #endif

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-28 18:19       ` Al Viro
@ 2024-04-28 18:46         ` Linus Torvalds
  2024-04-28 19:07           ` Al Viro
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2024-04-28 18:46 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sun, 28 Apr 2024 at 11:19, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> FWIW, pretty much the same can be done with zram - open with O_EXCL and to
> hell with reopening.  Guys, are there any objections to that?

Please do. The fewer of these strange "re-open block device" things we
have, the better.

I particularly dislike our "holder" logic, and this re-opening is one
source of nasty confusion, and if we could replace them all with just
the "O_EXCL uses the file itself as the holder", that would be
absolutely _lovely_.

                Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-28 18:46         ` Linus Torvalds
@ 2024-04-28 19:07           ` Al Viro
  2024-04-29  5:10             ` Christoph Hellwig
  0 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2024-04-28 19:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Christian Brauner, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sun, Apr 28, 2024 at 11:46:05AM -0700, Linus Torvalds wrote:
> On Sun, 28 Apr 2024 at 11:19, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > FWIW, pretty much the same can be done with zram - open with O_EXCL and to
> > hell with reopening.  Guys, are there any objections to that?
> 
> Please do. The fewer of these strange "re-open block device" things we
> have, the better.
> 
> I particularly dislike our "holder" logic, and this re-opening is one
> source of nasty confusion, and if we could replace them all with just
> the "O_EXCL uses the file itself as the holder", that would be
> absolutely _lovely_.

The tricky part is blk_holder_ops, and I'm no fonder of it than you are.

Christoph, do you have any plans along those lines for swap devices?
If they are not going to grow holder_ops, I'd say we should switch
to O_EXCL open and be done with that; zram is in the same situation,
AFAICS.

Might be worth a topic at LSF, actually...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/7] bcache_register(): don't bother with set_blocksize()
  2024-04-27 21:10 ` [PATCH 1/7] bcache_register(): don't bother with set_blocksize() Al Viro
@ 2024-04-29  5:06   ` Christoph Hellwig
  2024-04-29  8:37   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2024-04-29  5:06 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/7] pktcdvd: sort set_blocksize() calls out
  2024-04-27 21:10 ` [PATCH 2/7] pktcdvd: sort set_blocksize() calls out Al Viro
@ 2024-04-29  5:07   ` Christoph Hellwig
  2024-04-29  8:38   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2024-04-29  5:07 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 10:10:32PM +0100, Al Viro wrote:
> 1) it doesn't make any sense to have ->open() call set_blocksize() on the
> device being opened - the caller will override that anyway.
> 
> 2) setting block size on underlying device, OTOH, ought to be done when
> we are opening it exclusive - i.e. as part of pkt_open_dev().  Having
> it done at setup time doesn't guarantee us anything about the state
> at the time we start talking to it.  Worse, if you happen to have
> the underlying device containing e.g. ext2 with 4Kb blocks that
> is currently mounted r/o, that set_blocksize() will confuse the hell
> out of filesystem.

I brought some of this before and didn't dare to touch it because I
have no way of testing this code.  The changes looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

But I really wish we could find a dedicated tester for pktcdvd or
just drop this code..

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] swapon(2)/swapoff(2): don't bother with block size
  2024-04-27 21:10 ` [PATCH 3/7] swapon(2)/swapoff(2): don't bother with block size Al Viro
@ 2024-04-29  5:08   ` Christoph Hellwig
  2024-04-29  8:38   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2024-04-29  5:08 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-27 21:11 ` [PATCH 4/7] swapon(2): open swap with O_EXCL Al Viro
  2024-04-27 21:40   ` Linus Torvalds
@ 2024-04-29  5:09   ` Christoph Hellwig
  2024-04-29  8:39   ` Christian Brauner
  2 siblings, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2024-04-29  5:09 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

> -	swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
> +	swap_file = file_open_name(name, O_RDWR|O_LARGEFILE|O_EXCL, 0);

Can you add the proper whitespaces here while you touch it?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-28 19:07           ` Al Viro
@ 2024-04-29  5:10             ` Christoph Hellwig
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2024-04-29  5:10 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, linux-fsdevel, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

On Sun, Apr 28, 2024 at 08:07:09PM +0100, Al Viro wrote:
> Christoph, do you have any plans along those lines for swap devices?
> If they are not going to grow holder_ops, I'd say we should switch
> to O_EXCL open and be done with that; zram is in the same situation,
> AFAICS.

holder_ops right now are used for fs freezing, fs syncing and dead
device notification.  None of this is useful for swap, as is the
resize notification I plan to add eventually.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] swsusp: don't bother with setting block size
  2024-04-27 21:11 ` [PATCH 5/7] swsusp: don't bother with setting block size Al Viro
@ 2024-04-29  5:10   ` Christoph Hellwig
  2024-04-29  8:40   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2024-04-29  5:10 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize()
  2024-04-27 21:12 ` [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize() Al Viro
@ 2024-04-29  5:11   ` Christoph Hellwig
  2024-04-29  8:40   ` Christian Brauner
  2024-04-29 15:11   ` David Sterba
  2 siblings, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2024-04-29  5:11 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] set_blocksize(): switch to passing struct file *, fail if it's not opened exclusive
  2024-04-27 21:13 ` [PATCH 7/7] set_blocksize(): switch to passing struct file *, fail if it's not opened exclusive Al Viro
@ 2024-04-29  5:12   ` Christoph Hellwig
  2024-04-29  8:42   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2024-04-29  5:12 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

Please turn the second half of your subject and move it to the actual
commit message..


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/7] bcache_register(): don't bother with set_blocksize()
  2024-04-27 21:10 ` [PATCH 1/7] bcache_register(): don't bother with set_blocksize() Al Viro
  2024-04-29  5:06   ` Christoph Hellwig
@ 2024-04-29  8:37   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christian Brauner @ 2024-04-29  8:37 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 10:10:07PM +0100, Al Viro wrote:
> We are not using __bread() anymore and read_cache_page_gfp() doesn't
> care about block size.  Moreover, we should *not* change block
> size on a device that is currently held exclusive - filesystems
> that use buffer cache expect the block numbers to be interpreted
> in units set by filesystem.
> 
> ACKed-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/7] pktcdvd: sort set_blocksize() calls out
  2024-04-27 21:10 ` [PATCH 2/7] pktcdvd: sort set_blocksize() calls out Al Viro
  2024-04-29  5:07   ` Christoph Hellwig
@ 2024-04-29  8:38   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christian Brauner @ 2024-04-29  8:38 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 10:10:32PM +0100, Al Viro wrote:
> 1) it doesn't make any sense to have ->open() call set_blocksize() on the
> device being opened - the caller will override that anyway.
> 
> 2) setting block size on underlying device, OTOH, ought to be done when
> we are opening it exclusive - i.e. as part of pkt_open_dev().  Having
> it done at setup time doesn't guarantee us anything about the state
> at the time we start talking to it.  Worse, if you happen to have
> the underlying device containing e.g. ext2 with 4Kb blocks that
> is currently mounted r/o, that set_blocksize() will confuse the hell
> out of filesystem.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] swapon(2)/swapoff(2): don't bother with block size
  2024-04-27 21:10 ` [PATCH 3/7] swapon(2)/swapoff(2): don't bother with block size Al Viro
  2024-04-29  5:08   ` Christoph Hellwig
@ 2024-04-29  8:38   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christian Brauner @ 2024-04-29  8:38 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 10:10:59PM +0100, Al Viro wrote:
> once upon a time that used to matter; these days we do swap IO for
> swap devices at the level that doesn't give a damn about block size,
> buffer_head or anything of that sort - just attach the page to
> bio, set the location and size (the latter to PAGE_SIZE) and feed
> into queue.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] swapon(2): open swap with O_EXCL
  2024-04-27 21:11 ` [PATCH 4/7] swapon(2): open swap with O_EXCL Al Viro
  2024-04-27 21:40   ` Linus Torvalds
  2024-04-29  5:09   ` Christoph Hellwig
@ 2024-04-29  8:39   ` Christian Brauner
  2 siblings, 0 replies; 51+ messages in thread
From: Christian Brauner @ 2024-04-29  8:39 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 10:11:28PM +0100, Al Viro wrote:
> ... eliminating the need to reopen block devices so they could be
> exclusively held.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] swsusp: don't bother with setting block size
  2024-04-27 21:11 ` [PATCH 5/7] swsusp: don't bother with setting block size Al Viro
  2024-04-29  5:10   ` Christoph Hellwig
@ 2024-04-29  8:40   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christian Brauner @ 2024-04-29  8:40 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 10:11:56PM +0100, Al Viro wrote:
> same as with the swap...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize()
  2024-04-27 21:12 ` [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize() Al Viro
  2024-04-29  5:11   ` Christoph Hellwig
@ 2024-04-29  8:40   ` Christian Brauner
  2024-04-29 15:11   ` David Sterba
  2 siblings, 0 replies; 51+ messages in thread
From: Christian Brauner @ 2024-04-29  8:40 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 10:12:30PM +0100, Al Viro wrote:
> We don't have bdev opened exclusive there.  And I'm rather dubious
> about the need to do set_blocksize() anywhere in btrfs, to be
> honest - there's some access to page cache of underlying block
> devices in there, but it's nowhere near the hot paths, AFAICT.
> 
> In any case, btrfs_get_dev_args_from_path() only needs to read
> the on-disk superblock and copy several fields out of it; all
> callers are only interested in devices that are already opened
> and brought into per-filesystem set, so setting the block size
> is redundant for those and actively harmful if we are given
> a pathname of unrelated device.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] set_blocksize(): switch to passing struct file *, fail if it's not opened exclusive
  2024-04-27 21:13 ` [PATCH 7/7] set_blocksize(): switch to passing struct file *, fail if it's not opened exclusive Al Viro
  2024-04-29  5:12   ` Christoph Hellwig
@ 2024-04-29  8:42   ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christian Brauner @ 2024-04-29  8:42 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christoph Hellwig, linux-block,
	Jens Axboe, linux-btrfs, Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 10:13:05PM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize()
  2024-04-27 21:12 ` [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize() Al Viro
  2024-04-29  5:11   ` Christoph Hellwig
  2024-04-29  8:40   ` Christian Brauner
@ 2024-04-29 15:11   ` David Sterba
  2024-04-30  2:05     ` Al Viro
  2 siblings, 1 reply; 51+ messages in thread
From: David Sterba @ 2024-04-29 15:11 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

On Sat, Apr 27, 2024 at 10:12:30PM +0100, Al Viro wrote:
> We don't have bdev opened exclusive there.  And I'm rather dubious
> about the need to do set_blocksize() anywhere in btrfs, to be
> honest - there's some access to page cache of underlying block
> devices in there, but it's nowhere near the hot paths, AFAICT.

Long time ago we fixed a bug that involved set_blocksize(), 6f60cbd3ae44
("btrfs: access superblock via pagecache in scan_one_device").
Concurrent access from scan, mount and mkfs could interact and some
writes would be dropped, but the argument was rather not to use
set_blocksize.

I do not recall all the details but I think that the problem was when it
was called in the middle of the other operation in progress. The only
reason it's ever called is for the super block read and to call it
explicitly from our code rather than rely on some eventual call from
block layer.  But it's more than 10 years ago and things have changed,
we don't use buffer_head for superblock anymore.

> In any case, btrfs_get_dev_args_from_path() only needs to read
> the on-disk superblock and copy several fields out of it; all
> callers are only interested in devices that are already opened
> and brought into per-filesystem set, so setting the block size
> is redundant for those and actively harmful if we are given
> a pathname of unrelated device.

Calling set_blocksize on already opened devices will avoid the
scan/mount/mkfs interactions so this seems safe.

> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/btrfs/volumes.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f15591f3e54f..43af5a9fb547 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -482,10 +482,12 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
>  
>  	if (flush)
>  		sync_blockdev(bdev);
> -	ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
> -	if (ret) {
> -		fput(*bdev_file);
> -		goto error;
> +	if (holder) {
> +		ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);

The subject mentions a different function, you're removing it from
btrfs_get_bdev_and_sb() not btrfs_get_dev_args_from_path().

> +		if (ret) {
> +			fput(*bdev_file);
> +			goto error;
> +		}
>  	}
>  	invalidate_bdev(bdev);
>  	*disk_super = btrfs_read_dev_super(bdev);
> @@ -498,6 +500,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
>  	return 0;
>  
>  error:
> +	*disk_super = NULL;
>  	*bdev_file = NULL;
>  	return ret;
>  }
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize()
  2024-04-29 15:11   ` David Sterba
@ 2024-04-30  2:05     ` Al Viro
  0 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-04-30  2:05 UTC (permalink / raw)
  To: David Sterba
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

On Mon, Apr 29, 2024 at 05:11:25PM +0200, David Sterba wrote:
> > -	if (ret) {
> > -		fput(*bdev_file);
> > -		goto error;
> > +	if (holder) {
> > +		ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
> 
> The subject mentions a different function, you're removing it from
> btrfs_get_bdev_and_sb() not btrfs_get_dev_args_from_path().

... conditional upon holder being NULL, which happens only when
called by btrfs_get_dev_args_from_path().

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCHES v2][RFC] set_blocksize() rework
  2024-04-27 21:09 [PATCHES][RFC] set_blocksize() rework Al Viro
                   ` (6 preceding siblings ...)
  2024-04-27 21:13 ` [PATCH 7/7] set_blocksize(): switch to passing struct file *, fail if it's not opened exclusive Al Viro
@ 2024-05-03  3:18 ` Al Viro
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
  7 siblings, 2 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:18 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

On Sat, Apr 27, 2024 at 10:09:20PM +0100, Al Viro wrote:
> 	Any buffer-cache based filesystem is going to be FUBAR
> if somebody manages to change block size of device under it,
> since primitives (sb_bread(), sb_getblk(), etc.) operate in
> terms of block numbers.  If block size suddenly doubles, so
> will the offsets from the beginning of device.  Results are
> not pretty, obviously.
> 
> 	The thing that (mostly) prevents that kind of mess
> is that most of the mechanisms that lead to block size
> change require the device being opened exclusive.  However,
> there are several exceptions that allow to do that without
> an exclusive open.  Fortunately, all of them require
> CAP_SYS_ADMIN, so it's not a security problem - anyone
> who already has that level of access can screw the system
> into the ground in any number of ways.  However, security
> problems or not, that crap should be fixed.
> 
> 	The series below eliminates these calls of set_blocksize()
> and changes calling conventsion of set_blocksize() so that it
> uses struct file * instead of struct block_device * to tell
> which device to act upon.  Unlike struct block_device, struct
> file has enough information to tell an exclusive open from
> non-exclusive one, so we can reject the operation in non-exclusive
> case.
> 
> 	The branch is available at
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.set_blocksize
> Individual patches in followups.
> 
> Review (and testing, obviously) would be very welcome.

Branch updated and force-pushed (same place).  Individual patches in
followups.

Changes:
* zram elimination of double-open added.
* hopefully better description of btrfs side of things.
* final commit split into switch of set_blocksize() to struct file
  and adding a check for exclusive open.
* chunk in Documentation/filesystems/porting.rst added.

Shortlog:
Al Viro (9):
      bcache_register(): don't bother with set_blocksize()
      pktcdvd: sort set_blocksize() calls out
      swapon(2)/swapoff(2): don't bother with block size
      swapon(2): open swap with O_EXCL
      zram: don't bother with reopening - just use O_EXCL for open
      swsusp: don't bother with setting block size
      btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens
      set_blocksize(): switch to passing struct file *
      make set_blocksize() fail unless block device is opened exclusive

Diffstat:
 Documentation/filesystems/porting.rst |  7 +++++++
 block/bdev.c                          | 14 ++++++++++----
 block/ioctl.c                         | 21 ++++++++++++---------
 drivers/block/pktcdvd.c               |  7 +------
 drivers/block/zram/zram_drv.c         | 29 +++++++----------------------
 drivers/block/zram/zram_drv.h         |  2 +-
 drivers/md/bcache/super.c             |  4 ----
 fs/btrfs/dev-replace.c                |  2 +-
 fs/btrfs/volumes.c                    | 13 ++++++++-----
 fs/ext4/super.c                       |  2 +-
 fs/reiserfs/journal.c                 |  5 ++---
 fs/xfs/xfs_buf.c                      |  2 +-
 include/linux/blkdev.h                |  2 +-
 include/linux/swap.h                  |  2 --
 kernel/power/swap.c                   |  7 +------
 mm/swapfile.c                         | 29 ++---------------------------
 16 files changed, 55 insertions(+), 93 deletions(-)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize()
  2024-05-03  3:18 ` [PATCHES v2][RFC] set_blocksize() rework Al Viro
@ 2024-05-03  3:23   ` Al Viro
  2024-05-03  3:23     ` [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out Al Viro
                       ` (7 more replies)
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
  1 sibling, 8 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

We are not using __bread() anymore and read_cache_page_gfp() doesn't
care about block size.  Moreover, we should *not* change block
size on a device that is currently held exclusive - filesystems
that use buffer cache expect the block numbers to be interpreted
in units set by filesystem.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
ACKed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/md/bcache/super.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 330bcd9ea4a9..0ee5e17ae2dd 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2554,10 +2554,6 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 	if (IS_ERR(bdev_file))
 		goto out_free_sb;
 
-	err = "failed to set blocksize";
-	if (set_blocksize(file_bdev(bdev_file), 4096))
-		goto out_blkdev_put;
-
 	err = read_super(sb, file_bdev(bdev_file), &sb_disk);
 	if (err)
 		goto out_blkdev_put;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
@ 2024-05-03  3:23     ` Al Viro
  2024-05-03  3:23     ` [PATCH v2 3/9] swapon(2)/swapoff(2): don't bother with block size Al Viro
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

1) it doesn't make any sense to have ->open() call set_blocksize() on the
device being opened - the caller will override that anyway.

2) setting block size on underlying device, OTOH, ought to be done when
we are opening it exclusive - i.e. as part of pkt_open_dev().  Having
it done at setup time doesn't guarantee us anything about the state
at the time we start talking to it.  Worse, if you happen to have
the underlying device containing e.g. ext2 with 4Kb blocks that
is currently mounted r/o, that set_blocksize() will confuse the hell
out of filesystem.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/block/pktcdvd.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 21728e9ea5c3..05933f25b397 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2215,6 +2215,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
 		}
 		dev_info(ddev, "%lukB available on disc\n", lba << 1);
 	}
+	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
 
 	return 0;
 
@@ -2278,11 +2279,6 @@ static int pkt_open(struct gendisk *disk, blk_mode_t mode)
 		ret = pkt_open_dev(pd, mode & BLK_OPEN_WRITE);
 		if (ret)
 			goto out_dec;
-		/*
-		 * needed here as well, since ext2 (among others) may change
-		 * the blocksize at mount time
-		 */
-		set_blocksize(disk->part0, CD_FRAMESIZE);
 	}
 	mutex_unlock(&ctl_mutex);
 	mutex_unlock(&pktcdvd_mutex);
@@ -2526,7 +2522,6 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
 	__module_get(THIS_MODULE);
 
 	pd->bdev_file = bdev_file;
-	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
 
 	atomic_set(&pd->cdrw.pending_bios, 0);
 	pd->cdrw.thread = kthread_run(kcdrwd, pd, "%s", pd->disk->disk_name);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 3/9] swapon(2)/swapoff(2): don't bother with block size
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
  2024-05-03  3:23     ` [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out Al Viro
@ 2024-05-03  3:23     ` Al Viro
  2024-05-03  3:23     ` [PATCH v2 4/9] swapon(2): open swap with O_EXCL Al Viro
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

once upon a time that used to matter; these days we do swap IO for
swap devices at the level that doesn't give a damn about block size,
buffer_head or anything of that sort - just attach the page to
bio, set the location and size (the latter to PAGE_SIZE) and feed
into queue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/swap.h |  1 -
 mm/swapfile.c        | 12 +-----------
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index f53d608daa01..a5b640cca459 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -301,7 +301,6 @@ struct swap_info_struct {
 	struct file *bdev_file;		/* open handle of the bdev */
 	struct block_device *bdev;	/* swap device or bdev of swap file */
 	struct file *swap_file;		/* seldom referenced */
-	unsigned int old_block_size;	/* seldom referenced */
 	struct completion comp;		/* seldom referenced */
 	spinlock_t lock;		/*
 					 * protect map scan related fields like
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4919423cce76..304f74d039f3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2417,7 +2417,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	struct inode *inode;
 	struct filename *pathname;
 	int err, found = 0;
-	unsigned int old_block_size;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -2529,7 +2528,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	}
 
 	swap_file = p->swap_file;
-	old_block_size = p->old_block_size;
 	p->swap_file = NULL;
 	p->max = 0;
 	swap_map = p->swap_map;
@@ -2553,7 +2551,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 
 	inode = mapping->host;
 	if (p->bdev_file) {
-		set_blocksize(p->bdev, old_block_size);
 		fput(p->bdev_file);
 		p->bdev_file = NULL;
 	}
@@ -2782,21 +2779,15 @@ static struct swap_info_struct *alloc_swap_info(void)
 
 static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 {
-	int error;
-
 	if (S_ISBLK(inode->i_mode)) {
 		p->bdev_file = bdev_file_open_by_dev(inode->i_rdev,
 				BLK_OPEN_READ | BLK_OPEN_WRITE, p, NULL);
 		if (IS_ERR(p->bdev_file)) {
-			error = PTR_ERR(p->bdev_file);
+			int error = PTR_ERR(p->bdev_file);
 			p->bdev_file = NULL;
 			return error;
 		}
 		p->bdev = file_bdev(p->bdev_file);
-		p->old_block_size = block_size(p->bdev);
-		error = set_blocksize(p->bdev, PAGE_SIZE);
-		if (error < 0)
-			return error;
 		/*
 		 * Zoned block devices contain zones that have a sequential
 		 * write only restriction.  Hence zoned block devices are not
@@ -3235,7 +3226,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	free_percpu(p->cluster_next_cpu);
 	p->cluster_next_cpu = NULL;
 	if (p->bdev_file) {
-		set_blocksize(p->bdev, p->old_block_size);
 		fput(p->bdev_file);
 		p->bdev_file = NULL;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 4/9] swapon(2): open swap with O_EXCL
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
  2024-05-03  3:23     ` [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out Al Viro
  2024-05-03  3:23     ` [PATCH v2 3/9] swapon(2)/swapoff(2): don't bother with block size Al Viro
@ 2024-05-03  3:23     ` Al Viro
  2024-05-03  3:23     ` [PATCH v2 5/9] zram: don't bother with reopening - just use O_EXCL for open Al Viro
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

... eliminating the need to reopen block devices so they could be
exclusively held.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/swap.h |  1 -
 mm/swapfile.c        | 19 ++-----------------
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index a5b640cca459..7e61a4aef2fc 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -298,7 +298,6 @@ struct swap_info_struct {
 	unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */
 	struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */
 	struct rb_root swap_extent_root;/* root of the swap extent rbtree */
-	struct file *bdev_file;		/* open handle of the bdev */
 	struct block_device *bdev;	/* swap device or bdev of swap file */
 	struct file *swap_file;		/* seldom referenced */
 	struct completion comp;		/* seldom referenced */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 304f74d039f3..7bba0b0d4924 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2550,10 +2550,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	exit_swap_address_space(p->type);
 
 	inode = mapping->host;
-	if (p->bdev_file) {
-		fput(p->bdev_file);
-		p->bdev_file = NULL;
-	}
 
 	inode_lock(inode);
 	inode->i_flags &= ~S_SWAPFILE;
@@ -2780,14 +2776,7 @@ static struct swap_info_struct *alloc_swap_info(void)
 static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 {
 	if (S_ISBLK(inode->i_mode)) {
-		p->bdev_file = bdev_file_open_by_dev(inode->i_rdev,
-				BLK_OPEN_READ | BLK_OPEN_WRITE, p, NULL);
-		if (IS_ERR(p->bdev_file)) {
-			int error = PTR_ERR(p->bdev_file);
-			p->bdev_file = NULL;
-			return error;
-		}
-		p->bdev = file_bdev(p->bdev_file);
+		p->bdev = I_BDEV(inode);
 		/*
 		 * Zoned block devices contain zones that have a sequential
 		 * write only restriction.  Hence zoned block devices are not
@@ -3028,7 +3017,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		name = NULL;
 		goto bad_swap;
 	}
-	swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+	swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_EXCL, 0);
 	if (IS_ERR(swap_file)) {
 		error = PTR_ERR(swap_file);
 		swap_file = NULL;
@@ -3225,10 +3214,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	p->percpu_cluster = NULL;
 	free_percpu(p->cluster_next_cpu);
 	p->cluster_next_cpu = NULL;
-	if (p->bdev_file) {
-		fput(p->bdev_file);
-		p->bdev_file = NULL;
-	}
 	inode = NULL;
 	destroy_swap_extents(p);
 	swap_cgroup_swapoff(p->type);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 5/9] zram: don't bother with reopening - just use O_EXCL for open
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (2 preceding siblings ...)
  2024-05-03  3:23     ` [PATCH v2 4/9] swapon(2): open swap with O_EXCL Al Viro
@ 2024-05-03  3:23     ` Al Viro
  2024-05-03  3:23     ` [PATCH v2 6/9] swsusp: don't bother with setting block size Al Viro
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/block/zram/zram_drv.c | 29 +++++++----------------------
 drivers/block/zram/zram_drv.h |  2 +-
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f0639df6cd18..58c3466dcb17 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -426,11 +426,10 @@ static void reset_bdev(struct zram *zram)
 	if (!zram->backing_dev)
 		return;
 
-	fput(zram->bdev_file);
 	/* hope filp_close flush all of IO */
 	filp_close(zram->backing_dev, NULL);
 	zram->backing_dev = NULL;
-	zram->bdev_file = NULL;
+	zram->bdev = NULL;
 	zram->disk->fops = &zram_devops;
 	kvfree(zram->bitmap);
 	zram->bitmap = NULL;
@@ -473,10 +472,8 @@ static ssize_t backing_dev_store(struct device *dev,
 	size_t sz;
 	struct file *backing_dev = NULL;
 	struct inode *inode;
-	struct address_space *mapping;
 	unsigned int bitmap_sz;
 	unsigned long nr_pages, *bitmap = NULL;
-	struct file *bdev_file = NULL;
 	int err;
 	struct zram *zram = dev_to_zram(dev);
 
@@ -497,15 +494,14 @@ static ssize_t backing_dev_store(struct device *dev,
 	if (sz > 0 && file_name[sz - 1] == '\n')
 		file_name[sz - 1] = 0x00;
 
-	backing_dev = filp_open(file_name, O_RDWR|O_LARGEFILE, 0);
+	backing_dev = filp_open(file_name, O_RDWR | O_LARGEFILE | O_EXCL, 0);
 	if (IS_ERR(backing_dev)) {
 		err = PTR_ERR(backing_dev);
 		backing_dev = NULL;
 		goto out;
 	}
 
-	mapping = backing_dev->f_mapping;
-	inode = mapping->host;
+	inode = backing_dev->f_mapping->host;
 
 	/* Support only block device in this moment */
 	if (!S_ISBLK(inode->i_mode)) {
@@ -513,14 +509,6 @@ static ssize_t backing_dev_store(struct device *dev,
 		goto out;
 	}
 
-	bdev_file = bdev_file_open_by_dev(inode->i_rdev,
-				BLK_OPEN_READ | BLK_OPEN_WRITE, zram, NULL);
-	if (IS_ERR(bdev_file)) {
-		err = PTR_ERR(bdev_file);
-		bdev_file = NULL;
-		goto out;
-	}
-
 	nr_pages = i_size_read(inode) >> PAGE_SHIFT;
 	bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
 	bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
@@ -531,7 +519,7 @@ static ssize_t backing_dev_store(struct device *dev,
 
 	reset_bdev(zram);
 
-	zram->bdev_file = bdev_file;
+	zram->bdev = I_BDEV(inode);
 	zram->backing_dev = backing_dev;
 	zram->bitmap = bitmap;
 	zram->nr_pages = nr_pages;
@@ -544,9 +532,6 @@ static ssize_t backing_dev_store(struct device *dev,
 out:
 	kvfree(bitmap);
 
-	if (bdev_file)
-		fput(bdev_file);
-
 	if (backing_dev)
 		filp_close(backing_dev, NULL);
 
@@ -587,7 +572,7 @@ static void read_from_bdev_async(struct zram *zram, struct page *page,
 {
 	struct bio *bio;
 
-	bio = bio_alloc(file_bdev(zram->bdev_file), 1, parent->bi_opf, GFP_NOIO);
+	bio = bio_alloc(zram->bdev, 1, parent->bi_opf, GFP_NOIO);
 	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
 	__bio_add_page(bio, page, PAGE_SIZE, 0);
 	bio_chain(bio, parent);
@@ -703,7 +688,7 @@ static ssize_t writeback_store(struct device *dev,
 			continue;
 		}
 
-		bio_init(&bio, file_bdev(zram->bdev_file), &bio_vec, 1,
+		bio_init(&bio, zram->bdev, &bio_vec, 1,
 			 REQ_OP_WRITE | REQ_SYNC);
 		bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
 		__bio_add_page(&bio, page, PAGE_SIZE, 0);
@@ -785,7 +770,7 @@ static void zram_sync_read(struct work_struct *work)
 	struct bio_vec bv;
 	struct bio bio;
 
-	bio_init(&bio, file_bdev(zw->zram->bdev_file), &bv, 1, REQ_OP_READ);
+	bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ);
 	bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
 	__bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
 	zw->error = submit_bio_wait(&bio);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 37bf29f34d26..35e322144629 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -132,7 +132,7 @@ struct zram {
 	spinlock_t wb_limit_lock;
 	bool wb_limit_enable;
 	u64 bd_wb_limit;
-	struct file *bdev_file;
+	struct block_device *bdev;
 	unsigned long *bitmap;
 	unsigned long nr_pages;
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 6/9] swsusp: don't bother with setting block size
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (3 preceding siblings ...)
  2024-05-03  3:23     ` [PATCH v2 5/9] zram: don't bother with reopening - just use O_EXCL for open Al Viro
@ 2024-05-03  3:23     ` Al Viro
  2024-05-03  3:23     ` [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens Al Viro
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

same as with the swap...

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/power/swap.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 5bc04bfe2db1..d9abb7ab031d 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -368,11 +368,7 @@ static int swsusp_swap_check(void)
 	if (IS_ERR(hib_resume_bdev_file))
 		return PTR_ERR(hib_resume_bdev_file);
 
-	res = set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
-	if (res < 0)
-		fput(hib_resume_bdev_file);
-
-	return res;
+	return 0;
 }
 
 /**
@@ -1574,7 +1570,6 @@ int swsusp_check(bool exclusive)
 	hib_resume_bdev_file = bdev_file_open_by_dev(swsusp_resume_device,
 				BLK_OPEN_READ, holder, NULL);
 	if (!IS_ERR(hib_resume_bdev_file)) {
-		set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
 		clear_page(swsusp_header);
 		error = hib_submit_io(REQ_OP_READ, swsusp_resume_block,
 					swsusp_header, NULL);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (4 preceding siblings ...)
  2024-05-03  3:23     ` [PATCH v2 6/9] swsusp: don't bother with setting block size Al Viro
@ 2024-05-03  3:23     ` Al Viro
  2024-05-10 13:41       ` David Sterba
  2024-05-03  3:23     ` [PATCH v2 8/9] set_blocksize(): switch to passing struct file * Al Viro
  2024-05-03  3:23     ` [PATCH v2 9/9] make set_blocksize() fail unless block device is opened exclusive Al Viro
  7 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

btrfs_get_bdev_and_sb() has two callers - btrfs_open_one_device(),
which asks for open to be exclusive and btrfs_get_dev_args_from_path(),
which doesn't.  Currently it does set_blocksize() in all cases.

I'm rather dubious about the need to do set_blocksize() anywhere in btrfs,
to be honest - there's some access to page cache of underlying block
devices in there, but it's nowhere near the hot paths, AFAICT.

In any case, btrfs_get_dev_args_from_path() only needs to read
the on-disk superblock and copy several fields out of it; all
callers are only interested in devices that are already opened
and brought into per-filesystem set, so setting the block size
is redundant for those and actively harmful if we are given
a pathname of unrelated device.

So we only need btrfs_get_bdev_and_sb() to call set_blocksize()
when it's asked to open exclusive.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/btrfs/volumes.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f15591f3e54f..43af5a9fb547 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -482,10 +482,12 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 
 	if (flush)
 		sync_blockdev(bdev);
-	ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
-	if (ret) {
-		fput(*bdev_file);
-		goto error;
+	if (holder) {
+		ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
+		if (ret) {
+			fput(*bdev_file);
+			goto error;
+		}
 	}
 	invalidate_bdev(bdev);
 	*disk_super = btrfs_read_dev_super(bdev);
@@ -498,6 +500,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 	return 0;
 
 error:
+	*disk_super = NULL;
 	*bdev_file = NULL;
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 8/9] set_blocksize(): switch to passing struct file *
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (5 preceding siblings ...)
  2024-05-03  3:23     ` [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens Al Viro
@ 2024-05-03  3:23     ` Al Viro
  2024-05-03  3:23     ` [PATCH v2 9/9] make set_blocksize() fail unless block device is opened exclusive Al Viro
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/bdev.c            | 11 +++++++----
 block/ioctl.c           | 21 ++++++++++++---------
 drivers/block/pktcdvd.c |  2 +-
 fs/btrfs/dev-replace.c  |  2 +-
 fs/btrfs/volumes.c      |  4 ++--
 fs/ext4/super.c         |  2 +-
 fs/reiserfs/journal.c   |  5 ++---
 fs/xfs/xfs_buf.c        |  2 +-
 include/linux/blkdev.h  |  2 +-
 9 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index b8e32d933a63..a329ff9be11d 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -144,8 +144,11 @@ static void set_init_blocksize(struct block_device *bdev)
 	bdev->bd_inode->i_blkbits = blksize_bits(bsize);
 }
 
-int set_blocksize(struct block_device *bdev, int size)
+int set_blocksize(struct file *file, int size)
 {
+	struct inode *inode = file->f_mapping->host;
+	struct block_device *bdev = I_BDEV(inode);
+
 	/* Size must be a power of two, and between 512 and PAGE_SIZE */
 	if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
 		return -EINVAL;
@@ -155,9 +158,9 @@ int set_blocksize(struct block_device *bdev, int size)
 		return -EINVAL;
 
 	/* Don't change the size if it is same as current */
-	if (bdev->bd_inode->i_blkbits != blksize_bits(size)) {
+	if (inode->i_blkbits != blksize_bits(size)) {
 		sync_blockdev(bdev);
-		bdev->bd_inode->i_blkbits = blksize_bits(size);
+		inode->i_blkbits = blksize_bits(size);
 		kill_bdev(bdev);
 	}
 	return 0;
@@ -167,7 +170,7 @@ EXPORT_SYMBOL(set_blocksize);
 
 int sb_set_blocksize(struct super_block *sb, int size)
 {
-	if (set_blocksize(sb->s_bdev, size))
+	if (set_blocksize(sb->s_bdev_file, size))
 		return 0;
 	/* If we get here, we know size is power of two
 	 * and it's value is between 512 and PAGE_SIZE */
diff --git a/block/ioctl.c b/block/ioctl.c
index a9028a2c2db5..1c800364bc70 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -473,11 +473,14 @@ static int compat_hdio_getgeo(struct block_device *bdev,
 #endif
 
 /* set the logical block size */
-static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
+static int blkdev_bszset(struct file *file, blk_mode_t mode,
 		int __user *argp)
 {
+	// this one might be file_inode(file)->i_rdev - a rare valid
+	// use of file_inode() for those.
+	dev_t dev = I_BDEV(file->f_mapping->host)->bd_dev;
+	struct file *excl_file;
 	int ret, n;
-	struct file *file;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
@@ -487,13 +490,13 @@ static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
 		return -EFAULT;
 
 	if (mode & BLK_OPEN_EXCL)
-		return set_blocksize(bdev, n);
+		return set_blocksize(file, n);
 
-	file = bdev_file_open_by_dev(bdev->bd_dev, mode, &bdev, NULL);
-	if (IS_ERR(file))
+	excl_file = bdev_file_open_by_dev(dev, mode, &dev, NULL);
+	if (IS_ERR(excl_file))
 		return -EBUSY;
-	ret = set_blocksize(bdev, n);
-	fput(file);
+	ret = set_blocksize(excl_file, n);
+	fput(excl_file);
 	return ret;
 }
 
@@ -621,7 +624,7 @@ long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	case BLKBSZGET: /* get block device soft block size (cf. BLKSSZGET) */
 		return put_int(argp, block_size(bdev));
 	case BLKBSZSET:
-		return blkdev_bszset(bdev, mode, argp);
+		return blkdev_bszset(file, mode, argp);
 	case BLKGETSIZE64:
 		return put_u64(argp, bdev_nr_bytes(bdev));
 
@@ -681,7 +684,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	case BLKBSZGET_32: /* get the logical block size (cf. BLKSSZGET) */
 		return put_int(argp, bdev_logical_block_size(bdev));
 	case BLKBSZSET_32:
-		return blkdev_bszset(bdev, mode, argp);
+		return blkdev_bszset(file, mode, argp);
 	case BLKGETSIZE64_32:
 		return put_u64(argp, bdev_nr_bytes(bdev));
 
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 05933f25b397..8a2ce8070010 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2215,7 +2215,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
 		}
 		dev_info(ddev, "%lukB available on disc\n", lba << 1);
 	}
-	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
+	set_blocksize(bdev_file, CD_FRAMESIZE);
 
 	return 0;
 
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 7696beec4c21..7130040d92ab 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -316,7 +316,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
 	set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
 	device->dev_stats_valid = 1;
-	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
+	set_blocksize(bdev_file, BTRFS_BDEV_BLOCKSIZE);
 	device->fs_devices = fs_devices;
 
 	ret = btrfs_get_dev_zone_info(device, false);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 43af5a9fb547..65c03ddecc59 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -483,7 +483,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 	if (flush)
 		sync_blockdev(bdev);
 	if (holder) {
-		ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
+		ret = set_blocksize(*bdev_file, BTRFS_BDEV_BLOCKSIZE);
 		if (ret) {
 			fput(*bdev_file);
 			goto error;
@@ -2717,7 +2717,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
 	clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
 	device->dev_stats_valid = 1;
-	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
+	set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
 
 	if (seeding_dev) {
 		btrfs_clear_sb_rdonly(sb);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 044135796f2b..9988b3a40b42 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5873,7 +5873,7 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
 
 	sb_block = EXT4_MIN_BLOCK_SIZE / blocksize;
 	offset = EXT4_MIN_BLOCK_SIZE % blocksize;
-	set_blocksize(bdev, blocksize);
+	set_blocksize(bdev_file, blocksize);
 	bh = __bread(bdev, sb_block, blocksize);
 	if (!bh) {
 		ext4_msg(sb, KERN_ERR, "couldn't read superblock of "
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index e539ccd39e1e..e477ee0ff35d 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -2626,8 +2626,7 @@ static int journal_init_dev(struct super_block *super,
 					 MAJOR(jdev), MINOR(jdev), result);
 			return result;
 		} else if (jdev != super->s_dev)
-			set_blocksize(file_bdev(journal->j_bdev_file),
-				      super->s_blocksize);
+			set_blocksize(journal->j_bdev_file, super->s_blocksize);
 
 		return 0;
 	}
@@ -2643,7 +2642,7 @@ static int journal_init_dev(struct super_block *super,
 		return result;
 	}
 
-	set_blocksize(file_bdev(journal->j_bdev_file), super->s_blocksize);
+	set_blocksize(journal->j_bdev_file, super->s_blocksize);
 	reiserfs_info(super,
 		      "journal_init_dev: journal device: %pg\n",
 		      file_bdev(journal->j_bdev_file));
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f0fa02264eda..2dc0eacb0999 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2043,7 +2043,7 @@ xfs_setsize_buftarg(
 	btp->bt_meta_sectorsize = sectorsize;
 	btp->bt_meta_sectormask = sectorsize - 1;
 
-	if (set_blocksize(btp->bt_bdev, sectorsize)) {
+	if (set_blocksize(btp->bt_bdev_file, sectorsize)) {
 		xfs_warn(btp->bt_mount,
 			"Cannot set_blocksize to %u on device %pg",
 			sectorsize, btp->bt_bdev);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 172c91879999..20c749b2ebc2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1474,7 +1474,7 @@ static inline void bio_end_io_acct(struct bio *bio, unsigned long start_time)
 }
 
 int bdev_read_only(struct block_device *bdev);
-int set_blocksize(struct block_device *bdev, int size);
+int set_blocksize(struct file *file, int size);
 
 int lookup_bdev(const char *pathname, dev_t *dev);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 9/9] make set_blocksize() fail unless block device is opened exclusive
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (6 preceding siblings ...)
  2024-05-03  3:23     ` [PATCH v2 8/9] set_blocksize(): switch to passing struct file * Al Viro
@ 2024-05-03  3:23     ` Al Viro
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  3:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Christoph Hellwig,
	linux-block, Jens Axboe, linux-btrfs, Rafael J. Wysocki,
	Andrew Morton

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/porting.rst | 7 +++++++
 block/bdev.c                          | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 1be76ef117b3..5503d5c614a7 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1134,3 +1134,10 @@ superblock of the main block device, i.e., the one stored in sb->s_bdev. Block
 device freezing now works for any block device owned by a given superblock, not
 just the main block device. The get_active_super() helper and bd_fsfreeze_sb
 pointer are gone.
+
+---
+
+**mandatory**
+
+set_blocksize() takes opened struct file instead of struct block_device now
+and it *must* be opened exclusive.
diff --git a/block/bdev.c b/block/bdev.c
index a329ff9be11d..a89bce368b64 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -157,6 +157,9 @@ int set_blocksize(struct file *file, int size)
 	if (size < bdev_logical_block_size(bdev))
 		return -EINVAL;
 
+	if (!file->private_data)
+		return -EINVAL;
+
 	/* Don't change the size if it is same as current */
 	if (inode->i_blkbits != blksize_bits(size)) {
 		sync_blockdev(bdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize()
  2024-05-03  3:18 ` [PATCHES v2][RFC] set_blocksize() rework Al Viro
  2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
@ 2024-05-03  4:17   ` Al Viro
  2024-05-03  4:17     ` [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out Al Viro
                       ` (7 more replies)
  1 sibling, 8 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  4:17 UTC (permalink / raw)
  To: linux-fsdevel

We are not using __bread() anymore and read_cache_page_gfp() doesn't
care about block size.  Moreover, we should *not* change block
size on a device that is currently held exclusive - filesystems
that use buffer cache expect the block numbers to be interpreted
in units set by filesystem.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
ACKed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/md/bcache/super.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 330bcd9ea4a9..0ee5e17ae2dd 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2554,10 +2554,6 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 	if (IS_ERR(bdev_file))
 		goto out_free_sb;
 
-	err = "failed to set blocksize";
-	if (set_blocksize(file_bdev(bdev_file), 4096))
-		goto out_blkdev_put;
-
 	err = read_super(sb, file_bdev(bdev_file), &sb_disk);
 	if (err)
 		goto out_blkdev_put;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
@ 2024-05-03  4:17     ` Al Viro
  2024-05-03  4:17     ` [PATCH v2 3/9] swapon(2)/swapoff(2): don't bother with block size Al Viro
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  4:17 UTC (permalink / raw)
  To: linux-fsdevel

1) it doesn't make any sense to have ->open() call set_blocksize() on the
device being opened - the caller will override that anyway.

2) setting block size on underlying device, OTOH, ought to be done when
we are opening it exclusive - i.e. as part of pkt_open_dev().  Having
it done at setup time doesn't guarantee us anything about the state
at the time we start talking to it.  Worse, if you happen to have
the underlying device containing e.g. ext2 with 4Kb blocks that
is currently mounted r/o, that set_blocksize() will confuse the hell
out of filesystem.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/block/pktcdvd.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 21728e9ea5c3..05933f25b397 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2215,6 +2215,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
 		}
 		dev_info(ddev, "%lukB available on disc\n", lba << 1);
 	}
+	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
 
 	return 0;
 
@@ -2278,11 +2279,6 @@ static int pkt_open(struct gendisk *disk, blk_mode_t mode)
 		ret = pkt_open_dev(pd, mode & BLK_OPEN_WRITE);
 		if (ret)
 			goto out_dec;
-		/*
-		 * needed here as well, since ext2 (among others) may change
-		 * the blocksize at mount time
-		 */
-		set_blocksize(disk->part0, CD_FRAMESIZE);
 	}
 	mutex_unlock(&ctl_mutex);
 	mutex_unlock(&pktcdvd_mutex);
@@ -2526,7 +2522,6 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
 	__module_get(THIS_MODULE);
 
 	pd->bdev_file = bdev_file;
-	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
 
 	atomic_set(&pd->cdrw.pending_bios, 0);
 	pd->cdrw.thread = kthread_run(kcdrwd, pd, "%s", pd->disk->disk_name);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 3/9] swapon(2)/swapoff(2): don't bother with block size
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
  2024-05-03  4:17     ` [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out Al Viro
@ 2024-05-03  4:17     ` Al Viro
  2024-05-03  4:17     ` [PATCH v2 4/9] swapon(2): open swap with O_EXCL Al Viro
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  4:17 UTC (permalink / raw)
  To: linux-fsdevel

once upon a time that used to matter; these days we do swap IO for
swap devices at the level that doesn't give a damn about block size,
buffer_head or anything of that sort - just attach the page to
bio, set the location and size (the latter to PAGE_SIZE) and feed
into queue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/swap.h |  1 -
 mm/swapfile.c        | 12 +-----------
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index f53d608daa01..a5b640cca459 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -301,7 +301,6 @@ struct swap_info_struct {
 	struct file *bdev_file;		/* open handle of the bdev */
 	struct block_device *bdev;	/* swap device or bdev of swap file */
 	struct file *swap_file;		/* seldom referenced */
-	unsigned int old_block_size;	/* seldom referenced */
 	struct completion comp;		/* seldom referenced */
 	spinlock_t lock;		/*
 					 * protect map scan related fields like
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4919423cce76..304f74d039f3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2417,7 +2417,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	struct inode *inode;
 	struct filename *pathname;
 	int err, found = 0;
-	unsigned int old_block_size;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -2529,7 +2528,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	}
 
 	swap_file = p->swap_file;
-	old_block_size = p->old_block_size;
 	p->swap_file = NULL;
 	p->max = 0;
 	swap_map = p->swap_map;
@@ -2553,7 +2551,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 
 	inode = mapping->host;
 	if (p->bdev_file) {
-		set_blocksize(p->bdev, old_block_size);
 		fput(p->bdev_file);
 		p->bdev_file = NULL;
 	}
@@ -2782,21 +2779,15 @@ static struct swap_info_struct *alloc_swap_info(void)
 
 static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 {
-	int error;
-
 	if (S_ISBLK(inode->i_mode)) {
 		p->bdev_file = bdev_file_open_by_dev(inode->i_rdev,
 				BLK_OPEN_READ | BLK_OPEN_WRITE, p, NULL);
 		if (IS_ERR(p->bdev_file)) {
-			error = PTR_ERR(p->bdev_file);
+			int error = PTR_ERR(p->bdev_file);
 			p->bdev_file = NULL;
 			return error;
 		}
 		p->bdev = file_bdev(p->bdev_file);
-		p->old_block_size = block_size(p->bdev);
-		error = set_blocksize(p->bdev, PAGE_SIZE);
-		if (error < 0)
-			return error;
 		/*
 		 * Zoned block devices contain zones that have a sequential
 		 * write only restriction.  Hence zoned block devices are not
@@ -3235,7 +3226,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	free_percpu(p->cluster_next_cpu);
 	p->cluster_next_cpu = NULL;
 	if (p->bdev_file) {
-		set_blocksize(p->bdev, p->old_block_size);
 		fput(p->bdev_file);
 		p->bdev_file = NULL;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 4/9] swapon(2): open swap with O_EXCL
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
  2024-05-03  4:17     ` [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out Al Viro
  2024-05-03  4:17     ` [PATCH v2 3/9] swapon(2)/swapoff(2): don't bother with block size Al Viro
@ 2024-05-03  4:17     ` Al Viro
  2024-05-03  4:17     ` [PATCH v2 5/9] zram: don't bother with reopening - just use O_EXCL for open Al Viro
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  4:17 UTC (permalink / raw)
  To: linux-fsdevel

... eliminating the need to reopen block devices so they could be
exclusively held.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/swap.h |  1 -
 mm/swapfile.c        | 19 ++-----------------
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index a5b640cca459..7e61a4aef2fc 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -298,7 +298,6 @@ struct swap_info_struct {
 	unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */
 	struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */
 	struct rb_root swap_extent_root;/* root of the swap extent rbtree */
-	struct file *bdev_file;		/* open handle of the bdev */
 	struct block_device *bdev;	/* swap device or bdev of swap file */
 	struct file *swap_file;		/* seldom referenced */
 	struct completion comp;		/* seldom referenced */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 304f74d039f3..7bba0b0d4924 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2550,10 +2550,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	exit_swap_address_space(p->type);
 
 	inode = mapping->host;
-	if (p->bdev_file) {
-		fput(p->bdev_file);
-		p->bdev_file = NULL;
-	}
 
 	inode_lock(inode);
 	inode->i_flags &= ~S_SWAPFILE;
@@ -2780,14 +2776,7 @@ static struct swap_info_struct *alloc_swap_info(void)
 static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 {
 	if (S_ISBLK(inode->i_mode)) {
-		p->bdev_file = bdev_file_open_by_dev(inode->i_rdev,
-				BLK_OPEN_READ | BLK_OPEN_WRITE, p, NULL);
-		if (IS_ERR(p->bdev_file)) {
-			int error = PTR_ERR(p->bdev_file);
-			p->bdev_file = NULL;
-			return error;
-		}
-		p->bdev = file_bdev(p->bdev_file);
+		p->bdev = I_BDEV(inode);
 		/*
 		 * Zoned block devices contain zones that have a sequential
 		 * write only restriction.  Hence zoned block devices are not
@@ -3028,7 +3017,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		name = NULL;
 		goto bad_swap;
 	}
-	swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+	swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_EXCL, 0);
 	if (IS_ERR(swap_file)) {
 		error = PTR_ERR(swap_file);
 		swap_file = NULL;
@@ -3225,10 +3214,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	p->percpu_cluster = NULL;
 	free_percpu(p->cluster_next_cpu);
 	p->cluster_next_cpu = NULL;
-	if (p->bdev_file) {
-		fput(p->bdev_file);
-		p->bdev_file = NULL;
-	}
 	inode = NULL;
 	destroy_swap_extents(p);
 	swap_cgroup_swapoff(p->type);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 5/9] zram: don't bother with reopening - just use O_EXCL for open
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (2 preceding siblings ...)
  2024-05-03  4:17     ` [PATCH v2 4/9] swapon(2): open swap with O_EXCL Al Viro
@ 2024-05-03  4:17     ` Al Viro
  2024-05-03  4:17     ` [PATCH v2 6/9] swsusp: don't bother with setting block size Al Viro
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  4:17 UTC (permalink / raw)
  To: linux-fsdevel

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/block/zram/zram_drv.c | 29 +++++++----------------------
 drivers/block/zram/zram_drv.h |  2 +-
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f0639df6cd18..58c3466dcb17 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -426,11 +426,10 @@ static void reset_bdev(struct zram *zram)
 	if (!zram->backing_dev)
 		return;
 
-	fput(zram->bdev_file);
 	/* hope filp_close flush all of IO */
 	filp_close(zram->backing_dev, NULL);
 	zram->backing_dev = NULL;
-	zram->bdev_file = NULL;
+	zram->bdev = NULL;
 	zram->disk->fops = &zram_devops;
 	kvfree(zram->bitmap);
 	zram->bitmap = NULL;
@@ -473,10 +472,8 @@ static ssize_t backing_dev_store(struct device *dev,
 	size_t sz;
 	struct file *backing_dev = NULL;
 	struct inode *inode;
-	struct address_space *mapping;
 	unsigned int bitmap_sz;
 	unsigned long nr_pages, *bitmap = NULL;
-	struct file *bdev_file = NULL;
 	int err;
 	struct zram *zram = dev_to_zram(dev);
 
@@ -497,15 +494,14 @@ static ssize_t backing_dev_store(struct device *dev,
 	if (sz > 0 && file_name[sz - 1] == '\n')
 		file_name[sz - 1] = 0x00;
 
-	backing_dev = filp_open(file_name, O_RDWR|O_LARGEFILE, 0);
+	backing_dev = filp_open(file_name, O_RDWR | O_LARGEFILE | O_EXCL, 0);
 	if (IS_ERR(backing_dev)) {
 		err = PTR_ERR(backing_dev);
 		backing_dev = NULL;
 		goto out;
 	}
 
-	mapping = backing_dev->f_mapping;
-	inode = mapping->host;
+	inode = backing_dev->f_mapping->host;
 
 	/* Support only block device in this moment */
 	if (!S_ISBLK(inode->i_mode)) {
@@ -513,14 +509,6 @@ static ssize_t backing_dev_store(struct device *dev,
 		goto out;
 	}
 
-	bdev_file = bdev_file_open_by_dev(inode->i_rdev,
-				BLK_OPEN_READ | BLK_OPEN_WRITE, zram, NULL);
-	if (IS_ERR(bdev_file)) {
-		err = PTR_ERR(bdev_file);
-		bdev_file = NULL;
-		goto out;
-	}
-
 	nr_pages = i_size_read(inode) >> PAGE_SHIFT;
 	bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
 	bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
@@ -531,7 +519,7 @@ static ssize_t backing_dev_store(struct device *dev,
 
 	reset_bdev(zram);
 
-	zram->bdev_file = bdev_file;
+	zram->bdev = I_BDEV(inode);
 	zram->backing_dev = backing_dev;
 	zram->bitmap = bitmap;
 	zram->nr_pages = nr_pages;
@@ -544,9 +532,6 @@ static ssize_t backing_dev_store(struct device *dev,
 out:
 	kvfree(bitmap);
 
-	if (bdev_file)
-		fput(bdev_file);
-
 	if (backing_dev)
 		filp_close(backing_dev, NULL);
 
@@ -587,7 +572,7 @@ static void read_from_bdev_async(struct zram *zram, struct page *page,
 {
 	struct bio *bio;
 
-	bio = bio_alloc(file_bdev(zram->bdev_file), 1, parent->bi_opf, GFP_NOIO);
+	bio = bio_alloc(zram->bdev, 1, parent->bi_opf, GFP_NOIO);
 	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
 	__bio_add_page(bio, page, PAGE_SIZE, 0);
 	bio_chain(bio, parent);
@@ -703,7 +688,7 @@ static ssize_t writeback_store(struct device *dev,
 			continue;
 		}
 
-		bio_init(&bio, file_bdev(zram->bdev_file), &bio_vec, 1,
+		bio_init(&bio, zram->bdev, &bio_vec, 1,
 			 REQ_OP_WRITE | REQ_SYNC);
 		bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
 		__bio_add_page(&bio, page, PAGE_SIZE, 0);
@@ -785,7 +770,7 @@ static void zram_sync_read(struct work_struct *work)
 	struct bio_vec bv;
 	struct bio bio;
 
-	bio_init(&bio, file_bdev(zw->zram->bdev_file), &bv, 1, REQ_OP_READ);
+	bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ);
 	bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
 	__bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
 	zw->error = submit_bio_wait(&bio);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 37bf29f34d26..35e322144629 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -132,7 +132,7 @@ struct zram {
 	spinlock_t wb_limit_lock;
 	bool wb_limit_enable;
 	u64 bd_wb_limit;
-	struct file *bdev_file;
+	struct block_device *bdev;
 	unsigned long *bitmap;
 	unsigned long nr_pages;
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 6/9] swsusp: don't bother with setting block size
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (3 preceding siblings ...)
  2024-05-03  4:17     ` [PATCH v2 5/9] zram: don't bother with reopening - just use O_EXCL for open Al Viro
@ 2024-05-03  4:17     ` Al Viro
  2024-05-03  4:17     ` [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens Al Viro
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  4:17 UTC (permalink / raw)
  To: linux-fsdevel

same as with the swap...

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/power/swap.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 5bc04bfe2db1..d9abb7ab031d 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -368,11 +368,7 @@ static int swsusp_swap_check(void)
 	if (IS_ERR(hib_resume_bdev_file))
 		return PTR_ERR(hib_resume_bdev_file);
 
-	res = set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
-	if (res < 0)
-		fput(hib_resume_bdev_file);
-
-	return res;
+	return 0;
 }
 
 /**
@@ -1574,7 +1570,6 @@ int swsusp_check(bool exclusive)
 	hib_resume_bdev_file = bdev_file_open_by_dev(swsusp_resume_device,
 				BLK_OPEN_READ, holder, NULL);
 	if (!IS_ERR(hib_resume_bdev_file)) {
-		set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
 		clear_page(swsusp_header);
 		error = hib_submit_io(REQ_OP_READ, swsusp_resume_block,
 					swsusp_header, NULL);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (4 preceding siblings ...)
  2024-05-03  4:17     ` [PATCH v2 6/9] swsusp: don't bother with setting block size Al Viro
@ 2024-05-03  4:17     ` Al Viro
  2024-05-03  4:17     ` [PATCH v2 8/9] set_blocksize(): switch to passing struct file * Al Viro
  2024-05-03  4:17     ` [PATCH v2 9/9] make set_blocksize() fail unless block device is opened exclusive Al Viro
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  4:17 UTC (permalink / raw)
  To: linux-fsdevel

btrfs_get_bdev_and_sb() has two callers - btrfs_open_one_device(),
which asks for open to be exclusive and btrfs_get_dev_args_from_path(),
which doesn't.  Currently it does set_blocksize() in all cases.

I'm rather dubious about the need to do set_blocksize() anywhere in btrfs,
to be honest - there's some access to page cache of underlying block
devices in there, but it's nowhere near the hot paths, AFAICT.

In any case, btrfs_get_dev_args_from_path() only needs to read
the on-disk superblock and copy several fields out of it; all
callers are only interested in devices that are already opened
and brought into per-filesystem set, so setting the block size
is redundant for those and actively harmful if we are given
a pathname of unrelated device.

So we only need btrfs_get_bdev_and_sb() to call set_blocksize()
when it's asked to open exclusive.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/btrfs/volumes.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f15591f3e54f..43af5a9fb547 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -482,10 +482,12 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 
 	if (flush)
 		sync_blockdev(bdev);
-	ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
-	if (ret) {
-		fput(*bdev_file);
-		goto error;
+	if (holder) {
+		ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
+		if (ret) {
+			fput(*bdev_file);
+			goto error;
+		}
 	}
 	invalidate_bdev(bdev);
 	*disk_super = btrfs_read_dev_super(bdev);
@@ -498,6 +500,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 	return 0;
 
 error:
+	*disk_super = NULL;
 	*bdev_file = NULL;
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 8/9] set_blocksize(): switch to passing struct file *
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (5 preceding siblings ...)
  2024-05-03  4:17     ` [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens Al Viro
@ 2024-05-03  4:17     ` Al Viro
  2024-05-03  4:17     ` [PATCH v2 9/9] make set_blocksize() fail unless block device is opened exclusive Al Viro
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  4:17 UTC (permalink / raw)
  To: linux-fsdevel

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/bdev.c            | 11 +++++++----
 block/ioctl.c           | 21 ++++++++++++---------
 drivers/block/pktcdvd.c |  2 +-
 fs/btrfs/dev-replace.c  |  2 +-
 fs/btrfs/volumes.c      |  4 ++--
 fs/ext4/super.c         |  2 +-
 fs/reiserfs/journal.c   |  5 ++---
 fs/xfs/xfs_buf.c        |  2 +-
 include/linux/blkdev.h  |  2 +-
 9 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index b8e32d933a63..a329ff9be11d 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -144,8 +144,11 @@ static void set_init_blocksize(struct block_device *bdev)
 	bdev->bd_inode->i_blkbits = blksize_bits(bsize);
 }
 
-int set_blocksize(struct block_device *bdev, int size)
+int set_blocksize(struct file *file, int size)
 {
+	struct inode *inode = file->f_mapping->host;
+	struct block_device *bdev = I_BDEV(inode);
+
 	/* Size must be a power of two, and between 512 and PAGE_SIZE */
 	if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
 		return -EINVAL;
@@ -155,9 +158,9 @@ int set_blocksize(struct block_device *bdev, int size)
 		return -EINVAL;
 
 	/* Don't change the size if it is same as current */
-	if (bdev->bd_inode->i_blkbits != blksize_bits(size)) {
+	if (inode->i_blkbits != blksize_bits(size)) {
 		sync_blockdev(bdev);
-		bdev->bd_inode->i_blkbits = blksize_bits(size);
+		inode->i_blkbits = blksize_bits(size);
 		kill_bdev(bdev);
 	}
 	return 0;
@@ -167,7 +170,7 @@ EXPORT_SYMBOL(set_blocksize);
 
 int sb_set_blocksize(struct super_block *sb, int size)
 {
-	if (set_blocksize(sb->s_bdev, size))
+	if (set_blocksize(sb->s_bdev_file, size))
 		return 0;
 	/* If we get here, we know size is power of two
 	 * and it's value is between 512 and PAGE_SIZE */
diff --git a/block/ioctl.c b/block/ioctl.c
index a9028a2c2db5..1c800364bc70 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -473,11 +473,14 @@ static int compat_hdio_getgeo(struct block_device *bdev,
 #endif
 
 /* set the logical block size */
-static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
+static int blkdev_bszset(struct file *file, blk_mode_t mode,
 		int __user *argp)
 {
+	// this one might be file_inode(file)->i_rdev - a rare valid
+	// use of file_inode() for those.
+	dev_t dev = I_BDEV(file->f_mapping->host)->bd_dev;
+	struct file *excl_file;
 	int ret, n;
-	struct file *file;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
@@ -487,13 +490,13 @@ static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
 		return -EFAULT;
 
 	if (mode & BLK_OPEN_EXCL)
-		return set_blocksize(bdev, n);
+		return set_blocksize(file, n);
 
-	file = bdev_file_open_by_dev(bdev->bd_dev, mode, &bdev, NULL);
-	if (IS_ERR(file))
+	excl_file = bdev_file_open_by_dev(dev, mode, &dev, NULL);
+	if (IS_ERR(excl_file))
 		return -EBUSY;
-	ret = set_blocksize(bdev, n);
-	fput(file);
+	ret = set_blocksize(excl_file, n);
+	fput(excl_file);
 	return ret;
 }
 
@@ -621,7 +624,7 @@ long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	case BLKBSZGET: /* get block device soft block size (cf. BLKSSZGET) */
 		return put_int(argp, block_size(bdev));
 	case BLKBSZSET:
-		return blkdev_bszset(bdev, mode, argp);
+		return blkdev_bszset(file, mode, argp);
 	case BLKGETSIZE64:
 		return put_u64(argp, bdev_nr_bytes(bdev));
 
@@ -681,7 +684,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	case BLKBSZGET_32: /* get the logical block size (cf. BLKSSZGET) */
 		return put_int(argp, bdev_logical_block_size(bdev));
 	case BLKBSZSET_32:
-		return blkdev_bszset(bdev, mode, argp);
+		return blkdev_bszset(file, mode, argp);
 	case BLKGETSIZE64_32:
 		return put_u64(argp, bdev_nr_bytes(bdev));
 
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 05933f25b397..8a2ce8070010 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2215,7 +2215,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
 		}
 		dev_info(ddev, "%lukB available on disc\n", lba << 1);
 	}
-	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
+	set_blocksize(bdev_file, CD_FRAMESIZE);
 
 	return 0;
 
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 7696beec4c21..7130040d92ab 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -316,7 +316,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
 	set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
 	device->dev_stats_valid = 1;
-	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
+	set_blocksize(bdev_file, BTRFS_BDEV_BLOCKSIZE);
 	device->fs_devices = fs_devices;
 
 	ret = btrfs_get_dev_zone_info(device, false);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 43af5a9fb547..65c03ddecc59 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -483,7 +483,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 	if (flush)
 		sync_blockdev(bdev);
 	if (holder) {
-		ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
+		ret = set_blocksize(*bdev_file, BTRFS_BDEV_BLOCKSIZE);
 		if (ret) {
 			fput(*bdev_file);
 			goto error;
@@ -2717,7 +2717,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
 	clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
 	device->dev_stats_valid = 1;
-	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
+	set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
 
 	if (seeding_dev) {
 		btrfs_clear_sb_rdonly(sb);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 044135796f2b..9988b3a40b42 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5873,7 +5873,7 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
 
 	sb_block = EXT4_MIN_BLOCK_SIZE / blocksize;
 	offset = EXT4_MIN_BLOCK_SIZE % blocksize;
-	set_blocksize(bdev, blocksize);
+	set_blocksize(bdev_file, blocksize);
 	bh = __bread(bdev, sb_block, blocksize);
 	if (!bh) {
 		ext4_msg(sb, KERN_ERR, "couldn't read superblock of "
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index e539ccd39e1e..e477ee0ff35d 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -2626,8 +2626,7 @@ static int journal_init_dev(struct super_block *super,
 					 MAJOR(jdev), MINOR(jdev), result);
 			return result;
 		} else if (jdev != super->s_dev)
-			set_blocksize(file_bdev(journal->j_bdev_file),
-				      super->s_blocksize);
+			set_blocksize(journal->j_bdev_file, super->s_blocksize);
 
 		return 0;
 	}
@@ -2643,7 +2642,7 @@ static int journal_init_dev(struct super_block *super,
 		return result;
 	}
 
-	set_blocksize(file_bdev(journal->j_bdev_file), super->s_blocksize);
+	set_blocksize(journal->j_bdev_file, super->s_blocksize);
 	reiserfs_info(super,
 		      "journal_init_dev: journal device: %pg\n",
 		      file_bdev(journal->j_bdev_file));
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f0fa02264eda..2dc0eacb0999 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2043,7 +2043,7 @@ xfs_setsize_buftarg(
 	btp->bt_meta_sectorsize = sectorsize;
 	btp->bt_meta_sectormask = sectorsize - 1;
 
-	if (set_blocksize(btp->bt_bdev, sectorsize)) {
+	if (set_blocksize(btp->bt_bdev_file, sectorsize)) {
 		xfs_warn(btp->bt_mount,
 			"Cannot set_blocksize to %u on device %pg",
 			sectorsize, btp->bt_bdev);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 172c91879999..20c749b2ebc2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1474,7 +1474,7 @@ static inline void bio_end_io_acct(struct bio *bio, unsigned long start_time)
 }
 
 int bdev_read_only(struct block_device *bdev);
-int set_blocksize(struct block_device *bdev, int size);
+int set_blocksize(struct file *file, int size);
 
 int lookup_bdev(const char *pathname, dev_t *dev);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 9/9] make set_blocksize() fail unless block device is opened exclusive
  2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
                       ` (6 preceding siblings ...)
  2024-05-03  4:17     ` [PATCH v2 8/9] set_blocksize(): switch to passing struct file * Al Viro
@ 2024-05-03  4:17     ` Al Viro
  7 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2024-05-03  4:17 UTC (permalink / raw)
  To: linux-fsdevel

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/porting.rst | 7 +++++++
 block/bdev.c                          | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 1be76ef117b3..5503d5c614a7 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1134,3 +1134,10 @@ superblock of the main block device, i.e., the one stored in sb->s_bdev. Block
 device freezing now works for any block device owned by a given superblock, not
 just the main block device. The get_active_super() helper and bd_fsfreeze_sb
 pointer are gone.
+
+---
+
+**mandatory**
+
+set_blocksize() takes opened struct file instead of struct block_device now
+and it *must* be opened exclusive.
diff --git a/block/bdev.c b/block/bdev.c
index a329ff9be11d..a89bce368b64 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -157,6 +157,9 @@ int set_blocksize(struct file *file, int size)
 	if (size < bdev_logical_block_size(bdev))
 		return -EINVAL;
 
+	if (!file->private_data)
+		return -EINVAL;
+
 	/* Don't change the size if it is same as current */
 	if (inode->i_blkbits != blksize_bits(size)) {
 		sync_blockdev(bdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens
  2024-05-03  3:23     ` [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens Al Viro
@ 2024-05-10 13:41       ` David Sterba
  0 siblings, 0 replies; 51+ messages in thread
From: David Sterba @ 2024-05-10 13:41 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner,
	Christoph Hellwig, linux-block, Jens Axboe, linux-btrfs,
	Rafael J. Wysocki, Andrew Morton

On Fri, May 03, 2024 at 04:23:27AM +0100, Al Viro wrote:
> btrfs_get_bdev_and_sb() has two callers - btrfs_open_one_device(),
> which asks for open to be exclusive and btrfs_get_dev_args_from_path(),
> which doesn't.  Currently it does set_blocksize() in all cases.
> 
> I'm rather dubious about the need to do set_blocksize() anywhere in btrfs,
> to be honest - there's some access to page cache of underlying block
> devices in there, but it's nowhere near the hot paths, AFAICT.
> 
> In any case, btrfs_get_dev_args_from_path() only needs to read
> the on-disk superblock and copy several fields out of it; all
> callers are only interested in devices that are already opened
> and brought into per-filesystem set, so setting the block size
> is redundant for those and actively harmful if we are given
> a pathname of unrelated device.
> 
> So we only need btrfs_get_bdev_and_sb() to call set_blocksize()
> when it's asked to open exclusive.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Acked-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2024-05-10 13:41 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-27 21:09 [PATCHES][RFC] set_blocksize() rework Al Viro
2024-04-27 21:10 ` [PATCH 1/7] bcache_register(): don't bother with set_blocksize() Al Viro
2024-04-29  5:06   ` Christoph Hellwig
2024-04-29  8:37   ` Christian Brauner
2024-04-27 21:10 ` [PATCH 2/7] pktcdvd: sort set_blocksize() calls out Al Viro
2024-04-29  5:07   ` Christoph Hellwig
2024-04-29  8:38   ` Christian Brauner
2024-04-27 21:10 ` [PATCH 3/7] swapon(2)/swapoff(2): don't bother with block size Al Viro
2024-04-29  5:08   ` Christoph Hellwig
2024-04-29  8:38   ` Christian Brauner
2024-04-27 21:11 ` [PATCH 4/7] swapon(2): open swap with O_EXCL Al Viro
2024-04-27 21:40   ` Linus Torvalds
2024-04-27 23:46     ` Al Viro
2024-04-28  1:25       ` Al Viro
2024-04-28 18:19       ` Al Viro
2024-04-28 18:46         ` Linus Torvalds
2024-04-28 19:07           ` Al Viro
2024-04-29  5:10             ` Christoph Hellwig
2024-04-29  5:09   ` Christoph Hellwig
2024-04-29  8:39   ` Christian Brauner
2024-04-27 21:11 ` [PATCH 5/7] swsusp: don't bother with setting block size Al Viro
2024-04-29  5:10   ` Christoph Hellwig
2024-04-29  8:40   ` Christian Brauner
2024-04-27 21:12 ` [PATCH 6/7] btrfs_get_dev_args_from_path(): don't call set_blocksize() Al Viro
2024-04-29  5:11   ` Christoph Hellwig
2024-04-29  8:40   ` Christian Brauner
2024-04-29 15:11   ` David Sterba
2024-04-30  2:05     ` Al Viro
2024-04-27 21:13 ` [PATCH 7/7] set_blocksize(): switch to passing struct file *, fail if it's not opened exclusive Al Viro
2024-04-29  5:12   ` Christoph Hellwig
2024-04-29  8:42   ` Christian Brauner
2024-05-03  3:18 ` [PATCHES v2][RFC] set_blocksize() rework Al Viro
2024-05-03  3:23   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
2024-05-03  3:23     ` [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out Al Viro
2024-05-03  3:23     ` [PATCH v2 3/9] swapon(2)/swapoff(2): don't bother with block size Al Viro
2024-05-03  3:23     ` [PATCH v2 4/9] swapon(2): open swap with O_EXCL Al Viro
2024-05-03  3:23     ` [PATCH v2 5/9] zram: don't bother with reopening - just use O_EXCL for open Al Viro
2024-05-03  3:23     ` [PATCH v2 6/9] swsusp: don't bother with setting block size Al Viro
2024-05-03  3:23     ` [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens Al Viro
2024-05-10 13:41       ` David Sterba
2024-05-03  3:23     ` [PATCH v2 8/9] set_blocksize(): switch to passing struct file * Al Viro
2024-05-03  3:23     ` [PATCH v2 9/9] make set_blocksize() fail unless block device is opened exclusive Al Viro
2024-05-03  4:17   ` [PATCH v2 1/9] bcache_register(): don't bother with set_blocksize() Al Viro
2024-05-03  4:17     ` [PATCH v2 2/9] pktcdvd: sort set_blocksize() calls out Al Viro
2024-05-03  4:17     ` [PATCH v2 3/9] swapon(2)/swapoff(2): don't bother with block size Al Viro
2024-05-03  4:17     ` [PATCH v2 4/9] swapon(2): open swap with O_EXCL Al Viro
2024-05-03  4:17     ` [PATCH v2 5/9] zram: don't bother with reopening - just use O_EXCL for open Al Viro
2024-05-03  4:17     ` [PATCH v2 6/9] swsusp: don't bother with setting block size Al Viro
2024-05-03  4:17     ` [PATCH v2 7/9] btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens Al Viro
2024-05-03  4:17     ` [PATCH v2 8/9] set_blocksize(): switch to passing struct file * Al Viro
2024-05-03  4:17     ` [PATCH v2 9/9] make set_blocksize() fail unless block device is opened exclusive Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.