All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	Pankaj Raghav <p.raghav@samsung.com>,
	linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 17/18] xfs: remove check for block sizes smaller than PAGE_SIZE
Date: Wed, 20 Sep 2023 12:13:26 +1000	[thread overview]
Message-ID: <ZQpVRnh9QgdoQZso@dread.disaster.area> (raw)
In-Reply-To: <20230918110510.66470-18-hare@suse.de>

On Mon, Sep 18, 2023 at 01:05:09PM +0200, Hannes Reinecke wrote:
> We now support block sizes larger than PAGE_SIZE, so this
> check is pointless.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  fs/xfs/xfs_super.c | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 1f77014c6e1a..67dcdd4dcf2d 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1651,18 +1651,6 @@ xfs_fs_fill_super(
>  		goto out_free_sb;
>  	}
>  
> -	/*
> -	 * Until this is fixed only page-sized or smaller data blocks work.
> -	 */
> -	if (mp->m_sb.sb_blocksize > PAGE_SIZE) {
> -		xfs_warn(mp,
> -		"File system with blocksize %d bytes. "
> -		"Only pagesize (%ld) or less will currently work.",
> -				mp->m_sb.sb_blocksize, PAGE_SIZE);
> -		error = -ENOSYS;
> -		goto out_free_sb;
> -	}

This really needs to be replaced with an EXPERIMENTAL warning -
we're not going to support these LBS configurations until we are
sure it doesn't eat data.

Anyway, smoke tests....

# mkfs.xfs -f -b size=64k /dev/pmem0
meta-data=/dev/pmem0             isize=512    agcount=4, agsize=32768 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=65536  blocks=131072, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=65536  ascii-ci=0, ftype=1
log      =internal log           bsize=65536  blocks=1024, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0
# mount /dev/pmem0 /mnt/test

Message from syslogd@test3 at Sep 20 11:23:32 ...
 kernel:[   73.521819] XFS: Assertion failed: PAGE_SHIFT >= sbp->sb_blocklog, file: fs/xfs/xfs_mount.c, line: 134

Message from syslogd@test3 at Sep 20 11:23:32 ...
 kernel:[   73.521819] XFS: Assertion failed: PAGE_SHIFT >= sbp->sb_blocklog, file: fs/xfs/xfs_mount.c, line: 134
Segmentation fault
#

Looks like this hasn't been tested with CONFIG_XFS_DEBUG=y. If
that's the case, I expect that none of this actually works... :/

I've attached a patch at the end of the email that allows XFS
filesystems to mount with debug enabled.

Next problem, filesystem created with a 32kB sector size:

# mkfs.xfs -f -b size=64k -s size=32k /dev/pmem0
meta-data=/dev/pmem0             isize=512    agcount=4, agsize=32768 blks
         =                       sectsz=32768 attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=65536  blocks=131072, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=65536  ascii-ci=0, ftype=1
log      =internal log           bsize=65536  blocks=2709, version=2
         =                       sectsz=32768 sunit=1 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0
#

and then running xfs_db on it to change the UUID:

# xfs_admin -U generate /dev/pmem0

Results in a kernel panic:

[  132.151886] XFS (pmem0): Mounting V5 Filesystem 3d96f860-2aa2-4e50-970c-134508b7954a
[  132.161673] XFS (pmem0): Ending clean mount
[  175.824015] XFS (pmem0): Unmounting Filesystem 3d96f860-2aa2-4e50-970c-134508b7954a
[  185.759251] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_mpage_readpage+0x7e5/0x7f0
[  185.766632] CPU: 1 PID: 4383 Comm: xfs_db Not tainted 6.6.0-rc2-dgc+ #1903
[  185.771882] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  185.778827] Call Trace:
[  185.780706]  <TASK>
[  185.782318]  dump_stack_lvl+0x37/0x50
[  185.785069]  dump_stack+0x10/0x20
[  185.787557]  panic+0x15b/0x300
[  185.789763]  ? do_mpage_readpage+0x7e5/0x7f0
[  185.792705]  __stack_chk_fail+0x14/0x20
[  185.795183]  do_mpage_readpage+0x7e5/0x7f0
[  185.797826]  ? blkdev_write_begin+0x30/0x30
[  185.800500]  ? blkdev_readahead+0x15/0x20
[  185.802894]  ? read_pages+0x5c/0x230
[  185.805023]  ? page_cache_ra_order+0x2ae/0x310
[  185.807538]  ? ondemand_readahead+0x1f1/0x3a0
[  185.809899]  ? page_cache_async_ra+0x26/0x30
[  185.812175]  ? filemap_get_pages+0x540/0x6d0
[  185.814327]  ? _copy_to_iter+0x65/0x4c0
[  185.816283]  ? filemap_read+0xfc/0x3a0
[  185.818086]  ? __fsnotify_parent+0x107/0x340
[  185.820142]  ? __might_sleep+0x42/0x70
[  185.821854]  ? blkdev_read_iter+0x6d/0x150
[  185.823697]  ? vfs_read+0x1b1/0x300
[  185.825307]  ? __x64_sys_pread64+0x8f/0xc0
[  185.827159]  ? irqentry_exit+0x33/0x40
[  185.828858]  ? do_syscall_64+0x35/0x80
[  185.830485]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  185.832677]  </TASK>
[  185.834068] Kernel Offset: disabled
[  185.835510] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_mpage_readpage+0x7e5/0x7f0 ]---

Probably related to the block device having a 32kB block size set on
the pmem device by xfs_db when it only really has a 4kB sector size....

Anyway, back to just using 64k fsb, 4k sector.  generic/001 ASSERT
fails immediately with:

[  111.785796] run fstests generic/001 at 2023-09-20 11:50:19
[  113.346797] XFS: Assertion failed: imap.br_startblock != DELAYSTARTBLOCK, file: fs/xfs/xfs_reflink.c, line: 1392
[  113.352512] ------------[ cut here ]------------
[  113.354793] kernel BUG at fs/xfs/xfs_message.c:102!
[  113.358444] invalid opcode: 0000 [#1] PREEMPT SMP
[  113.360769] CPU: 8 PID: 7581 Comm: cp Not tainted 6.6.0-rc2-dgc+ #1903
[  113.364183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  113.369784] RIP: 0010:assfail+0x35/0x40
[  113.372038] Code: c9 48 c7 c2 00 d8 6c 82 48 89 e5 48 89 f1 48 89 fe 48 c7 c7 d8 d3 60 82 e8 a8 fd ff ff 80 3d d1 36 ec 02 00 75 04 0f 0b 5d c3 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 63 f6 49 89
[  113.384178] RSP: 0018:ffffc9000962bca0 EFLAGS: 00010202
[  113.387467] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000007fffffff
[  113.392326] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8260d3d8
[  113.397235] RBP: ffffc9000962bca0 R08: 0000000000000000 R09: 000000000000000a
[  113.401449] R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000000
[  113.406312] R13: 00000000ffffff8b R14: 0000000000000000 R15: ffff88810de00f00
[  113.410562] FS:  00007f4f6219b500(0000) GS:ffff8885fec00000(0000) knlGS:0000000000000000
[  113.415506] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  113.418688] CR2: 0000564951bfcf50 CR3: 00000005cd6b8005 CR4: 0000000000060ee0
[  113.422825] Call Trace:
[  113.424302]  <TASK>
[  113.425566]  ? show_regs+0x61/0x70
[  113.427444]  ? die+0x37/0x90
[  113.429175]  ? do_trap+0xec/0x100
[  113.431016]  ? do_error_trap+0x6c/0x90
[  113.432951]  ? assfail+0x35/0x40
[  113.434677]  ? exc_invalid_op+0x52/0x70
[  113.436755]  ? assfail+0x35/0x40
[  113.438659]  ? asm_exc_invalid_op+0x1b/0x20
[  113.440864]  ? assfail+0x35/0x40
[  113.442671]  xfs_reflink_remap_blocks+0x197/0x350
[  113.445278]  xfs_file_remap_range+0xf3/0x320
[  113.447504]  do_clone_file_range+0xfe/0x2b0
[  113.449689]  vfs_clone_file_range+0x3f/0x150
[  113.452080]  ioctl_file_clone+0x52/0xa0
[  113.453600]  do_vfs_ioctl+0x485/0x8d0
[  113.455054]  ? selinux_file_ioctl+0x96/0x120
[  113.456637]  ? selinux_file_ioctl+0x96/0x120
[  113.458213]  __x64_sys_ioctl+0x73/0xd0
[  113.459598]  do_syscall_64+0x35/0x80
[  113.460840]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Problems with unexpected extent types in the reflink remap code.

This is due to the reflink operation finding a delalloc extent where
it should be finding a real extent. this implies the
xfs_flush_unmap_range() call in xfs_reflink_remap_prep() didn't
flush the full data range it was supposed to.
xfs_flush_unmap_range() is supposed to round the range out to:

	rounding = max_t(xfs_off_t, mp->m_sb.sb_blocksize, PAGE_SIZE);

the block size for these cases, so it is LBS aware. So maybe there's
a problem with filemap_write_and_wait_range() and/or
truncate_pagecache_range() when dealing with LBS enabled page cache?

So, yeah, debug XFS builds tell us straight away that important stuff
does not appear to be not working correctly. Debug really needs to
be enabled, otherwise silent data corruption situations like this
can go unnoticed by fstests tests...

-Dave
-- 
Dave Chinner
david@fromorbit.com


xfs: fix page cache size validation for LBS configuations

From: Dave Chinner <dchinner@redhat.com>

The page cache can index larger filesystem sizes on 32 bit systems
if large block sizes are enabled. Fix this code to take these
configurations into account.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_mount.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 0a0fd19573d8..35dfcbc1e576 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -131,11 +131,15 @@ xfs_sb_validate_fsb_count(
 	xfs_sb_t	*sbp,
 	uint64_t	nblocks)
 {
-	ASSERT(PAGE_SHIFT >= sbp->sb_blocklog);
-	ASSERT(sbp->sb_blocklog >= BBSHIFT);
+	ASSERT(sbp->sb_blocklog <= XFS_MAX_BLOCKSIZE_LOG);
+	ASSERT(sbp->sb_blocklog >= XFS_MIN_BLOCKSIZE_LOG);
 
 	/* Limited by ULONG_MAX of page cache index */
-	if (nblocks >> (PAGE_SHIFT - sbp->sb_blocklog) > ULONG_MAX)
+	if (PAGE_SHIFT >= sbp->sb_blocklog)
+		nblocks >>= (PAGE_SHIFT - sbp->sb_blocklog);
+	else
+		nblocks <<= (sbp->sb_blocklog - PAGE_SHIFT);
+	if (nblocks > ULONG_MAX)
 		return -EFBIG;
 	return 0;
 }

  reply	other threads:[~2023-09-20  2:13 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-18 11:04 [RFC PATCH 00/18] block: update buffer_head for Large-block I/O Hannes Reinecke
2023-09-18 11:04 ` [PATCH 01/18] mm/readahead: rework loop in page_cache_ra_unbounded() Hannes Reinecke
     [not found]   ` <CGME20230920115645eucas1p1c8ed9bf515c4532b3e6995f8078a863b@eucas1p1.samsung.com>
2023-09-20 11:56     ` Pankaj Raghav
2023-09-20 14:13       ` Hannes Reinecke
2023-09-21  9:06         ` Pankaj Raghav
2023-09-20 14:18       ` Matthew Wilcox
2023-09-18 11:04 ` [PATCH 02/18] fs/mpage: use blocks_per_folio instead of blocks_per_page Hannes Reinecke
2023-09-18 13:15   ` Matthew Wilcox
2023-09-18 17:45     ` Hannes Reinecke
2023-09-18 11:04 ` [PATCH 03/18] block/buffer_head: introduce block_{index_to_sector,sector_to_index} Hannes Reinecke
2023-09-18 16:36   ` Matthew Wilcox
2023-09-18 17:42     ` Hannes Reinecke
2023-09-18 21:01       ` Matthew Wilcox
2023-09-18 11:04 ` [PATCH 04/18] fs/buffer.c: use accessor function to translate page index to sectors Hannes Reinecke
2023-10-20 19:37   ` Matthew Wilcox
2023-10-21  5:08     ` Matthew Wilcox
2023-10-23  5:03     ` Hannes Reinecke
2023-09-18 11:04 ` [PATCH 05/18] fs/mpage: " Hannes Reinecke
2023-09-18 11:04 ` [PATCH 06/18] fs: Allow fine-grained control of folio sizes Hannes Reinecke
2023-09-18 12:29   ` [lkp] [+550 bytes kernel size regression] [i386-tinyconfig] [8558b2228d] " kernel test robot
2023-09-18 11:04 ` [PATCH 07/18] mm/filemap: allocate folios with mapping order preference Hannes Reinecke
2023-09-18 13:41   ` Matthew Wilcox
2023-09-18 17:34     ` Hannes Reinecke
2023-09-18 11:05 ` [PATCH 08/18] mm/readahead: " Hannes Reinecke
2023-09-18 13:11   ` kernel test robot
2023-09-18 20:46   ` kernel test robot
2023-09-18 11:05 ` [PATCH 09/18] fs/buffer: use mapping order in grow_dev_page() Hannes Reinecke
2023-09-18 14:00   ` Matthew Wilcox
2023-09-18 17:38     ` Hannes Reinecke
2023-09-18 11:05 ` [PATCH 10/18] block/bdev: lift restrictions on supported blocksize Hannes Reinecke
2023-09-18 11:05 ` [PATCH 11/18] block/bdev: enable large folio support for large logical block sizes Hannes Reinecke
2023-09-18 11:05 ` [PATCH 12/18] brd: convert to folios Hannes Reinecke
2023-09-18 11:05 ` [PATCH 13/18] brd: abstract page_size conventions Hannes Reinecke
2023-09-18 11:05 ` [PATCH 14/18] brd: use memcpy_{to,from}_folio() Hannes Reinecke
2023-09-18 11:05 ` [PATCH 15/18] brd: make sector size configurable Hannes Reinecke
2023-09-18 11:05 ` [PATCH 16/18] brd: make logical " Hannes Reinecke
2023-09-18 11:05 ` [PATCH 17/18] xfs: remove check for block sizes smaller than PAGE_SIZE Hannes Reinecke
2023-09-20  2:13   ` Dave Chinner [this message]
2023-09-18 11:05 ` [PATCH 18/18] nvme: enable logical block size > PAGE_SIZE Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQpVRnh9QgdoQZso@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.