All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: linux-xfs <linux-xfs@vger.kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	Brian Foster <bfoster@redhat.com>,
	Amir Goldstein <amir73il@gmail.com>
Subject: Re: [PATCH] xfs: don't reuse busy extents on extent trim
Date: Wed, 11 Jan 2023 22:41:43 +0800	[thread overview]
Message-ID: <5287e7e6-adea-e865-5818-9cc34400cd0b@linux.alibaba.com> (raw)
In-Reply-To: <CAOQ4uxiWajRgGG2V=dYhBmVJYiRmdD+7YgkH2DMWGz6BAOXjvg@mail.gmail.com>

Hi,

On 2022/5/26 19:34, Amir Goldstein wrote:
> On Tue, Feb 23, 2021 at 2:35 PM Brian Foster <bfoster@redhat.com> wrote:
>>
>> On Mon, Feb 22, 2021 at 10:27:45AM -0800, Darrick J. Wong wrote:
>>> On Mon, Feb 22, 2021 at 10:34:42AM -0500, Brian Foster wrote:
>>>> Freed extents are marked busy from the point the freeing transaction
>>>> commits until the associated CIL context is checkpointed to the log.
>>>> This prevents reuse and overwrite of recently freed blocks before
>>>> the changes are committed to disk, which can lead to corruption
>>>> after a crash. The exception to this rule is that metadata
>>>> allocation is allowed to reuse busy extents because metadata changes
>>>> are also logged.
>>>>
>>>> As of commit 97d3ac75e5e0 ("xfs: exact busy extent tracking"), XFS
>>>> has allowed modification or complete invalidation of outstanding
>>>> busy extents for metadata allocations. This implementation assumes
>>>> that use of the associated extent is imminent, which is not always
>>>> the case. For example, the trimmed extent might not satisfy the
>>>> minimum length of the allocation request, or the allocation
>>>> algorithm might be involved in a search for the optimal result based
>>>> on locality.
>>>>
>>>> generic/019 reproduces a corruption caused by this scenario. First,
>>>> a metadata block (usually a bmbt or symlink block) is freed from an
>>>> inode. A subsequent bmbt split on an unrelated inode attempts a near
>>>> mode allocation request that invalidates the busy block during the
>>>> search, but does not ultimately allocate it. Due to the busy state
>>>> invalidation, the block is no longer considered busy to subsequent
>>>> allocation. A direct I/O write request immediately allocates the
>>>> block and writes to it.
>>>

...

>>
> 
> Hi Brian,
> 
> This patch was one of my selected fixes to backport for v5.10.y.
> It has a very scary looking commit message and the change seems
> to be independent of any infrastructure changes(?).
> 
> The problem is that applying this patch to v5.10.y reliably reproduces
> this buffer corruption assertion [*] with test xfs/076.
> 
> This happens on the kdevops system that is using loop devices over
> sparse files inside qemu images. It does not reproduce on my small
> VM at home.
> 
> Normally, I would just drop this patch from the stable candidates queue
> and move on, but I thought you might be interested to investigate this
> reliable reproducer, because maybe this system exercises an error
> that is otherwise rare to hit.
> 
> It seemed weird to me that NOT reusing the extent would result in
> data corruption, but it could indicate that reusing the extent was masking
> the assertion and hiding another bug(?).
> 
> Can you think of another reason to explain the regression this fix
> introduces to 5.10.y?
> 
> Do you care to investigate this failure or shall I just move on?
> 
> Thanks,
> Amir.
> 
> [*]
> : XFS (loop5): Internal error xfs_trans_cancel at line 954 of file
> fs/xfs/xfs_trans.c.  Caller xfs_create+0x22f/0x590 [xfs]
> : CPU: 3 PID: 25481 Comm: touch Kdump: loaded Tainted: G            E
>     5.10.109-xfs-2 #8
> : Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
> 04/01/2014
> : Call Trace:
> :  dump_stack+0x6d/0x88
> :  xfs_trans_cancel+0x17b/0x1a0 [xfs]
> :  xfs_create+0x22f/0x590 [xfs]
> :  xfs_generic_create+0x245/0x310 [xfs]
> :  ? d_splice_alias+0x13a/0x3c0
> :  path_openat+0xe3f/0x1080
> :  do_filp_open+0x93/0x100
> :  ? handle_mm_fault+0x148e/0x1690
> :  ? __check_object_size+0x162/0x180
> :  do_sys_openat2+0x228/0x2d0
> :  do_sys_open+0x4b/0x80
> :  do_syscall_64+0x33/0x80
> :  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> : RIP: 0033:0x7f36b02eff1e
> : Code: 25 00 00 41 00 3d 00 00 41 00 74 48 48 8d 05 e9 57 0d 00 8b 00
> 85 c0 75 69 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d
> 00 f0 ff ff 0
> : RSP: 002b:00007ffe7ef6ca10 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> : RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f36b02eff1e
> : RDX: 0000000000000941 RSI: 00007ffe7ef6ebfa RDI: 00000000ffffff9c
> : RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> : R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000002
> : R13: 00007ffe7ef6ebfa R14: 0000000000000001 R15: 0000000000000001
> : XFS (loop5): xfs_do_force_shutdown(0x8) called from line 955 of file
> fs/xfs/xfs_trans.c. Return address = ffffffffc08f5764
> : XFS (loop5): Corruption of in-memory data detected.  Shutting down filesystem
> : XFS (loop5): Please unmount the filesystem and rectify the problem(s)
> 

(...just for the record) We also encountered this issue but without commit
  xfs: don't reuse busy extents on extent trim
applied in our 5.10 codebase...

Need to find some time to look into that...

[  413.283300] XFS (loop1): Internal error xfs_trans_cancel at line 950 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x219/0x590 [xfs]
[  413.284295] CPU: 0 PID: 27484 Comm: touch Tainted: G            E     5.10.134-13.an8.x86_64 #1
[  413.284296] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
[  413.284297] Call Trace:
[  413.284314]  dump_stack+0x57/0x6e
[  413.284373]  xfs_trans_cancel+0xa3/0x110 [xfs]
[  413.284412]  xfs_create+0x219/0x590 [xfs]
[  413.284458]  xfs_generic_create+0x21f/0x2d0 [xfs]
[  413.284462]  path_openat+0xdee/0x1020
[  413.284464]  do_filp_open+0x80/0xd0
[  413.284467]  ? __check_object_size+0x16a/0x180
[  413.284469]  do_sys_openat2+0x207/0x2c0
[  413.284471]  do_sys_open+0x3b/0x60
[  413.284475]  do_syscall_64+0x33/0x40
[  413.284478]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[  413.284481] RIP: 0033:0x7fe623920252
[  413.284482] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 55 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[  413.284483] RSP: 002b:00007ffd7a38ca70 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  413.284485] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fe623920252
[  413.284486] RDX: 0000000000000941 RSI: 00007ffd7a38d79d RDI: 00000000ffffff9c
[  413.284487] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[  413.284488] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000001
[  413.284488] R13: 0000000000000001 R14: 00007ffd7a38d79d R15: 00007fe623bbf374
[  413.289856] XFS (loop1): xfs_do_force_shutdown(0x8) called from line 951 of file fs/xfs/xfs_trans.c. Return address = 000000003fe0b8ba
[  413.289858] XFS (loop1): Corruption of in-memory data detected.  Shutting down filesystem
[  413.290573] XFS (loop1): Please unmount the filesystem and rectify the problem(s)

Thanks,
Gao Xiang

  parent reply	other threads:[~2023-01-11 14:42 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22 15:34 [PATCH] xfs: don't reuse busy extents on extent trim Brian Foster
2021-02-22 18:27 ` Darrick J. Wong
2021-02-23 12:31   ` Brian Foster
2021-02-23 18:22     ` Darrick J. Wong
2022-05-26 11:34     ` Amir Goldstein
2022-05-26 14:21       ` Brian Foster
2022-05-26 15:28         ` Amir Goldstein
2022-05-26 19:33           ` Amir Goldstein
2022-05-26 19:47           ` Brian Foster
2022-05-26 20:56             ` Amir Goldstein
2022-05-27  7:06               ` Amir Goldstein
2022-05-27 12:50                 ` Brian Foster
2022-05-27 14:16                   ` Amir Goldstein
2022-05-28 14:23             ` Amir Goldstein
2022-05-31 15:55               ` Brian Foster
2023-01-11 14:41       ` Gao Xiang [this message]
2023-01-11 21:06         ` Dave Chinner
2021-02-23  6:24 ` Chandan Babu R
2021-02-25  7:51 ` Christoph Hellwig
2021-02-25 18:05   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5287e7e6-adea-e865-5818-9cc34400cd0b@linux.alibaba.com \
    --to=hsiangkao@linux.alibaba.com \
    --cc=amir73il@gmail.com \
    --cc=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.