* 1 lock held by xfs_repair/276634
@ 2021-07-01 10:44 Bruno Goncalves
2021-07-02 0:05 ` Dave Chinner
0 siblings, 1 reply; 2+ messages in thread
From: Bruno Goncalves @ 2021-07-01 10:44 UTC (permalink / raw)
To: linux-fsdevel; +Cc: fstests, CKI Project
Hello,
We have hit this lock problem during xfstest [1] on aarch64. The whole
console.log is available on [2].
10847.013727] run fstests generic/023 at 2021-05-15 17:21:46
[10863.635560] XFS (sda4): Unmounting Filesystem
[10865.095328] BUG: sleeping function called from invalid context at (null):3550
[10865.102695] in_atomic(): 0, irqs_disabled(): 128, non_block: 0,
pid: 276634, name: xfs_repair
[10865.111223] 1 lock held by xfs_repair/276634:
[10865.115579] #0: ffff000168f654d0
(&tsk->futex_exit_mutex){+.+.}-{3:3}, at: futex_exit_release+0x40/0xe4
[10865.125091] irq event stamp: 150
[10865.128314] hardirqs last enabled at (149): [<ffff8000101a2778>]
uaccess_ttbr0_enable+0xa8/0xc0
[10865.137096] hardirqs last disabled at (150): [<ffff8000101a2838>]
uaccess_ttbr0_disable+0xa8/0xb4
[10865.145964] softirqs last enabled at (132): [<ffff800010016490>]
put_cpu_fpsimd_context+0x30/0x70
[10865.154921] softirqs last disabled at (130): [<ffff800010016408>]
get_cpu_fpsimd_context+0x8/0x60
[10865.163792] CPU: 31 PID: 276634 Comm: xfs_repair Not tainted 5.13.0-rc1 #1
[10865.170663] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS
F02 08/06/2019
[10865.178054] Call trace:
[10865.180496] dump_backtrace+0x0/0x1c0
[10865.184156] show_stack+0x24/0x30
[10865.187467] dump_stack+0xf8/0x164
[10865.190867] ___might_sleep+0x174/0x250
[10865.194700] __might_sleep+0x60/0xa0
[10865.198272] __might_fault+0x3c/0x90
[10865.201847] exit_robust_list+0xac/0x36c
[10865.205767] exit_robust_list+0x9c/0x36c
[10865.209686] futex_exit_release+0xa8/0xe4
[10865.213692] exit_mm_release+0x28/0x44
[10865.217438] exit_mm+0x2c/0x27c
[10865.220579] do_exit+0x1f0/0x454
[10865.223804] __arm64_sys_exit+0x24/0x2c
[10865.227638] invoke_syscall+0x50/0x120
[10865.231384] el0_svc_common.constprop.0+0x68/0x104
[10865.236172] do_el0_svc+0x30/0x9c
[10865.239483] el0_svc+0x2c/0x54
[10865.242538] el0_sync_handler+0x1a4/0x1b0
[10865.246544] el0_sync+0x19c/0x1c0
We don't reproduce this often, but the first time I've seen it was
with 'Commit: f36edc5533b2 - Merge tag 'arc-5.13-rc2' of
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc'
[1] https://gitlab.com/cki-project/kernel-tests/-/tree/main/filesystems/xfs/xfstests
[2] https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/05/15/303402899/build_aarch64_redhat%3A1264727321/tests/9991652_aarch64_2_console.log
Thank you,
Bruno Goncalves
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: 1 lock held by xfs_repair/276634
2021-07-01 10:44 1 lock held by xfs_repair/276634 Bruno Goncalves
@ 2021-07-02 0:05 ` Dave Chinner
0 siblings, 0 replies; 2+ messages in thread
From: Dave Chinner @ 2021-07-02 0:05 UTC (permalink / raw)
To: Bruno Goncalves; +Cc: linux-fsdevel, fstests, CKI Project
On Thu, Jul 01, 2021 at 12:44:30PM +0200, Bruno Goncalves wrote:
> Hello,
>
> We have hit this lock problem during xfstest [1] on aarch64. The whole
> console.log is available on [2].
fstests is not the place to report test failures. They should be
directed to the list for the subsystem that failed. In this case,
probably linux-xfs@vger.kernel.org. I haven't cc'd that list
because....
>
> 10847.013727] run fstests generic/023 at 2021-05-15 17:21:46
> [10863.635560] XFS (sda4): Unmounting Filesystem
> [10865.095328] BUG: sleeping function called from invalid context at (null):3550
> [10865.102695] in_atomic(): 0, irqs_disabled(): 128, non_block: 0,
> pid: 276634, name: xfs_repair
> [10865.111223] 1 lock held by xfs_repair/276634:
> [10865.115579] #0: ffff000168f654d0
> (&tsk->futex_exit_mutex){+.+.}-{3:3}, at: futex_exit_release+0x40/0xe4
> [10865.125091] irq event stamp: 150
> [10865.128314] hardirqs last enabled at (149): [<ffff8000101a2778>]
> uaccess_ttbr0_enable+0xa8/0xc0
> [10865.137096] hardirqs last disabled at (150): [<ffff8000101a2838>]
> uaccess_ttbr0_disable+0xa8/0xb4
> [10865.145964] softirqs last enabled at (132): [<ffff800010016490>]
> put_cpu_fpsimd_context+0x30/0x70
> [10865.154921] softirqs last disabled at (130): [<ffff800010016408>]
> get_cpu_fpsimd_context+0x8/0x60
> [10865.163792] CPU: 31 PID: 276634 Comm: xfs_repair Not tainted 5.13.0-rc1 #1
> [10865.170663] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS
> F02 08/06/2019
> [10865.178054] Call trace:
> [10865.180496] dump_backtrace+0x0/0x1c0
> [10865.184156] show_stack+0x24/0x30
> [10865.187467] dump_stack+0xf8/0x164
> [10865.190867] ___might_sleep+0x174/0x250
> [10865.194700] __might_sleep+0x60/0xa0
> [10865.198272] __might_fault+0x3c/0x90
> [10865.201847] exit_robust_list+0xac/0x36c
> [10865.205767] exit_robust_list+0x9c/0x36c
> [10865.209686] futex_exit_release+0xa8/0xe4
> [10865.213692] exit_mm_release+0x28/0x44
> [10865.217438] exit_mm+0x2c/0x27c
> [10865.220579] do_exit+0x1f0/0x454
> [10865.223804] __arm64_sys_exit+0x24/0x2c
> [10865.227638] invoke_syscall+0x50/0x120
> [10865.231384] el0_svc_common.constprop.0+0x68/0x104
> [10865.236172] do_el0_svc+0x30/0x9c
> [10865.239483] el0_svc+0x2c/0x54
> [10865.242538] el0_sync_handler+0x1a4/0x1b0
> [10865.246544] el0_sync+0x19c/0x1c0
... this is likely a futex bug or some other platform kernel
bug. xfs_repair is just the userspace application that is tripping
over it.
> We don't reproduce this often, but the first time I've seen it was
> with 'Commit: f36edc5533b2 - Merge tag 'arc-5.13-rc2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc'
>
> [1] https://gitlab.com/cki-project/kernel-tests/-/tree/main/filesystems/xfs/xfstests
> [2] https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/05/15/303402899/build_aarch64_redhat%3A1264727321/tests/9991652_aarch64_2_console.log
Yup, there's a second occurrence of this same "sleeping in
invalid context" bug from something called "stress-ng" on a rwsem:
[ 2277.799926] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1352
[ 2277.808464] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 125191, name: stress-ng
[ 2277.816908] no locks held by stress-ng/125191.
[ 2277.821356] irq event stamp: 2482
[ 2277.824682] hardirqs last enabled at (2481): [<ffff800010341e0c>] __uaccess_ttbr0_enable+0x7c/0x90
[ 2277.833742] hardirqs last disabled at (2482): [<ffff800010342130>] __do_sys_mincore+0x310/0x354
[ 2277.842448] softirqs last enabled at (30): [<ffff800010016490>] put_cpu_fpsimd_context+0x30/0x70
[ 2277.851329] softirqs last disabled at (28): [<ffff800010016408>] get_cpu_fpsimd_context+0x8/0x60
[ 2277.860125] CPU: 11 PID: 125191 Comm: stress-ng Not tainted 5.13.0-rc1 #1
[ 2277.866919] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS F02 08/06/2019
[ 2277.874319] Call trace:
[ 2277.876772] dump_backtrace+0x0/0x1c0
[ 2277.880443] show_stack+0x24/0x30
[ 2277.883765] dump_stack+0xf8/0x164
[ 2277.887170] ___might_sleep+0x174/0x250
[ 2277.891003] __might_sleep+0x60/0xa0
[ 2277.894575] down_read+0x38/0xa0
[ 2277.897802] __do_sys_mincore+0xe0/0x354
[ 2277.901723] __arm64_sys_mincore+0x28/0x8c
[ 2277.905816] invoke_syscall+0x50/0x120
[ 2277.909563] el0_svc_common.constprop.0+0x68/0x104
[ 2277.914350] do_el0_svc+0x30/0x9c
[ 2277.917661] el0_svc+0x2c/0x54
[ 2277.920716] el0_sync_handler+0x1a4/0x1b0
[ 2277.924722] el0_sync+0x19c/0x1c0
There are also RCU lock warnings immediately after this:
"kernel/sched/core.c:8304 Illegal context switch in RCU-sched read-side critical section!"
occuring in core memory allocation code, followed by other
interleaved warning mess.
So, really, this looks like a platform bug or unbalanced irq
enable/disable somewhere in the kernel and has nothing to do with
the xfs_repair process that triggered it...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-07-02 0:06 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-01 10:44 1 lock held by xfs_repair/276634 Bruno Goncalves
2021-07-02 0:05 ` Dave Chinner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.