linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Lockups due to "locking/rwsem: Make handoff bit handling more consistent"
@ 2022-06-17 13:43 Mel Gorman
  2022-06-17 14:29 ` Waiman Long
  0 siblings, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2022-06-17 13:43 UTC (permalink / raw)
  To: Waiman Long
  Cc: Zhenhua Ma, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	LKML, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 1825 bytes --]

Hi Waiman,

I've received reports of lockups happening in kernels including
commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
consistent"). The exact symptoms vary but usually it's either a soft lockup
(older kernel with a backport), the task hanging and never exiting or the
machine becomes generally unresponsive and ssh is broken.  The problem
started in 5.16 and reliably bisected to commit d257cc8cb8d5. Reverting
the patch in 5.16, 5.17 and 5.18 finish the test successfully but I didn't
test a revert on 5.19-rc2 because of other changes layered on top.

The reproducer is simple -- start pairs of CPU hogs pinned to a CPU with
different SCHED_RR priorities that run for a few seconds. It does not
hit every time but usually happens within 10 attempts. On 5.16 at least,
the tasks failed to exit and kept retrying to exit using the following path

[<0>] rwsem_down_write_slowpath+0x2ad/0x580
[<0>] unlink_file_vma+0x2c/0x50
[<0>] free_pgtables+0xbe/0x110
[<0>] exit_mmap+0xc1/0x220
[<0>] mmput+0x52/0x110
[<0>] do_exit+0x2ec/0xb00
[<0>] do_group_exit+0x2d/0x90
[<0>] get_signal+0xb6/0x920
[<0>] arch_do_signal_or_restart+0xba/0x700
[<0>] exit_to_user_mode_prepare+0xb7/0x230
[<0>] irqentry_exit_to_user_mode+0x5/0x20
[<0>] asm_sysvec_apic_timer_interrupt+0x12/0x20
[<0>] preempt_schedule_thunk+0x16/0x18
[<0>] rwsem_down_write_slowpath+0x2ad/0x580
[<0>] unlink_file_vma+0x2c/0x50
[<0>] free_pgtables+0xbe/0x110
[<0>] exit_mmap+0xc1/0x220
[<0>] mmput+0x52/0x110
[<0>] do_exit+0x2ec/0xb00
[<0>] do_group_exit+0x2d/0x90
[<0>] get_signal+0xb6/0x920
[<0>] arch_do_signal_or_restart+0xba/0x700
[<0>] exit_to_user_mode_prepare+0xb7/0x230
[<0>] irqentry_exit_to_user_mode+0x5/0x20
[<0>] asm_sysvec_apic_timer_interrupt+0x12/0x20

The C file and shell script to run it are attached.

-- 
Mel Gorman
SUSE Labs

[-- Attachment #2: fsim.c --]
[-- Type: text/x-c, Size: 191 bytes --]

#include <unistd.h>
#include <stdlib.h>
#include <signal.h>

void sig_handle(int sig) { exit(0); }

int main(void)
{ unsigned long c; signal(SIGALRM, sig_handle); alarm(10); while (1) c++; }

[-- Attachment #3: run-fsim.sh --]
[-- Type: application/x-sh, Size: 459 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-22 15:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-17 13:43 Lockups due to "locking/rwsem: Make handoff bit handling more consistent" Mel Gorman
2022-06-17 14:29 ` Waiman Long
2022-06-20 14:09   ` Mel Gorman
2022-06-22  1:32     ` Waiman Long
2022-06-22 15:09       ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).