* Question about kill a process group @ 2022-03-29 8:07 Zhang Qiao 2022-04-02 2:22 ` Zhang Qiao 0 siblings, 1 reply; 12+ messages in thread From: Zhang Qiao @ 2022-03-29 8:07 UTC (permalink / raw) To: lkml; +Cc: ebiederm [-- Attachment #1: Type: text/plain, Size: 6533 bytes --] hello everyone, I got a hradlockup panic when run the ltp syscall testcases. 348439.713178] NMI watchdog: Watchdog detected hard LOCKUP on cpu 32 [348439.713236] irq event stamp: 0 [348439.713237] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [348439.713238] hardirqs last disabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 [348439.713239] softirqs last enabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 [348439.713240] softirqs last disabled at (0): [<0000000000000000>] 0x0 [348439.713241] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 [348439.713242] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 [348439.713243] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 [348439.713245] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 [348439.713246] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff [348439.713247] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 [348439.713248] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 [348439.713249] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 [348439.713250] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 [348439.713251] FS: 0000000000000000(0000) GS:ffff96edbc200000(0000) knlGS:0000000000000000 [348439.713252] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [348439.713253] CR2: 0000000000416ea0 CR3: 0000002d91812004 CR4: 00000000001706e0 [348439.713254] Call Trace: [348439.713255] do_raw_write_lock+0xa9/0xb0 [348439.713256] _raw_write_lock_irq+0x5a/0x70 [348439.713256] do_exit+0x429/0xd00 [348439.713257] do_group_exit+0x39/0xb0 [348439.713258] __x64_sys_exit_group+0x14/0x20 [348439.713259] do_syscall_64+0x33/0x40 [348439.713260] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [348439.713260] RIP: 0033:0x7f59295a7066 [348439.713261] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. [348439.713262] RSP: 002b:00007fff0afeac38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [348439.713264] RAX: ffffffffffffffda RBX: 00007f5929694530 RCX: 00007f59295a7066 [348439.713265] RDX: 0000000000000002 RSI: 000000000000003c RDI: 0000000000000002 [348439.713266] RBP: 0000000000000002 R08: 00000000000000e7 R09: ffffffffffffff80 [348439.713267] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f5929694530 [348439.713268] R13: 0000000000000001 R14: 00007f5929697f68 R15: 0000000000000000 [348439.713269] Kernel panic - not syncing: Hard LOCKUP [348439.713270] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 [348439.713272] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 [348439.713272] Call Trace: [348439.713273] <NMI> [348439.713274] dump_stack+0x77/0x97 [348439.713275] panic+0x10c/0x2fb [348439.713275] nmi_panic+0x35/0x40 [348439.713276] watchdog_hardlockup_check+0xeb/0x110 [348439.713277] __perf_event_overflow+0x52/0xf0 [348439.713278] handle_pmi_common+0x21a/0x320 [348439.713286] intel_pmu_handle_irq+0xc9/0x1b0 [348439.713287] perf_event_nmi_handler+0x24/0x40 [348439.713288] nmi_handle+0xc3/0x2a0 [348439.713289] default_do_nmi+0x49/0xf0 [348439.713289] exc_nmi+0x146/0x160 [348439.713290] end_repeat_nmi+0x16/0x55 [348439.713291] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 [348439.713293] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 [348439.713295] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff [348439.713296] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 [348439.713296] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 [348439.713297] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 [348439.713298] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 [348439.713300] </NMI> [348439.713301] do_raw_write_lock+0xa9/0xb0 [348439.713302] _raw_write_lock_irq+0x5a/0x70 [348439.713303] do_exit+0x429/0xd00 [348439.713303] do_group_exit+0x39/0xb0 [348439.713304] __x64_sys_exit_group+0x14/0x20 [348439.713305] do_syscall_64+0x33/0x40 [348439.713306] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [348439.713307] RIP: 0033:0x7f59295a7066 [348439.713308] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. when analyzing vmcore, i notice lots of fork12 processes are waiting for tasklist read lock or write lock (see the attachment file all_cpu_stacks.log),and every fork12 process(belongs to the same process group) call kill(0, SIGQUIT) in their signal handler()[1], it will traverse all the processes in the same process group and send signal to them one by one, which is a very time-costly work and hold tasklist read lock long time. At the same time, other processes will exit after receive signal, they try to get the tasklist write lock at exit_notify(). [1] fork12 testcase: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/fork/fork12.c some processes call kill(0, SIGQUIT), wait for tasklist read lock: #5 [ffff972a9b16fd78] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a9b16fd78] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a9b16fd90] do_wait at ffffffff992bc17d #8 [ffff972a9b16fdd0] kernel_wait4 at ffffffff992bd88d #9 [ffff972a9b16fe58] __do_sys_wait4 at ffffffff992bd9e5 #10 [ffff972a9b16ff30] do_syscall_64 at ffffffff9920432d #11 [ffff972a9b16ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad As the same time, some processes are exiting, wait for tasklist write lock: #5 [ffff972aa49a7e60] native_queued_spin_lock_slowpath at ffffffff9931ecb0 #6 [ffff972aa49a7e60] queued_write_lock_slowpath at ffffffff993209e4 #7 [ffff972aa49a7e78] do_raw_write_lock at ffffffff99320834 #8 [ffff972aa49a7e88] do_exit at ffffffff992bcd78 #9 [ffff972aa49a7f00] do_group_exit at ffffffff992bd719 #10 [ffff972aa49a7f28] __x64_sys_exit_group at ffffffff992bd7a4 #11 [ffff972aa49a7f30] do_syscall_64 at ffffffff9920432d #12 [ffff972aa49a7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad In this scenario,there are lots of process are waiting for tasklist read lock or the tasklist write lock, so they will queue. if the wait queue is long enough, it might cause a hardlockup issue when a process wait for taking the write lock at exit_notify(). I tried to solve this problem by avoiding traversing the process group multiple times when kill(0, xxxx) is called multiple times form the same process group, but it doesn't look like a good solution. Is there any good idea for fixing this problem ? Thanks! Qiao . [-- Attachment #2: all_cpu_stack.log --] [-- Type: text/plain, Size: 113592 bytes --] PID: 1533401 TASK: ffff8c27ae12a080 CPU: 0 COMMAND: "fork12" #0 [fffffe0000007e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000007e50] nmi_handle at ffffffff9922458e #2 [fffffe0000007eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000007ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000007ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff9729d17bfea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000040000 RDX: ffff8c50ff835300 RSI: 0000000000000007 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c27ae12a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff9729d17bfea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff9729d17bfea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff9729d17bfec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff9729d17bff30] do_syscall_64 at ffffffff9920432d #9 [ffff9729d17bff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1640079 TASK: ffff8c294d8cc100 CPU: 1 COMMAND: "fork12" #0 [fffffe0000032e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000032e50] nmi_handle at ffffffff9922458e #2 [fffffe0000032eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000032ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000032ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a956a7ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000080000 RDX: ffff8c50ff875300 RSI: 000000000000002e RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c294d8cc100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a956a7ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a956a7ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a956a7ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a956a7f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a956a7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1485110 TASK: ffff8c556902c100 CPU: 2 COMMAND: "fork12" #0 [fffffe000005de48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000005de50] nmi_handle at ffffffff9922458e #2 [fffffe000005deb0] default_do_nmi at ffffffff99224aae #3 [fffffe000005ded0] do_nmi at ffffffff99224c96 #4 [fffffe000005def0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972977e67ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000000c0000 RDX: ffff8c50ff8b5300 RSI: 0000000000000005 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c556902c100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972977e67ea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972977e67ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972977e67ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972977e67f30] do_syscall_64 at ffffffff9920432d #9 [ffff972977e67f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1337040 TASK: ffff8c24389e0000 CPU: 3 COMMAND: "fork12" #0 [fffffe0000088e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000088e50] nmi_handle at ffffffff9922458e #2 [fffffe0000088eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000088ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000088ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff97286a5cfea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000100000 RDX: ffff8c50ff8f5300 RSI: 000000000000003e RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c24389e0000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff97286a5cfea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff97286a5cfea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff97286a5cfec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff97286a5cff30] do_syscall_64 at ffffffff9920432d #9 [ffff97286a5cff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1648304 TASK: ffff8c2a8f132080 CPU: 4 COMMAND: "fork12" #0 [fffffe00000b3e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00000b3e50] nmi_handle at ffffffff9922458e #2 [fffffe00000b3eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00000b3ed0] do_nmi at ffffffff99224c96 #4 [fffffe00000b3ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+272] RIP: ffffffff9931ecb0 RSP: ffff972aa49a7e60 RFLAGS: 00000002 RAX: 0000000000900001 RBX: ffffffff9a4050c0 RCX: 0000000000140000 RDX: ffff8c50ff935300 RSI: ffff8c50ffab5300 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a8f132c01 R13: ffff8c2a8f132080 R14: ffff8c2a7f6b8d28 R15: ffff8c2a8f132c88 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972aa49a7e60] native_queued_spin_lock_slowpath at ffffffff9931ecb0 #6 [ffff972aa49a7e60] queued_write_lock_slowpath at ffffffff993209e4 #7 [ffff972aa49a7e78] do_raw_write_lock at ffffffff99320834 #8 [ffff972aa49a7e88] do_exit at ffffffff992bcd78 #9 [ffff972aa49a7f00] do_group_exit at ffffffff992bd719 #10 [ffff972aa49a7f28] __x64_sys_exit_group at ffffffff992bd7a4 #11 [ffff972aa49a7f30] do_syscall_64 at ffffffff9920432d #12 [ffff972aa49a7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f41e5451586 RSP: 00007ffd186fd3f8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f41e553e530 RCX: 00007f41e5451586 RDX: 0000000000000002 RSI: 000000000000003c RDI: 0000000000000002 RBP: 0000000000000002 R8: 00000000000000e7 R9: ffffffffffffff80 R10: 0000000000000002 R11: 0000000000000246 R12: 00007f41e553e530 R13: 0000000000000001 R14: 00007f41e5541f88 R15: 00007f41e553e8c0 ORIG_RAX: 00000000000000e7 CS: 0033 SS: 002b PID: 1541624 TASK: ffff8c26b035a080 CPU: 5 COMMAND: "fork12" #0 [fffffe00000dee48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00000dee50] nmi_handle at ffffffff9922458e #2 [fffffe00000deeb0] default_do_nmi at ffffffff99224aae #3 [fffffe00000deed0] do_nmi at ffffffff99224c96 #4 [fffffe00000deef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff9729e0a97ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000180000 RDX: ffff8c50ff975300 RSI: 0000000000000009 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c26b035a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff9729e0a97ea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff9729e0a97ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff9729e0a97ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff9729e0a97f30] do_syscall_64 at ffffffff9920432d #9 [ffff9729e0a97f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1638333 TASK: ffff8c56ff250000 CPU: 6 COMMAND: "fork12" #0 [fffffe0000109e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000109e50] nmi_handle at ffffffff9922458e #2 [fffffe0000109eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000109ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000109ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a9239fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000001c0000 RDX: ffff8c50ff9b5300 RSI: 0000000000000008 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56ff250000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a9239fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a9239fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a9239fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a9239ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a9239ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1638264 TASK: ffff8c56fe828000 CPU: 7 COMMAND: "fork12" #0 [fffffe0000134e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000134e50] nmi_handle at ffffffff9922458e #2 [fffffe0000134eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000134ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000134ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a92177ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000200000 RDX: ffff8c50ff9f5300 RSI: 000000000000003f RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56fe828000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a92177ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a92177ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a92177ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a92177f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a92177f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1640135 TASK: ffff8c5704872080 CPU: 8 COMMAND: "fork12" #0 [fffffe000015fe48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000015fe50] nmi_handle at ffffffff9922458e #2 [fffffe000015feb0] default_do_nmi at ffffffff99224aae #3 [fffffe000015fed0] do_nmi at ffffffff99224c96 #4 [fffffe000015fef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a95867ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000240000 RDX: ffff8c50ffa35300 RSI: 000000000000002d RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c5704872080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a95867ea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a95867ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a95867ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a95867f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a95867f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1640684 TASK: ffff8c294eec2080 CPU: 9 COMMAND: "fork12" #0 [fffffe000018ae48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000018ae50] nmi_handle at ffffffff9922458e #2 [fffffe000018aeb0] default_do_nmi at ffffffff99224aae #3 [fffffe000018aed0] do_nmi at ffffffff99224c96 #4 [fffffe000018aef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a9687fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000280000 RDX: ffff8c50ffa75300 RSI: 0000000000000033 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c294eec2080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a9687fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a9687fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a9687fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a9687ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a9687ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1647196 TASK: ffff8c2a7eba8000 CPU: 10 COMMAND: "fork12" #0 [fffffe00001b5e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00001b5e50] nmi_handle at ffffffff9922458e #2 [fffffe00001b5eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00001b5ed0] do_nmi at ffffffff99224c96 #4 [fffffe00001b5ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972aa2797d78 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000002c0000 RDX: ffff8c50ffab5300 RSI: 0000000000000004 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff992bc149 R13: ffff8c2a7eba8000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972aa2797d78] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972aa2797d78] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972aa2797d90] do_wait at ffffffff992bc17d #8 [ffff972aa2797dd0] kernel_wait4 at ffffffff992bd88d #9 [ffff972aa2797e58] __do_sys_wait4 at ffffffff992bd9e5 #10 [ffff972aa2797f30] do_syscall_64 at ffffffff9920432d #11 [ffff972aa2797f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f41e5450f21 RSP: 00007ffd186fd438 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f41e5450f21 RDX: 0000000000000000 RSI: 00007ffd186fd44c RDI: 00000000ffffffff RBP: 0000000000000002 R8: 00007f41e55445c0 R9: 00000000ffffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003d CS: 0033 SS: 002b PID: 1397892 TASK: ffff8c25555f8000 CPU: 11 COMMAND: "fork12" #0 [fffffe00001e0e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00001e0e50] nmi_handle at ffffffff9922458e #2 [fffffe00001e0eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00001e0ed0] do_nmi at ffffffff99224c96 #4 [fffffe00001e0ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff9728dd5afea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000300000 RDX: ffff8c50ffaf5300 RSI: 0000000000000028 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c25555f8000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff9728dd5afea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff9728dd5afea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff9728dd5afec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff9728dd5aff30] do_syscall_64 at ffffffff9920432d #9 [ffff9728dd5aff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1628949 TASK: ffff8c29289f4100 CPU: 12 COMMAND: "fork12" #0 [fffffe000020be48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000020be50] nmi_handle at ffffffff9922458e #2 [fffffe000020beb0] default_do_nmi at ffffffff99224aae #3 [fffffe000020bed0] do_nmi at ffffffff99224c96 #4 [fffffe000020bef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a8125fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000340000 RDX: ffff8c50ffb35300 RSI: 0000000000000029 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c29289f4100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a8125fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a8125fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a8125fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a8125ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a8125ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1288763 TASK: ffff8c22f73c4100 CPU: 13 COMMAND: "fork12" #0 [fffffe0000236e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000236e50] nmi_handle at ffffffff9922458e #2 [fffffe0000236eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000236ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000236ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff97280dd4fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000380000 RDX: ffff8c50ffb75300 RSI: 000000000000001c RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c22f73c4100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff97280dd4fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff97280dd4fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff97280dd4fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff97280dd4ff30] do_syscall_64 at ffffffff9920432d #9 [ffff97280dd4ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1641118 TASK: ffff8c28e4b72080 CPU: 14 COMMAND: "fork12" #0 [fffffe0000261e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000261e50] nmi_handle at ffffffff9922458e #2 [fffffe0000261eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000261ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000261ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a97547ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000003c0000 RDX: ffff8c50ffbb5300 RSI: 0000000000000011 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c28e4b72080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a97547ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a97547ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a97547ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a97547f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a97547f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1635745 TASK: ffff8c56f6054100 CPU: 15 COMMAND: "fork12" #0 [fffffe000028ce48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000028ce50] nmi_handle at ffffffff9922458e #2 [fffffe000028ceb0] default_do_nmi at ffffffff99224aae #3 [fffffe000028ced0] do_nmi at ffffffff99224c96 #4 [fffffe000028cef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a8d857ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000400000 RDX: ffff8c50ffbf5300 RSI: 000000000000001b RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56f6054100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a8d857ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a8d857ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a8d857ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a8d857f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a8d857f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1593648 TASK: ffff8c5681332080 CPU: 16 COMMAND: "fork12" #0 [fffffe00002b7e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00002b7e50] nmi_handle at ffffffff9922458e #2 [fffffe00002b7eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00002b7ed0] do_nmi at ffffffff99224c96 #4 [fffffe00002b7ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a4087fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000440000 RDX: ffff8c80ff435300 RSI: 0000000000000019 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c5681332080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a4087fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a4087fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a4087fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a4087ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a4087ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1638805 TASK: ffff8c5701278000 CPU: 17 COMMAND: "fork12" #0 [fffffe00002e2e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00002e2e50] nmi_handle at ffffffff9922458e #2 [fffffe00002e2eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00002e2ed0] do_nmi at ffffffff99224c96 #4 [fffffe00002e2ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a93197ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000480000 RDX: ffff8c80ff475300 RSI: 0000000000000010 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c5701278000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a93197ea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a93197ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a93197ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a93197f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a93197f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1640639 TASK: ffff8c294ea60000 CPU: 18 COMMAND: "fork12" #0 [fffffe000030de48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000030de50] nmi_handle at ffffffff9922458e #2 [fffffe000030deb0] default_do_nmi at ffffffff99224aae #3 [fffffe000030ded0] do_nmi at ffffffff99224c96 #4 [fffffe000030def0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a9673fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000004c0000 RDX: ffff8c80ff4b5300 RSI: 000000000000001f RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c294ea60000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a9673fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a9673fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a9673fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a9673ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a9673ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1633393 TASK: ffff8c56f0fc2080 CPU: 19 COMMAND: "fork12" #0 [fffffe0000338e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000338e50] nmi_handle at ffffffff9922458e #2 [fffffe0000338eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000338ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000338ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: __lock_acquire+778] RIP: ffffffff9931c38a RSP: ffff972a893bfd30 RFLAGS: 00000097 RAX: 0000000000000000 RBX: 0000000000000046 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8c23c095b3e0 RBP: ffffffff9acf3050 R8: ffff8c23c095b3e0 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56f0fc2080 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a893bfd30] __lock_acquire at ffffffff9931c38a #6 [ffff972a893bfd70] lock_acquire at ffffffff9931d133 #7 [ffff972a893bfdd8] _raw_spin_lock_irqsave at ffffffff99ae873e #8 [ffff972a893bfdf8] __lock_task_sighand at ffffffff992c907c #9 [ffff972a893bfe30] do_send_sig_info at ffffffff992c913a #10 [ffff972a893bfe88] __kill_pgrp_info at ffffffff992c9920 #11 [ffff972a893bfec0] __x64_sys_kill at ffffffff992c9f58 #12 [ffff972a893bff30] do_syscall_64 at ffffffff9920432d #13 [ffff972a893bff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1639858 TASK: ffff8c5704fd2080 CPU: 20 COMMAND: "fork12" #0 [fffffe0000363e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000363e50] nmi_handle at ffffffff9922458e #2 [fffffe0000363eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000363ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000363ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a95007ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000540000 RDX: ffff8c80ff535300 RSI: 0000000000000022 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c5704fd2080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a95007ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a95007ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a95007ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a95007f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a95007f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1650946 TASK: ffff8c2aa1b08000 CPU: 21 COMMAND: "(kill)" #0 [fffffe000038ee48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000038ee50] nmi_handle at ffffffff9922458e #2 [fffffe000038eeb0] default_do_nmi at ffffffff99224aae #3 [fffffe000038eed0] do_nmi at ffffffff99224c96 #4 [fffffe000038eef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a8be2fed8 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000580000 RDX: ffff8c80ff575300 RSI: 000000000000003c RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2aa1b08000 R13: 0000000000193102 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a8be2fed8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a8be2fed8] queued_write_lock_slowpath at ffffffff993209e4 #7 [ffff972a8be2fef0] do_raw_write_lock at ffffffff99320834 #8 [ffff972a8be2ff00] ksys_setsid at ffffffff992d1d3a #9 [ffff972a8be2ff28] __x64_sys_setsid at ffffffff992d1e1a #10 [ffff972a8be2ff30] do_syscall_64 at ffffffff9920432d #11 [ffff972a8be2ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007ff7748244c7 RSP: 00007fff24e31768 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007fff24e31770 RCX: 00007ff7748244c7 RDX: 0000000000200560 RSI: 0000000000000000 RDI: 000000000000003b RBP: 00007fff24e31990 R8: 000055ca12c6e09d R9: 0000000000000000 R10: 00007ff7748c7aa0 R11: 0000000000000246 R12: 00007fff24e31730 R13: 000055ca12c46c40 R14: 000055ca12b6f0c8 R15: 000055ca12b6eb00 ORIG_RAX: 0000000000000070 CS: 0033 SS: 002b PID: 1641913 TASK: ffff8c29525e0000 CPU: 22 COMMAND: "fork12" #0 [fffffe00003b9e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00003b9e50] nmi_handle at ffffffff9922458e #2 [fffffe00003b9eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00003b9ed0] do_nmi at ffffffff99224c96 #4 [fffffe00003b9ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a98c67ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000005c0000 RDX: ffff8c80ff5b5300 RSI: 000000000000003a RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c29525e0000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a98c67ea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a98c67ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a98c67ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a98c67f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a98c67f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1650811 TASK: ffff8c5655532080 CPU: 23 COMMAND: "fork12" #0 [fffffe00003e4990] machine_kexec at ffffffff9925dc1e #1 [fffffe00003e49e8] __crash_kexec at ffffffff99370081 #2 [fffffe00003e4aa8] panic at ffffffff992b78eb #3 [fffffe00003e4b30] nmi_panic at ffffffff992b7475 #4 [fffffe00003e4b38] watchdog_overflow_callback at ffffffff993aa66e #5 [fffffe00003e4b48] __perf_event_overflow at ffffffff99419362 #6 [fffffe00003e4b78] handle_pmi_common at ffffffff9920e94d #7 [fffffe00003e4df8] intel_pmu_handle_irq at ffffffff9920ebdf #8 [fffffe00003e4e38] perf_event_nmi_handler at ffffffff992065dd #9 [fffffe00003e4e50] nmi_handle at ffffffff9922458e #10 [fffffe00003e4eb0] default_do_nmi at ffffffff99224aae #11 [fffffe00003e4ed0] do_nmi at ffffffff99224c96 #12 [fffffe00003e4ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972917cf7e60 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000600000 RDX: ffff8c80ff5f5300 RSI: 000000000000000a RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c5655532c01 R13: ffff8c5655532080 R14: ffff8c554af69368 R15: ffff8c5655532c88 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #13 [ffff972917cf7e60] native_queued_spin_lock_slowpath at ffffffff9931ed47 #14 [ffff972917cf7e60] queued_write_lock_slowpath at ffffffff993209e4 #15 [ffff972917cf7e78] do_raw_write_lock at ffffffff99320834 #16 [ffff972917cf7e88] do_exit at ffffffff992bcd78 #17 [ffff972917cf7f00] do_group_exit at ffffffff992bd719 #18 [ffff972917cf7f28] __x64_sys_exit_group at ffffffff992bd7a4 #19 [ffff972917cf7f30] do_syscall_64 at ffffffff9920432d #20 [ffff972917cf7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f41e5451586 RSP: 00007ffd186fd3f8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f41e553e530 RCX: 00007f41e5451586 RDX: 0000000000000002 RSI: 000000000000003c RDI: 0000000000000002 RBP: 0000000000000002 R8: 00000000000000e7 R9: ffffffffffffff80 R10: 0000000000000002 R11: 0000000000000246 R12: 00007f41e553e530 R13: 0000000000000001 R14: 00007f41e5541f88 R15: 00007f41e553e8c0 ORIG_RAX: 00000000000000e7 CS: 0033 SS: 002b PID: 1614551 TASK: ffff8c28f1dca080 CPU: 24 COMMAND: "fork12" #0 [fffffe000040fe48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000040fe50] nmi_handle at ffffffff9922458e #2 [fffffe000040feb0] default_do_nmi at ffffffff99224aae #3 [fffffe000040fed0] do_nmi at ffffffff99224c96 #4 [fffffe000040fef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a66ddfea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000640000 RDX: ffff8c80ff635300 RSI: 0000000000000001 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c28f1dca080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a66ddfea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a66ddfea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a66ddfec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a66ddff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a66ddff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1633628 TASK: ffff8c56f0e22080 CPU: 25 COMMAND: "fork12" #0 [fffffe000043ae48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000043ae50] nmi_handle at ffffffff9922458e #2 [fffffe000043aeb0] default_do_nmi at ffffffff99224aae #3 [fffffe000043aed0] do_nmi at ffffffff99224c96 #4 [fffffe000043aef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a89abfea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000680000 RDX: ffff8c80ff675300 RSI: 0000000000000020 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56f0e22080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a89abfea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a89abfea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a89abfec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a89abff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a89abff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1641865 TASK: ffff8c29522ca080 CPU: 26 COMMAND: "fork12" #0 [fffffe0000465e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000465e50] nmi_handle at ffffffff9922458e #2 [fffffe0000465eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000465ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000465ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a98af7ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000006c0000 RDX: ffff8c80ff6b5300 RSI: 0000000000000016 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c29522ca080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a98af7ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a98af7ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a98af7ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a98af7f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a98af7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1634511 TASK: ffff8c293a224100 CPU: 27 COMMAND: "fork12" #0 [fffffe0000490e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000490e50] nmi_handle at ffffffff9922458e #2 [fffffe0000490eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000490ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000490ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a8b4dfea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000700000 RDX: ffff8c80ff6f5300 RSI: 0000000000000012 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c293a224100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a8b4dfea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a8b4dfea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a8b4dfec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a8b4dff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a8b4dff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1634517 TASK: ffff8c2939b54100 CPU: 28 COMMAND: "fork12" #0 [fffffe00004bbe48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00004bbe50] nmi_handle at ffffffff9922458e #2 [fffffe00004bbeb0] default_do_nmi at ffffffff99224aae #3 [fffffe00004bbed0] do_nmi at ffffffff99224c96 #4 [fffffe00004bbef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a8b50fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000740000 RDX: ffff8c80ff735300 RSI: 000000000000000f RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c2939b54100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a8b50fea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a8b50fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a8b50fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a8b50ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a8b50ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1640836 TASK: ffff8c5707c1a080 CPU: 29 COMMAND: "fork12" #0 [fffffe00004e6e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00004e6e50] nmi_handle at ffffffff9922458e #2 [fffffe00004e6eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00004e6ed0] do_nmi at ffffffff99224c96 #4 [fffffe00004e6ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a96ce7ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000780000 RDX: ffff8c80ff775300 RSI: 0000000000000032 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c5707c1a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a96ce7ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a96ce7ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a96ce7ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a96ce7f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a96ce7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1908 TASK: ffff8c51da6e2080 CPU: 30 COMMAND: "monitor-inode" #0 [fffffe0000511e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000511e50] nmi_handle at ffffffff9922458e #2 [fffffe0000511eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000511ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000511ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff9727e0babd78 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000007c0000 RDX: ffff8c80ff7b5300 RSI: 000000000000000e RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff992bc149 R13: ffff8c51da6e2080 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff9727e0babd78] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff9727e0babd78] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff9727e0babd90] do_wait at ffffffff992bc17d #8 [ffff9727e0babdd0] kernel_wait4 at ffffffff992bd88d #9 [ffff9727e0babe58] __do_sys_wait4 at ffffffff992bd9e5 #10 [ffff9727e0babf30] do_syscall_64 at ffffffff9920432d #11 [ffff9727e0babf50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007fd7f09bdcba RSP: 00007fd7ed7e4640 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00000000001930f5 RCX: 00007fd7f09bdcba RDX: 0000000000000001 RSI: 00007fd7ed7e46a8 RDI: 00000000001930f5 RBP: 00007fd7ed7e46a8 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00000000000003ff R14: 00007fd7ed7e4aa0 R15: 00007fd7ed7e49cc ORIG_RAX: 000000000000003d CS: 0033 SS: 002b PID: 1636883 TASK: ffff8c2942450000 CPU: 31 COMMAND: "fork12" #0 [fffffe000053ce48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000053ce50] nmi_handle at ffffffff9922458e #2 [fffffe000053ceb0] default_do_nmi at ffffffff99224aae #3 [fffffe000053ced0] do_nmi at ffffffff99224c96 #4 [fffffe000053cef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a8f997ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000800000 RDX: ffff8c80ff7f5300 RSI: 000000000000001d RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c2942450000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a8f997ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a8f997ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a8f997ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a8f997f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a8f997f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1314596 TASK: ffff8c23a435a080 CPU: 32 COMMAND: "fork12" #0 [fffffe0000567e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000567e50] nmi_handle at ffffffff9922458e #2 [fffffe0000567eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000567ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000567ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+421] RIP: ffffffff9931ed45 RSP: ffff97283f80fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000840000 RDX: ffff8c50ffc35300 RSI: 0000000000000025 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c23a435a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff97283f80fea8] native_queued_spin_lock_slowpath at ffffffff9931ed45 #6 [ffff97283f80fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff97283f80fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff97283f80ff30] do_syscall_64 at ffffffff9920432d #9 [ffff97283f80ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1312414 TASK: ffff8c544ec44100 CPU: 33 COMMAND: "fork12" #0 [fffffe0000592e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000592e50] nmi_handle at ffffffff9922458e #2 [fffffe0000592eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000592ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000592ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: auditd_test_task+88] RIP: ffffffff9938e168 RSP: ffff97283b577db0 RFLAGS: 00000206 RAX: ffff8c51daa30200 RBX: 0000000000000000 RCX: 3b30226f00000000 RDX: ffff8c54260f8580 RSI: 000000000d9f6cdb RDI: 0000000000000246 RBP: ffff97283b577dc0 R8: 000000002aa0eea9 R9: 0000000000000000 R10: ffff8c544ec44de0 R11: 0000000000000000 R12: ffff8c53eadfa400 R13: 0000000000000000 R14: ffff8c5445eac100 R15: ffff8c544ec44100 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff97283b577db0] auditd_test_task at ffffffff9938e168 #6 [ffff97283b577dc8] audit_signal_info at ffffffff99397752 #7 [ffff97283b577e20] check_kill_permission at ffffffff992c669d #8 [ffff97283b577e48] group_send_sig_info at ffffffff992c9811 #9 [ffff97283b577e88] __kill_pgrp_info at ffffffff992c9920 #10 [ffff97283b577ec0] __x64_sys_kill at ffffffff992c9f58 #11 [ffff97283b577f30] do_syscall_64 at ffffffff9920432d #12 [ffff97283b577f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1391758 TASK: ffff8c547c7e4100 CPU: 34 COMMAND: "fork12" #0 [fffffe00005bde48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00005bde50] nmi_handle at ffffffff9922458e #2 [fffffe00005bdeb0] default_do_nmi at ffffffff99224aae #3 [fffffe00005bded0] do_nmi at ffffffff99224c96 #4 [fffffe00005bdef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff9728d191fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000008c0000 RDX: ffff8c50ffcb5300 RSI: 000000000000000d RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c547c7e4100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff9728d191fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff9728d191fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff9728d191fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff9728d191ff30] do_syscall_64 at ffffffff9920432d #9 [ffff9728d191ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1641220 TASK: ffff8c2950664100 CPU: 35 COMMAND: "fork12" #0 [fffffe00005e8e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00005e8e50] nmi_handle at ffffffff9922458e #2 [fffffe00005e8eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00005e8ed0] do_nmi at ffffffff99224c96 #4 [fffffe00005e8ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a97877ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000900000 RDX: ffff8c50ffcf5300 RSI: 0000000000000006 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c2950664100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a97877ea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a97877ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a97877ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a97877f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a97877f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1642039 TASK: ffff8c2924edc100 CPU: 36 COMMAND: "fork12" #0 [fffffe0000613e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000613e50] nmi_handle at ffffffff9922458e #2 [fffffe0000613eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000613ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000613ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+421] RIP: ffffffff9931ed45 RSP: ffff972a98fe7ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000940000 RDX: ffff8c50ffd35300 RSI: 0000000000000015 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c2924edc100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a98fe7ea8] native_queued_spin_lock_slowpath at ffffffff9931ed45 #6 [ffff972a98fe7ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a98fe7ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a98fe7f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a98fe7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1573939 TASK: ffff8c563da5c100 CPU: 37 COMMAND: "fork12" #0 [fffffe000063ee48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000063ee50] nmi_handle at ffffffff9922458e #2 [fffffe000063eeb0] default_do_nmi at ffffffff99224aae #3 [fffffe000063eed0] do_nmi at ffffffff99224c96 #4 [fffffe000063eef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a1c3e7ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000980000 RDX: ffff8c50ffd75300 RSI: 0000000000000031 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c563da5c100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a1c3e7ea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a1c3e7ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a1c3e7ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a1c3e7f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a1c3e7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1640366 TASK: ffff8c56fa618000 CPU: 38 COMMAND: "fork12" #0 [fffffe0000669e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000669e50] nmi_handle at ffffffff9922458e #2 [fffffe0000669eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000669ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000669ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a95f5fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 00000000009c0000 RDX: ffff8c50ffdb5300 RSI: 000000000000002b RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56fa618000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a95f5fea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a95f5fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a95f5fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a95f5ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a95f5ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1648408 TASK: ffff8c2a96e68000 CPU: 39 COMMAND: "fork12" #0 [fffffe0000694e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000694e50] nmi_handle at ffffffff9922458e #2 [fffffe0000694eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000694ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000694ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972aa4cafd78 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000a00000 RDX: ffff8c50ffdf5300 RSI: 0000000000000017 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff992bc149 R13: ffff8c2a96e68000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972aa4cafd78] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972aa4cafd78] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972aa4cafd90] do_wait at ffffffff992bc17d #8 [ffff972aa4cafdd0] kernel_wait4 at ffffffff992bd88d #9 [ffff972aa4cafe58] __do_sys_wait4 at ffffffff992bd9e5 #10 [ffff972aa4caff30] do_syscall_64 at ffffffff9920432d #11 [ffff972aa4caff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f41e5450f21 RSP: 00007ffd186fd438 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f41e5450f21 RDX: 0000000000000000 RSI: 00007ffd186fd44c RDI: 00000000ffffffff RBP: 0000000000000002 R8: 00007f41e55445c0 R9: 00000000ffffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003d CS: 0033 SS: 002b PID: 1640361 TASK: ffff8c570674a080 CPU: 40 COMMAND: "fork12" #0 [fffffe00006bfe48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00006bfe50] nmi_handle at ffffffff9922458e #2 [fffffe00006bfeb0] default_do_nmi at ffffffff99224aae #3 [fffffe00006bfed0] do_nmi at ffffffff99224c96 #4 [fffffe00006bfef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a95f37ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000a40000 RDX: ffff8c50ffe35300 RSI: 0000000000000003 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c570674a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a95f37ea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a95f37ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a95f37ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a95f37f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a95f37f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1640945 TASK: ffff8c5708590000 CPU: 41 COMMAND: "fork12" #0 [fffffe00006eae48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00006eae50] nmi_handle at ffffffff9922458e #2 [fffffe00006eaeb0] default_do_nmi at ffffffff99224aae #3 [fffffe00006eaed0] do_nmi at ffffffff99224c96 #4 [fffffe00006eaef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a97037ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000a80000 RDX: ffff8c50ffe75300 RSI: 0000000000000039 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c5708590000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a97037ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a97037ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a97037ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a97037f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a97037f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1648338 TASK: ffff8c2a8a930000 CPU: 42 COMMAND: "fork12" #0 [fffffe0000715e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000715e50] nmi_handle at ffffffff9922458e #2 [fffffe0000715eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000715ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000715ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: queued_write_lock_slowpath+70] RIP: ffffffff993209b6 RSP: ffff972aa4a8fe68 RFLAGS: 00000006 RAX: 0000000000000400 RBX: ffffffff9a4050c0 RCX: 00000000000000ff RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000007c0000 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a8a930c01 R13: ffff8c2a8a930000 R14: ffff8c2a90679368 R15: ffff8c2a8a930c08 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972aa4a8fe68] queued_write_lock_slowpath at ffffffff993209b6 #6 [ffff972aa4a8fe78] do_raw_write_lock at ffffffff99320834 #7 [ffff972aa4a8fe88] do_exit at ffffffff992bcd78 #8 [ffff972aa4a8ff00] do_group_exit at ffffffff992bd719 #9 [ffff972aa4a8ff28] __x64_sys_exit_group at ffffffff992bd7a4 #10 [ffff972aa4a8ff30] do_syscall_64 at ffffffff9920432d #11 [ffff972aa4a8ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f41e5451586 RSP: 00007ffd186fd3f8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f41e553e530 RCX: 00007f41e5451586 RDX: 0000000000000002 RSI: 000000000000003c RDI: 0000000000000002 RBP: 0000000000000002 R8: 00000000000000e7 R9: ffffffffffffff80 R10: 0000000000000002 R11: 0000000000000246 R12: 00007f41e553e530 R13: 0000000000000001 R14: 00007f41e5541f88 R15: 00007f41e553e8c0 ORIG_RAX: 00000000000000e7 CS: 0033 SS: 002b PID: 1432746 TASK: ffff8c260b188000 CPU: 43 COMMAND: "fork12" #0 [fffffe0000740e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000740e50] nmi_handle at ffffffff9922458e #2 [fffffe0000740eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000740ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000740ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972916257ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000b00000 RDX: ffff8c50ffef5300 RSI: 0000000000000002 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c260b188000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972916257ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972916257ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972916257ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972916257f30] do_syscall_64 at ffffffff9920432d #9 [ffff972916257f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1646272 TASK: ffff8c2a7fc72080 CPU: 44 COMMAND: "fork12" #0 [fffffe000076be48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000076be50] nmi_handle at ffffffff9922458e #2 [fffffe000076beb0] default_do_nmi at ffffffff99224aae #3 [fffffe000076bed0] do_nmi at ffffffff99224c96 #4 [fffffe000076bef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972aa0b1fd78 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000b40000 RDX: ffff8c50fff35300 RSI: 0000000000000027 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff992bc149 R13: ffff8c2a7fc72080 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972aa0b1fd78] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972aa0b1fd78] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972aa0b1fd90] do_wait at ffffffff992bc17d #8 [ffff972aa0b1fdd0] kernel_wait4 at ffffffff992bd88d #9 [ffff972aa0b1fe58] __do_sys_wait4 at ffffffff992bd9e5 #10 [ffff972aa0b1ff30] do_syscall_64 at ffffffff9920432d #11 [ffff972aa0b1ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f41e5450f21 RSP: 00007ffd186fd438 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f41e5450f21 RDX: 0000000000000000 RSI: 00007ffd186fd44c RDI: 00000000ffffffff RBP: 0000000000000002 R8: 00007f41e55445c0 R9: 00000000ffffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003d CS: 0033 SS: 002b PID: 1272894 TASK: ffff8c2291218000 CPU: 45 COMMAND: "fork12" #0 [fffffe0000796e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000796e50] nmi_handle at ffffffff9922458e #2 [fffffe0000796eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000796ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000796ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff9727eeadfea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000b80000 RDX: ffff8c50fff75300 RSI: 0000000000000037 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c2291218000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff9727eeadfea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff9727eeadfea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff9727eeadfec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff9727eeadff30] do_syscall_64 at ffffffff9920432d #9 [ffff9727eeadff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1419497 TASK: ffff8c25c292a080 CPU: 46 COMMAND: "fork12" #0 [fffffe00007c1e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00007c1e50] nmi_handle at ffffffff9922458e #2 [fffffe00007c1eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00007c1ed0] do_nmi at ffffffff99224c96 #4 [fffffe00007c1ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff9728fd457ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000bc0000 RDX: ffff8c50fffb5300 RSI: 0000000000000000 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c25c292a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff9728fd457ea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff9728fd457ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff9728fd457ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff9728fd457f30] do_syscall_64 at ffffffff9920432d #9 [ffff9728fd457f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1641649 TASK: ffff8c570a02a080 CPU: 47 COMMAND: "fork12" #0 [fffffe00007ece48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00007ece50] nmi_handle at ffffffff9922458e #2 [fffffe00007eceb0] default_do_nmi at ffffffff99224aae #3 [fffffe00007eced0] do_nmi at ffffffff99224c96 #4 [fffffe00007ecef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a9848fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000c00000 RDX: ffff8c50ffff5300 RSI: 0000000000000034 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c570a02a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a9848fea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a9848fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a9848fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a9848ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a9848ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1631262 TASK: ffff8c56eac10000 CPU: 48 COMMAND: "fork12" #0 [fffffe0000817e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000817e50] nmi_handle at ffffffff9922458e #2 [fffffe0000817eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000817ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000817ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: _raw_spin_unlock+11] RIP: ffffffff99ae81db RSP: ffff8c80ff803e78 RFLAGS: 00000096 RAX: ffff8c56eac10000 RBX: ffff8c80ff834440 RCX: 0000000000000000 RDX: ffffffff992ee527 RSI: ffff8c80ff834458 RDI: ffff8c80ff834440 RBP: ffff8c80ff803eb8 R8: 0000000000000005 R9: 0000000000000020 R10: 0000000000000004 R11: 0000000000000416 R12: 0000000000034440 R13: 0000000000000030 R14: 00000000000037a8 R15: ffff8c80ff834458 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff8c80ff803e78] _raw_spin_unlock at ffffffff99ae81db #6 [ffff8c80ff803e80] scheduler_tick at ffffffff992ee527 #7 [ffff8c80ff803ec0] update_process_times at ffffffff9934d4a0 #8 [ffff8c80ff803ed8] tick_sched_handle at ffffffff9935f501 #9 [ffff8c80ff803ef0] tick_sched_timer at ffffffff9935f797 #10 [ffff8c80ff803f10] __hrtimer_run_queues at ffffffff9934e1f9 #11 [ffff8c80ff803f80] hrtimer_interrupt at ffffffff9934eab5 #12 [ffff8c80ff803fd8] smp_apic_timer_interrupt at ffffffff99c02b52 #13 [ffff8c80ff803ff0] apic_timer_interrupt at ffffffff99c01d0f --- <IRQ stack> --- #14 [ffff972a8555fdf8] apic_timer_interrupt at ffffffff99c01d0f [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a8555fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000c40000 RDX: ffff8c80ff835300 RSI: 000000000000003b RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56eac10000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffff13 CS: 0010 SS: 0018 #15 [ffff972a8555fea8] queued_read_lock_slowpath at ffffffff99320a58 #16 [ffff972a8555fec0] __x64_sys_kill at ffffffff992c9f2d #17 [ffff972a8555ff30] do_syscall_64 at ffffffff9920432d #18 [ffff972a8555ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1639368 TASK: ffff8c5702fd0000 CPU: 49 COMMAND: "fork12" #0 [fffffe0000842e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000842e50] nmi_handle at ffffffff9922458e #2 [fffffe0000842eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000842ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000842ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a941c7ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000c80000 RDX: ffff8c80ff875300 RSI: 000000000000003d RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c5702fd0000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a941c7ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a941c7ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a941c7ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a941c7f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a941c7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1638378 TASK: ffff8c56feec2080 CPU: 50 COMMAND: "fork12" #0 [fffffe000086de48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000086de50] nmi_handle at ffffffff9922458e #2 [fffffe000086deb0] default_do_nmi at ffffffff99224aae #3 [fffffe000086ded0] do_nmi at ffffffff99224c96 #4 [fffffe000086def0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a92507ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000cc0000 RDX: ffff8c80ff8b5300 RSI: 0000000000000018 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56feec2080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a92507ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a92507ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a92507ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a92507f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a92507f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1637496 TASK: ffff8c56fb04a080 CPU: 51 COMMAND: "fork12" #0 [fffffe0000898e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000898e50] nmi_handle at ffffffff9922458e #2 [fffffe0000898eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000898ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000898ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a90b47ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000d00000 RDX: ffff8c80ff8f5300 RSI: 000000000000000c RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56fb04a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a90b47ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a90b47ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a90b47ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a90b47f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a90b47f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1587344 TASK: ffff8c566a6fa080 CPU: 52 COMMAND: "fork12" #0 [fffffe00008c3e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00008c3e50] nmi_handle at ffffffff9922458e #2 [fffffe00008c3eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00008c3ed0] do_nmi at ffffffff99224c96 #4 [fffffe00008c3ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a34ee7ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000d40000 RDX: ffff8c80ff935300 RSI: 0000000000000038 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c566a6fa080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a34ee7ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a34ee7ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a34ee7ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a34ee7f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a34ee7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1610607 TASK: ffff8c56b56d8000 CPU: 53 COMMAND: "fork12" #0 [fffffe00008eee48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00008eee50] nmi_handle at ffffffff9922458e #2 [fffffe00008eeeb0] default_do_nmi at ffffffff99224aae #3 [fffffe00008eeed0] do_nmi at ffffffff99224c96 #4 [fffffe00008eeef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a5fb8fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000d80000 RDX: ffff8c80ff975300 RSI: 000000000000001a RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56b56d8000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a5fb8fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a5fb8fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a5fb8fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a5fb8ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a5fb8ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1639756 TASK: ffff8c294aa8a080 CPU: 54 COMMAND: "fork12" #0 [fffffe0000919e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000919e50] nmi_handle at ffffffff9922458e #2 [fffffe0000919eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000919ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000919ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a94cf7ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000dc0000 RDX: ffff8c80ff9b5300 RSI: 0000000000000026 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c294aa8a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a94cf7ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a94cf7ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a94cf7ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a94cf7f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a94cf7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1554758 TASK: ffff8c281321a080 CPU: 55 COMMAND: "fork12" #0 [fffffe0000944e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000944e50] nmi_handle at ffffffff9922458e #2 [fffffe0000944eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000944ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000944ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff9729f8e9fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000e00000 RDX: ffff8c80ff9f5300 RSI: 0000000000000035 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c281321a080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff9729f8e9fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff9729f8e9fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff9729f8e9fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff9729f8e9ff30] do_syscall_64 at ffffffff9920432d #9 [ffff9729f8e9ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1639294 TASK: ffff8c5701784100 CPU: 56 COMMAND: "fork12" #0 [fffffe000096fe48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000096fe50] nmi_handle at ffffffff9922458e #2 [fffffe000096feb0] default_do_nmi at ffffffff99224aae #3 [fffffe000096fed0] do_nmi at ffffffff99224c96 #4 [fffffe000096fef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a93f77ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000e40000 RDX: ffff8c80ffa35300 RSI: 000000000000000b RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c5701784100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a93f77ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a93f77ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a93f77ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a93f77f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a93f77f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1588653 TASK: ffff8c566f5a2080 CPU: 57 COMMAND: "fork12" #0 [fffffe000099ae48] crash_nmi_callback at ffffffff99250113 #1 [fffffe000099ae50] nmi_handle at ffffffff9922458e #2 [fffffe000099aeb0] default_do_nmi at ffffffff99224aae #3 [fffffe000099aed0] do_nmi at ffffffff99224c96 #4 [fffffe000099aef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a375cfea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000e80000 RDX: ffff8c80ffa75300 RSI: 0000000000000030 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c566f5a2080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a375cfea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a375cfea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a375cfec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a375cff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a375cff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1615175 TASK: ffff8c56c3bf2080 CPU: 58 COMMAND: "fork12" #0 [fffffe00009c5e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00009c5e50] nmi_handle at ffffffff9922458e #2 [fffffe00009c5eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00009c5ed0] do_nmi at ffffffff99224c96 #4 [fffffe00009c5ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a6802fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000ec0000 RDX: ffff8c80ffab5300 RSI: 000000000000001e RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56c3bf2080 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a6802fea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a6802fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a6802fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a6802ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a6802ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1638586 TASK: ffff8c56ffea8000 CPU: 59 COMMAND: "fork12" #0 [fffffe00009f0e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe00009f0e50] nmi_handle at ffffffff9922458e #2 [fffffe00009f0eb0] default_do_nmi at ffffffff99224aae #3 [fffffe00009f0ed0] do_nmi at ffffffff99224c96 #4 [fffffe00009f0ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a92b2fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000f00000 RDX: ffff8c80ffaf5300 RSI: 0000000000000014 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c56ffea8000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a92b2fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a92b2fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a92b2fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a92b2ff30] do_syscall_64 at ffffffff9920432d #9 [ffff972a92b2ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1642095 TASK: ffff8c295407c100 CPU: 60 COMMAND: "fork12" #0 [fffffe0000a1be48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000a1be50] nmi_handle at ffffffff9922458e #2 [fffffe0000a1beb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000a1bed0] do_nmi at ffffffff99224c96 #4 [fffffe0000a1bef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff972a991a7ea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000f40000 RDX: ffff8c80ffb35300 RSI: 000000000000002c RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c295407c100 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a991a7ea8] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff972a991a7ea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a991a7ec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff972a991a7f30] do_syscall_64 at ffffffff9920432d #9 [ffff972a991a7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebe6f8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000202 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1449369 TASK: ffff8c26564a8000 CPU: 61 COMMAND: "fork12" #0 [fffffe0000a46e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000a46e50] nmi_handle at ffffffff9922458e #2 [fffffe0000a46eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000a46ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000a46ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff97293542fea8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000f80000 RDX: ffff8c80ffb75300 RSI: 000000000000002f RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c26564a8000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff97293542fea8] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff97293542fea8] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff97293542fec0] __x64_sys_kill at ffffffff992c9f2d #8 [ffff97293542ff30] do_syscall_64 at ffffffff9920432d #9 [ffff97293542ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f85c30e1a77 RSP: 00007fffeaebbcb8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f85c30e1a77 RDX: 00007f85c325c400 RSI: 0000000000000003 RDI: 0000000000000000 RBP: 0000000000000002 R8: 00007f85c32615c0 R9: 00000000ffffffff R10: 0000000000000248 R11: 0000000000000206 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003e CS: 0033 SS: 002b PID: 1907 TASK: ffff8c51da6e4100 CPU: 62 COMMAND: "monitor-disk" #0 [fffffe0000a71e48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000a71e50] nmi_handle at ffffffff9922458e #2 [fffffe0000a71eb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000a71ed0] do_nmi at ffffffff99224c96 #4 [fffffe0000a71ef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+426] RIP: ffffffff9931ed4a RSP: ffff9727e0b8fd78 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000000fc0000 RDX: ffff8c80ffbb5300 RSI: 0000000000000036 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff992bc149 R13: ffff8c51da6e4100 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff9727e0b8fd78] native_queued_spin_lock_slowpath at ffffffff9931ed4a #6 [ffff9727e0b8fd78] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff9727e0b8fd90] do_wait at ffffffff992bc17d #8 [ffff9727e0b8fdd0] kernel_wait4 at ffffffff992bd88d #9 [ffff9727e0b8fe58] __do_sys_wait4 at ffffffff992bd9e5 #10 [ffff9727e0b8ff30] do_syscall_64 at ffffffff9920432d #11 [ffff9727e0b8ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007fd7f09bdcba RSP: 00007fd7edfe5640 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00000000001930fc RCX: 00007fd7f09bdcba RDX: 0000000000000001 RSI: 00007fd7edfe56a8 RDI: 00000000001930fc RBP: 00007fd7edfe56a8 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00000000000003ff R14: 00007fd7edfe5aa0 R15: 00007fd7edfe59cc ORIG_RAX: 000000000000003d CS: 0033 SS: 002b PID: 1643282 TASK: ffff8c2a2fbf8000 CPU: 63 COMMAND: "fork12" #0 [fffffe0000a9ce48] crash_nmi_callback at ffffffff99250113 #1 [fffffe0000a9ce50] nmi_handle at ffffffff9922458e #2 [fffffe0000a9ceb0] default_do_nmi at ffffffff99224aae #3 [fffffe0000a9ced0] do_nmi at ffffffff99224c96 #4 [fffffe0000a9cef0] end_repeat_nmi at ffffffff99c016d4 [exception RIP: native_queued_spin_lock_slowpath+423] RIP: ffffffff9931ed47 RSP: ffff972a9b16fd78 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff9a4050c0 RCX: 0000000001000000 RDX: ffff8c80ffbf5300 RSI: 0000000000000024 RDI: ffffffff9a4050c4 RBP: ffffffff9a4050c4 R8: 00000000ecfc3c81 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff992bc149 R13: ffff8c2a2fbf8000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff972a9b16fd78] native_queued_spin_lock_slowpath at ffffffff9931ed47 #6 [ffff972a9b16fd78] queued_read_lock_slowpath at ffffffff99320a58 #7 [ffff972a9b16fd90] do_wait at ffffffff992bc17d #8 [ffff972a9b16fdd0] kernel_wait4 at ffffffff992bd88d #9 [ffff972a9b16fe58] __do_sys_wait4 at ffffffff992bd9e5 #10 [ffff972a9b16ff30] do_syscall_64 at ffffffff9920432d #11 [ffff972a9b16ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad RIP: 00007f41e5450f21 RSP: 00007ffd186fd438 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f41e5450f21 RDX: 0000000000000000 RSI: 00007ffd186fd44c RDI: 00000000ffffffff RBP: 0000000000000002 R8: 00007f41e55445c0 R9: 00000000ffffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403e30 R13: 00000000004180cd R14: 0000000000000087 R15: 000000000041a3da ORIG_RAX: 000000000000003d CS: 0033 SS: 002b ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-03-29 8:07 Question about kill a process group Zhang Qiao @ 2022-04-02 2:22 ` Zhang Qiao 2022-04-13 1:56 ` Zhang Qiao 0 siblings, 1 reply; 12+ messages in thread From: Zhang Qiao @ 2022-04-02 2:22 UTC (permalink / raw) To: lkml; +Cc: ebiederm ping... Any suggestions for this problem? thank! Qiao 在 2022/3/29 16:07, Zhang Qiao 写道: > hello everyone, > > I got a hradlockup panic when run the ltp syscall testcases. > > 348439.713178] NMI watchdog: Watchdog detected hard LOCKUP on cpu 32 > [348439.713236] irq event stamp: 0 > [348439.713237] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [348439.713238] hardirqs last disabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 > [348439.713239] softirqs last enabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 > [348439.713240] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [348439.713241] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 > [348439.713242] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 > [348439.713243] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 > [348439.713245] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 > [348439.713246] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff > [348439.713247] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 > [348439.713248] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 > [348439.713249] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 > [348439.713250] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 > [348439.713251] FS: 0000000000000000(0000) GS:ffff96edbc200000(0000) knlGS:0000000000000000 > [348439.713252] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [348439.713253] CR2: 0000000000416ea0 CR3: 0000002d91812004 CR4: 00000000001706e0 > [348439.713254] Call Trace: > [348439.713255] do_raw_write_lock+0xa9/0xb0 > [348439.713256] _raw_write_lock_irq+0x5a/0x70 > [348439.713256] do_exit+0x429/0xd00 > [348439.713257] do_group_exit+0x39/0xb0 > [348439.713258] __x64_sys_exit_group+0x14/0x20 > [348439.713259] do_syscall_64+0x33/0x40 > [348439.713260] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [348439.713260] RIP: 0033:0x7f59295a7066 > [348439.713261] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. > [348439.713262] RSP: 002b:00007fff0afeac38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > [348439.713264] RAX: ffffffffffffffda RBX: 00007f5929694530 RCX: 00007f59295a7066 > [348439.713265] RDX: 0000000000000002 RSI: 000000000000003c RDI: 0000000000000002 > [348439.713266] RBP: 0000000000000002 R08: 00000000000000e7 R09: ffffffffffffff80 > [348439.713267] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f5929694530 > [348439.713268] R13: 0000000000000001 R14: 00007f5929697f68 R15: 0000000000000000 > [348439.713269] Kernel panic - not syncing: Hard LOCKUP > [348439.713270] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 > [348439.713272] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 > [348439.713272] Call Trace: > [348439.713273] <NMI> > [348439.713274] dump_stack+0x77/0x97 > [348439.713275] panic+0x10c/0x2fb > [348439.713275] nmi_panic+0x35/0x40 > [348439.713276] watchdog_hardlockup_check+0xeb/0x110 > [348439.713277] __perf_event_overflow+0x52/0xf0 > [348439.713278] handle_pmi_common+0x21a/0x320 > [348439.713286] intel_pmu_handle_irq+0xc9/0x1b0 > [348439.713287] perf_event_nmi_handler+0x24/0x40 > [348439.713288] nmi_handle+0xc3/0x2a0 > [348439.713289] default_do_nmi+0x49/0xf0 > [348439.713289] exc_nmi+0x146/0x160 > [348439.713290] end_repeat_nmi+0x16/0x55 > [348439.713291] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 > [348439.713293] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 > [348439.713295] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff > [348439.713296] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 > [348439.713296] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 > [348439.713297] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 > [348439.713298] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 > [348439.713300] </NMI> > [348439.713301] do_raw_write_lock+0xa9/0xb0 > [348439.713302] _raw_write_lock_irq+0x5a/0x70 > [348439.713303] do_exit+0x429/0xd00 > [348439.713303] do_group_exit+0x39/0xb0 > [348439.713304] __x64_sys_exit_group+0x14/0x20 > [348439.713305] do_syscall_64+0x33/0x40 > [348439.713306] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [348439.713307] RIP: 0033:0x7f59295a7066 > [348439.713308] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. > > > when analyzing vmcore, i notice lots of fork12 processes are waiting for tasklist read lock or write > lock (see the attachment file all_cpu_stacks.log),and every fork12 process(belongs to the same > process group) call kill(0, SIGQUIT) in their signal handler()[1], it will traverse all the processes in the > same process group and send signal to them one by one, which is a very time-costly work and hold tasklist > read lock long time. At the same time, other processes will exit after receive signal, they try to get > the tasklist write lock at exit_notify(). > > [1] fork12 testcase: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/fork/fork12.c > > some processes call kill(0, SIGQUIT), wait for tasklist read lock: > > #5 [ffff972a9b16fd78] native_queued_spin_lock_slowpath at ffffffff9931ed47 > #6 [ffff972a9b16fd78] queued_read_lock_slowpath at ffffffff99320a58 > #7 [ffff972a9b16fd90] do_wait at ffffffff992bc17d > #8 [ffff972a9b16fdd0] kernel_wait4 at ffffffff992bd88d > #9 [ffff972a9b16fe58] __do_sys_wait4 at ffffffff992bd9e5 > #10 [ffff972a9b16ff30] do_syscall_64 at ffffffff9920432d > #11 [ffff972a9b16ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad > > As the same time, some processes are exiting, wait for tasklist write lock: > > #5 [ffff972aa49a7e60] native_queued_spin_lock_slowpath at ffffffff9931ecb0 > #6 [ffff972aa49a7e60] queued_write_lock_slowpath at ffffffff993209e4 > #7 [ffff972aa49a7e78] do_raw_write_lock at ffffffff99320834 > #8 [ffff972aa49a7e88] do_exit at ffffffff992bcd78 > #9 [ffff972aa49a7f00] do_group_exit at ffffffff992bd719 > #10 [ffff972aa49a7f28] __x64_sys_exit_group at ffffffff992bd7a4 > #11 [ffff972aa49a7f30] do_syscall_64 at ffffffff9920432d > #12 [ffff972aa49a7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad > > In this scenario,there are lots of process are waiting for tasklist read lock or the tasklist > write lock, so they will queue. if the wait queue is long enough, it might cause a hardlockup issue when a > process wait for taking the write lock at exit_notify(). > > I tried to solve this problem by avoiding traversing the process group multiple times when kill(0, xxxx) > is called multiple times form the same process group, but it doesn't look like a good solution. > > Is there any good idea for fixing this problem ? > > Thanks! > > Qiao > . > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-04-02 2:22 ` Zhang Qiao @ 2022-04-13 1:56 ` Zhang Qiao 2022-04-13 15:47 ` Eric W. Biederman 0 siblings, 1 reply; 12+ messages in thread From: Zhang Qiao @ 2022-04-13 1:56 UTC (permalink / raw) To: lkml Cc: ebiederm, keescook, tglx, Peter Zijlstra, elver, legion, oleg, brauner Gentle ping. Any comments on this problem? 在 2022/4/2 10:22, Zhang Qiao 写道: > ping... > > Any suggestions for this problem? > > thank! > Qiao > > > 在 2022/3/29 16:07, Zhang Qiao 写道: >> hello everyone, >> >> I got a hradlockup panic when run the ltp syscall testcases. >> >> 348439.713178] NMI watchdog: Watchdog detected hard LOCKUP on cpu 32 >> [348439.713236] irq event stamp: 0 >> [348439.713237] hardirqs last enabled at (0): [<0000000000000000>] 0x0 >> [348439.713238] hardirqs last disabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 >> [348439.713239] softirqs last enabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 >> [348439.713240] softirqs last disabled at (0): [<0000000000000000>] 0x0 >> [348439.713241] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 >> [348439.713242] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 >> [348439.713243] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 >> [348439.713245] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 >> [348439.713246] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff >> [348439.713247] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 >> [348439.713248] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 >> [348439.713249] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 >> [348439.713250] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 >> [348439.713251] FS: 0000000000000000(0000) GS:ffff96edbc200000(0000) knlGS:0000000000000000 >> [348439.713252] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [348439.713253] CR2: 0000000000416ea0 CR3: 0000002d91812004 CR4: 00000000001706e0 >> [348439.713254] Call Trace: >> [348439.713255] do_raw_write_lock+0xa9/0xb0 >> [348439.713256] _raw_write_lock_irq+0x5a/0x70 >> [348439.713256] do_exit+0x429/0xd00 >> [348439.713257] do_group_exit+0x39/0xb0 >> [348439.713258] __x64_sys_exit_group+0x14/0x20 >> [348439.713259] do_syscall_64+0x33/0x40 >> [348439.713260] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [348439.713260] RIP: 0033:0x7f59295a7066 >> [348439.713261] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. >> [348439.713262] RSP: 002b:00007fff0afeac38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 >> [348439.713264] RAX: ffffffffffffffda RBX: 00007f5929694530 RCX: 00007f59295a7066 >> [348439.713265] RDX: 0000000000000002 RSI: 000000000000003c RDI: 0000000000000002 >> [348439.713266] RBP: 0000000000000002 R08: 00000000000000e7 R09: ffffffffffffff80 >> [348439.713267] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f5929694530 >> [348439.713268] R13: 0000000000000001 R14: 00007f5929697f68 R15: 0000000000000000 >> [348439.713269] Kernel panic - not syncing: Hard LOCKUP >> [348439.713270] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 >> [348439.713272] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 >> [348439.713272] Call Trace: >> [348439.713273] <NMI> >> [348439.713274] dump_stack+0x77/0x97 >> [348439.713275] panic+0x10c/0x2fb >> [348439.713275] nmi_panic+0x35/0x40 >> [348439.713276] watchdog_hardlockup_check+0xeb/0x110 >> [348439.713277] __perf_event_overflow+0x52/0xf0 >> [348439.713278] handle_pmi_common+0x21a/0x320 >> [348439.713286] intel_pmu_handle_irq+0xc9/0x1b0 >> [348439.713287] perf_event_nmi_handler+0x24/0x40 >> [348439.713288] nmi_handle+0xc3/0x2a0 >> [348439.713289] default_do_nmi+0x49/0xf0 >> [348439.713289] exc_nmi+0x146/0x160 >> [348439.713290] end_repeat_nmi+0x16/0x55 >> [348439.713291] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 >> [348439.713293] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 >> [348439.713295] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff >> [348439.713296] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 >> [348439.713296] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 >> [348439.713297] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 >> [348439.713298] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 >> [348439.713300] </NMI> >> [348439.713301] do_raw_write_lock+0xa9/0xb0 >> [348439.713302] _raw_write_lock_irq+0x5a/0x70 >> [348439.713303] do_exit+0x429/0xd00 >> [348439.713303] do_group_exit+0x39/0xb0 >> [348439.713304] __x64_sys_exit_group+0x14/0x20 >> [348439.713305] do_syscall_64+0x33/0x40 >> [348439.713306] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [348439.713307] RIP: 0033:0x7f59295a7066 >> [348439.713308] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. >> >> >> when analyzing vmcore, i notice lots of fork12 processes are waiting for tasklist read lock or write >> lock (see the attachment file all_cpu_stacks.log),and every fork12 process(belongs to the same >> process group) call kill(0, SIGQUIT) in their signal handler()[1], it will traverse all the processes in the >> same process group and send signal to them one by one, which is a very time-costly work and hold tasklist >> read lock long time. At the same time, other processes will exit after receive signal, they try to get >> the tasklist write lock at exit_notify(). >> >> [1] fork12 testcase: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/fork/fork12.c >> >> some processes call kill(0, SIGQUIT), wait for tasklist read lock: >> >> #5 [ffff972a9b16fd78] native_queued_spin_lock_slowpath at ffffffff9931ed47 >> #6 [ffff972a9b16fd78] queued_read_lock_slowpath at ffffffff99320a58 >> #7 [ffff972a9b16fd90] do_wait at ffffffff992bc17d >> #8 [ffff972a9b16fdd0] kernel_wait4 at ffffffff992bd88d >> #9 [ffff972a9b16fe58] __do_sys_wait4 at ffffffff992bd9e5 >> #10 [ffff972a9b16ff30] do_syscall_64 at ffffffff9920432d >> #11 [ffff972a9b16ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad >> >> As the same time, some processes are exiting, wait for tasklist write lock: >> >> #5 [ffff972aa49a7e60] native_queued_spin_lock_slowpath at ffffffff9931ecb0 >> #6 [ffff972aa49a7e60] queued_write_lock_slowpath at ffffffff993209e4 >> #7 [ffff972aa49a7e78] do_raw_write_lock at ffffffff99320834 >> #8 [ffff972aa49a7e88] do_exit at ffffffff992bcd78 >> #9 [ffff972aa49a7f00] do_group_exit at ffffffff992bd719 >> #10 [ffff972aa49a7f28] __x64_sys_exit_group at ffffffff992bd7a4 >> #11 [ffff972aa49a7f30] do_syscall_64 at ffffffff9920432d >> #12 [ffff972aa49a7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad >> >> In this scenario,there are lots of process are waiting for tasklist read lock or the tasklist >> write lock, so they will queue. if the wait queue is long enough, it might cause a hardlockup issue when a >> process wait for taking the write lock at exit_notify(). >> >> I tried to solve this problem by avoiding traversing the process group multiple times when kill(0, xxxx) >> is called multiple times form the same process group, but it doesn't look like a good solution. >> >> Is there any good idea for fixing this problem ? >> >> Thanks! >> >> Qiao >> . >> > . > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-04-13 1:56 ` Zhang Qiao @ 2022-04-13 15:47 ` Eric W. Biederman 2022-04-14 11:40 ` Zhang Qiao 0 siblings, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2022-04-13 15:47 UTC (permalink / raw) To: Zhang Qiao Cc: lkml, keescook, tglx, Peter Zijlstra, elver, legion, oleg, brauner Zhang Qiao <zhangqiao22@huawei.com> writes: > Gentle ping. Any comments on this problem? Is fork12 a new test? Is there a real world use case that connects to this? How many children are being created in this test? Several million? I would like to blame this on the old issue that tasklist_lock being a global lock. Given the number of child processes (as many as can be created) I don't think we are hurt much by using a global lock. The problem for solubility is that we have a lock. Fundamentally there must be a lock taken to maintain the parent's list of children. I only see SIGQUIT being called once in the parent process so that should not be an issue. There is a minor issue in fork12 that it calls exit(0) instead of _exit(0) in the children. Not the problem you are dealing with but it does look like it can be a distraction. I suspect the issue really is the thundering hurd of a million+ processes synchronizing on a single lock. I don't think this is a hard lockup, just a global slow down. I expect everything will eventually exit. To do something about this is going to take a deep and fundamental redesign of how we maintain process lists to handle a parent with millions of children well. Is there any real world reason to care about this case? Without real world motivation I am inclined to just note that this is something that is handled poorly, and leave it as is. Eric > > 在 2022/4/2 10:22, Zhang Qiao 写道: >> ping... >> >> Any suggestions for this problem? >> >> thank! >> Qiao >> >> >> 在 2022/3/29 16:07, Zhang Qiao 写道: >>> hello everyone, >>> >>> I got a hradlockup panic when run the ltp syscall testcases. >>> >>> 348439.713178] NMI watchdog: Watchdog detected hard LOCKUP on cpu 32 >>> [348439.713236] irq event stamp: 0 >>> [348439.713237] hardirqs last enabled at (0): [<0000000000000000>] 0x0 >>> [348439.713238] hardirqs last disabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 >>> [348439.713239] softirqs last enabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 >>> [348439.713240] softirqs last disabled at (0): [<0000000000000000>] 0x0 >>> [348439.713241] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 >>> [348439.713242] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 >>> [348439.713243] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 >>> [348439.713245] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 >>> [348439.713246] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff >>> [348439.713247] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 >>> [348439.713248] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 >>> [348439.713249] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 >>> [348439.713250] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 >>> [348439.713251] FS: 0000000000000000(0000) GS:ffff96edbc200000(0000) knlGS:0000000000000000 >>> [348439.713252] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [348439.713253] CR2: 0000000000416ea0 CR3: 0000002d91812004 CR4: 00000000001706e0 >>> [348439.713254] Call Trace: >>> [348439.713255] do_raw_write_lock+0xa9/0xb0 >>> [348439.713256] _raw_write_lock_irq+0x5a/0x70 >>> [348439.713256] do_exit+0x429/0xd00 >>> [348439.713257] do_group_exit+0x39/0xb0 >>> [348439.713258] __x64_sys_exit_group+0x14/0x20 >>> [348439.713259] do_syscall_64+0x33/0x40 >>> [348439.713260] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [348439.713260] RIP: 0033:0x7f59295a7066 >>> [348439.713261] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. >>> [348439.713262] RSP: 002b:00007fff0afeac38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 >>> [348439.713264] RAX: ffffffffffffffda RBX: 00007f5929694530 RCX: 00007f59295a7066 >>> [348439.713265] RDX: 0000000000000002 RSI: 000000000000003c RDI: 0000000000000002 >>> [348439.713266] RBP: 0000000000000002 R08: 00000000000000e7 R09: ffffffffffffff80 >>> [348439.713267] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f5929694530 >>> [348439.713268] R13: 0000000000000001 R14: 00007f5929697f68 R15: 0000000000000000 >>> [348439.713269] Kernel panic - not syncing: Hard LOCKUP >>> [348439.713270] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 >>> [348439.713272] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 >>> [348439.713272] Call Trace: >>> [348439.713273] <NMI> >>> [348439.713274] dump_stack+0x77/0x97 >>> [348439.713275] panic+0x10c/0x2fb >>> [348439.713275] nmi_panic+0x35/0x40 >>> [348439.713276] watchdog_hardlockup_check+0xeb/0x110 >>> [348439.713277] __perf_event_overflow+0x52/0xf0 >>> [348439.713278] handle_pmi_common+0x21a/0x320 >>> [348439.713286] intel_pmu_handle_irq+0xc9/0x1b0 >>> [348439.713287] perf_event_nmi_handler+0x24/0x40 >>> [348439.713288] nmi_handle+0xc3/0x2a0 >>> [348439.713289] default_do_nmi+0x49/0xf0 >>> [348439.713289] exc_nmi+0x146/0x160 >>> [348439.713290] end_repeat_nmi+0x16/0x55 >>> [348439.713291] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 >>> [348439.713293] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 >>> [348439.713295] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff >>> [348439.713296] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 >>> [348439.713296] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 >>> [348439.713297] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 >>> [348439.713298] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 >>> [348439.713300] </NMI> >>> [348439.713301] do_raw_write_lock+0xa9/0xb0 >>> [348439.713302] _raw_write_lock_irq+0x5a/0x70 >>> [348439.713303] do_exit+0x429/0xd00 >>> [348439.713303] do_group_exit+0x39/0xb0 >>> [348439.713304] __x64_sys_exit_group+0x14/0x20 >>> [348439.713305] do_syscall_64+0x33/0x40 >>> [348439.713306] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [348439.713307] RIP: 0033:0x7f59295a7066 >>> [348439.713308] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. >>> >>> >>> when analyzing vmcore, i notice lots of fork12 processes are waiting for tasklist read lock or write >>> lock (see the attachment file all_cpu_stacks.log),and every fork12 process(belongs to the same >>> process group) call kill(0, SIGQUIT) in their signal handler()[1], it will traverse all the processes in the >>> same process group and send signal to them one by one, which is a very time-costly work and hold tasklist >>> read lock long time. At the same time, other processes will exit after receive signal, they try to get >>> the tasklist write lock at exit_notify(). >>> >>> [1] fork12 testcase: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/fork/fork12.c >>> >>> some processes call kill(0, SIGQUIT), wait for tasklist read lock: >>> >>> #5 [ffff972a9b16fd78] native_queued_spin_lock_slowpath at ffffffff9931ed47 >>> #6 [ffff972a9b16fd78] queued_read_lock_slowpath at ffffffff99320a58 >>> #7 [ffff972a9b16fd90] do_wait at ffffffff992bc17d >>> #8 [ffff972a9b16fdd0] kernel_wait4 at ffffffff992bd88d >>> #9 [ffff972a9b16fe58] __do_sys_wait4 at ffffffff992bd9e5 >>> #10 [ffff972a9b16ff30] do_syscall_64 at ffffffff9920432d >>> #11 [ffff972a9b16ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad >>> >>> As the same time, some processes are exiting, wait for tasklist write lock: >>> >>> #5 [ffff972aa49a7e60] native_queued_spin_lock_slowpath at ffffffff9931ecb0 >>> #6 [ffff972aa49a7e60] queued_write_lock_slowpath at ffffffff993209e4 >>> #7 [ffff972aa49a7e78] do_raw_write_lock at ffffffff99320834 >>> #8 [ffff972aa49a7e88] do_exit at ffffffff992bcd78 >>> #9 [ffff972aa49a7f00] do_group_exit at ffffffff992bd719 >>> #10 [ffff972aa49a7f28] __x64_sys_exit_group at ffffffff992bd7a4 >>> #11 [ffff972aa49a7f30] do_syscall_64 at ffffffff9920432d >>> #12 [ffff972aa49a7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad >>> >>> In this scenario,there are lots of process are waiting for tasklist read lock or the tasklist >>> write lock, so they will queue. if the wait queue is long enough, it might cause a hardlockup issue when a >>> process wait for taking the write lock at exit_notify(). >>> >>> I tried to solve this problem by avoiding traversing the process group multiple times when kill(0, xxxx) >>> is called multiple times form the same process group, but it doesn't look like a good solution. >>> >>> Is there any good idea for fixing this problem ? >>> >>> Thanks! >>> >>> Qiao >>> . >>> >> . >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-04-13 15:47 ` Eric W. Biederman @ 2022-04-14 11:40 ` Zhang Qiao 2022-04-21 16:12 ` Eric W. Biederman 0 siblings, 1 reply; 12+ messages in thread From: Zhang Qiao @ 2022-04-14 11:40 UTC (permalink / raw) To: Eric W. Biederman Cc: lkml, keescook, tglx, Peter Zijlstra, elver, legion, oleg, brauner 在 2022/4/13 23:47, Eric W. Biederman 写道: > Zhang Qiao <zhangqiao22@huawei.com> writes: > >> Gentle ping. Any comments on this problem? > > Is fork12 a new test? The fork12 is a ltp testcase. (https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/fork/fork12.c) > > Is there a real world use case that connects to this? > > How many children are being created in this test? Several million? There are about 300,000+ processes. > > I would like to blame this on the old issue that tasklist_lock being > a global lock. Given the number of child processes (as many as can be > created) I don't think we are hurt much by using a global lock. The > problem for solubility is that we have a lock. > > Fundamentally there must be a lock taken to maintain the parent's > list of children. > > I only see SIGQUIT being called once in the parent process so that > should not be an issue. In fork12, every child will call kill(0, SIGQUIT) at cleanup(). There are a lot of kill(0, SIGQUIT) calls. > > There is a minor issue in fork12 that it calls exit(0) instead of > _exit(0) in the children. Not the problem you are dealing with > but it does look like it can be a distraction. > > I suspect the issue really is the thundering hurd of a million+ > processes synchronizing on a single lock. > > I don't think this is a hard lockup, just a global slow down. > I expect everything will eventually exit. > But according to the vmcore, this is a hardlockup issue, and i think there may be the following scenarios: rl = read_lock(tasklist_lock); ru = read_unlock(tasklist_lock); wl = write_lock_irq(tasklist_lock); wu = write_unlock_irq(tasklist_lock); t0 t1 t2 t3 t4 t5 t6 t7 t8 ...... cpu0: rl<------------speed 1s ----------->ru // a fork12 call kill(0, SIGQUIT) at t0 on cpu0, taking tasklist read lock at __kill_pgrp_info() cpu1: wl<-----wait lock---------------->|<--get lock-->wu // a fork12 exit, and will disable irq, spin for waiting tasklist write lock at exit_notify() util cpu0 unlock. cpu2: rl<---- wait readlock---------------------.....-->ru // a fork12 call kill(0, SIGQUIT), spin for waiting cpu1 unlock. cpu3: wl<-----------------------------......-------->wu // a fork12 do exit, spin for waiting cpu2 unlock... ..... cpux: rl<-------------------......-------------------->ru // a fork12 call kill(0, SIGQUIT), spin for waiting other cpu unlock. cpux+1: wl<-------------------......-------------------->wu // a fork12 do exit, spin for waiting cpux unlock. The cpu may trigger a hardlockup if too many fork12 are spining to acquire the tasklist read/write lock. As above,the fork12 will take a lot of time to send the signal to the child process at __kill_pgrp_info(), the whole process will take more than a second(more than 300000+ children). when the fork12 hold tasklist read lock over one sencond at __kill_pgrp_info(), there may be a large number of chilren do exit and kill(0, SIGQUIT), they will alternately acquire the tasklist lock(queued spinlock) and spin on waitqueue. Because the process that call __kill_pgrp_info() on the queue takes a lot of time, the exiting process at the tail of waitqueue will wait for long time at exit_notify(), it will cause a hardlockup issue. > > To do something about this is going to take a deep and fundamental > redesign of how we maintain process lists to handle a parent > with millions of children well. > > Is there any real world reason to care about this case? Without > real world motivation I am inclined to just note that this is I just foune it while i ran ltp test. thanks! qiao. > something that is handled poorly, and leave it as is. > > Eric > >> >> 在 2022/4/2 10:22, Zhang Qiao 写道: >>> ping... >>> >>> Any suggestions for this problem? >>> >>> thank! >>> Qiao >>> >>> >>> 在 2022/3/29 16:07, Zhang Qiao 写道: >>>> hello everyone, >>>> >>>> I got a hradlockup panic when run the ltp syscall testcases. >>>> >>>> 348439.713178] NMI watchdog: Watchdog detected hard LOCKUP on cpu 32 >>>> [348439.713236] irq event stamp: 0 >>>> [348439.713237] hardirqs last enabled at (0): [<0000000000000000>] 0x0 >>>> [348439.713238] hardirqs last disabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 >>>> [348439.713239] softirqs last enabled at (0): [<ffffffff87cd1ea5>] copy_process+0x7f5/0x2160 >>>> [348439.713240] softirqs last disabled at (0): [<0000000000000000>] 0x0 >>>> [348439.713241] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 >>>> [348439.713242] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 >>>> [348439.713243] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 >>>> [348439.713245] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 >>>> [348439.713246] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff >>>> [348439.713247] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 >>>> [348439.713248] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 >>>> [348439.713249] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 >>>> [348439.713250] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 >>>> [348439.713251] FS: 0000000000000000(0000) GS:ffff96edbc200000(0000) knlGS:0000000000000000 >>>> [348439.713252] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [348439.713253] CR2: 0000000000416ea0 CR3: 0000002d91812004 CR4: 00000000001706e0 >>>> [348439.713254] Call Trace: >>>> [348439.713255] do_raw_write_lock+0xa9/0xb0 >>>> [348439.713256] _raw_write_lock_irq+0x5a/0x70 >>>> [348439.713256] do_exit+0x429/0xd00 >>>> [348439.713257] do_group_exit+0x39/0xb0 >>>> [348439.713258] __x64_sys_exit_group+0x14/0x20 >>>> [348439.713259] do_syscall_64+0x33/0x40 >>>> [348439.713260] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>> [348439.713260] RIP: 0033:0x7f59295a7066 >>>> [348439.713261] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. >>>> [348439.713262] RSP: 002b:00007fff0afeac38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 >>>> [348439.713264] RAX: ffffffffffffffda RBX: 00007f5929694530 RCX: 00007f59295a7066 >>>> [348439.713265] RDX: 0000000000000002 RSI: 000000000000003c RDI: 0000000000000002 >>>> [348439.713266] RBP: 0000000000000002 R08: 00000000000000e7 R09: ffffffffffffff80 >>>> [348439.713267] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f5929694530 >>>> [348439.713268] R13: 0000000000000001 R14: 00007f5929697f68 R15: 0000000000000000 >>>> [348439.713269] Kernel panic - not syncing: Hard LOCKUP >>>> [348439.713270] CPU: 32 PID: 1151212 Comm: fork12 Kdump: loaded Tainted: G S 5.10.0+ #1 >>>> [348439.713272] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016 >>>> [348439.713272] Call Trace: >>>> [348439.713273] <NMI> >>>> [348439.713274] dump_stack+0x77/0x97 >>>> [348439.713275] panic+0x10c/0x2fb >>>> [348439.713275] nmi_panic+0x35/0x40 >>>> [348439.713276] watchdog_hardlockup_check+0xeb/0x110 >>>> [348439.713277] __perf_event_overflow+0x52/0xf0 >>>> [348439.713278] handle_pmi_common+0x21a/0x320 >>>> [348439.713286] intel_pmu_handle_irq+0xc9/0x1b0 >>>> [348439.713287] perf_event_nmi_handler+0x24/0x40 >>>> [348439.713288] nmi_handle+0xc3/0x2a0 >>>> [348439.713289] default_do_nmi+0x49/0xf0 >>>> [348439.713289] exc_nmi+0x146/0x160 >>>> [348439.713290] end_repeat_nmi+0x16/0x55 >>>> [348439.713291] RIP: 0010:queued_write_lock_slowpath+0x4d/0x80 >>>> [348439.713293] RSP: 0018:ffffa3a6bed4fe60 EFLAGS: 00000006 >>>> [348439.713295] RAX: 0000000000000500 RBX: ffffffff892060c0 RCX: 00000000000000ff >>>> [348439.713296] RDX: 0000000000000500 RSI: 0000000000000100 RDI: ffffffff892060c0 >>>> [348439.713296] RBP: ffffffff892060c4 R08: 0000000000000001 R09: 0000000000000000 >>>> [348439.713297] R10: ffffa3a6bed4fde8 R11: 0000000000000000 R12: ffff96dfd3b68001 >>>> [348439.713298] R13: ffff96dfd3b68000 R14: ffff96dfd3b68c38 R15: ffff96e2cf1f51c0 >>>> [348439.713300] </NMI> >>>> [348439.713301] do_raw_write_lock+0xa9/0xb0 >>>> [348439.713302] _raw_write_lock_irq+0x5a/0x70 >>>> [348439.713303] do_exit+0x429/0xd00 >>>> [348439.713303] do_group_exit+0x39/0xb0 >>>> [348439.713304] __x64_sys_exit_group+0x14/0x20 >>>> [348439.713305] do_syscall_64+0x33/0x40 >>>> [348439.713306] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>> [348439.713307] RIP: 0033:0x7f59295a7066 >>>> [348439.713308] Code: Unable to access opcode bytes at RIP 0x7f59295a703c. >>>> >>>> >>>> when analyzing vmcore, i notice lots of fork12 processes are waiting for tasklist read lock or write >>>> lock (see the attachment file all_cpu_stacks.log),and every fork12 process(belongs to the same >>>> process group) call kill(0, SIGQUIT) in their signal handler()[1], it will traverse all the processes in the >>>> same process group and send signal to them one by one, which is a very time-costly work and hold tasklist >>>> read lock long time. At the same time, other processes will exit after receive signal, they try to get >>>> the tasklist write lock at exit_notify(). >>>> >>>> [1] fork12 testcase: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/fork/fork12.c >>>> >>>> some processes call kill(0, SIGQUIT), wait for tasklist read lock: >>>> >>>> #5 [ffff972a9b16fd78] native_queued_spin_lock_slowpath at ffffffff9931ed47 >>>> #6 [ffff972a9b16fd78] queued_read_lock_slowpath at ffffffff99320a58 >>>> #7 [ffff972a9b16fd90] do_wait at ffffffff992bc17d >>>> #8 [ffff972a9b16fdd0] kernel_wait4 at ffffffff992bd88d >>>> #9 [ffff972a9b16fe58] __do_sys_wait4 at ffffffff992bd9e5 >>>> #10 [ffff972a9b16ff30] do_syscall_64 at ffffffff9920432d >>>> #11 [ffff972a9b16ff50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad >>>> >>>> As the same time, some processes are exiting, wait for tasklist write lock: >>>> >>>> #5 [ffff972aa49a7e60] native_queued_spin_lock_slowpath at ffffffff9931ecb0 >>>> #6 [ffff972aa49a7e60] queued_write_lock_slowpath at ffffffff993209e4 >>>> #7 [ffff972aa49a7e78] do_raw_write_lock at ffffffff99320834 >>>> #8 [ffff972aa49a7e88] do_exit at ffffffff992bcd78 >>>> #9 [ffff972aa49a7f00] do_group_exit at ffffffff992bd719 >>>> #10 [ffff972aa49a7f28] __x64_sys_exit_group at ffffffff992bd7a4 >>>> #11 [ffff972aa49a7f30] do_syscall_64 at ffffffff9920432d >>>> #12 [ffff972aa49a7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c000ad >>>> >>>> In this scenario,there are lots of process are waiting for tasklist read lock or the tasklist >>>> write lock, so they will queue. if the wait queue is long enough, it might cause a hardlockup issue when a >>>> process wait for taking the write lock at exit_notify(). >>>> >>>> I tried to solve this problem by avoiding traversing the process group multiple times when kill(0, xxxx) >>>> is called multiple times form the same process group, but it doesn't look like a good solution. >>>> >>>> Is there any good idea for fixing this problem ? >>>> >>>> Thanks! >>>> >>>> Qiao >>>> . >>>> >>> . >>> > . > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-04-14 11:40 ` Zhang Qiao @ 2022-04-21 16:12 ` Eric W. Biederman 2022-04-28 2:05 ` Zhang Qiao 2022-04-28 12:33 ` Thomas Gleixner 0 siblings, 2 replies; 12+ messages in thread From: Eric W. Biederman @ 2022-04-21 16:12 UTC (permalink / raw) To: Zhang Qiao Cc: lkml, keescook, tglx, Peter Zijlstra, elver, legion, oleg, brauner Zhang Qiao <zhangqiao22@huawei.com> writes: > 在 2022/4/13 23:47, Eric W. Biederman 写道: >> To do something about this is going to take a deep and fundamental >> redesign of how we maintain process lists to handle a parent >> with millions of children well. >> >> Is there any real world reason to care about this case? Without >> real world motivation I am inclined to just note that this is > > I just foune it while i ran ltp test. So I looked and fork12 has been around since 2002 in largely it's current form. So I am puzzled why you have run into problems and other people have not. Did you perhaps have lock debugging enabled? Did you run on a very large machine where a ridiculous number processes could be created? Did you happen to run fork12 on a machine where locks are much more expensive than on most machines? >> Is there a real world use case that connects to this? >> >> How many children are being created in this test? Several million? > > There are about 300,000+ processes. Not as many as I was guessing, but still enough to cause a huge wait on locks. >> I would like to blame this on the old issue that tasklist_lock being >> a global lock. Given the number of child processes (as many as can be >> created) I don't think we are hurt much by using a global lock. The >> problem for solubility is that we have a lock. >> >> Fundamentally there must be a lock taken to maintain the parent's >> list of children. >> >> I only see SIGQUIT being called once in the parent process so that >> should not be an issue. > > > In fork12, every child will call kill(0, SIGQUIT) at cleanup(). > There are a lot of kill(0, SIGQUIT) calls. I had missed that. I can see that stressing out a lot. At the same time as I read fork12.c that is very much a bug. The children in fork12.c should call _exit() instead of exit(). Which would suppress calling the atexit() handlers and let fork12.c test what it is trying to test. That doesn't mean there isn't a mystery here, but more that if we really want to test lots of processes calling the same signal at the same time it should be a test that means to do that. >> There is a minor issue in fork12 that it calls exit(0) instead of >> _exit(0) in the children. Not the problem you are dealing with >> but it does look like it can be a distraction. >> >> I suspect the issue really is the thundering hurd of a million+ >> processes synchronizing on a single lock. >> >> I don't think this is a hard lockup, just a global slow down. >> I expect everything will eventually exit. >> > > But according to the vmcore, this is a hardlockup issue, and i think > there may be the following scenarios: Let me rewind a second. I just realized that I don't have a clue what a hard lockup is (outside of the linux hard lockup detector). The two kinds of lockups that I understand with a technical meaning are deadlock (such taking two locks in opposite orders which can never be escaped), and livelock (where things are so busy no progress is made for an extended period of time). I meant to say this is not a deadlock situation. This looks like a livelock, but I think given enough time the code would make progress and get out of it. I do agree over 1 second for holding a spin lock is ridiculous and a denial of service attack. What I unfortunately do not see is a real world scenario where this will happen. Without a real world scenario it is hard to find motivation to spend the year or so it would take to rework all of the data structures. The closest I can imagine to a real world scenario is that this situation can be used as a denial of service attack. The hardest part of the problem is that signals sent to a group need to be sent to the group atomically. That is the signals need to be sent to every member of the group. Anyway I am very curious why you are the only one seeing a problem with fork12. That we can definitely investigate as tracking down what is different about your setup versus other people who have run ltp seems much easier than redesigning all of the signal processing data structures from scratch. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-04-21 16:12 ` Eric W. Biederman @ 2022-04-28 2:05 ` Zhang Qiao 2022-04-28 12:33 ` Thomas Gleixner 1 sibling, 0 replies; 12+ messages in thread From: Zhang Qiao @ 2022-04-28 2:05 UTC (permalink / raw) To: Eric W. Biederman Cc: lkml, keescook, tglx, Peter Zijlstra, elver, legion, oleg, brauner hi, 在 2022/4/22 0:12, Eric W. Biederman 写道: > Zhang Qiao <zhangqiao22@huawei.com> writes: > >> 在 2022/4/13 23:47, Eric W. Biederman 写道: >>> To do something about this is going to take a deep and fundamental >>> redesign of how we maintain process lists to handle a parent >>> with millions of children well. >>> >>> Is there any real world reason to care about this case? Without >>> real world motivation I am inclined to just note that this is >> >> I just foune it while i ran ltp test. > > So I looked and fork12 has been around since 2002 in largely it's > current form. So I am puzzled why you have run into problems > and other people have not. > > Did you perhaps have lock debugging enabled? > > Did you run on a very large machine where a ridiculous number processes > could be created? > > Did you happen to run fork12 on a machine where locks are much more > expensive than on most machines? I don't think so, I reproduced this problem on two servers with different configurations. One of server info as follows: cpu: Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz, 64 cpus, RAM: 377G Do you need any other information? > > >>> Is there a real world use case that connects to this? >>> >>> How many children are being created in this test? Several million? >> >> There are about 300,000+ processes. > > Not as many as I was guessing, but still enough to cause a huge > wait on locks. > >>> I would like to blame this on the old issue that tasklist_lock being >>> a global lock. Given the number of child processes (as many as can be >>> created) I don't think we are hurt much by using a global lock. The >>> problem for solubility is that we have a lock. >>> >>> Fundamentally there must be a lock taken to maintain the parent's >>> list of children. >>> >>> I only see SIGQUIT being called once in the parent process so that >>> should not be an issue. >> >> >> In fork12, every child will call kill(0, SIGQUIT) at cleanup(). >> There are a lot of kill(0, SIGQUIT) calls. > > I had missed that. I can see that stressing out a lot. > > At the same time as I read fork12.c that is very much a bug. The > children in fork12.c should call _exit() instead of exit(). Which > would suppress calling the atexit() handlers and let fork12.c > test what it is trying to test. > > That doesn't mean there isn't a mystery here, but more that if > we really want to test lots of processes calling the same > signal at the same time it should be a test that means to do that. > > >>> There is a minor issue in fork12 that it calls exit(0) instead of >>> _exit(0) in the children. Not the problem you are dealing with >>> but it does look like it can be a distraction. >>> >>> I suspect the issue really is the thundering hurd of a million+ >>> processes synchronizing on a single lock. >>> >>> I don't think this is a hard lockup, just a global slow down. >>> I expect everything will eventually exit. >>> >> >> But according to the vmcore, this is a hardlockup issue, and i think >> there may be the following scenarios: > > Let me rewind a second. I just realized that I don't have a clue what > a hard lockup is (outside of the linux hard lockup detector). > > The two kinds of lockups that I understand with a technical meaning are > deadlock (such taking two locks in opposite orders which can never be > escaped), and livelock (where things are so busy no progress is made for > an extended period of time). > > I meant to say this is not a deadlock situation. This looks like a > livelock, but I think given enough time the code would make progress and > get out of it. > > I do agree over 1 second for holding a spin lock is ridiculous and a > denial of service attack. > > > > What I unfortunately do not see is a real world scenario where this will > happen. Without a real world scenario it is hard to find motivation to > spend the year or so it would take to rework all of the data structures. > The closest I can imagine to a real world scenario is that this > situation can be used as a denial of service attack. > > The hardest part of the problem is that signals sent to a group need to > be sent to the group atomically. That is the signals need to be sent to > every member of the group. > > Anyway I am very curious why you are the only one seeing a problem with > fork12. That we can definitely investigate as tracking down what is > different about your setup versus other people who have run ltp seems > much easier than redesigning all of the signal processing data > structures from scratch. the test steps are as follows: 1. git clone https://github.com/linux-test-project/ltp.git --depth=1 2. cd ltp/ 3. make autotools 4. ./configure 5. cd testcases/kernel/syscalls/ 6. make -j64 7. find ./ -type f -executable > newlist 8. while read line;do ./$line -I 30;done < newlist 9. After ten hours, i trigger Ctrl+C repeatedly. > > Eric > . > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-04-21 16:12 ` Eric W. Biederman 2022-04-28 2:05 ` Zhang Qiao @ 2022-04-28 12:33 ` Thomas Gleixner 2022-05-11 18:33 ` Eric W. Biederman 1 sibling, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2022-04-28 12:33 UTC (permalink / raw) To: Eric W. Biederman, Zhang Qiao Cc: lkml, keescook, Peter Zijlstra, elver, legion, oleg, brauner On Thu, Apr 21 2022 at 11:12, Eric W. Biederman wrote: > Zhang Qiao <zhangqiao22@huawei.com> writes: >>> How many children are being created in this test? Several million? >> >> There are about 300,000+ processes. > > Not as many as I was guessing, but still enough to cause a huge > wait on locks. Indeed. It's about 4-5us per process to send the signal on a 2GHz SKL-X. So with 20000k processes tasklist lock is read held for 1 second. > I do agree over 1 second for holding a spin lock is ridiculous and a > denial of service attack. Exactly. Even holding it for 100ms (20k forks) is daft. So unless the number of PIDs for a user is limited this _is_ an unpriviledged DoS vector. > Anyway I am very curious why you are the only one seeing a problem with > fork12. It's fully reproducible. It's just a question how big the machine is and what the PID limits are on the box you are testing on. >>> I suspect the issue really is the thundering hurd of a million+ >>> processes synchronizing on a single lock. There are several issues: 1) The parent sending the signal is holding the lock for an obscene long time. 2) Every signaled child runs into tasklist lock contention as all of them need to aquire it for write in do_exit(). That means within (NR_CPUS - 1) * 5usec all CPUs are spinning on tasklist lock with interrupts disabled up to the point where #1 has finished. So depending on the number of childs and the configured limits of a lockup detector this is sufficient to trigger a warning. Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-04-28 12:33 ` Thomas Gleixner @ 2022-05-11 18:33 ` Eric W. Biederman 2022-05-11 22:53 ` Thomas Gleixner 0 siblings, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2022-05-11 18:33 UTC (permalink / raw) To: Thomas Gleixner Cc: Zhang Qiao, lkml, keescook, Peter Zijlstra, elver, legion, oleg, brauner Thomas Gleixner <tglx@linutronix.de> writes: > On Thu, Apr 21 2022 at 11:12, Eric W. Biederman wrote: >> Zhang Qiao <zhangqiao22@huawei.com> writes: >>>> How many children are being created in this test? Several million? >>> >>> There are about 300,000+ processes. >> >> Not as many as I was guessing, but still enough to cause a huge >> wait on locks. > > Indeed. It's about 4-5us per process to send the signal on a 2GHz > SKL-X. So with 20000k processes tasklist lock is read held for 1 second. > >> I do agree over 1 second for holding a spin lock is ridiculous and a >> denial of service attack. > > Exactly. Even holding it for 100ms (20k forks) is daft. > > So unless the number of PIDs for a user is limited this _is_ an > unpriviledged DoS vector. After having slept on this a bit it finally occurred to me the semi-obvious solution to this issue is to convert tasklist_lock from a rw-spinlock to rw-semaphore. The challenge is finding the users (tty layer?) that generate signals from interrupt context and redirect that signal generation. Once signals holding tasklist_lock are no longer generated from interrupt context irqs no longer need to be disabled and after verifying tasklist_lock isn't held under any other spinlocks it can be converted to a semaphore. It won't help the signal delivery times, but it should reduce the effect on the rest of the system, and prevent watchdogs from firing. I don't know if I have time to do any of that now, but it does seem a reasonable direction to move the code in. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-05-11 18:33 ` Eric W. Biederman @ 2022-05-11 22:53 ` Thomas Gleixner 2022-05-12 18:23 ` Eric W. Biederman 0 siblings, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2022-05-11 22:53 UTC (permalink / raw) To: Eric W. Biederman Cc: Zhang Qiao, lkml, keescook, Peter Zijlstra, elver, legion, oleg, brauner On Wed, May 11 2022 at 13:33, Eric W. Biederman wrote: > Thomas Gleixner <tglx@linutronix.de> writes: >> So unless the number of PIDs for a user is limited this _is_ an >> unpriviledged DoS vector. > > After having slept on this a bit it finally occurred to me the > semi-obvious solution to this issue is to convert tasklist_lock > from a rw-spinlock to rw-semaphore. The challenge is finding > the users (tty layer?) that generate signals from interrupt > context and redirect that signal generation. From my outdated notes where I looked at this before: [soft]interrupt context which acquires tasklist_lock: sysrq-e send_sig_all() sysrq-i send_sig_all() sysrq-n normalize_rt_tasks() tasklist_lock nesting into other locks: fs/fcntl.c: send_sigio(), send_sigurg() send_sigurg() is called from the network stack ... Some very obscure stuff in arch/ia64/kernel/mca.c which is called from a DIE notifier. Plus quite a bunch of read_lock() instances which nest inside rcu_read_lock() held sections. This is probably incomplete, but the scope of the problem has been greatly reduced vs. the point where I looked at it last time a couple of years ago. But that's still a herculean task. > Once signals holding tasklist_lock are no longer generated from > interrupt context irqs no longer need to be disabled and > after verifying tasklist_lock isn't held under any other spinlocks > it can be converted to a semaphore. Going to take a while. :) > It won't help the signal delivery times, but it should reduce > the effect on the rest of the system, and prevent watchdogs from > firing. The signal delivery time itself is the least of the worries, but this still prevents any other operations which require tasklist_lock from making progress for quite some time, i.e. fork/exec for unrelated processes/users will have to wait too. So you converted the 'visible' DoS to an 'invisible' one. The real problem is that the scope of tasklist_lock is too broad for most use cases. That does not change when you actually can convert it to a rwsem. The underlying problem still persists. Let's take a step back and look what most sane use cases (sysrq-* is not in that category) require: Preventing that tasks are added or removed Do they require that any task is added or removed? No. They require to prevent add/remove for the intended scope. That's the thing we need to focus on: reducing the protection scope. If we can segment the protection for the required scope of e.g. kill(2) then we still can let unrelated processes/tasks make progress and just inflict the damage on the affected portion of processes/tasks. For example: read_lock(&tasklist_lock); for_each_process(p) { if (task_pid_vnr(p) > 1 && !same_thread_group(p, current)) { group_send_sig_info(...., p); } } read_unlock(&tasklist_lock); same_thread_group() does: return p->signal == current->signal; Ideally we can do: read_lock(&tasklist_lock); prevent_add_remove(current->signal); read_unlock(&tasklist_lock); rcu_read_lock(); for_each_process(p) { if (task_pid_vnr(p) > 1 && !same_thread_group(p, current)) { group_send_sig_info(...., p); } } rcu_read_unlock(); allow_add_remove(current->signal); Where prevent_add_remove() sets a state which has to be waited for to be cleared by anything which wants to add/remove a task in that scope or change $relatedtask->signal until allow_add_remove() removes that blocker. I'm sure it's way more complicated, but you get the idea. If we find a solution to this scope reduction problem, then it will not only squash the issue which started this discussion. This will have a benefit in general. Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-05-11 22:53 ` Thomas Gleixner @ 2022-05-12 18:23 ` Eric W. Biederman 2022-09-26 7:32 ` Zhang Qiao 0 siblings, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2022-05-12 18:23 UTC (permalink / raw) To: Thomas Gleixner Cc: Zhang Qiao, lkml, keescook, Peter Zijlstra, elver, legion, oleg, brauner Thomas Gleixner <tglx@linutronix.de> writes: > On Wed, May 11 2022 at 13:33, Eric W. Biederman wrote: >> Thomas Gleixner <tglx@linutronix.de> writes: >>> So unless the number of PIDs for a user is limited this _is_ an >>> unpriviledged DoS vector. >> >> After having slept on this a bit it finally occurred to me the >> semi-obvious solution to this issue is to convert tasklist_lock >> from a rw-spinlock to rw-semaphore. The challenge is finding >> the users (tty layer?) that generate signals from interrupt >> context and redirect that signal generation. > > From my outdated notes where I looked at this before: > > [soft]interrupt context which acquires tasklist_lock: > sysrq-e send_sig_all() > sysrq-i send_sig_all() > sysrq-n normalize_rt_tasks() > > tasklist_lock nesting into other locks: > fs/fcntl.c: send_sigio(), send_sigurg() > > send_sigurg() is called from the network stack ... > > Some very obscure stuff in arch/ia64/kernel/mca.c which is called > from a DIE notifier. I think we are very close to the point that if ia64 is the only user problem case we can just do "git rm arch/ia64". I am not certain there is even anyone left that cares enough to report breakage on ia64. > Plus quite a bunch of read_lock() instances which nest inside > rcu_read_lock() held sections. > > This is probably incomplete, but the scope of the problem has been > greatly reduced vs. the point where I looked at it last time a couple of > years ago. But that's still a herculean task. I won't argue. >> Once signals holding tasklist_lock are no longer generated from >> interrupt context irqs no longer need to be disabled and >> after verifying tasklist_lock isn't held under any other spinlocks >> it can be converted to a semaphore. > > Going to take a while. :) It is a very tractable problem that people can work on incrementally. >> It won't help the signal delivery times, but it should reduce >> the effect on the rest of the system, and prevent watchdogs from >> firing. > > The signal delivery time itself is the least of the worries, but this > still prevents any other operations which require tasklist_lock from > making progress for quite some time, i.e. fork/exec for unrelated > processes/users will have to wait too. So you converted the 'visible' > DoS to an 'invisible' one. > > The real problem is that the scope of tasklist_lock is too broad for > most use cases. That does not change when you actually can convert it to > a rwsem. The underlying problem still persists. > > Let's take a step back and look what most sane use cases (sysrq-* is not > in that category) require: > > Preventing that tasks are added or removed > > Do they require that any task is added or removed? No. > > They require to prevent add/remove for the intended scope. > > That's the thing we need to focus on: reducing the protection scope. > > If we can segment the protection for the required scope of e.g. kill(2) > then we still can let unrelated processes/tasks make progress and just > inflict the damage on the affected portion of processes/tasks. > > For example: > > read_lock(&tasklist_lock); > for_each_process(p) { > if (task_pid_vnr(p) > 1 && > !same_thread_group(p, current)) { > > group_send_sig_info(...., p); > } > } > read_unlock(&tasklist_lock); > > same_thread_group() does: > > return p->signal == current->signal; Yes. So the sender can not send a signal to itself. Basically it is a test to see if a thread is a member of a process. > Ideally we can do: > > read_lock(&tasklist_lock); > prevent_add_remove(current->signal); > read_unlock(&tasklist_lock); > > rcu_read_lock(); > for_each_process(p) { > if (task_pid_vnr(p) > 1 && > !same_thread_group(p, current)) { > > group_send_sig_info(...., p); > } > } > rcu_read_unlock(); > > allow_add_remove(current->signal); > > Where prevent_add_remove() sets a state which has to be waited for to be > cleared by anything which wants to add/remove a task in that scope or > change $relatedtask->signal until allow_add_remove() removes that > blocker. I'm sure it's way more complicated, but you get the idea. Hmm. For sending signals what is needed is the guarantee that the signal is sent to an atomic snapshot of the appropriate group of processes so that SIGKILL sent to the group will reliably kill all of the processes. It should be ok for a process to exit on it's own from the group. As long as it logically looks like the process exited before the signal was sent. There is also ptrace_attach/__ptrace_unlink, reparenting, kill_orphaned_pgrp, zap_pid_ns_processes, and pid hash table maintenance in release_task. I have a patch I am playing with that protects task->parent and task->real_parent with siglock and with a little luck that can be generalized so that sending signals to parents and ptrace don't need tasklist_lock at all. For reparenting of children the new parents list of children needs protection but that should not need tasklist lock. For kill_orphaned_pgrp with some additional per process group maintenance state so that will_become_orphaned_pgrp and has_stopped_jobs don't need to traverse the process group it should be possible to just have it send a sender of a process group signal. zap_pid_ns_processes is already called without the tasklist_lock. Maintenance of the pid hash table certainly needs a write lock in __exit_signal but it doesn't need to be tasklist_lock. Which is a long way of saying semantically all we need is to prevent_addition to the group of processes a signal will be sent to. We have one version of that prevention today in fork where it tests fatal_signal_pending after taking tasklist_lock and siglock. For the case you are describing the code would just need to check each group of processes the new process is put into. Hmm. When I boil it all down in my head I wind up with something like: rwlock_t *lock = signal_lock(enum pid_type type); read_lock(lock); /* Do the work of sending the signal */ read_unlock(lock); With fork needing to grab all of those possible locks for write as it adds the process to the group. Maybe it could be something like: struct group_signal { struct hlist_node node; struct kernel_siginfo *info; }; void start_group_signal(struct group_signal *group, struct kernel_siginfo *info, enum pid_type type); void end_group_signal(struct group_signal *group); struct group_signal group_sig; start_group_signal(&group_sig, info, PIDTYPE_PGID); /* Walk through the list and deliver the signal */ end_group_signal(&group_sig); That would allow fork to see all signals that are being delivered to a group even it the signal has not been delivered to the parent process yet. At which point the signal could be delivered to the parent before the fork. I just need something to ensure that the signal delivery loop between start_group_signal and end_group_signal skips processes that hurried up and delivered the signal to themselves, and does not deliver to newly added processes. A generation counter perhaps. There is a lot to flesh out, and I am burried alive in other cleanups but I think that could work, and remove the need to hold tasklist_lock during signal delivery. > If we find a solution to this scope reduction problem, then it will not > only squash the issue which started this discussion. This will have a > benefit in general. We need to go farther than simple scope reduction to benefit the original problem. As all of the process in that problem were all sending a signal to the same process group. So they all needed to wait for each other. If we need to block adds then the adds need to effectively take a write_lock to the read_lock taken during signal delivery. Because all of the blocking is the same we won't see an improvement of the original problem. If in addition to scope reduction, a barrier is implemented so that it is guaranteed that past a certain point processes will see the signal before they fork (or do anything else that userspace could tell the signal was not delivered atomically) then I think we can eliminate blocking in the same places and an improvement in the issue that started this discussion can be seen. I will queue it up on my list of things I would like to do. I am burried in other signal related cleanups at the moment so I don't know when I will be able to get to anything like that. But I really appreciate the idea. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Question about kill a process group 2022-05-12 18:23 ` Eric W. Biederman @ 2022-09-26 7:32 ` Zhang Qiao 0 siblings, 0 replies; 12+ messages in thread From: Zhang Qiao @ 2022-09-26 7:32 UTC (permalink / raw) To: Eric W. Biederman, Thomas Gleixner Cc: lkml, keescook, Peter Zijlstra, elver, legion, oleg, brauner 在 2022/5/13 2:23, Eric W. Biederman 写道: > Thomas Gleixner <tglx@linutronix.de> writes: > >> On Wed, May 11 2022 at 13:33, Eric W. Biederman wrote: >>> Thomas Gleixner <tglx@linutronix.de> writes: >>>> So unless the number of PIDs for a user is limited this _is_ an >>>> unpriviledged DoS vector. >>> >>> After having slept on this a bit it finally occurred to me the >>> semi-obvious solution to this issue is to convert tasklist_lock >>> from a rw-spinlock to rw-semaphore. The challenge is finding >>> the users (tty layer?) that generate signals from interrupt >>> context and redirect that signal generation. >> >> From my outdated notes where I looked at this before: >> >> [soft]interrupt context which acquires tasklist_lock: >> sysrq-e send_sig_all() >> sysrq-i send_sig_all() >> sysrq-n normalize_rt_tasks() >> >> tasklist_lock nesting into other locks: >> fs/fcntl.c: send_sigio(), send_sigurg() >> >> send_sigurg() is called from the network stack ... >> >> Some very obscure stuff in arch/ia64/kernel/mca.c which is called >> from a DIE notifier. > > I think we are very close to the point that if ia64 is the only user > problem case we can just do "git rm arch/ia64". I am not certain > there is even anyone left that cares enough to report breakage > on ia64. > >> Plus quite a bunch of read_lock() instances which nest inside >> rcu_read_lock() held sections. >> >> This is probably incomplete, but the scope of the problem has been >> greatly reduced vs. the point where I looked at it last time a couple of >> years ago. But that's still a herculean task. > > I won't argue. > >>> Once signals holding tasklist_lock are no longer generated from >>> interrupt context irqs no longer need to be disabled and >>> after verifying tasklist_lock isn't held under any other spinlocks >>> it can be converted to a semaphore. >> >> Going to take a while. :) > > It is a very tractable problem that people can work on incrementally. > >>> It won't help the signal delivery times, but it should reduce >>> the effect on the rest of the system, and prevent watchdogs from >>> firing. >> >> The signal delivery time itself is the least of the worries, but this >> still prevents any other operations which require tasklist_lock from >> making progress for quite some time, i.e. fork/exec for unrelated >> processes/users will have to wait too. So you converted the 'visible' >> DoS to an 'invisible' one. >> >> The real problem is that the scope of tasklist_lock is too broad for >> most use cases. That does not change when you actually can convert it to >> a rwsem. The underlying problem still persists. >> >> Let's take a step back and look what most sane use cases (sysrq-* is not >> in that category) require: >> >> Preventing that tasks are added or removed >> >> Do they require that any task is added or removed? No. >> >> They require to prevent add/remove for the intended scope. >> >> That's the thing we need to focus on: reducing the protection scope. >> >> If we can segment the protection for the required scope of e.g. kill(2) >> then we still can let unrelated processes/tasks make progress and just >> inflict the damage on the affected portion of processes/tasks. >> >> For example: >> >> read_lock(&tasklist_lock); >> for_each_process(p) { >> if (task_pid_vnr(p) > 1 && >> !same_thread_group(p, current)) { >> >> group_send_sig_info(...., p); >> } >> } >> read_unlock(&tasklist_lock); >> >> same_thread_group() does: >> >> return p->signal == current->signal; > > Yes. So the sender can not send a signal to itself. > Basically it is a test to see if a thread is a member of a process. > >> Ideally we can do: >> >> read_lock(&tasklist_lock); >> prevent_add_remove(current->signal); >> read_unlock(&tasklist_lock); >> >> rcu_read_lock(); >> for_each_process(p) { >> if (task_pid_vnr(p) > 1 && >> !same_thread_group(p, current)) { >> >> group_send_sig_info(...., p); >> } >> } >> rcu_read_unlock(); >> >> allow_add_remove(current->signal); >> >> Where prevent_add_remove() sets a state which has to be waited for to be >> cleared by anything which wants to add/remove a task in that scope or >> change $relatedtask->signal until allow_add_remove() removes that >> blocker. I'm sure it's way more complicated, but you get the idea. > > Hmm. > > For sending signals what is needed is the guarantee that the signal is > sent to an atomic snapshot of the appropriate group of processes so that > SIGKILL sent to the group will reliably kill all of the processes. It > should be ok for a process to exit on it's own from the group. As long > as it logically looks like the process exited before the signal was > sent. > > There is also ptrace_attach/__ptrace_unlink, reparenting, > kill_orphaned_pgrp, zap_pid_ns_processes, and pid hash table > maintenance in release_task. > > I have a patch I am playing with that protects task->parent and > task->real_parent with siglock and with a little luck that can > be generalized so that sending signals to parents and ptrace don't > need tasklist_lock at all. > > For reparenting of children the new parents list of children > needs protection but that should not need tasklist lock. > > For kill_orphaned_pgrp with some additional per process group > maintenance state so that will_become_orphaned_pgrp and has_stopped_jobs > don't need to traverse the process group it should be possible to > just have it send a sender of a process group signal. > > zap_pid_ns_processes is already called without the tasklist_lock. > > Maintenance of the pid hash table certainly needs a write lock in > __exit_signal but it doesn't need to be tasklist_lock. > > > > > > Which is a long way of saying semantically all we need is to > prevent_addition to the group of processes a signal will be sent to. We > have one version of that prevention today in fork where it tests > fatal_signal_pending after taking tasklist_lock and siglock. For the > case you are describing the code would just need to check each group of > processes the new process is put into. > > > Hmm. > > When I boil it all down in my head I wind up with something like: > > rwlock_t *lock = signal_lock(enum pid_type type); > read_lock(lock); > /* Do the work of sending the signal */ > read_unlock(lock); > > With fork needing to grab all of those possible locks for write > as it adds the process to the group. > > Maybe it could be something like: > > struct group_signal { > struct hlist_node node; > struct kernel_siginfo *info; > }; > > void start_group_signal(struct group_signal *group, struct > kernel_siginfo *info, enum pid_type type); > void end_group_signal(struct group_signal *group); > > struct group_signal group_sig; > start_group_signal(&group_sig, info, PIDTYPE_PGID); > > /* Walk through the list and deliver the signal */ > > end_group_signal(&group_sig); > > That would allow fork to see all signals that are being delivered to a > group even it the signal has not been delivered to the parent process > yet. At which point the signal could be delivered to the parent before > the fork. I just need something to ensure that the signal delivery loop > between start_group_signal and end_group_signal skips processes that > hurried up and delivered the signal to themselves, and does not > deliver to newly added processes. A generation counter perhaps. > > There is a lot to flesh out, and I am burried alive in other cleanups > but I think that could work, and remove the need to hold tasklist_lock > during signal delivery. > > >> If we find a solution to this scope reduction problem, then it will not >> only squash the issue which started this discussion. This will have a >> benefit in general. > > We need to go farther than simple scope reduction to benefit the > original problem. As all of the process in that problem were > all sending a signal to the same process group. So they all needed > to wait for each other. > > If we need to block adds then the adds need to effectively take a > write_lock to the read_lock taken during signal delivery. Because > all of the blocking is the same we won't see an improvement of > the original problem. > > > > If in addition to scope reduction, a barrier is implemented so that > it is guaranteed that past a certain point processes will see the signal > before they fork (or do anything else that userspace could tell the > signal was not delivered atomically) then I think we can eliminate > blocking in the same places and an improvement in the issue that > started this discussion can be seen. > > > I will queue it up on my list of things I would like to do. I am hi, Eric: Do you have any plans to fix it? I look forward to your patches. thanks! -Zhang Qiao . > burried in other signal related cleanups at the moment so I don't know > when I will be able to get to anything like that. But I really > appreciate the idea. > > Eric > > . > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-09-26 7:33 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-29 8:07 Question about kill a process group Zhang Qiao 2022-04-02 2:22 ` Zhang Qiao 2022-04-13 1:56 ` Zhang Qiao 2022-04-13 15:47 ` Eric W. Biederman 2022-04-14 11:40 ` Zhang Qiao 2022-04-21 16:12 ` Eric W. Biederman 2022-04-28 2:05 ` Zhang Qiao 2022-04-28 12:33 ` Thomas Gleixner 2022-05-11 18:33 ` Eric W. Biederman 2022-05-11 22:53 ` Thomas Gleixner 2022-05-12 18:23 ` Eric W. Biederman 2022-09-26 7:32 ` Zhang Qiao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).