linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* possible deadlock in __perf_event_task_sched_in
@ 2024-04-22 18:44 Marius Fleischer
  2024-04-24  9:43 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Marius Fleischer @ 2024-04-22 18:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-perf-users, linux-kernel
  Cc: syzkaller, harrisonmichaelgreen

Hi,

We would like to report the following bug which has been found by our
modified version of syzkaller.

We found this report (https://lkml.org/lkml/2021/9/12/333) that seems
to have a similar but different stack trace. We are unable to tell,
though, whether it is the same cause. We’d be grateful for your
advice.

======================================================
description: possible deadlock in __perf_event_task_sched_in
affected file: kernel/events/core.c
kernel version: v5.15.156
kernel commit: c52b9710c83d3b8ab63bb217cc7c8b61e13f12cd
git tree: upstream
kernel config: attached
crash reproducer: attached
======================================================
Crash log:
WARNING: possible circular locking dependency detected
5.15.156 #1 Not tainted
------------------------------------------------------
syz-executor.2/150802 is trying to acquire lock:
ffffffff8b976cd8 ((console_sem).lock){-.-.}-{2:2}, at:
down_trylock+0xe/0x60 kernel/locking/semaphore.c:138

but task is already holding lock:
ffff88809b9f8c20 (&ctx->lock){....}-{2:2}, at: perf_ctx_lock
kernel/events/core.c:169 [inline]
ffff88809b9f8c20 (&ctx->lock){....}-{2:2}, at:
perf_event_context_sched_in kernel/events/core.c:4009 [inline]
ffff88809b9f8c20 (&ctx->lock){....}-{2:2}, at:
__perf_event_task_sched_in+0x4bc/0xb90 kernel/events/core.c:4071

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&ctx->lock){....}-{2:2}:
    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
    perf_event_context_sched_out kernel/events/core.c:3623 [inline]
    __perf_event_task_sched_out+0x6e0/0x2110 kernel/events/core.c:3732
    perf_event_task_sched_out include/linux/perf_event.h:1252 [inline]
    prepare_task_switch kernel/sched/core.c:4834 [inline]
    context_switch kernel/sched/core.c:4982 [inline]
    __schedule+0x16d1/0x5560 kernel/sched/core.c:6376
    preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6552
    preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:34
    try_to_wake_up+0xa11/0x1580 kernel/sched/core.c:4152
    wake_up_process kernel/sched/core.c:4215 [inline]
    wake_up_q+0x7e/0xf0 kernel/sched/core.c:950
    futex_wake+0x3e9/0x490 kernel/futex/core.c:1702
    do_futex+0x2cc/0x1960 kernel/futex/core.c:3987
    __do_sys_futex+0xe7/0x3d0 kernel/futex/core.c:4059
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x66/0xd0

-> #2 (&rq->__lock){-.-.}-{2:2}:
    _raw_spin_lock_nested+0x30/0x40 kernel/locking/spinlock.c:368
    raw_spin_rq_lock_nested+0x2b/0x120 kernel/sched/core.c:475
    raw_spin_rq_lock kernel/sched/sched.h:1326 [inline]
    rq_lock kernel/sched/sched.h:1621 [inline]
    task_fork_fair+0x70/0x500 kernel/sched/fair.c:11480
    sched_cgroup_fork+0x343/0x480 kernel/sched/core.c:4466
    copy_process+0x39c8/0x8730 kernel/fork.c:2320
    kernel_clone+0xe7/0x9f0 kernel/fork.c:2604
    kernel_thread+0xb8/0xf0 kernel/fork.c:2656
    rest_init+0x23/0x460 init/main.c:704
    start_kernel+0x465/0x486 init/main.c:1138
    secondary_startup_64_no_verify+0xb0/0xbb

-> #1 (&p->pi_lock){-.-.}-{2:2}:
    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
    _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
    try_to_wake_up+0xac/0x1580 kernel/sched/core.c:4030
    up+0x75/0xb0 kernel/locking/semaphore.c:190
    __up_console_sem+0xa4/0xc0 kernel/printk/printk.c:256
    console_unlock+0x564/0xb70 kernel/printk/printk.c:2760
    vga_remove_vgacon drivers/gpu/vga/vgaarb.c:211 [inline]
    vga_remove_vgacon.cold+0x99/0x9e drivers/gpu/vga/vgaarb.c:192
    drm_aperture_remove_conflicting_pci_framebuffers+0x7f/0x230
drivers/gpu/drm/drm_aperture.c:343
    bochs_pci_probe+0x101/0x870 drivers/gpu/drm/tiny/bochs.c:645
    local_pci_probe+0xe1/0x1a0 drivers/pci/pci-driver.c:323
    pci_call_probe drivers/pci/pci-driver.c:380 [inline]
    __pci_device_probe drivers/pci/pci-driver.c:405 [inline]
    pci_device_probe+0x3dd/0x6f0 drivers/pci/pci-driver.c:448
    call_driver_probe drivers/base/dd.c:516 [inline]
    really_probe+0x245/0xcd0 drivers/base/dd.c:595
    __driver_probe_device+0x2f0/0x460 drivers/base/dd.c:755
    driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:785
    __driver_attach+0x29a/0x500 drivers/base/dd.c:1164
    bus_for_each_dev+0x147/0x1d0 drivers/base/bus.c:301
    bus_add_driver+0x3ae/0x640 drivers/base/bus.c:618
    driver_register+0x23c/0x3d0 drivers/base/driver.c:240
    bochs_init+0x78/0x86 drivers/gpu/drm/tiny/bochs.c:722
    do_one_initcall+0x103/0x540 init/main.c:1300
    do_initcall_level init/main.c:1373 [inline]
    do_initcalls init/main.c:1389 [inline]
    do_basic_setup init/main.c:1408 [inline]
    kernel_init_freeable+0x6b4/0x73d init/main.c:1613
    kernel_init+0x1a/0x1d0 init/main.c:1504
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:300

-> #0 ((console_sem).lock){-.-.}-{2:2}:
    check_prev_add kernel/locking/lockdep.c:3053 [inline]
    check_prevs_add kernel/locking/lockdep.c:3172 [inline]
    validate_chain kernel/locking/lockdep.c:3788 [inline]
    __lock_acquire+0x2a41/0x5340 kernel/locking/lockdep.c:5012
    lock_acquire kernel/locking/lockdep.c:5623 [inline]
    lock_acquire+0x1ab/0x4e0 kernel/locking/lockdep.c:5588
    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
    _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
    down_trylock+0xe/0x60 kernel/locking/semaphore.c:138
    __down_trylock_console_sem+0x40/0x120 kernel/printk/printk.c:239
    console_trylock kernel/printk/printk.c:2575 [inline]
    console_trylock_spinning kernel/printk/printk.c:1867 [inline]
    vprintk_emit+0x14b/0x520 kernel/printk/printk.c:2273
    vprintk+0x72/0x90 kernel/printk/printk_safe.c:45
    _printk+0xba/0xed kernel/printk/printk.c:2299
    ex_handler_msr.cold+0xb7/0x147 arch/x86/mm/extable.c:90
    fixup_exception+0x973/0xbb0 arch/x86/mm/extable.c:187
    __exc_general_protection arch/x86/kernel/traps.c:601 [inline]
    exc_general_protection+0xed/0x2f0 arch/x86/kernel/traps.c:562
    asm_exc_general_protection+0x22/0x30 arch/x86/include/asm/idtentry.h:562
    __wrmsr arch/x86/include/asm/msr.h:103 [inline]
    native_write_msr arch/x86/include/asm/msr.h:154 [inline]
    wrmsrl arch/x86/include/asm/msr.h:271 [inline]
    __x86_pmu_enable_event arch/x86/events/intel/../perf_event.h:1120 [inline]
    intel_pmu_enable_event+0x2d9/0xff0 arch/x86/events/intel/core.c:2694
    x86_pmu_start+0x1cc/0x270 arch/x86/events/core.c:1520
    x86_pmu_enable+0x481/0xdf0 arch/x86/events/core.c:1337
    perf_pmu_enable kernel/events/core.c:1243 [inline]
    perf_pmu_enable kernel/events/core.c:1239 [inline]
    perf_event_context_sched_in kernel/events/core.c:4033 [inline]
    __perf_event_task_sched_in+0x79c/0xb90 kernel/events/core.c:4071
    perf_event_task_sched_in include/linux/perf_event.h:1229 [inline]
    finish_task_switch.isra.0+0x554/0xc80 kernel/sched/core.c:4899
    context_switch kernel/sched/core.c:5033 [inline]
    __schedule+0xb5c/0x5560 kernel/sched/core.c:6376
    preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6552
    preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:34
    __raw_spin_unlock include/linux/spinlock_api_smp.h:152 [inline]
    _raw_spin_unlock+0x36/0x40 kernel/locking/spinlock.c:186
    spin_unlock include/linux/spinlock.h:403 [inline]
    zap_pte_range mm/memory.c:1435 [inline]
    zap_pmd_range mm/memory.c:1494 [inline]
    zap_pud_range mm/memory.c:1523 [inline]
    zap_p4d_range mm/memory.c:1544 [inline]
    unmap_page_range+0x1090/0x2a40 mm/memory.c:1565
    unmap_single_vma+0x198/0x310 mm/memory.c:1610
    unmap_vmas+0x16d/0x2f0 mm/memory.c:1642
    exit_mmap+0x1d0/0x620 mm/mmap.c:3186
    __mmput+0x122/0x4b0 kernel/fork.c:1126
    mmput+0x58/0x60 kernel/fork.c:1147
    exit_mm kernel/exit.c:548 [inline]
    do_exit+0xa8f/0x2930 kernel/exit.c:859
    do_group_exit+0x125/0x310 kernel/exit.c:994
    get_signal+0x337/0x26b0 kernel/signal.c:2889
    arch_do_signal_or_restart+0x2a6/0x1c10 arch/x86/kernel/signal.c:867
    handle_signal_work kernel/entry/common.c:154 [inline]
    exit_to_user_mode_loop kernel/entry/common.c:178 [inline]
    exit_to_user_mode_prepare+0x13d/0x280 kernel/entry/common.c:214
    irqentry_exit_to_user_mode+0x5/0x40 kernel/entry/common.c:320
    exc_page_fault+0xc6/0x170 arch/x86/mm/fault.c:1535
    asm_exc_page_fault+0x22/0x30 arch/x86/include/asm/idtentry.h:568

other info that might help us debug this:

Chain exists of:
  (console_sem).lock --> &rq->__lock --> &ctx->lock

 Possible unsafe locking scenario:

    CPU0                CPU1
    ----                ----
  lock(&ctx->lock);
                            lock(&rq->__lock);
                            lock(&ctx->lock);
  lock((console_sem).lock);

 *** DEADLOCK ***

2 locks held by syz-executor.2/150802:
 #0: ffff888063e42e20 (&cpuctx_lock){-...}-{2:2}, at: perf_ctx_lock
kernel/events/core.c:167 [inline]
 #0: ffff888063e42e20 (&cpuctx_lock){-...}-{2:2}, at:
perf_event_context_sched_in kernel/events/core.c:4009 [inline]
 #0: ffff888063e42e20 (&cpuctx_lock){-...}-{2:2}, at:
__perf_event_task_sched_in+0x4ab/0xb90 kernel/events/core.c:4071
 #1: ffff88809b9f8c20 (&ctx->lock){....}-{2:2}, at: perf_ctx_lock
kernel/events/core.c:169 [inline]
 #1: ffff88809b9f8c20 (&ctx->lock){....}-{2:2}, at:
perf_event_context_sched_in kernel/events/core.c:4009 [inline]
 #1: ffff88809b9f8c20 (&ctx->lock){....}-{2:2}, at:
__perf_event_task_sched_in+0x4bc/0xb90 kernel/events/core.c:4071

stack backtrace:
CPU: 0 PID: 150802 Comm: syz-executor.2 Not tainted 5.15.156 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 check_noncircular+0x268/0x310 kernel/locking/lockdep.c:2133
 check_prev_add kernel/locking/lockdep.c:3053 [inline]
 check_prevs_add kernel/locking/lockdep.c:3172 [inline]
 validate_chain kernel/locking/lockdep.c:3788 [inline]
 __lock_acquire+0x2a41/0x5340 kernel/locking/lockdep.c:5012
 lock_acquire kernel/locking/lockdep.c:5623 [inline]
 lock_acquire+0x1ab/0x4e0 kernel/locking/lockdep.c:5588
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
 down_trylock+0xe/0x60 kernel/locking/semaphore.c:138
 __down_trylock_console_sem+0x40/0x120 kernel/printk/printk.c:239
 console_trylock kernel/printk/printk.c:2575 [inline]
 console_trylock_spinning kernel/printk/printk.c:1867 [inline]
 vprintk_emit+0x14b/0x520 kernel/printk/printk.c:2273
 vprintk+0x72/0x90 kernel/printk/printk_safe.c:45
 _printk+0xba/0xed kernel/printk/printk.c:2299
 ex_handler_msr.cold+0xb7/0x147 arch/x86/mm/extable.c:90
 fixup_exception+0x973/0xbb0 arch/x86/mm/extable.c:187
 __exc_general_protection arch/x86/kernel/traps.c:601 [inline]
 exc_general_protection+0xed/0x2f0 arch/x86/kernel/traps.c:562
 asm_exc_general_protection+0x22/0x30 arch/x86/include/asm/idtentry.h:562
RIP: 0010:__wrmsr arch/x86/include/asm/msr.h:103 [inline]
RIP: 0010:native_write_msr arch/x86/include/asm/msr.h:154 [inline]
RIP: 0010:wrmsrl arch/x86/include/asm/msr.h:271 [inline]
RIP: 0010:__x86_pmu_enable_event
arch/x86/events/intel/../perf_event.h:1120 [inline]
RIP: 0010:intel_pmu_enable_event+0x2d9/0xff0 arch/x86/events/intel/core.c:2694
Code: ea 03 49 81 cc 00 00 40 00 4d 21 f4 80 3c 02 00 0f 85 5b 0c 00
00 44 8b ab 70 01 00 00 4c 89 e2 44 89 e0 48 c1 ea 20 44 89 e9 <0f> 30
0f 1f 44 00 00 e8 1b 32 75 00 48 83 c4 20 5b 5d 41 5c 41 5d
RSP: 0018:ffffc900115af348 EFLAGS: 00010002
RAX: 0000000000530000 RBX: ffff888019dd6a50 RCX: 0000000000000188
RDX: 0000000000000002 RSI: ffffffff81029464 RDI: ffff888019dd6bc0
RBP: 0000000000000000 R08: 0000000000000001 R09: ffff888063e22ab7
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000200530000
R13: 0000000000000188 R14: ffffffffffffffff R15: ffff888019dd6bb0
 x86_pmu_start+0x1cc/0x270 arch/x86/events/core.c:1520
 x86_pmu_enable+0x481/0xdf0 arch/x86/events/core.c:1337
 perf_pmu_enable kernel/events/core.c:1243 [inline]
 perf_pmu_enable kernel/events/core.c:1239 [inline]
 perf_event_context_sched_in kernel/events/core.c:4033 [inline]
 __perf_event_task_sched_in+0x79c/0xb90 kernel/events/core.c:4071
 perf_event_task_sched_in include/linux/perf_event.h:1229 [inline]
 finish_task_switch.isra.0+0x554/0xc80 kernel/sched/core.c:4899
 context_switch kernel/sched/core.c:5033 [inline]
 __schedule+0xb5c/0x5560 kernel/sched/core.c:6376
 preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6552
 preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:34
 __raw_spin_unlock include/linux/spinlock_api_smp.h:152 [inline]
 _raw_spin_unlock+0x36/0x40 kernel/locking/spinlock.c:186
 spin_unlock include/linux/spinlock.h:403 [inline]
 zap_pte_range mm/memory.c:1435 [inline]
 zap_pmd_range mm/memory.c:1494 [inline]
 zap_pud_range mm/memory.c:1523 [inline]
 zap_p4d_range mm/memory.c:1544 [inline]
 unmap_page_range+0x1090/0x2a40 mm/memory.c:1565
 unmap_single_vma+0x198/0x310 mm/memory.c:1610
 unmap_vmas+0x16d/0x2f0 mm/memory.c:1642
 exit_mmap+0x1d0/0x620 mm/mmap.c:3186
 __mmput+0x122/0x4b0 kernel/fork.c:1126
 mmput+0x58/0x60 kernel/fork.c:1147
 exit_mm kernel/exit.c:548 [inline]
 do_exit+0xa8f/0x2930 kernel/exit.c:859
 do_group_exit+0x125/0x310 kernel/exit.c:994
 get_signal+0x337/0x26b0 kernel/signal.c:2889
 arch_do_signal_or_restart+0x2a6/0x1c10 arch/x86/kernel/signal.c:867
 handle_signal_work kernel/entry/common.c:154 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:178 [inline]
 exit_to_user_mode_prepare+0x13d/0x280 kernel/entry/common.c:214
 irqentry_exit_to_user_mode+0x5/0x40 kernel/entry/common.c:320
 exc_page_fault+0xc6/0x170 arch/x86/mm/fault.c:1535
 asm_exc_page_fault+0x22/0x30 arch/x86/include/asm/idtentry.h:568
RIP: 0033:0x7f7cb7755e07
Code: Unable to access opcode bytes at RIP 0x7f7cb7755ddd.
RSP: 002b:00007f7cb5d14df0 EFLAGS: 00010246
RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00007f7cb7831920 RSI: 0000000000000000 RDI: 00007f7cb840e000
RBP: 00007f7cb5d14ef0 R08: 00007f7cb840e000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f7cb783ffdc R14: 00007f7cb7840512 R15: 00007f7cb5cf6000
 </TASK>
Call Trace:
 <TASK>
 x86_pmu_start+0x1cc/0x270 arch/x86/events/core.c:1520
 x86_pmu_enable+0x481/0xdf0 arch/x86/events/core.c:1337
 perf_pmu_enable kernel/events/core.c:1243 [inline]
 perf_pmu_enable kernel/events/core.c:1239 [inline]
 perf_event_context_sched_in kernel/events/core.c:4033 [inline]
 __perf_event_task_sched_in+0x79c/0xb90 kernel/events/core.c:4071
 perf_event_task_sched_in include/linux/perf_event.h:1229 [inline]
 finish_task_switch.isra.0+0x554/0xc80 kernel/sched/core.c:4899
 context_switch kernel/sched/core.c:5033 [inline]
 __schedule+0xb5c/0x5560 kernel/sched/core.c:6376
 preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6552
 preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:34
 __raw_spin_unlock include/linux/spinlock_api_smp.h:152 [inline]
 _raw_spin_unlock+0x36/0x40 kernel/locking/spinlock.c:186
 spin_unlock include/linux/spinlock.h:403 [inline]
 zap_pte_range mm/memory.c:1435 [inline]
 zap_pmd_range mm/memory.c:1494 [inline]
 zap_pud_range mm/memory.c:1523 [inline]
 zap_p4d_range mm/memory.c:1544 [inline]
 unmap_page_range+0x1090/0x2a40 mm/memory.c:1565
 unmap_single_vma+0x198/0x310 mm/memory.c:1610
 unmap_vmas+0x16d/0x2f0 mm/memory.c:1642
 exit_mmap+0x1d0/0x620 mm/mmap.c:3186
 __mmput+0x122/0x4b0 kernel/fork.c:1126
 mmput+0x58/0x60 kernel/fork.c:1147
 exit_mm kernel/exit.c:548 [inline]
 do_exit+0xa8f/0x2930 kernel/exit.c:859
 do_group_exit+0x125/0x310 kernel/exit.c:994
 get_signal+0x337/0x26b0 kernel/signal.c:2889
 arch_do_signal_or_restart+0x2a6/0x1c10 arch/x86/kernel/signal.c:867
 handle_signal_work kernel/entry/common.c:154 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:178 [inline]
 exit_to_user_mode_prepare+0x13d/0x280 kernel/entry/common.c:214
 irqentry_exit_to_user_mode+0x5/0x40 kernel/entry/common.c:320
 exc_page_fault+0xc6/0x170 arch/x86/mm/fault.c:1535
 asm_exc_page_fault+0x22/0x30 arch/x86/include/asm/idtentry.h:568
RIP: 0033:0x7f7cb7755e07
Code: Unable to access opcode bytes at RIP 0x7f7cb7755ddd.
RSP: 002b:00007f7cb5d14df0 EFLAGS: 00010246
RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00007f7cb7831920 RSI: 0000000000000000 RDI: 00007f7cb840e000
RBP: 00007f7cb5d14ef0 R08: 00007f7cb840e000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f7cb783ffdc R14: 00007f7cb7840512 R15: 00007f7cb5cf6000
 </TASK>
----------------
Code disassembly (best guess), 1 bytes skipped:
   0:    03 49 81        add -0x7f(%rcx),%ecx
   3:    cc              int3
   4:    00 00           add %al,(%rax)
   6:    40 00 4d 21     rex add %cl,0x21(%rbp)
   a:    f4              hlt
   b:    80 3c 02 00     cmpb   $0x0,(%rdx,%rax,1)
   f:    0f 85 5b 0c 00 00   jne 0xc70
  15:    44 8b ab 70 01 00 00     mov 0x170(%rbx),%r13d
  1c:    4c 89 e2        mov %r12,%rdx
  1f:    44 89 e0        mov %r12d,%eax
  22:    48 c1 ea 20     shr $0x20,%rdx
  26:    44 89 e9        mov %r13d,%ecx
* 29:    0f 30           wrmsr <-- trapping instruction
  2b:    0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
  30:    e8 1b 32 75 00  call   0x753250
  35:    48 83 c4 20     add $0x20,%rsp
  39:    5b              pop %rbx
  3a:    5d              pop %rbp
  3b:    41 5c           pop %r12
  3d:    41 5d           pop %r13
======================================================

Wishing you a nice day!

Best,
Marius

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: possible deadlock in __perf_event_task_sched_in
  2024-04-22 18:44 possible deadlock in __perf_event_task_sched_in Marius Fleischer
@ 2024-04-24  9:43 ` Peter Zijlstra
  2024-04-29 16:38   ` Marius Fleischer
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2024-04-24  9:43 UTC (permalink / raw)
  To: Marius Fleischer
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, linux-perf-users, linux-kernel,
	syzkaller, harrisonmichaelgreen

On Mon, Apr 22, 2024 at 11:44:27AM -0700, Marius Fleischer wrote:
> Hi,
> 
> We would like to report the following bug which has been found by our
> modified version of syzkaller.
> 
> We found this report (https://lkml.org/lkml/2021/9/12/333) that seems
> to have a similar but different stack trace. We are unable to tell,
> though, whether it is the same cause. We’d be grateful for your
> advice.

This is just the printk thing sucks again. Some WARN/printk got tripped
in a non-suitable context.


>  _printk+0xba/0xed kernel/printk/printk.c:2299
>  ex_handler_msr.cold+0xb7/0x147 arch/x86/mm/extable.c:90
>  fixup_exception+0x973/0xbb0 arch/x86/mm/extable.c:187
>  __exc_general_protection arch/x86/kernel/traps.c:601 [inline]
>  exc_general_protection+0xed/0x2f0 arch/x86/kernel/traps.c:562
>  asm_exc_general_protection+0x22/0x30 arch/x86/include/asm/idtentry.h:562
> RIP: 0010:__wrmsr arch/x86/include/asm/msr.h:103 [inline]
> RIP: 0010:native_write_msr arch/x86/include/asm/msr.h:154 [inline]
> RIP: 0010:wrmsrl arch/x86/include/asm/msr.h:271 [inline]
> RIP: 0010:__x86_pmu_enable_event
> arch/x86/events/intel/../perf_event.h:1120 [inline]
> RIP: 0010:intel_pmu_enable_event+0x2d9/0xff0 arch/x86/events/intel/core.c:2694
> Code: ea 03 49 81 cc 00 00 40 00 4d 21 f4 80 3c 02 00 0f 85 5b 0c 00
> 00 44 8b ab 70 01 00 00 4c 89 e2 44 89 e0 48 c1 ea 20 44 89 e9 <0f> 30
> 0f 1f 44 00 00 e8 1b 32 75 00 48 83 c4 20 5b 5d 41 5c 41 5d
> RSP: 0018:ffffc900115af348 EFLAGS: 00010002
> RAX: 0000000000530000 RBX: ffff888019dd6a50 RCX: 0000000000000188
> RDX: 0000000000000002 RSI: ffffffff81029464 RDI: ffff888019dd6bc0
> RBP: 0000000000000000 R08: 0000000000000001 R09: ffff888063e22ab7
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000200530000
> R13: 0000000000000188 R14: ffffffffffffffff R15: ffff888019dd6bb0
>  x86_pmu_start+0x1cc/0x270 arch/x86/events/core.c:1520
>  x86_pmu_enable+0x481/0xdf0 arch/x86/events/core.c:1337
>  perf_pmu_enable kernel/events/core.c:1243 [inline]
>  perf_pmu_enable kernel/events/core.c:1239 [inline]

Most likely your VM is wonky and perf tries to poke an MSR that either
doesn't exist or isn't emulated properly, who knows.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: possible deadlock in __perf_event_task_sched_in
  2024-04-24  9:43 ` Peter Zijlstra
@ 2024-04-29 16:38   ` Marius Fleischer
  0 siblings, 0 replies; 3+ messages in thread
From: Marius Fleischer @ 2024-04-29 16:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, linux-perf-users, linux-kernel,
	syzkaller, harrisonmichaelgreen

Hi Peter,

Thanks for taking the time to explain this issue!

Wishing you a nice day!

Best,
Marius

On Wed, 24 Apr 2024 at 02:43, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Apr 22, 2024 at 11:44:27AM -0700, Marius Fleischer wrote:
> > Hi,
> >
> > We would like to report the following bug which has been found by our
> > modified version of syzkaller.
> >
> > We found this report (https://lkml.org/lkml/2021/9/12/333) that seems
> > to have a similar but different stack trace. We are unable to tell,
> > though, whether it is the same cause. We’d be grateful for your
> > advice.
>
> This is just the printk thing sucks again. Some WARN/printk got tripped
> in a non-suitable context.
>
>
> >  _printk+0xba/0xed kernel/printk/printk.c:2299
> >  ex_handler_msr.cold+0xb7/0x147 arch/x86/mm/extable.c:90
> >  fixup_exception+0x973/0xbb0 arch/x86/mm/extable.c:187
> >  __exc_general_protection arch/x86/kernel/traps.c:601 [inline]
> >  exc_general_protection+0xed/0x2f0 arch/x86/kernel/traps.c:562
> >  asm_exc_general_protection+0x22/0x30 arch/x86/include/asm/idtentry.h:562
> > RIP: 0010:__wrmsr arch/x86/include/asm/msr.h:103 [inline]
> > RIP: 0010:native_write_msr arch/x86/include/asm/msr.h:154 [inline]
> > RIP: 0010:wrmsrl arch/x86/include/asm/msr.h:271 [inline]
> > RIP: 0010:__x86_pmu_enable_event
> > arch/x86/events/intel/../perf_event.h:1120 [inline]
> > RIP: 0010:intel_pmu_enable_event+0x2d9/0xff0 arch/x86/events/intel/core.c:2694
> > Code: ea 03 49 81 cc 00 00 40 00 4d 21 f4 80 3c 02 00 0f 85 5b 0c 00
> > 00 44 8b ab 70 01 00 00 4c 89 e2 44 89 e0 48 c1 ea 20 44 89 e9 <0f> 30
> > 0f 1f 44 00 00 e8 1b 32 75 00 48 83 c4 20 5b 5d 41 5c 41 5d
> > RSP: 0018:ffffc900115af348 EFLAGS: 00010002
> > RAX: 0000000000530000 RBX: ffff888019dd6a50 RCX: 0000000000000188
> > RDX: 0000000000000002 RSI: ffffffff81029464 RDI: ffff888019dd6bc0
> > RBP: 0000000000000000 R08: 0000000000000001 R09: ffff888063e22ab7
> > R10: 0000000000000000 R11: 0000000000000001 R12: 0000000200530000
> > R13: 0000000000000188 R14: ffffffffffffffff R15: ffff888019dd6bb0
> >  x86_pmu_start+0x1cc/0x270 arch/x86/events/core.c:1520
> >  x86_pmu_enable+0x481/0xdf0 arch/x86/events/core.c:1337
> >  perf_pmu_enable kernel/events/core.c:1243 [inline]
> >  perf_pmu_enable kernel/events/core.c:1239 [inline]
>
> Most likely your VM is wonky and perf tries to poke an MSR that either
> doesn't exist or isn't emulated properly, who knows.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-29 16:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-22 18:44 possible deadlock in __perf_event_task_sched_in Marius Fleischer
2024-04-24  9:43 ` Peter Zijlstra
2024-04-29 16:38   ` Marius Fleischer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).