* possible deadlock in __ata_sff_interrupt @ 2022-12-13 15:09 Wei Chen 2022-12-15 9:48 ` Damien Le Moal 0 siblings, 1 reply; 14+ messages in thread From: Wei Chen @ 2022-12-13 15:09 UTC (permalink / raw) To: damien.lemoal, linux-ide, linux-kernel, syzkaller-bugs, syzbot Dear Linux Developer, Recently, when using our tool to fuzz kernel, the following crash was triggered. HEAD commit: 094226ad94f4 Linux v6.1-rc5 git tree: upstream compiler: clang 12.0.1 console output: https://drive.google.com/file/d/1QZttkbuLed4wp6U32UR6TpxfY_HHCIqQ/view?usp=share_link kernel config: https://drive.google.com/file/d/1TdPsg_5Zon8S2hEFpLBWjb8Tnd2KA5WJ/view?usp=share_link Unfortunately, I didn't have a reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: Wei Chen <harperchen1110@gmail.com> ===================================================== WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 6.1.0-rc5 #40 Not tainted ----------------------------------------------------- syz-executor.0/27911 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at: kill_fasync_rcu fs/fcntl.c:996 [inline] ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at: kill_fasync+0x13b/0x430 fs/fcntl.c:1017 and this task is already holding: ffff8880144dec18 (&host->lock){-.-.}-{2:2}, at: ata_scsi_queuecmd+0x7a/0x130 drivers/ata/libata-scsi.c:4048 which would create a new lock dependency: (&host->lock){-.-.}-{2:2} -> (&new->fa_lock){....}-{2:2} but this new dependency connects a HARDIRQ-irq-safe lock: (&host->lock){-.-.}-{2:2} ... which became HARDIRQ-irq-safe at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xb3/0x100 kernel/locking/spinlock.c:162 __ata_sff_interrupt+0x23/0x710 drivers/ata/libata-sff.c:1540 __handle_irq_event_percpu+0x1f7/0x620 kernel/irq/handle.c:158 handle_irq_event_percpu kernel/irq/handle.c:193 [inline] handle_irq_event+0x83/0x1e0 kernel/irq/handle.c:210 handle_edge_irq+0x245/0xbe0 kernel/irq/chip.c:819 generic_handle_irq_desc include/linux/irqdesc.h:158 [inline] handle_irq arch/x86/kernel/irq.c:231 [inline] __common_interrupt+0xce/0x1e0 arch/x86/kernel/irq.c:250 common_interrupt+0x9f/0xc0 arch/x86/kernel/irq.c:240 asm_common_interrupt+0x22/0x40 arch/x86/include/asm/idtentry.h:640 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:160 [inline] _raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202 process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289 worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 to a HARDIRQ-irq-unsafe lock: (tasklist_lock){.+.+}-{2:2} ... which became HARDIRQ-irq-unsafe at: ... lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock include/linux/rwlock_api_smp.h:150 [inline] _raw_read_lock+0x32/0x40 kernel/locking/spinlock.c:228 do_wait+0x224/0x9d0 kernel/exit.c:1533 kernel_wait+0xe4/0x230 kernel/exit.c:1723 call_usermodehelper_exec_sync kernel/umh.c:140 [inline] call_usermodehelper_exec_work+0xb4/0x220 kernel/umh.c:167 process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289 worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 other info that might help us debug this: Chain exists of: &host->lock --> &new->fa_lock --> tasklist_lock Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(tasklist_lock); local_irq_disable(); lock(&host->lock); lock(&new->fa_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** 3 locks held by syz-executor.0/27911: #0: ffffffff8cd20b60 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x9/0x30 include/linux/rcupdate.h:304 #1: ffff8880144dec18 (&host->lock){-.-.}-{2:2}, at: ata_scsi_queuecmd+0x7a/0x130 drivers/ata/libata-scsi.c:4048 #2: ffffffff8cd20b60 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x5/0x30 include/linux/rcupdate.h:303 the dependencies between HARDIRQ-irq-safe lock and the holding lock: -> (&host->lock){-.-.}-{2:2} { IN-HARDIRQ-W at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xb3/0x100 kernel/locking/spinlock.c:162 __ata_sff_interrupt+0x23/0x710 drivers/ata/libata-sff.c:1540 __handle_irq_event_percpu+0x1f7/0x620 kernel/irq/handle.c:158 handle_irq_event_percpu kernel/irq/handle.c:193 [inline] handle_irq_event+0x83/0x1e0 kernel/irq/handle.c:210 handle_edge_irq+0x245/0xbe0 kernel/irq/chip.c:819 generic_handle_irq_desc include/linux/irqdesc.h:158 [inline] handle_irq arch/x86/kernel/irq.c:231 [inline] __common_interrupt+0xce/0x1e0 arch/x86/kernel/irq.c:250 common_interrupt+0x9f/0xc0 arch/x86/kernel/irq.c:240 asm_common_interrupt+0x22/0x40 arch/x86/include/asm/idtentry.h:640 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:160 [inline] _raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202 process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289 worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 IN-SOFTIRQ-W at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xb3/0x100 kernel/locking/spinlock.c:162 __ata_sff_interrupt+0x23/0x710 drivers/ata/libata-sff.c:1540 __handle_irq_event_percpu+0x1f7/0x620 kernel/irq/handle.c:158 handle_irq_event_percpu kernel/irq/handle.c:193 [inline] handle_irq_event+0x83/0x1e0 kernel/irq/handle.c:210 handle_edge_irq+0x245/0xbe0 kernel/irq/chip.c:819 generic_handle_irq_desc include/linux/irqdesc.h:158 [inline] handle_irq arch/x86/kernel/irq.c:231 [inline] __common_interrupt+0xce/0x1e0 arch/x86/kernel/irq.c:250 common_interrupt+0x4a/0xc0 arch/x86/kernel/irq.c:240 asm_common_interrupt+0x22/0x40 arch/x86/include/asm/idtentry.h:640 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:160 [inline] _raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202 __run_timers+0x922/0x970 kernel/time/timer.c:1792 run_timer_softirq+0x63/0xf0 kernel/time/timer.c:1803 __do_softirq+0x277/0x73a kernel/softirq.c:571 __irq_exit_rcu+0xcf/0x150 kernel/softirq.c:650 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662 sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1107 asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline] _raw_spin_unlock_irqrestore+0xbc/0x120 kernel/locking/spinlock.c:194 spin_unlock_irqrestore include/linux/spinlock.h:405 [inline] ata_scsi_queuecmd+0xc6/0x130 drivers/ata/libata-scsi.c:4058 scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1524 [inline] scsi_queue_rq+0x1ea6/0x2ec0 drivers/scsi/scsi_lib.c:1760 blk_mq_dispatch_rq_list+0x104f/0x2ca0 block/blk-mq.c:1992 __blk_mq_do_dispatch_sched block/blk-mq-sched.c:173 [inline] blk_mq_do_dispatch_sched+0x820/0xe60 block/blk-mq-sched.c:187 __blk_mq_sched_dispatch_requests+0x39b/0x490 blk_mq_sched_dispatch_requests+0xef/0x160 block/blk-mq-sched.c:339 __blk_mq_run_hw_queue+0x1cf/0x260 block/blk-mq.c:2110 process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289 worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 INITIAL USE at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xb3/0x100 kernel/locking/spinlock.c:162 ata_dev_init drivers/ata/libata-core.c:5202 [inline] ata_link_init+0x236/0x890 drivers/ata/libata-core.c:5247 ata_port_alloc+0x3ce/0x470 drivers/ata/libata-core.c:5322 ata_host_alloc+0x198/0x2c0 drivers/ata/libata-core.c:5433 ata_host_alloc_pinfo+0x27/0x430 drivers/ata/libata-core.c:5476 ata_pci_sff_prepare_host+0x40/0xe0 drivers/ata/libata-sff.c:2305 ata_pci_bmdma_prepare_host+0x20/0x70 drivers/ata/libata-sff.c:3210 piix_init_one+0x628/0x1ed0 drivers/ata/ata_piix.c:1704 local_pci_probe drivers/pci/pci-driver.c:324 [inline] pci_call_probe drivers/pci/pci-driver.c:392 [inline] __pci_device_probe drivers/pci/pci-driver.c:417 [inline] pci_device_probe+0x4fe/0xa60 drivers/pci/pci-driver.c:460 call_driver_probe+0x96/0x250 really_probe+0x237/0xaf0 drivers/base/dd.c:639 __driver_probe_device+0x1f8/0x3e0 drivers/base/dd.c:778 driver_probe_device+0x50/0x240 drivers/base/dd.c:808 __driver_attach+0x2b6/0x5b0 drivers/base/dd.c:1190 bus_for_each_dev+0x168/0x1d0 drivers/base/bus.c:301 bus_add_driver+0x32f/0x600 drivers/base/bus.c:618 driver_register+0x2e9/0x3e0 drivers/base/driver.c:246 piix_init+0x1b/0x41 drivers/ata/ata_piix.c:1774 do_one_initcall+0x1a7/0x400 init/main.c:1303 do_initcall_level+0x168/0x218 init/main.c:1376 do_initcalls+0x4b/0x8c init/main.c:1392 kernel_init_freeable+0x428/0x5d8 init/main.c:1631 kernel_init+0x19/0x2b0 init/main.c:1519 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 } ... key at: [<ffffffff9114fc40>] ata_host_alloc.__key+0x0/0x40 the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock: -> (tasklist_lock){.+.+}-{2:2} { HARDIRQ-ON-R at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock include/linux/rwlock_api_smp.h:150 [inline] _raw_read_lock+0x32/0x40 kernel/locking/spinlock.c:228 do_wait+0x224/0x9d0 kernel/exit.c:1533 kernel_wait+0xe4/0x230 kernel/exit.c:1723 call_usermodehelper_exec_sync kernel/umh.c:140 [inline] call_usermodehelper_exec_work+0xb4/0x220 kernel/umh.c:167 process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289 worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 SOFTIRQ-ON-R at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock include/linux/rwlock_api_smp.h:150 [inline] _raw_read_lock+0x32/0x40 kernel/locking/spinlock.c:228 do_wait+0x224/0x9d0 kernel/exit.c:1533 kernel_wait+0xe4/0x230 kernel/exit.c:1723 call_usermodehelper_exec_sync kernel/umh.c:140 [inline] call_usermodehelper_exec_work+0xb4/0x220 kernel/umh.c:167 process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289 worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 INITIAL USE at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_write_lock_irq include/linux/rwlock_api_smp.h:195 [inline] _raw_write_lock_irq+0xae/0xf0 kernel/locking/spinlock.c:326 copy_process+0x37cf/0x6200 kernel/fork.c:2387 kernel_clone+0x212/0x610 kernel/fork.c:2671 user_mode_thread+0x12d/0x190 kernel/fork.c:2747 rest_init+0x21/0x270 init/main.c:694 start_kernel+0x0/0x540 init/main.c:890 start_kernel+0x49a/0x540 init/main.c:1145 secondary_startup_64_no_verify+0xcf/0xdb INITIAL READ USE at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock include/linux/rwlock_api_smp.h:150 [inline] _raw_read_lock+0x32/0x40 kernel/locking/spinlock.c:228 do_wait+0x224/0x9d0 kernel/exit.c:1533 kernel_wait+0xe4/0x230 kernel/exit.c:1723 call_usermodehelper_exec_sync kernel/umh.c:140 [inline] call_usermodehelper_exec_work+0xb4/0x220 kernel/umh.c:167 process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289 worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 } ... key at: [<ffffffff8ca0a058>] tasklist_lock+0x18/0x40 ... acquired at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock include/linux/rwlock_api_smp.h:150 [inline] _raw_read_lock+0x32/0x40 kernel/locking/spinlock.c:228 send_sigio+0xbe/0x300 fs/fcntl.c:792 kill_fasync_rcu fs/fcntl.c:1003 [inline] kill_fasync+0x1e4/0x430 fs/fcntl.c:1017 __receive_buf drivers/tty/n_tty.c:1629 [inline] n_tty_receive_buf_common+0xaac/0x1370 drivers/tty/n_tty.c:1711 tiocsti drivers/tty/tty_io.c:2286 [inline] tty_ioctl+0xda7/0x1710 drivers/tty/tty_io.c:2685 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> (&f->f_owner.lock){....}-{2:2} { INITIAL USE at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_write_lock_irq include/linux/rwlock_api_smp.h:195 [inline] _raw_write_lock_irq+0xae/0xf0 kernel/locking/spinlock.c:326 f_modown+0x38/0x340 fs/fcntl.c:91 __tty_fasync drivers/tty/tty_io.c:2237 [inline] tty_fasync+0x24a/0x340 drivers/tty/tty_io.c:2252 setfl fs/fcntl.c:73 [inline] do_fcntl+0xe6c/0x1350 fs/fcntl.c:340 __do_sys_fcntl fs/fcntl.c:454 [inline] __se_sys_fcntl+0xd5/0x1b0 fs/fcntl.c:439 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd INITIAL READ USE at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock_irq include/linux/rwlock_api_smp.h:169 [inline] _raw_read_lock_irq+0xb6/0x100 kernel/locking/spinlock.c:244 f_getown_ex fs/fcntl.c:212 [inline] do_fcntl+0x1a5/0x1350 fs/fcntl.c:380 __do_sys_fcntl fs/fcntl.c:454 [inline] __se_sys_fcntl+0xd5/0x1b0 fs/fcntl.c:439 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd } ... key at: [<ffffffff90e590c0>] __alloc_file.__key+0x0/0x10 ... acquired at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline] _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236 send_sigio+0x2f/0x300 fs/fcntl.c:778 kill_fasync_rcu fs/fcntl.c:1003 [inline] kill_fasync+0x1e4/0x430 fs/fcntl.c:1017 sock_wake_async+0x133/0x150 rcu_read_unlock include/linux/rcupdate.h:767 [inline] sk_wake_async+0x12e/0x200 include/net/sock.h:2525 sock_def_error_report+0x154/0x200 net/core/sock.c:3264 smc_fback_forward_wakeup+0x1b6/0x500 net/smc/af_smc.c:786 smc_fback_error_report+0x90/0xb0 net/smc/af_smc.c:838 sk_error_report+0x3b/0xb0 net/core/sock.c:345 tcp_validate_incoming+0x1509/0x1fc0 net/ipv4/tcp_input.c:5805 tcp_rcv_state_process+0x513/0x2610 net/ipv4/tcp_input.c:6520 tcp_v4_do_rcv+0x691/0xa10 net/ipv4/tcp_ipv4.c:1704 sk_backlog_rcv include/net/sock.h:1109 [inline] __release_sock+0x106/0x3a0 net/core/sock.c:2906 release_sock+0x5d/0x1c0 net/core/sock.c:3462 sk_stream_wait_memory+0x6d9/0xe20 net/core/stream.c:145 tcp_sendmsg_locked+0x1888/0x4540 net/ipv4/tcp.c:1445 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1483 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x558/0x8a0 net/socket.c:2482 ___sys_sendmsg net/socket.c:2536 [inline] __sys_sendmmsg+0x360/0x6c0 net/socket.c:2622 __do_sys_sendmmsg net/socket.c:2651 [inline] __se_sys_sendmmsg net/socket.c:2648 [inline] __x64_sys_sendmmsg+0x9c/0xb0 net/socket.c:2648 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> (&new->fa_lock){....}-{2:2} { INITIAL USE at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_write_lock_irq include/linux/rwlock_api_smp.h:195 [inline] _raw_write_lock_irq+0xae/0xf0 kernel/locking/spinlock.c:326 fasync_remove_entry+0xff/0x1d0 fs/fcntl.c:873 sock_fasync+0x86/0xf0 net/socket.c:1390 __fput+0x751/0x8c0 fs/file_table.c:317 task_work_run+0x243/0x300 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x1f2/0x210 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x26/0x60 kernel/entry/common.c:296 do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd INITIAL READ USE at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline] _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236 kill_fasync_rcu fs/fcntl.c:996 [inline] kill_fasync+0x13b/0x430 fs/fcntl.c:1017 sock_wake_async+0x133/0x150 rcu_read_unlock include/linux/rcupdate.h:767 [inline] sk_wake_async+0x12e/0x200 include/net/sock.h:2525 sock_def_error_report+0x154/0x200 net/core/sock.c:3264 smc_fback_forward_wakeup+0x1b6/0x500 net/smc/af_smc.c:786 smc_fback_error_report+0x90/0xb0 net/smc/af_smc.c:838 sk_error_report+0x3b/0xb0 net/core/sock.c:345 tcp_validate_incoming+0x1509/0x1fc0 net/ipv4/tcp_input.c:5805 tcp_rcv_state_process+0x513/0x2610 net/ipv4/tcp_input.c:6520 tcp_v4_do_rcv+0x691/0xa10 net/ipv4/tcp_ipv4.c:1704 sk_backlog_rcv include/net/sock.h:1109 [inline] __release_sock+0x106/0x3a0 net/core/sock.c:2906 release_sock+0x5d/0x1c0 net/core/sock.c:3462 sk_stream_wait_memory+0x6d9/0xe20 net/core/stream.c:145 tcp_sendmsg_locked+0x1888/0x4540 net/ipv4/tcp.c:1445 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1483 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x558/0x8a0 net/socket.c:2482 ___sys_sendmsg net/socket.c:2536 [inline] __sys_sendmmsg+0x360/0x6c0 net/socket.c:2622 __do_sys_sendmmsg net/socket.c:2651 [inline] __se_sys_sendmmsg net/socket.c:2648 [inline] __x64_sys_sendmmsg+0x9c/0xb0 net/socket.c:2648 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd } ... key at: [<ffffffff90e59f80>] fasync_insert_entry.__key+0x0/0x40 ... acquired at: lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline] _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236 kill_fasync_rcu fs/fcntl.c:996 [inline] kill_fasync+0x13b/0x430 fs/fcntl.c:1017 sg_rq_end_io+0x604/0xf50 drivers/scsi/sg.c:1403 __blk_mq_end_request+0x2c7/0x380 block/blk-mq.c:1011 scsi_end_request+0x4ed/0x9c0 drivers/scsi/scsi_lib.c:576 scsi_io_completion+0xc25/0x27a0 drivers/scsi/scsi_lib.c:985 ata_scsi_simulate+0x336e/0x3dd0 drivers/ata/libata-scsi.c:4190 __ata_scsi_queuecmd+0x20b/0x1020 drivers/ata/libata-scsi.c:4009 ata_scsi_queuecmd+0xa0/0x130 drivers/ata/libata-scsi.c:4052 scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1524 [inline] scsi_queue_rq+0x1ea6/0x2ec0 drivers/scsi/scsi_lib.c:1760 blk_mq_dispatch_rq_list+0x104f/0x2ca0 block/blk-mq.c:1992 __blk_mq_sched_dispatch_requests+0x382/0x490 block/blk-mq-sched.c:306 blk_mq_sched_dispatch_requests+0xef/0x160 block/blk-mq-sched.c:339 __blk_mq_run_hw_queue+0x1cf/0x260 block/blk-mq.c:2110 blk_mq_sched_insert_request+0x1e2/0x430 block/blk-mq-sched.c:458 blk_execute_rq_nowait+0x2e8/0x3b0 block/blk-mq.c:1305 sg_common_write+0x8c0/0x1970 drivers/scsi/sg.c:832 sg_new_write+0x61f/0x860 drivers/scsi/sg.c:770 sg_ioctl_common drivers/scsi/sg.c:935 [inline] sg_ioctl+0x1c51/0x2be0 drivers/scsi/sg.c:1159 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd stack backtrace: CPU: 0 PID: 27911 Comm: syz-executor.0 Not tainted 6.1.0-rc5 #40 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 print_bad_irq_dependency kernel/locking/lockdep.c:2611 [inline] check_irq_usage kernel/locking/lockdep.c:2850 [inline] check_prev_add kernel/locking/lockdep.c:3101 [inline] check_prevs_add+0x4e5f/0x5b70 kernel/locking/lockdep.c:3216 validate_chain kernel/locking/lockdep.c:3831 [inline] __lock_acquire+0x4411/0x6070 kernel/locking/lockdep.c:5055 lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline] _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236 kill_fasync_rcu fs/fcntl.c:996 [inline] kill_fasync+0x13b/0x430 fs/fcntl.c:1017 sg_rq_end_io+0x604/0xf50 drivers/scsi/sg.c:1403 __blk_mq_end_request+0x2c7/0x380 block/blk-mq.c:1011 scsi_end_request+0x4ed/0x9c0 drivers/scsi/scsi_lib.c:576 scsi_io_completion+0xc25/0x27a0 drivers/scsi/scsi_lib.c:985 ata_scsi_simulate+0x336e/0x3dd0 drivers/ata/libata-scsi.c:4190 __ata_scsi_queuecmd+0x20b/0x1020 drivers/ata/libata-scsi.c:4009 ata_scsi_queuecmd+0xa0/0x130 drivers/ata/libata-scsi.c:4052 scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1524 [inline] scsi_queue_rq+0x1ea6/0x2ec0 drivers/scsi/scsi_lib.c:1760 blk_mq_dispatch_rq_list+0x104f/0x2ca0 block/blk-mq.c:1992 __blk_mq_sched_dispatch_requests+0x382/0x490 block/blk-mq-sched.c:306 blk_mq_sched_dispatch_requests+0xef/0x160 block/blk-mq-sched.c:339 __blk_mq_run_hw_queue+0x1cf/0x260 block/blk-mq.c:2110 blk_mq_sched_insert_request+0x1e2/0x430 block/blk-mq-sched.c:458 blk_execute_rq_nowait+0x2e8/0x3b0 block/blk-mq.c:1305 sg_common_write+0x8c0/0x1970 drivers/scsi/sg.c:832 sg_new_write+0x61f/0x860 drivers/scsi/sg.c:770 sg_ioctl_common drivers/scsi/sg.c:935 [inline] sg_ioctl+0x1c51/0x2be0 drivers/scsi/sg.c:1159 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f153dc8bded Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f153ede2c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f153ddabf80 RCX: 00007f153dc8bded RDX: 0000000020000440 RSI: 0000000000002285 RDI: 0000000000000006 RBP: 00007f153dcf8ce0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f153ddabf80 R13: 00007ffc72e5108f R14: 00007ffc72e51230 R15: 00007f153ede2dc0 </TASK> Best, Wei ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-13 15:09 possible deadlock in __ata_sff_interrupt Wei Chen @ 2022-12-15 9:48 ` Damien Le Moal 2022-12-15 15:19 ` Al Viro 0 siblings, 1 reply; 14+ messages in thread From: Damien Le Moal @ 2022-12-15 9:48 UTC (permalink / raw) To: Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, Alexander Viro, linux-fsdevel, Chuck Lever, Jeff Layton On 12/14/22 00:09, Wei Chen wrote: > Dear Linux Developer, > > Recently, when using our tool to fuzz kernel, the following crash was triggered. > > HEAD commit: 094226ad94f4 Linux v6.1-rc5 > git tree: upstream > compiler: clang 12.0.1 > console output: > https://drive.google.com/file/d/1QZttkbuLed4wp6U32UR6TpxfY_HHCIqQ/view?usp=share_link > kernel config: https://drive.google.com/file/d/1TdPsg_5Zon8S2hEFpLBWjb8Tnd2KA5WJ/view?usp=share_link > > Unfortunately, I didn't have a reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: Wei Chen <harperchen1110@gmail.com> > > ===================================================== > WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected > 6.1.0-rc5 #40 Not tainted > ----------------------------------------------------- > syz-executor.0/27911 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at: kill_fasync_rcu > fs/fcntl.c:996 [inline] > ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at: > kill_fasync+0x13b/0x430 fs/fcntl.c:1017 [...] > stack backtrace: > CPU: 0 PID: 27911 Comm: syz-executor.0 Not tainted 6.1.0-rc5 #40 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.13.0-1ubuntu1.1 04/01/2014 > Call Trace: > <TASK> > __dump_stack lib/dump_stack.c:88 [inline] > dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 > print_bad_irq_dependency kernel/locking/lockdep.c:2611 [inline] > check_irq_usage kernel/locking/lockdep.c:2850 [inline] > check_prev_add kernel/locking/lockdep.c:3101 [inline] > check_prevs_add+0x4e5f/0x5b70 kernel/locking/lockdep.c:3216 > validate_chain kernel/locking/lockdep.c:3831 [inline] > __lock_acquire+0x4411/0x6070 kernel/locking/lockdep.c:5055 > lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668 > __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline] > _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236 > kill_fasync_rcu fs/fcntl.c:996 [inline] > kill_fasync+0x13b/0x430 fs/fcntl.c:1017 > sg_rq_end_io+0x604/0xf50 drivers/scsi/sg.c:1403 The problem is here: sg_rq_end_io() calling kill_fasync(). But at a quick glance, this is not the only driver calling kill_fasync() with a spinlock held with irq disabled... So there may be a fundamental problem with kill_fasync() function if drivers are allowed to do that, or the reverse, all drivers calling that function with a lock held with irq disabled need to be fixed. Al, Chuck, Jeff, Any thought ? > __blk_mq_end_request+0x2c7/0x380 block/blk-mq.c:1011 > scsi_end_request+0x4ed/0x9c0 drivers/scsi/scsi_lib.c:576 > scsi_io_completion+0xc25/0x27a0 drivers/scsi/scsi_lib.c:985 > ata_scsi_simulate+0x336e/0x3dd0 drivers/ata/libata-scsi.c:4190 > __ata_scsi_queuecmd+0x20b/0x1020 drivers/ata/libata-scsi.c:4009 > ata_scsi_queuecmd+0xa0/0x130 drivers/ata/libata-scsi.c:4052 > scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1524 [inline] > scsi_queue_rq+0x1ea6/0x2ec0 drivers/scsi/scsi_lib.c:1760 > blk_mq_dispatch_rq_list+0x104f/0x2ca0 block/blk-mq.c:1992 > __blk_mq_sched_dispatch_requests+0x382/0x490 block/blk-mq-sched.c:306 > blk_mq_sched_dispatch_requests+0xef/0x160 block/blk-mq-sched.c:339 > __blk_mq_run_hw_queue+0x1cf/0x260 block/blk-mq.c:2110 > blk_mq_sched_insert_request+0x1e2/0x430 block/blk-mq-sched.c:458 > blk_execute_rq_nowait+0x2e8/0x3b0 block/blk-mq.c:1305 > sg_common_write+0x8c0/0x1970 drivers/scsi/sg.c:832 > sg_new_write+0x61f/0x860 drivers/scsi/sg.c:770 > sg_ioctl_common drivers/scsi/sg.c:935 [inline] > sg_ioctl+0x1c51/0x2be0 drivers/scsi/sg.c:1159 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:870 [inline] > __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f153dc8bded > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007f153ede2c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007f153ddabf80 RCX: 00007f153dc8bded > RDX: 0000000020000440 RSI: 0000000000002285 RDI: 0000000000000006 > RBP: 00007f153dcf8ce0 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f153ddabf80 > R13: 00007ffc72e5108f R14: 00007ffc72e51230 R15: 00007f153ede2dc0 > </TASK> > > Best, > Wei -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-15 9:48 ` Damien Le Moal @ 2022-12-15 15:19 ` Al Viro 2022-12-16 1:44 ` Damien Le Moal 0 siblings, 1 reply; 14+ messages in thread From: Al Viro @ 2022-12-15 15:19 UTC (permalink / raw) To: Damien Le Moal Cc: Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton On Thu, Dec 15, 2022 at 06:48:20PM +0900, Damien Le Moal wrote: > The problem is here: sg_rq_end_io() calling kill_fasync(). But at a quick > glance, this is not the only driver calling kill_fasync() with a spinlock > held with irq disabled... So there may be a fundamental problem with > kill_fasync() function if drivers are allowed to do that, or the reverse, > all drivers calling that function with a lock held with irq disabled need > to be fixed. > > Al, Chuck, Jeff, > > Any thought ? What is the problem with read_lock_irqsave() called with irqs disabled? read_lock_irq() would have been a bug in such conditions, of course, but that's not what we use... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-15 15:19 ` Al Viro @ 2022-12-16 1:44 ` Damien Le Moal 2022-12-16 3:41 ` Al Viro 0 siblings, 1 reply; 14+ messages in thread From: Damien Le Moal @ 2022-12-16 1:44 UTC (permalink / raw) To: Al Viro Cc: Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton On 12/16/22 00:19, Al Viro wrote: > On Thu, Dec 15, 2022 at 06:48:20PM +0900, Damien Le Moal wrote: > >> The problem is here: sg_rq_end_io() calling kill_fasync(). But at a quick >> glance, this is not the only driver calling kill_fasync() with a spinlock >> held with irq disabled... So there may be a fundamental problem with >> kill_fasync() function if drivers are allowed to do that, or the reverse, >> all drivers calling that function with a lock held with irq disabled need >> to be fixed. >> >> Al, Chuck, Jeff, >> >> Any thought ? > > What is the problem with read_lock_irqsave() called with irqs disabled? > read_lock_irq() would have been a bug in such conditions, of course, but > that's not what we use... The original & complete lockdep splat is in the report email here: https://marc.info/?l=linux-ide&m=167094379710177&w=2 It looks like a spinlock is taken for the fasync stuff without irq disabled and that same spinlock is needed in kill_fasync() which is itself called (potentially) with IRQ disabled. Hence the splat. In any case, that is how I understand the issue. But as mentioned above, given that I can see many drivers calling kill_fasync() with irq disabled, I wonder if this is a genuine potential problem or a false negative. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-16 1:44 ` Damien Le Moal @ 2022-12-16 3:41 ` Al Viro 2022-12-16 11:26 ` Linus Torvalds 0 siblings, 1 reply; 14+ messages in thread From: Al Viro @ 2022-12-16 3:41 UTC (permalink / raw) To: Damien Le Moal Cc: Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra, Linus Torvalds On Fri, Dec 16, 2022 at 10:44:06AM +0900, Damien Le Moal wrote: > The original & complete lockdep splat is in the report email here: > > https://marc.info/?l=linux-ide&m=167094379710177&w=2 > > It looks like a spinlock is taken for the fasync stuff without irq > disabled and that same spinlock is needed in kill_fasync() which is > itself called (potentially) with IRQ disabled. Hence the splat. In any > case, that is how I understand the issue. But as mentioned above, given > that I can see many drivers calling kill_fasync() with irq disabled, I > wonder if this is a genuine potential problem or a false negative. OK, I'm about to fall asleep, so I might very well be missing something obvious, but... CPU1: ptrace(2) ptrace_check_attach() read_lock(&tasklist_lock); CPU2: setpgid(2) write_lock_irq(&tasklist_lock); spins CPU1: takes an interrupt that would call kill_fasync(). grep and the first instance of kill_fasync() is in hpet_interrupt() - it's not something exotic. IRQs disabled on CPU2 won't stop it. kill_fasync(..., SIGIO, ...) kill_fasync_rcu() read_lock_irqsave(&fa->fa_lock, flags); send_sigio() read_lock_irqsave(&fown->lock, flags); read_lock(&tasklist_lock); ... and CPU1 spins as well. It's not a matter of kill_fasync() called with IRQs disabled; the problem is kill_fasync() called from interrupt taken while holding tasklist_lock at least shared. Somebody trying to grab it on another CPU exclusive before we get to send_sigio() from kill_fasync() will end up spinning and will make us spin as well. I really hope that's just me not seeing something obvious - we had kill_fasync() called in IRQ handlers since way back and we had tasklist_lock taken shared without disabling IRQs for just as long. <goes to sleep, hoping to find "Al, you are a moron, it's obviously OK for such and such reasons" in the mailbox tomorrow morning> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-16 3:41 ` Al Viro @ 2022-12-16 11:26 ` Linus Torvalds 2022-12-16 23:39 ` Al Viro 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2022-12-16 11:26 UTC (permalink / raw) To: Al Viro Cc: Damien Le Moal, Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > CPU1: ptrace(2) > ptrace_check_attach() > read_lock(&tasklist_lock); > > CPU2: setpgid(2) > write_lock_irq(&tasklist_lock); > spins > > CPU1: takes an interrupt that would call kill_fasync(). grep and the > first instance of kill_fasync() is in hpet_interrupt() - it's not > something exotic. IRQs disabled on CPU2 won't stop it. > kill_fasync(..., SIGIO, ...) > kill_fasync_rcu() > read_lock_irqsave(&fa->fa_lock, flags); > send_sigio() > read_lock_irqsave(&fown->lock, flags); > read_lock(&tasklist_lock); > > ... and CPU1 spins as well. Nope. See kernel/locking/qrwlock.c: /* * Readers come here when they cannot get the lock without waiting */ if (unlikely(in_interrupt())) { /* * Readers in interrupt context will get the lock immediately * if the writer is just waiting (not holding the lock yet), * so spin with ACQUIRE semantics until the lock is available * without waiting in the queue. */ atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); return; } and that's the new "civilized" reader unfairness. The traditional rwlock was unconditionally unfair to writers, to the point that there were starvation issues because new readers would always get the lock. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-16 11:26 ` Linus Torvalds @ 2022-12-16 23:39 ` Al Viro 2022-12-16 23:54 ` Boqun Feng 0 siblings, 1 reply; 14+ messages in thread From: Al Viro @ 2022-12-16 23:39 UTC (permalink / raw) To: Linus Torvalds Cc: Damien Le Moal, Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra, Boqun Feng [Boqun Feng Cc'd] On Fri, Dec 16, 2022 at 03:26:21AM -0800, Linus Torvalds wrote: > On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > CPU1: ptrace(2) > > ptrace_check_attach() > > read_lock(&tasklist_lock); > > > > CPU2: setpgid(2) > > write_lock_irq(&tasklist_lock); > > spins > > > > CPU1: takes an interrupt that would call kill_fasync(). grep and the > > first instance of kill_fasync() is in hpet_interrupt() - it's not > > something exotic. IRQs disabled on CPU2 won't stop it. > > kill_fasync(..., SIGIO, ...) > > kill_fasync_rcu() > > read_lock_irqsave(&fa->fa_lock, flags); > > send_sigio() > > read_lock_irqsave(&fown->lock, flags); > > read_lock(&tasklist_lock); > > > > ... and CPU1 spins as well. > > Nope. See kernel/locking/qrwlock.c: [snip rwlocks are inherently unfair, queued ones are somewhat milder, but all implementations have writers-starving behaviour for read_lock() at least when in_interrupt()] D'oh... Consider requested "Al, you are a moron" duly delivered... I plead having been on way too low caffeine and too little sleep ;-/ Looking at the original report, looks like the scenario there is meant to be the following: CPU1: read_lock(&tasklist_lock) tasklist_lock grabbed CPU2: get an sg write(2) feeding request to libata; host->lock is taken, request is immediately completed and scsi_done() is about to be called. host->lock grabbed CPU3: write_lock_irq(&tasklist_lock) spins on tasklist_lock until CPU1 gets through. CPU2: get around to kill_fasync() called by sg_rq_end_io() and to grabbing tasklist_lock inside send_sigio() spins, since it's not in an interrupt and there's a pending writer host->lock is held, spin until CPU3 gets through. CPU1: take an interrupt, which on libata will try to grab host->lock tasklist_lock is held, spins on host->lock until CPU2 gets through Am I reading it correctly? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-16 23:39 ` Al Viro @ 2022-12-16 23:54 ` Boqun Feng 2022-12-17 1:59 ` Al Viro 2022-12-17 2:31 ` Linus Torvalds 0 siblings, 2 replies; 14+ messages in thread From: Boqun Feng @ 2022-12-16 23:54 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Damien Le Moal, Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra On Fri, Dec 16, 2022 at 11:39:21PM +0000, Al Viro wrote: > [Boqun Feng Cc'd] > > On Fri, Dec 16, 2022 at 03:26:21AM -0800, Linus Torvalds wrote: > > On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > CPU1: ptrace(2) > > > ptrace_check_attach() > > > read_lock(&tasklist_lock); > > > > > > CPU2: setpgid(2) > > > write_lock_irq(&tasklist_lock); > > > spins > > > > > > CPU1: takes an interrupt that would call kill_fasync(). grep and the > > > first instance of kill_fasync() is in hpet_interrupt() - it's not > > > something exotic. IRQs disabled on CPU2 won't stop it. > > > kill_fasync(..., SIGIO, ...) > > > kill_fasync_rcu() > > > read_lock_irqsave(&fa->fa_lock, flags); > > > send_sigio() > > > read_lock_irqsave(&fown->lock, flags); > > > read_lock(&tasklist_lock); > > > > > > ... and CPU1 spins as well. > > > > Nope. See kernel/locking/qrwlock.c: > > [snip rwlocks are inherently unfair, queued ones are somewhat milder, but > all implementations have writers-starving behaviour for read_lock() at least > when in_interrupt()] > > D'oh... Consider requested "Al, you are a moron" duly delivered... I plead > having been on way too low caffeine and too little sleep ;-/ > > Looking at the original report, looks like the scenario there is meant to be > the following: > > CPU1: read_lock(&tasklist_lock) > tasklist_lock grabbed > > CPU2: get an sg write(2) feeding request to libata; host->lock is taken, > request is immediately completed and scsi_done() is about to be called. > host->lock grabbed > > CPU3: write_lock_irq(&tasklist_lock) > spins on tasklist_lock until CPU1 gets through. > > CPU2: get around to kill_fasync() called by sg_rq_end_io() and to grabbing > tasklist_lock inside send_sigio() > spins, since it's not in an interrupt and there's a pending writer > host->lock is held, spin until CPU3 gets through. Right, for a reader not in_interrupt(), it may be blocked by a random waiting writer because of the fairness, even the lock is currently held by a reader: CPU 1 CPU 2 CPU 3 read_lock(&tasklist_lock); // get the lock write_lock_irq(&tasklist_lock); // wait for the lock read_lock(&tasklist_lock); // cannot get the lock because of the fairness Regards, Boqun > > CPU1: take an interrupt, which on libata will try to grab host->lock > tasklist_lock is held, spins on host->lock until CPU2 gets through > > Am I reading it correctly? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-16 23:54 ` Boqun Feng @ 2022-12-17 1:59 ` Al Viro 2022-12-17 3:25 ` Boqun Feng 2022-12-17 2:31 ` Linus Torvalds 1 sibling, 1 reply; 14+ messages in thread From: Al Viro @ 2022-12-17 1:59 UTC (permalink / raw) To: Boqun Feng Cc: Linus Torvalds, Damien Le Moal, Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra On Fri, Dec 16, 2022 at 03:54:09PM -0800, Boqun Feng wrote: > On Fri, Dec 16, 2022 at 11:39:21PM +0000, Al Viro wrote: > > [Boqun Feng Cc'd] > > > > On Fri, Dec 16, 2022 at 03:26:21AM -0800, Linus Torvalds wrote: > > > On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > > > CPU1: ptrace(2) > > > > ptrace_check_attach() > > > > read_lock(&tasklist_lock); > > > > > > > > CPU2: setpgid(2) > > > > write_lock_irq(&tasklist_lock); > > > > spins > > > > > > > > CPU1: takes an interrupt that would call kill_fasync(). grep and the > > > > first instance of kill_fasync() is in hpet_interrupt() - it's not > > > > something exotic. IRQs disabled on CPU2 won't stop it. > > > > kill_fasync(..., SIGIO, ...) > > > > kill_fasync_rcu() > > > > read_lock_irqsave(&fa->fa_lock, flags); > > > > send_sigio() > > > > read_lock_irqsave(&fown->lock, flags); > > > > read_lock(&tasklist_lock); > > > > > > > > ... and CPU1 spins as well. > > > > > > Nope. See kernel/locking/qrwlock.c: > > > > [snip rwlocks are inherently unfair, queued ones are somewhat milder, but > > all implementations have writers-starving behaviour for read_lock() at least > > when in_interrupt()] > > > > D'oh... Consider requested "Al, you are a moron" duly delivered... I plead > > having been on way too low caffeine and too little sleep ;-/ > > > > Looking at the original report, looks like the scenario there is meant to be > > the following: > > > > CPU1: read_lock(&tasklist_lock) > > tasklist_lock grabbed > > > > CPU2: get an sg write(2) feeding request to libata; host->lock is taken, > > request is immediately completed and scsi_done() is about to be called. > > host->lock grabbed > > > > CPU3: write_lock_irq(&tasklist_lock) > > spins on tasklist_lock until CPU1 gets through. > > > > CPU2: get around to kill_fasync() called by sg_rq_end_io() and to grabbing > > tasklist_lock inside send_sigio() > > spins, since it's not in an interrupt and there's a pending writer > > host->lock is held, spin until CPU3 gets through. > > Right, for a reader not in_interrupt(), it may be blocked by a random > waiting writer because of the fairness, even the lock is currently held > by a reader: > > CPU 1 CPU 2 CPU 3 > read_lock(&tasklist_lock); // get the lock > > write_lock_irq(&tasklist_lock); // wait for the lock > > read_lock(&tasklist_lock); // cannot get the lock because of the fairness IOW, any caller of scsi_done() from non-interrupt context while holding a spinlock that is also taken in an interrupt... And we have drivers/scsi/scsi_error.c:scsi_send_eh_cmnd(), which calls ->queuecommand() under a mutex, with #define DEF_SCSI_QCMD(func_name) \ int func_name(struct Scsi_Host *shost, struct scsi_cmnd *cmd) \ { \ unsigned long irq_flags; \ int rc; \ spin_lock_irqsave(shost->host_lock, irq_flags); \ rc = func_name##_lck(cmd); \ spin_unlock_irqrestore(shost->host_lock, irq_flags); \ return rc; \ } being commonly used for ->queuecommand() instances. So any scsi_done() in foo_lck() (quite a few of such) + use of ->host_lock in interrupt for the same driver (also common)... I wonder why that hadn't triggered the same warning a long time ago - these warnings had been around for at least two years. Am I missing something here? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-17 1:59 ` Al Viro @ 2022-12-17 3:25 ` Boqun Feng 0 siblings, 0 replies; 14+ messages in thread From: Boqun Feng @ 2022-12-17 3:25 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Damien Le Moal, Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra On Sat, Dec 17, 2022 at 01:59:32AM +0000, Al Viro wrote: > On Fri, Dec 16, 2022 at 03:54:09PM -0800, Boqun Feng wrote: > > On Fri, Dec 16, 2022 at 11:39:21PM +0000, Al Viro wrote: > > > [Boqun Feng Cc'd] > > > > > > On Fri, Dec 16, 2022 at 03:26:21AM -0800, Linus Torvalds wrote: > > > > On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > > > > > CPU1: ptrace(2) > > > > > ptrace_check_attach() > > > > > read_lock(&tasklist_lock); > > > > > > > > > > CPU2: setpgid(2) > > > > > write_lock_irq(&tasklist_lock); > > > > > spins > > > > > > > > > > CPU1: takes an interrupt that would call kill_fasync(). grep and the > > > > > first instance of kill_fasync() is in hpet_interrupt() - it's not > > > > > something exotic. IRQs disabled on CPU2 won't stop it. > > > > > kill_fasync(..., SIGIO, ...) > > > > > kill_fasync_rcu() > > > > > read_lock_irqsave(&fa->fa_lock, flags); > > > > > send_sigio() > > > > > read_lock_irqsave(&fown->lock, flags); > > > > > read_lock(&tasklist_lock); > > > > > > > > > > ... and CPU1 spins as well. > > > > > > > > Nope. See kernel/locking/qrwlock.c: > > > > > > [snip rwlocks are inherently unfair, queued ones are somewhat milder, but > > > all implementations have writers-starving behaviour for read_lock() at least > > > when in_interrupt()] > > > > > > D'oh... Consider requested "Al, you are a moron" duly delivered... I plead > > > having been on way too low caffeine and too little sleep ;-/ > > > > > > Looking at the original report, looks like the scenario there is meant to be > > > the following: > > > > > > CPU1: read_lock(&tasklist_lock) > > > tasklist_lock grabbed > > > > > > CPU2: get an sg write(2) feeding request to libata; host->lock is taken, > > > request is immediately completed and scsi_done() is about to be called. > > > host->lock grabbed > > > > > > CPU3: write_lock_irq(&tasklist_lock) > > > spins on tasklist_lock until CPU1 gets through. > > > > > > CPU2: get around to kill_fasync() called by sg_rq_end_io() and to grabbing > > > tasklist_lock inside send_sigio() > > > spins, since it's not in an interrupt and there's a pending writer > > > host->lock is held, spin until CPU3 gets through. > > > > Right, for a reader not in_interrupt(), it may be blocked by a random > > waiting writer because of the fairness, even the lock is currently held > > by a reader: > > > > CPU 1 CPU 2 CPU 3 > > read_lock(&tasklist_lock); // get the lock > > > > write_lock_irq(&tasklist_lock); // wait for the lock > > > > read_lock(&tasklist_lock); // cannot get the lock because of the fairness > > IOW, any caller of scsi_done() from non-interrupt context while > holding a spinlock that is also taken in an interrupt... > > And we have drivers/scsi/scsi_error.c:scsi_send_eh_cmnd(), which calls > ->queuecommand() under a mutex, with > #define DEF_SCSI_QCMD(func_name) \ > int func_name(struct Scsi_Host *shost, struct scsi_cmnd *cmd) \ > { \ > unsigned long irq_flags; \ > int rc; \ > spin_lock_irqsave(shost->host_lock, irq_flags); \ > rc = func_name##_lck(cmd); \ > spin_unlock_irqrestore(shost->host_lock, irq_flags); \ > return rc; \ > } > > being commonly used for ->queuecommand() instances. So any scsi_done() > in foo_lck() (quite a few of such) + use of ->host_lock in interrupt > for the same driver (also common)... > > I wonder why that hadn't triggered the same warning a long time > ago - these warnings had been around for at least two years. > FWIW, the complete dependency chain is: &host->lock --> &new->fa_lock --> &f->f_owner.lock --> tasklist_lock for the "&f->f_owner.lock" part to get into lockdep's radar, the following call trace needs to appear once: kill_fasync(): kill_fasync_rcu(): send_sigio() not sure whether it's rare or not though. And ->fa_lock also had its own issue: https://lore.kernel.org/lkml/20210702091831.615042-1-desmondcheongzx@gmail.com/ which may have covered &host->lock for a while ;-) Regards, Boqun > Am I missing something here? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-16 23:54 ` Boqun Feng 2022-12-17 1:59 ` Al Viro @ 2022-12-17 2:31 ` Linus Torvalds 2022-12-17 2:59 ` Boqun Feng 2022-12-17 3:05 ` Al Viro 1 sibling, 2 replies; 14+ messages in thread From: Linus Torvalds @ 2022-12-17 2:31 UTC (permalink / raw) To: Boqun Feng, Waiman Long Cc: Al Viro, Damien Le Moal, Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra Ok, let's bring in Waiman for the rwlock side. On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@gmail.com> wrote: > > Right, for a reader not in_interrupt(), it may be blocked by a random > waiting writer because of the fairness, even the lock is currently held > by a reader: > > CPU 1 CPU 2 CPU 3 > read_lock(&tasklist_lock); // get the lock > > write_lock_irq(&tasklist_lock); // wait for the lock > > read_lock(&tasklist_lock); // cannot get the lock because of the fairness But this should be ok - because CPU1 can make progress and eventually release the lock. So the tasklist_lock use is fine on its own - the reason interrupts are special is because an interrupt on CPU 1 taking the lock for reading would deadlock otherwise. As long as it happens on another CPU, the original CPU should then be able to make progress. But the problem here seems to be thst *another* lock is also involved (in this case apparently "host->lock", and now if CPU1 and CPU2 get these two locks in a different order, you can get an ABBA deadlock. And apparently our lockdep machinery doesn't catch that issue, so it doesn't get flagged. I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep treats rwlocks as being _always_ unfair, not knowing about that "it's only unfair when it's in interrupt context". Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock? Oh, how I hate tasklist_lock. It's pretty much our one remaining "one big lock". It's been a pain for a long long time. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-17 2:31 ` Linus Torvalds @ 2022-12-17 2:59 ` Boqun Feng 2022-12-17 3:05 ` Al Viro 1 sibling, 0 replies; 14+ messages in thread From: Boqun Feng @ 2022-12-17 2:59 UTC (permalink / raw) To: Linus Torvalds Cc: Waiman Long, Al Viro, Damien Le Moal, Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra On Fri, Dec 16, 2022 at 08:31:54PM -0600, Linus Torvalds wrote: > Ok, let's bring in Waiman for the rwlock side. > > On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@gmail.com> wrote: > > > > Right, for a reader not in_interrupt(), it may be blocked by a random > > waiting writer because of the fairness, even the lock is currently held > > by a reader: > > > > CPU 1 CPU 2 CPU 3 > > read_lock(&tasklist_lock); // get the lock > > > > write_lock_irq(&tasklist_lock); // wait for the lock > > > > read_lock(&tasklist_lock); // cannot get the lock because of the fairness > > But this should be ok - because CPU1 can make progress and eventually > release the lock. > Yes. > So the tasklist_lock use is fine on its own - the reason interrupts > are special is because an interrupt on CPU 1 taking the lock for > reading would deadlock otherwise. As long as it happens on another > CPU, the original CPU should then be able to make progress. > > But the problem here seems to be thst *another* lock is also involved > (in this case apparently "host->lock", and now if CPU1 and CPU2 get > these two locks in a different order, you can get an ABBA deadlock. > Right. > And apparently our lockdep machinery doesn't catch that issue, so it > doesn't get flagged. > I'm confused. Isn't the original problem showing that lockdep catches this? > I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep > treats rwlocks as being _always_ unfair, not knowing about that "it's > only unfair when it's in interrupt context". > The rules nowadays are: * If the reader is in_interrupt() or queued-spinlock implemention is not used, it's an unfair reader, i.e. it won't wait for any existing writer. * Otherwise, it's a fair reader. > Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock? > That's possible, but I need to make sure I understand the issue for lockdep. It's that lockdep misses catching something or it has a false positive? Regards, Boqun > Oh, how I hate tasklist_lock. It's pretty much our one remaining "one > big lock". It's been a pain for a long long time. > > Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-17 2:31 ` Linus Torvalds 2022-12-17 2:59 ` Boqun Feng @ 2022-12-17 3:05 ` Al Viro 2022-12-17 4:41 ` Waiman Long 1 sibling, 1 reply; 14+ messages in thread From: Al Viro @ 2022-12-17 3:05 UTC (permalink / raw) To: Linus Torvalds Cc: Boqun Feng, Waiman Long, Damien Le Moal, Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra On Fri, Dec 16, 2022 at 08:31:54PM -0600, Linus Torvalds wrote: > Ok, let's bring in Waiman for the rwlock side. > > On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@gmail.com> wrote: > > > > Right, for a reader not in_interrupt(), it may be blocked by a random > > waiting writer because of the fairness, even the lock is currently held > > by a reader: > > > > CPU 1 CPU 2 CPU 3 > > read_lock(&tasklist_lock); // get the lock > > > > write_lock_irq(&tasklist_lock); // wait for the lock > > > > read_lock(&tasklist_lock); // cannot get the lock because of the fairness > > But this should be ok - because CPU1 can make progress and eventually > release the lock. > > So the tasklist_lock use is fine on its own - the reason interrupts > are special is because an interrupt on CPU 1 taking the lock for > reading would deadlock otherwise. As long as it happens on another > CPU, the original CPU should then be able to make progress. > > But the problem here seems to be thst *another* lock is also involved > (in this case apparently "host->lock", and now if CPU1 and CPU2 get > these two locks in a different order, you can get an ABBA deadlock. > > And apparently our lockdep machinery doesn't catch that issue, so it > doesn't get flagged. Lockdep has actually caught that; the locks involved are mention in the report (https://marc.info/?l=linux-ide&m=167094379710177&w=2). The form of report might have been better, but if anything, it doesn't mention potential involvement of tasklist_lock writer, turning that into a deadlock. OTOH, that's more or less implicit for the entire class: read_lock(A) [non-interrupt] local_irq_disable() local_irq_disable() spin_lock(B) write_lock(A) read_lock(A) [in interrupt] spin_lock(B) is what that sort of reports is about. In this case A is tasklist_lock, B is host->lock. Possible call chains for CPU1 and CPU2 are reported... I wonder why analogues of that hadn't been reported for other SCSI hosts - it's a really common pattern there... > I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep > treats rwlocks as being _always_ unfair, not knowing about that "it's > only unfair when it's in interrupt context". > > Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock? ISTR threads about the possibility of explicit read_lock_unfair()... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: possible deadlock in __ata_sff_interrupt 2022-12-17 3:05 ` Al Viro @ 2022-12-17 4:41 ` Waiman Long 0 siblings, 0 replies; 14+ messages in thread From: Waiman Long @ 2022-12-17 4:41 UTC (permalink / raw) To: Al Viro, Linus Torvalds Cc: Boqun Feng, Damien Le Moal, Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra On 12/16/22 22:05, Al Viro wrote: > On Fri, Dec 16, 2022 at 08:31:54PM -0600, Linus Torvalds wrote: >> Ok, let's bring in Waiman for the rwlock side. >> >> On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@gmail.com> wrote: >>> Right, for a reader not in_interrupt(), it may be blocked by a random >>> waiting writer because of the fairness, even the lock is currently held >>> by a reader: >>> >>> CPU 1 CPU 2 CPU 3 >>> read_lock(&tasklist_lock); // get the lock >>> >>> write_lock_irq(&tasklist_lock); // wait for the lock >>> >>> read_lock(&tasklist_lock); // cannot get the lock because of the fairness >> But this should be ok - because CPU1 can make progress and eventually >> release the lock. >> >> So the tasklist_lock use is fine on its own - the reason interrupts >> are special is because an interrupt on CPU 1 taking the lock for >> reading would deadlock otherwise. As long as it happens on another >> CPU, the original CPU should then be able to make progress. >> >> But the problem here seems to be thst *another* lock is also involved >> (in this case apparently "host->lock", and now if CPU1 and CPU2 get >> these two locks in a different order, you can get an ABBA deadlock. >> >> And apparently our lockdep machinery doesn't catch that issue, so it >> doesn't get flagged. > Lockdep has actually caught that; the locks involved are mention in the > report (https://marc.info/?l=linux-ide&m=167094379710177&w=2). The form > of report might have been better, but if anything, it doesn't mention > potential involvement of tasklist_lock writer, turning that into a deadlock. > > OTOH, that's more or less implicit for the entire class: > > read_lock(A) [non-interrupt] > local_irq_disable() local_irq_disable() > spin_lock(B) write_lock(A) > read_lock(A) > [in interrupt] > spin_lock(B) > > is what that sort of reports is about. In this case A is tasklist_lock, > B is host->lock. Possible call chains for CPU1 and CPU2 are reported... > > I wonder why analogues of that hadn't been reported for other SCSI hosts - > it's a really common pattern there... > >> I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep >> treats rwlocks as being _always_ unfair, not knowing about that "it's >> only unfair when it's in interrupt context". >> >> Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock? That may not be a good idea as the cacheline bouncing problem will be back with reduced performance. > ISTR threads about the possibility of explicit read_lock_unfair()... Another possible alternative is to treat the read_lock as unfair if interrupt has been disabled as I think we should reduce the interrupt disabled interval as much as possible. Thought? Cheers, Longman ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-12-17 4:42 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-12-13 15:09 possible deadlock in __ata_sff_interrupt Wei Chen 2022-12-15 9:48 ` Damien Le Moal 2022-12-15 15:19 ` Al Viro 2022-12-16 1:44 ` Damien Le Moal 2022-12-16 3:41 ` Al Viro 2022-12-16 11:26 ` Linus Torvalds 2022-12-16 23:39 ` Al Viro 2022-12-16 23:54 ` Boqun Feng 2022-12-17 1:59 ` Al Viro 2022-12-17 3:25 ` Boqun Feng 2022-12-17 2:31 ` Linus Torvalds 2022-12-17 2:59 ` Boqun Feng 2022-12-17 3:05 ` Al Viro 2022-12-17 4:41 ` Waiman Long
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.