All of lore.kernel.org
 help / color / mirror / Atom feed
* possible deadlock in __ata_sff_interrupt
@ 2022-12-13 15:09 Wei Chen
  2022-12-15  9:48 ` Damien Le Moal
  0 siblings, 1 reply; 14+ messages in thread
From: Wei Chen @ 2022-12-13 15:09 UTC (permalink / raw)
  To: damien.lemoal, linux-ide, linux-kernel, syzkaller-bugs, syzbot

Dear Linux Developer,

Recently, when using our tool to fuzz kernel, the following crash was triggered.

HEAD commit: 094226ad94f4 Linux v6.1-rc5
git tree: upstream
compiler: clang 12.0.1
console output:
https://drive.google.com/file/d/1QZttkbuLed4wp6U32UR6TpxfY_HHCIqQ/view?usp=share_link
kernel config: https://drive.google.com/file/d/1TdPsg_5Zon8S2hEFpLBWjb8Tnd2KA5WJ/view?usp=share_link

Unfortunately, I didn't have a reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: Wei Chen <harperchen1110@gmail.com>

=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.1.0-rc5 #40 Not tainted
-----------------------------------------------------
syz-executor.0/27911 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at: kill_fasync_rcu
fs/fcntl.c:996 [inline]
ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at:
kill_fasync+0x13b/0x430 fs/fcntl.c:1017

and this task is already holding:
ffff8880144dec18 (&host->lock){-.-.}-{2:2}, at:
ata_scsi_queuecmd+0x7a/0x130 drivers/ata/libata-scsi.c:4048
which would create a new lock dependency:
 (&host->lock){-.-.}-{2:2} -> (&new->fa_lock){....}-{2:2}

but this new dependency connects a HARDIRQ-irq-safe lock:
 (&host->lock){-.-.}-{2:2}

... which became HARDIRQ-irq-safe at:
  lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0xb3/0x100 kernel/locking/spinlock.c:162
  __ata_sff_interrupt+0x23/0x710 drivers/ata/libata-sff.c:1540
  __handle_irq_event_percpu+0x1f7/0x620 kernel/irq/handle.c:158
  handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
  handle_irq_event+0x83/0x1e0 kernel/irq/handle.c:210
  handle_edge_irq+0x245/0xbe0 kernel/irq/chip.c:819
  generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
  handle_irq arch/x86/kernel/irq.c:231 [inline]
  __common_interrupt+0xce/0x1e0 arch/x86/kernel/irq.c:250
  common_interrupt+0x9f/0xc0 arch/x86/kernel/irq.c:240
  asm_common_interrupt+0x22/0x40 arch/x86/include/asm/idtentry.h:640
  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:160 [inline]
  _raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202
  process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289
  worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436
  kthread+0x266/0x300 kernel/kthread.c:376
  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

to a HARDIRQ-irq-unsafe lock:
 (tasklist_lock){.+.+}-{2:2}

... which became HARDIRQ-irq-unsafe at:
...
  lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
  __raw_read_lock include/linux/rwlock_api_smp.h:150 [inline]
  _raw_read_lock+0x32/0x40 kernel/locking/spinlock.c:228
  do_wait+0x224/0x9d0 kernel/exit.c:1533
  kernel_wait+0xe4/0x230 kernel/exit.c:1723
  call_usermodehelper_exec_sync kernel/umh.c:140 [inline]
  call_usermodehelper_exec_work+0xb4/0x220 kernel/umh.c:167
  process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289
  worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436
  kthread+0x266/0x300 kernel/kthread.c:376
  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

other info that might help us debug this:

Chain exists of:
  &host->lock --> &new->fa_lock --> tasklist_lock

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(tasklist_lock);
                               local_irq_disable();
                               lock(&host->lock);
                               lock(&new->fa_lock);
  <Interrupt>
    lock(&host->lock);

 *** DEADLOCK ***

3 locks held by syz-executor.0/27911:
 #0: ffffffff8cd20b60 (rcu_read_lock){....}-{1:2}, at:
rcu_lock_acquire+0x9/0x30 include/linux/rcupdate.h:304
 #1: ffff8880144dec18 (&host->lock){-.-.}-{2:2}, at:
ata_scsi_queuecmd+0x7a/0x130 drivers/ata/libata-scsi.c:4048
 #2: ffffffff8cd20b60 (rcu_read_lock){....}-{1:2}, at:
rcu_lock_acquire+0x5/0x30 include/linux/rcupdate.h:303

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (&host->lock){-.-.}-{2:2} {
   IN-HARDIRQ-W at:
                    lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                    __raw_spin_lock_irqsave
include/linux/spinlock_api_smp.h:110 [inline]
                    _raw_spin_lock_irqsave+0xb3/0x100
kernel/locking/spinlock.c:162
                    __ata_sff_interrupt+0x23/0x710 drivers/ata/libata-sff.c:1540
                    __handle_irq_event_percpu+0x1f7/0x620
kernel/irq/handle.c:158
                    handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
                    handle_irq_event+0x83/0x1e0 kernel/irq/handle.c:210
                    handle_edge_irq+0x245/0xbe0 kernel/irq/chip.c:819
                    generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
                    handle_irq arch/x86/kernel/irq.c:231 [inline]
                    __common_interrupt+0xce/0x1e0 arch/x86/kernel/irq.c:250
                    common_interrupt+0x9f/0xc0 arch/x86/kernel/irq.c:240
                    asm_common_interrupt+0x22/0x40
arch/x86/include/asm/idtentry.h:640
                    __raw_spin_unlock_irq
include/linux/spinlock_api_smp.h:160 [inline]
                    _raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202
                    process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289
                    worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436
                    kthread+0x266/0x300 kernel/kthread.c:376
                    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
   IN-SOFTIRQ-W at:
                    lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                    __raw_spin_lock_irqsave
include/linux/spinlock_api_smp.h:110 [inline]
                    _raw_spin_lock_irqsave+0xb3/0x100
kernel/locking/spinlock.c:162
                    __ata_sff_interrupt+0x23/0x710 drivers/ata/libata-sff.c:1540
                    __handle_irq_event_percpu+0x1f7/0x620
kernel/irq/handle.c:158
                    handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
                    handle_irq_event+0x83/0x1e0 kernel/irq/handle.c:210
                    handle_edge_irq+0x245/0xbe0 kernel/irq/chip.c:819
                    generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
                    handle_irq arch/x86/kernel/irq.c:231 [inline]
                    __common_interrupt+0xce/0x1e0 arch/x86/kernel/irq.c:250
                    common_interrupt+0x4a/0xc0 arch/x86/kernel/irq.c:240
                    asm_common_interrupt+0x22/0x40
arch/x86/include/asm/idtentry.h:640
                    __raw_spin_unlock_irq
include/linux/spinlock_api_smp.h:160 [inline]
                    _raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202
                    __run_timers+0x922/0x970 kernel/time/timer.c:1792
                    run_timer_softirq+0x63/0xf0 kernel/time/timer.c:1803
                    __do_softirq+0x277/0x73a kernel/softirq.c:571
                    __irq_exit_rcu+0xcf/0x150 kernel/softirq.c:650
                    irq_exit_rcu+0x5/0x20 kernel/softirq.c:662
                    sysvec_apic_timer_interrupt+0x91/0xb0
arch/x86/kernel/apic/apic.c:1107
                    asm_sysvec_apic_timer_interrupt+0x16/0x20
arch/x86/include/asm/idtentry.h:649
                    __raw_spin_unlock_irqrestore
include/linux/spinlock_api_smp.h:152 [inline]
                    _raw_spin_unlock_irqrestore+0xbc/0x120
kernel/locking/spinlock.c:194
                    spin_unlock_irqrestore include/linux/spinlock.h:405 [inline]
                    ata_scsi_queuecmd+0xc6/0x130 drivers/ata/libata-scsi.c:4058
                    scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1524 [inline]
                    scsi_queue_rq+0x1ea6/0x2ec0 drivers/scsi/scsi_lib.c:1760
                    blk_mq_dispatch_rq_list+0x104f/0x2ca0 block/blk-mq.c:1992
                    __blk_mq_do_dispatch_sched block/blk-mq-sched.c:173 [inline]
                    blk_mq_do_dispatch_sched+0x820/0xe60
block/blk-mq-sched.c:187
                    __blk_mq_sched_dispatch_requests+0x39b/0x490
                    blk_mq_sched_dispatch_requests+0xef/0x160
block/blk-mq-sched.c:339
                    __blk_mq_run_hw_queue+0x1cf/0x260 block/blk-mq.c:2110
                    process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289
                    worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436
                    kthread+0x266/0x300 kernel/kthread.c:376
                    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
   INITIAL USE at:
                   lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                   __raw_spin_lock_irqsave
include/linux/spinlock_api_smp.h:110 [inline]
                   _raw_spin_lock_irqsave+0xb3/0x100
kernel/locking/spinlock.c:162
                   ata_dev_init drivers/ata/libata-core.c:5202 [inline]
                   ata_link_init+0x236/0x890 drivers/ata/libata-core.c:5247
                   ata_port_alloc+0x3ce/0x470 drivers/ata/libata-core.c:5322
                   ata_host_alloc+0x198/0x2c0 drivers/ata/libata-core.c:5433
                   ata_host_alloc_pinfo+0x27/0x430
drivers/ata/libata-core.c:5476
                   ata_pci_sff_prepare_host+0x40/0xe0
drivers/ata/libata-sff.c:2305
                   ata_pci_bmdma_prepare_host+0x20/0x70
drivers/ata/libata-sff.c:3210
                   piix_init_one+0x628/0x1ed0 drivers/ata/ata_piix.c:1704
                   local_pci_probe drivers/pci/pci-driver.c:324 [inline]
                   pci_call_probe drivers/pci/pci-driver.c:392 [inline]
                   __pci_device_probe drivers/pci/pci-driver.c:417 [inline]
                   pci_device_probe+0x4fe/0xa60 drivers/pci/pci-driver.c:460
                   call_driver_probe+0x96/0x250
                   really_probe+0x237/0xaf0 drivers/base/dd.c:639
                   __driver_probe_device+0x1f8/0x3e0 drivers/base/dd.c:778
                   driver_probe_device+0x50/0x240 drivers/base/dd.c:808
                   __driver_attach+0x2b6/0x5b0 drivers/base/dd.c:1190
                   bus_for_each_dev+0x168/0x1d0 drivers/base/bus.c:301
                   bus_add_driver+0x32f/0x600 drivers/base/bus.c:618
                   driver_register+0x2e9/0x3e0 drivers/base/driver.c:246
                   piix_init+0x1b/0x41 drivers/ata/ata_piix.c:1774
                   do_one_initcall+0x1a7/0x400 init/main.c:1303
                   do_initcall_level+0x168/0x218 init/main.c:1376
                   do_initcalls+0x4b/0x8c init/main.c:1392
                   kernel_init_freeable+0x428/0x5d8 init/main.c:1631
                   kernel_init+0x19/0x2b0 init/main.c:1519
                   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 }
 ... key      at: [<ffffffff9114fc40>] ata_host_alloc.__key+0x0/0x40

the dependencies between the lock to be acquired
 and HARDIRQ-irq-unsafe lock:
  -> (tasklist_lock){.+.+}-{2:2} {
     HARDIRQ-ON-R at:
                        lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                        __raw_read_lock
include/linux/rwlock_api_smp.h:150 [inline]
                        _raw_read_lock+0x32/0x40 kernel/locking/spinlock.c:228
                        do_wait+0x224/0x9d0 kernel/exit.c:1533
                        kernel_wait+0xe4/0x230 kernel/exit.c:1723
                        call_usermodehelper_exec_sync kernel/umh.c:140 [inline]
                        call_usermodehelper_exec_work+0xb4/0x220
kernel/umh.c:167
                        process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289
                        worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436
                        kthread+0x266/0x300 kernel/kthread.c:376
                        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
     SOFTIRQ-ON-R at:
                        lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                        __raw_read_lock
include/linux/rwlock_api_smp.h:150 [inline]
                        _raw_read_lock+0x32/0x40 kernel/locking/spinlock.c:228
                        do_wait+0x224/0x9d0 kernel/exit.c:1533
                        kernel_wait+0xe4/0x230 kernel/exit.c:1723
                        call_usermodehelper_exec_sync kernel/umh.c:140 [inline]
                        call_usermodehelper_exec_work+0xb4/0x220
kernel/umh.c:167
                        process_one_work+0x83c/0x11a0 kernel/workqueue.c:2289
                        worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436
                        kthread+0x266/0x300 kernel/kthread.c:376
                        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
     INITIAL USE at:
                       lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                       __raw_write_lock_irq
include/linux/rwlock_api_smp.h:195 [inline]
                       _raw_write_lock_irq+0xae/0xf0
kernel/locking/spinlock.c:326
                       copy_process+0x37cf/0x6200 kernel/fork.c:2387
                       kernel_clone+0x212/0x610 kernel/fork.c:2671
                       user_mode_thread+0x12d/0x190 kernel/fork.c:2747
                       rest_init+0x21/0x270 init/main.c:694
                       start_kernel+0x0/0x540 init/main.c:890
                       start_kernel+0x49a/0x540 init/main.c:1145
                       secondary_startup_64_no_verify+0xcf/0xdb
     INITIAL READ USE at:
                            lock_acquire+0x17f/0x430
kernel/locking/lockdep.c:5668
                            __raw_read_lock
include/linux/rwlock_api_smp.h:150 [inline]
                            _raw_read_lock+0x32/0x40
kernel/locking/spinlock.c:228
                            do_wait+0x224/0x9d0 kernel/exit.c:1533
                            kernel_wait+0xe4/0x230 kernel/exit.c:1723
                            call_usermodehelper_exec_sync
kernel/umh.c:140 [inline]
                            call_usermodehelper_exec_work+0xb4/0x220
kernel/umh.c:167
                            process_one_work+0x83c/0x11a0
kernel/workqueue.c:2289
                            worker_thread+0xa6c/0x1290 kernel/workqueue.c:2436
                            kthread+0x266/0x300 kernel/kthread.c:376
                            ret_from_fork+0x1f/0x30
arch/x86/entry/entry_64.S:306
   }
   ... key      at: [<ffffffff8ca0a058>] tasklist_lock+0x18/0x40
   ... acquired at:
   lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
   __raw_read_lock include/linux/rwlock_api_smp.h:150 [inline]
   _raw_read_lock+0x32/0x40 kernel/locking/spinlock.c:228
   send_sigio+0xbe/0x300 fs/fcntl.c:792
   kill_fasync_rcu fs/fcntl.c:1003 [inline]
   kill_fasync+0x1e4/0x430 fs/fcntl.c:1017
   __receive_buf drivers/tty/n_tty.c:1629 [inline]
   n_tty_receive_buf_common+0xaac/0x1370 drivers/tty/n_tty.c:1711
   tiocsti drivers/tty/tty_io.c:2286 [inline]
   tty_ioctl+0xda7/0x1710 drivers/tty/tty_io.c:2685
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:870 [inline]
   __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

 -> (&f->f_owner.lock){....}-{2:2} {
    INITIAL USE at:
                     lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                     __raw_write_lock_irq
include/linux/rwlock_api_smp.h:195 [inline]
                     _raw_write_lock_irq+0xae/0xf0 kernel/locking/spinlock.c:326
                     f_modown+0x38/0x340 fs/fcntl.c:91
                     __tty_fasync drivers/tty/tty_io.c:2237 [inline]
                     tty_fasync+0x24a/0x340 drivers/tty/tty_io.c:2252
                     setfl fs/fcntl.c:73 [inline]
                     do_fcntl+0xe6c/0x1350 fs/fcntl.c:340
                     __do_sys_fcntl fs/fcntl.c:454 [inline]
                     __se_sys_fcntl+0xd5/0x1b0 fs/fcntl.c:439
                     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
                     do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
                     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    INITIAL READ USE at:
                          lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                          __raw_read_lock_irq
include/linux/rwlock_api_smp.h:169 [inline]
                          _raw_read_lock_irq+0xb6/0x100
kernel/locking/spinlock.c:244
                          f_getown_ex fs/fcntl.c:212 [inline]
                          do_fcntl+0x1a5/0x1350 fs/fcntl.c:380
                          __do_sys_fcntl fs/fcntl.c:454 [inline]
                          __se_sys_fcntl+0xd5/0x1b0 fs/fcntl.c:439
                          do_syscall_x64 arch/x86/entry/common.c:50 [inline]
                          do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
                          entry_SYSCALL_64_after_hwframe+0x63/0xcd
  }
  ... key      at: [<ffffffff90e590c0>] __alloc_file.__key+0x0/0x10
  ... acquired at:
   lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
   __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline]
   _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236
   send_sigio+0x2f/0x300 fs/fcntl.c:778
   kill_fasync_rcu fs/fcntl.c:1003 [inline]
   kill_fasync+0x1e4/0x430 fs/fcntl.c:1017
   sock_wake_async+0x133/0x150
   rcu_read_unlock include/linux/rcupdate.h:767 [inline]
   sk_wake_async+0x12e/0x200 include/net/sock.h:2525
   sock_def_error_report+0x154/0x200 net/core/sock.c:3264
   smc_fback_forward_wakeup+0x1b6/0x500 net/smc/af_smc.c:786
   smc_fback_error_report+0x90/0xb0 net/smc/af_smc.c:838
   sk_error_report+0x3b/0xb0 net/core/sock.c:345
   tcp_validate_incoming+0x1509/0x1fc0 net/ipv4/tcp_input.c:5805
   tcp_rcv_state_process+0x513/0x2610 net/ipv4/tcp_input.c:6520
   tcp_v4_do_rcv+0x691/0xa10 net/ipv4/tcp_ipv4.c:1704
   sk_backlog_rcv include/net/sock.h:1109 [inline]
   __release_sock+0x106/0x3a0 net/core/sock.c:2906
   release_sock+0x5d/0x1c0 net/core/sock.c:3462
   sk_stream_wait_memory+0x6d9/0xe20 net/core/stream.c:145
   tcp_sendmsg_locked+0x1888/0x4540 net/ipv4/tcp.c:1445
   tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1483
   sock_sendmsg_nosec net/socket.c:714 [inline]
   sock_sendmsg net/socket.c:734 [inline]
   ____sys_sendmsg+0x558/0x8a0 net/socket.c:2482
   ___sys_sendmsg net/socket.c:2536 [inline]
   __sys_sendmmsg+0x360/0x6c0 net/socket.c:2622
   __do_sys_sendmmsg net/socket.c:2651 [inline]
   __se_sys_sendmmsg net/socket.c:2648 [inline]
   __x64_sys_sendmmsg+0x9c/0xb0 net/socket.c:2648
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> (&new->fa_lock){....}-{2:2} {
   INITIAL USE at:
                   lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                   __raw_write_lock_irq
include/linux/rwlock_api_smp.h:195 [inline]
                   _raw_write_lock_irq+0xae/0xf0 kernel/locking/spinlock.c:326
                   fasync_remove_entry+0xff/0x1d0 fs/fcntl.c:873
                   sock_fasync+0x86/0xf0 net/socket.c:1390
                   __fput+0x751/0x8c0 fs/file_table.c:317
                   task_work_run+0x243/0x300 kernel/task_work.c:179
                   resume_user_mode_work
include/linux/resume_user_mode.h:49 [inline]
                   exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
                   exit_to_user_mode_prepare+0x1f2/0x210
kernel/entry/common.c:203
                   __syscall_exit_to_user_mode_work
kernel/entry/common.c:285 [inline]
                   syscall_exit_to_user_mode+0x26/0x60 kernel/entry/common.c:296
                   do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
                   entry_SYSCALL_64_after_hwframe+0x63/0xcd
   INITIAL READ USE at:
                        lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
                        __raw_read_lock_irqsave
include/linux/rwlock_api_smp.h:160 [inline]
                        _raw_read_lock_irqsave+0xbb/0x100
kernel/locking/spinlock.c:236
                        kill_fasync_rcu fs/fcntl.c:996 [inline]
                        kill_fasync+0x13b/0x430 fs/fcntl.c:1017
                        sock_wake_async+0x133/0x150
                        rcu_read_unlock include/linux/rcupdate.h:767 [inline]
                        sk_wake_async+0x12e/0x200 include/net/sock.h:2525
                        sock_def_error_report+0x154/0x200 net/core/sock.c:3264
                        smc_fback_forward_wakeup+0x1b6/0x500
net/smc/af_smc.c:786
                        smc_fback_error_report+0x90/0xb0 net/smc/af_smc.c:838
                        sk_error_report+0x3b/0xb0 net/core/sock.c:345
                        tcp_validate_incoming+0x1509/0x1fc0
net/ipv4/tcp_input.c:5805
                        tcp_rcv_state_process+0x513/0x2610
net/ipv4/tcp_input.c:6520
                        tcp_v4_do_rcv+0x691/0xa10 net/ipv4/tcp_ipv4.c:1704
                        sk_backlog_rcv include/net/sock.h:1109 [inline]
                        __release_sock+0x106/0x3a0 net/core/sock.c:2906
                        release_sock+0x5d/0x1c0 net/core/sock.c:3462
                        sk_stream_wait_memory+0x6d9/0xe20 net/core/stream.c:145
                        tcp_sendmsg_locked+0x1888/0x4540 net/ipv4/tcp.c:1445
                        tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1483
                        sock_sendmsg_nosec net/socket.c:714 [inline]
                        sock_sendmsg net/socket.c:734 [inline]
                        ____sys_sendmsg+0x558/0x8a0 net/socket.c:2482
                        ___sys_sendmsg net/socket.c:2536 [inline]
                        __sys_sendmmsg+0x360/0x6c0 net/socket.c:2622
                        __do_sys_sendmmsg net/socket.c:2651 [inline]
                        __se_sys_sendmmsg net/socket.c:2648 [inline]
                        __x64_sys_sendmmsg+0x9c/0xb0 net/socket.c:2648
                        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
                        do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
                        entry_SYSCALL_64_after_hwframe+0x63/0xcd
 }
 ... key      at: [<ffffffff90e59f80>] fasync_insert_entry.__key+0x0/0x40
 ... acquired at:
   lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
   __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline]
   _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236
   kill_fasync_rcu fs/fcntl.c:996 [inline]
   kill_fasync+0x13b/0x430 fs/fcntl.c:1017
   sg_rq_end_io+0x604/0xf50 drivers/scsi/sg.c:1403
   __blk_mq_end_request+0x2c7/0x380 block/blk-mq.c:1011
   scsi_end_request+0x4ed/0x9c0 drivers/scsi/scsi_lib.c:576
   scsi_io_completion+0xc25/0x27a0 drivers/scsi/scsi_lib.c:985
   ata_scsi_simulate+0x336e/0x3dd0 drivers/ata/libata-scsi.c:4190
   __ata_scsi_queuecmd+0x20b/0x1020 drivers/ata/libata-scsi.c:4009
   ata_scsi_queuecmd+0xa0/0x130 drivers/ata/libata-scsi.c:4052
   scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1524 [inline]
   scsi_queue_rq+0x1ea6/0x2ec0 drivers/scsi/scsi_lib.c:1760
   blk_mq_dispatch_rq_list+0x104f/0x2ca0 block/blk-mq.c:1992
   __blk_mq_sched_dispatch_requests+0x382/0x490 block/blk-mq-sched.c:306
   blk_mq_sched_dispatch_requests+0xef/0x160 block/blk-mq-sched.c:339
   __blk_mq_run_hw_queue+0x1cf/0x260 block/blk-mq.c:2110
   blk_mq_sched_insert_request+0x1e2/0x430 block/blk-mq-sched.c:458
   blk_execute_rq_nowait+0x2e8/0x3b0 block/blk-mq.c:1305
   sg_common_write+0x8c0/0x1970 drivers/scsi/sg.c:832
   sg_new_write+0x61f/0x860 drivers/scsi/sg.c:770
   sg_ioctl_common drivers/scsi/sg.c:935 [inline]
   sg_ioctl+0x1c51/0x2be0 drivers/scsi/sg.c:1159
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:870 [inline]
   __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd


stack backtrace:
CPU: 0 PID: 27911 Comm: syz-executor.0 Not tainted 6.1.0-rc5 #40
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
 print_bad_irq_dependency kernel/locking/lockdep.c:2611 [inline]
 check_irq_usage kernel/locking/lockdep.c:2850 [inline]
 check_prev_add kernel/locking/lockdep.c:3101 [inline]
 check_prevs_add+0x4e5f/0x5b70 kernel/locking/lockdep.c:3216
 validate_chain kernel/locking/lockdep.c:3831 [inline]
 __lock_acquire+0x4411/0x6070 kernel/locking/lockdep.c:5055
 lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
 __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline]
 _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236
 kill_fasync_rcu fs/fcntl.c:996 [inline]
 kill_fasync+0x13b/0x430 fs/fcntl.c:1017
 sg_rq_end_io+0x604/0xf50 drivers/scsi/sg.c:1403
 __blk_mq_end_request+0x2c7/0x380 block/blk-mq.c:1011
 scsi_end_request+0x4ed/0x9c0 drivers/scsi/scsi_lib.c:576
 scsi_io_completion+0xc25/0x27a0 drivers/scsi/scsi_lib.c:985
 ata_scsi_simulate+0x336e/0x3dd0 drivers/ata/libata-scsi.c:4190
 __ata_scsi_queuecmd+0x20b/0x1020 drivers/ata/libata-scsi.c:4009
 ata_scsi_queuecmd+0xa0/0x130 drivers/ata/libata-scsi.c:4052
 scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1524 [inline]
 scsi_queue_rq+0x1ea6/0x2ec0 drivers/scsi/scsi_lib.c:1760
 blk_mq_dispatch_rq_list+0x104f/0x2ca0 block/blk-mq.c:1992
 __blk_mq_sched_dispatch_requests+0x382/0x490 block/blk-mq-sched.c:306
 blk_mq_sched_dispatch_requests+0xef/0x160 block/blk-mq-sched.c:339
 __blk_mq_run_hw_queue+0x1cf/0x260 block/blk-mq.c:2110
 blk_mq_sched_insert_request+0x1e2/0x430 block/blk-mq-sched.c:458
 blk_execute_rq_nowait+0x2e8/0x3b0 block/blk-mq.c:1305
 sg_common_write+0x8c0/0x1970 drivers/scsi/sg.c:832
 sg_new_write+0x61f/0x860 drivers/scsi/sg.c:770
 sg_ioctl_common drivers/scsi/sg.c:935 [inline]
 sg_ioctl+0x1c51/0x2be0 drivers/scsi/sg.c:1159
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:870 [inline]
 __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f153dc8bded
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f153ede2c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f153ddabf80 RCX: 00007f153dc8bded
RDX: 0000000020000440 RSI: 0000000000002285 RDI: 0000000000000006
RBP: 00007f153dcf8ce0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f153ddabf80
R13: 00007ffc72e5108f R14: 00007ffc72e51230 R15: 00007f153ede2dc0
 </TASK>

Best,
Wei

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-13 15:09 possible deadlock in __ata_sff_interrupt Wei Chen
@ 2022-12-15  9:48 ` Damien Le Moal
  2022-12-15 15:19   ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2022-12-15  9:48 UTC (permalink / raw)
  To: Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot,
	Alexander Viro, linux-fsdevel, Chuck Lever, Jeff Layton

On 12/14/22 00:09, Wei Chen wrote:
> Dear Linux Developer,
> 
> Recently, when using our tool to fuzz kernel, the following crash was triggered.
> 
> HEAD commit: 094226ad94f4 Linux v6.1-rc5
> git tree: upstream
> compiler: clang 12.0.1
> console output:
> https://drive.google.com/file/d/1QZttkbuLed4wp6U32UR6TpxfY_HHCIqQ/view?usp=share_link
> kernel config: https://drive.google.com/file/d/1TdPsg_5Zon8S2hEFpLBWjb8Tnd2KA5WJ/view?usp=share_link
> 
> Unfortunately, I didn't have a reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: Wei Chen <harperchen1110@gmail.com>
> 
> =====================================================
> WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> 6.1.0-rc5 #40 Not tainted
> -----------------------------------------------------
> syz-executor.0/27911 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at: kill_fasync_rcu
> fs/fcntl.c:996 [inline]
> ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at:
> kill_fasync+0x13b/0x430 fs/fcntl.c:1017

[...]

> stack backtrace:
> CPU: 0 PID: 27911 Comm: syz-executor.0 Not tainted 6.1.0-rc5 #40
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
>  print_bad_irq_dependency kernel/locking/lockdep.c:2611 [inline]
>  check_irq_usage kernel/locking/lockdep.c:2850 [inline]
>  check_prev_add kernel/locking/lockdep.c:3101 [inline]
>  check_prevs_add+0x4e5f/0x5b70 kernel/locking/lockdep.c:3216
>  validate_chain kernel/locking/lockdep.c:3831 [inline]
>  __lock_acquire+0x4411/0x6070 kernel/locking/lockdep.c:5055
>  lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
>  __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline]
>  _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236
>  kill_fasync_rcu fs/fcntl.c:996 [inline]
>  kill_fasync+0x13b/0x430 fs/fcntl.c:1017
>  sg_rq_end_io+0x604/0xf50 drivers/scsi/sg.c:1403

The problem is here: sg_rq_end_io() calling kill_fasync(). But at a quick
glance, this is not the only driver calling kill_fasync() with a spinlock
held with irq disabled... So there may be a fundamental problem with
kill_fasync() function if drivers are allowed to do that, or the reverse,
all drivers calling that function with a lock held with irq disabled need
to be fixed.

Al, Chuck, Jeff,

Any thought ?

>  __blk_mq_end_request+0x2c7/0x380 block/blk-mq.c:1011
>  scsi_end_request+0x4ed/0x9c0 drivers/scsi/scsi_lib.c:576
>  scsi_io_completion+0xc25/0x27a0 drivers/scsi/scsi_lib.c:985
>  ata_scsi_simulate+0x336e/0x3dd0 drivers/ata/libata-scsi.c:4190
>  __ata_scsi_queuecmd+0x20b/0x1020 drivers/ata/libata-scsi.c:4009
>  ata_scsi_queuecmd+0xa0/0x130 drivers/ata/libata-scsi.c:4052
>  scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1524 [inline]
>  scsi_queue_rq+0x1ea6/0x2ec0 drivers/scsi/scsi_lib.c:1760
>  blk_mq_dispatch_rq_list+0x104f/0x2ca0 block/blk-mq.c:1992
>  __blk_mq_sched_dispatch_requests+0x382/0x490 block/blk-mq-sched.c:306
>  blk_mq_sched_dispatch_requests+0xef/0x160 block/blk-mq-sched.c:339
>  __blk_mq_run_hw_queue+0x1cf/0x260 block/blk-mq.c:2110
>  blk_mq_sched_insert_request+0x1e2/0x430 block/blk-mq-sched.c:458
>  blk_execute_rq_nowait+0x2e8/0x3b0 block/blk-mq.c:1305
>  sg_common_write+0x8c0/0x1970 drivers/scsi/sg.c:832
>  sg_new_write+0x61f/0x860 drivers/scsi/sg.c:770
>  sg_ioctl_common drivers/scsi/sg.c:935 [inline]
>  sg_ioctl+0x1c51/0x2be0 drivers/scsi/sg.c:1159
>  vfs_ioctl fs/ioctl.c:51 [inline]
>  __do_sys_ioctl fs/ioctl.c:870 [inline]
>  __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f153dc8bded
> Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f153ede2c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007f153ddabf80 RCX: 00007f153dc8bded
> RDX: 0000000020000440 RSI: 0000000000002285 RDI: 0000000000000006
> RBP: 00007f153dcf8ce0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f153ddabf80
> R13: 00007ffc72e5108f R14: 00007ffc72e51230 R15: 00007f153ede2dc0
>  </TASK>
> 
> Best,
> Wei

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-15  9:48 ` Damien Le Moal
@ 2022-12-15 15:19   ` Al Viro
  2022-12-16  1:44     ` Damien Le Moal
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2022-12-15 15:19 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot,
	linux-fsdevel, Chuck Lever, Jeff Layton

On Thu, Dec 15, 2022 at 06:48:20PM +0900, Damien Le Moal wrote:

> The problem is here: sg_rq_end_io() calling kill_fasync(). But at a quick
> glance, this is not the only driver calling kill_fasync() with a spinlock
> held with irq disabled... So there may be a fundamental problem with
> kill_fasync() function if drivers are allowed to do that, or the reverse,
> all drivers calling that function with a lock held with irq disabled need
> to be fixed.
> 
> Al, Chuck, Jeff,
> 
> Any thought ?

What is the problem with read_lock_irqsave() called with irqs disabled?
read_lock_irq() would have been a bug in such conditions, of course, but
that's not what we use...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-15 15:19   ` Al Viro
@ 2022-12-16  1:44     ` Damien Le Moal
  2022-12-16  3:41       ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2022-12-16  1:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot,
	linux-fsdevel, Chuck Lever, Jeff Layton

On 12/16/22 00:19, Al Viro wrote:
> On Thu, Dec 15, 2022 at 06:48:20PM +0900, Damien Le Moal wrote:
> 
>> The problem is here: sg_rq_end_io() calling kill_fasync(). But at a quick
>> glance, this is not the only driver calling kill_fasync() with a spinlock
>> held with irq disabled... So there may be a fundamental problem with
>> kill_fasync() function if drivers are allowed to do that, or the reverse,
>> all drivers calling that function with a lock held with irq disabled need
>> to be fixed.
>>
>> Al, Chuck, Jeff,
>>
>> Any thought ?
> 
> What is the problem with read_lock_irqsave() called with irqs disabled?
> read_lock_irq() would have been a bug in such conditions, of course, but
> that's not what we use...

The original & complete lockdep splat is in the report email here:

https://marc.info/?l=linux-ide&m=167094379710177&w=2

It looks like a spinlock is taken for the fasync stuff without irq
disabled and that same spinlock is needed in kill_fasync() which is
itself called (potentially) with IRQ disabled. Hence the splat. In any
case, that is how I understand the issue. But as mentioned above, given
that I can see many drivers calling kill_fasync() with irq disabled, I
wonder if this is a genuine potential problem or a false negative.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-16  1:44     ` Damien Le Moal
@ 2022-12-16  3:41       ` Al Viro
  2022-12-16 11:26         ` Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2022-12-16  3:41 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Wei Chen, linux-ide, linux-kernel, syzkaller-bugs, syzbot,
	linux-fsdevel, Chuck Lever, Jeff Layton, Peter Zijlstra,
	Linus Torvalds

On Fri, Dec 16, 2022 at 10:44:06AM +0900, Damien Le Moal wrote:

> The original & complete lockdep splat is in the report email here:
> 
> https://marc.info/?l=linux-ide&m=167094379710177&w=2
> 
> It looks like a spinlock is taken for the fasync stuff without irq
> disabled and that same spinlock is needed in kill_fasync() which is
> itself called (potentially) with IRQ disabled. Hence the splat. In any
> case, that is how I understand the issue. But as mentioned above, given
> that I can see many drivers calling kill_fasync() with irq disabled, I
> wonder if this is a genuine potential problem or a false negative.

OK, I'm about to fall asleep, so I might very well be missing something
obvious, but...

CPU1: ptrace(2)
	ptrace_check_attach()
		read_lock(&tasklist_lock);

CPU2: setpgid(2)
        write_lock_irq(&tasklist_lock);
	spins

CPU1: takes an interrupt that would call kill_fasync().  grep and the
first instance of kill_fasync() is in hpet_interrupt() - it's not
something exotic.  IRQs disabled on CPU2 won't stop it.
	kill_fasync(..., SIGIO, ...)
		kill_fasync_rcu()
			read_lock_irqsave(&fa->fa_lock, flags);
			send_sigio()
			        read_lock_irqsave(&fown->lock, flags);
		                read_lock(&tasklist_lock);

... and CPU1 spins as well.

It's not a matter of kill_fasync() called with IRQs disabled; the
problem is kill_fasync() called from interrupt taken while holding
tasklist_lock at least shared.  Somebody trying to grab it on another
CPU exclusive before we get to send_sigio() from kill_fasync() will
end up spinning and will make us spin as well.

I really hope that's just me not seeing something obvious - we had
kill_fasync() called in IRQ handlers since way back and we had
tasklist_lock taken shared without disabling IRQs for just as long.

<goes to sleep, hoping to find "Al, you are a moron, it's obviously OK
for such and such reasons" in the mailbox tomorrow morning>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-16  3:41       ` Al Viro
@ 2022-12-16 11:26         ` Linus Torvalds
  2022-12-16 23:39           ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2022-12-16 11:26 UTC (permalink / raw)
  To: Al Viro
  Cc: Damien Le Moal, Wei Chen, linux-ide, linux-kernel,
	syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton,
	Peter Zijlstra

On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> CPU1: ptrace(2)
>         ptrace_check_attach()
>                 read_lock(&tasklist_lock);
>
> CPU2: setpgid(2)
>         write_lock_irq(&tasklist_lock);
>         spins
>
> CPU1: takes an interrupt that would call kill_fasync().  grep and the
> first instance of kill_fasync() is in hpet_interrupt() - it's not
> something exotic.  IRQs disabled on CPU2 won't stop it.
>         kill_fasync(..., SIGIO, ...)
>                 kill_fasync_rcu()
>                         read_lock_irqsave(&fa->fa_lock, flags);
>                         send_sigio()
>                                 read_lock_irqsave(&fown->lock, flags);
>                                 read_lock(&tasklist_lock);
>
> ... and CPU1 spins as well.

Nope. See kernel/locking/qrwlock.c:

        /*
         * Readers come here when they cannot get the lock without waiting
         */
        if (unlikely(in_interrupt())) {
                /*
                 * Readers in interrupt context will get the lock immediately
                 * if the writer is just waiting (not holding the lock yet),
                 * so spin with ACQUIRE semantics until the lock is available
                 * without waiting in the queue.
                 */
                atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
                return;
        }

and that's the new "civilized" reader unfairness.

The traditional rwlock was unconditionally unfair to writers, to the
point that there were starvation issues because new readers would
always get the lock.

         Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-16 11:26         ` Linus Torvalds
@ 2022-12-16 23:39           ` Al Viro
  2022-12-16 23:54             ` Boqun Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2022-12-16 23:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Damien Le Moal, Wei Chen, linux-ide, linux-kernel,
	syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton,
	Peter Zijlstra, Boqun Feng

[Boqun Feng Cc'd]

On Fri, Dec 16, 2022 at 03:26:21AM -0800, Linus Torvalds wrote:
> On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > CPU1: ptrace(2)
> >         ptrace_check_attach()
> >                 read_lock(&tasklist_lock);
> >
> > CPU2: setpgid(2)
> >         write_lock_irq(&tasklist_lock);
> >         spins
> >
> > CPU1: takes an interrupt that would call kill_fasync().  grep and the
> > first instance of kill_fasync() is in hpet_interrupt() - it's not
> > something exotic.  IRQs disabled on CPU2 won't stop it.
> >         kill_fasync(..., SIGIO, ...)
> >                 kill_fasync_rcu()
> >                         read_lock_irqsave(&fa->fa_lock, flags);
> >                         send_sigio()
> >                                 read_lock_irqsave(&fown->lock, flags);
> >                                 read_lock(&tasklist_lock);
> >
> > ... and CPU1 spins as well.
> 
> Nope. See kernel/locking/qrwlock.c:

[snip rwlocks are inherently unfair, queued ones are somewhat milder, but
all implementations have writers-starving behaviour for read_lock() at least
when in_interrupt()]

D'oh...  Consider requested "Al, you are a moron" duly delivered...  I plead
having been on way too low caffeine and too little sleep ;-/

Looking at the original report, looks like the scenario there is meant to be
the following:

CPU1: read_lock(&tasklist_lock)
	tasklist_lock grabbed

CPU2: get an sg write(2) feeding request to libata; host->lock is taken,
	request is immediately completed and scsi_done() is about to be called.
	host->lock grabbed

CPU3: write_lock_irq(&tasklist_lock)
	spins on tasklist_lock until CPU1 gets through.

CPU2: get around to kill_fasync() called by sg_rq_end_io() and to grabbing
	tasklist_lock inside send_sigio()
	spins, since it's not in an interrupt and there's a pending writer
	host->lock is held, spin until CPU3 gets through.

CPU1: take an interrupt, which on libata will try to grab host->lock
	tasklist_lock is held, spins on host->lock until CPU2 gets through

Am I reading it correctly?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-16 23:39           ` Al Viro
@ 2022-12-16 23:54             ` Boqun Feng
  2022-12-17  1:59               ` Al Viro
  2022-12-17  2:31               ` Linus Torvalds
  0 siblings, 2 replies; 14+ messages in thread
From: Boqun Feng @ 2022-12-16 23:54 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Damien Le Moal, Wei Chen, linux-ide,
	linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever,
	Jeff Layton, Peter Zijlstra

On Fri, Dec 16, 2022 at 11:39:21PM +0000, Al Viro wrote:
> [Boqun Feng Cc'd]
> 
> On Fri, Dec 16, 2022 at 03:26:21AM -0800, Linus Torvalds wrote:
> > On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > CPU1: ptrace(2)
> > >         ptrace_check_attach()
> > >                 read_lock(&tasklist_lock);
> > >
> > > CPU2: setpgid(2)
> > >         write_lock_irq(&tasklist_lock);
> > >         spins
> > >
> > > CPU1: takes an interrupt that would call kill_fasync().  grep and the
> > > first instance of kill_fasync() is in hpet_interrupt() - it's not
> > > something exotic.  IRQs disabled on CPU2 won't stop it.
> > >         kill_fasync(..., SIGIO, ...)
> > >                 kill_fasync_rcu()
> > >                         read_lock_irqsave(&fa->fa_lock, flags);
> > >                         send_sigio()
> > >                                 read_lock_irqsave(&fown->lock, flags);
> > >                                 read_lock(&tasklist_lock);
> > >
> > > ... and CPU1 spins as well.
> > 
> > Nope. See kernel/locking/qrwlock.c:
> 
> [snip rwlocks are inherently unfair, queued ones are somewhat milder, but
> all implementations have writers-starving behaviour for read_lock() at least
> when in_interrupt()]
> 
> D'oh...  Consider requested "Al, you are a moron" duly delivered...  I plead
> having been on way too low caffeine and too little sleep ;-/
> 
> Looking at the original report, looks like the scenario there is meant to be
> the following:
> 
> CPU1: read_lock(&tasklist_lock)
> 	tasklist_lock grabbed
> 
> CPU2: get an sg write(2) feeding request to libata; host->lock is taken,
> 	request is immediately completed and scsi_done() is about to be called.
> 	host->lock grabbed
> 
> CPU3: write_lock_irq(&tasklist_lock)
> 	spins on tasklist_lock until CPU1 gets through.
> 
> CPU2: get around to kill_fasync() called by sg_rq_end_io() and to grabbing
> 	tasklist_lock inside send_sigio()
> 	spins, since it's not in an interrupt and there's a pending writer
> 	host->lock is held, spin until CPU3 gets through.

Right, for a reader not in_interrupt(), it may be blocked by a random
waiting writer because of the fairness, even the lock is currently held
by a reader:

	CPU 1			CPU 2		CPU 3
	read_lock(&tasklist_lock); // get the lock

						write_lock_irq(&tasklist_lock); // wait for the lock

				read_lock(&tasklist_lock); // cannot get the lock because of the fairness

Regards,
Boqun

> 
> CPU1: take an interrupt, which on libata will try to grab host->lock
> 	tasklist_lock is held, spins on host->lock until CPU2 gets through
> 
> Am I reading it correctly?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-16 23:54             ` Boqun Feng
@ 2022-12-17  1:59               ` Al Viro
  2022-12-17  3:25                 ` Boqun Feng
  2022-12-17  2:31               ` Linus Torvalds
  1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2022-12-17  1:59 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Linus Torvalds, Damien Le Moal, Wei Chen, linux-ide,
	linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever,
	Jeff Layton, Peter Zijlstra

On Fri, Dec 16, 2022 at 03:54:09PM -0800, Boqun Feng wrote:
> On Fri, Dec 16, 2022 at 11:39:21PM +0000, Al Viro wrote:
> > [Boqun Feng Cc'd]
> > 
> > On Fri, Dec 16, 2022 at 03:26:21AM -0800, Linus Torvalds wrote:
> > > On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > CPU1: ptrace(2)
> > > >         ptrace_check_attach()
> > > >                 read_lock(&tasklist_lock);
> > > >
> > > > CPU2: setpgid(2)
> > > >         write_lock_irq(&tasklist_lock);
> > > >         spins
> > > >
> > > > CPU1: takes an interrupt that would call kill_fasync().  grep and the
> > > > first instance of kill_fasync() is in hpet_interrupt() - it's not
> > > > something exotic.  IRQs disabled on CPU2 won't stop it.
> > > >         kill_fasync(..., SIGIO, ...)
> > > >                 kill_fasync_rcu()
> > > >                         read_lock_irqsave(&fa->fa_lock, flags);
> > > >                         send_sigio()
> > > >                                 read_lock_irqsave(&fown->lock, flags);
> > > >                                 read_lock(&tasklist_lock);
> > > >
> > > > ... and CPU1 spins as well.
> > > 
> > > Nope. See kernel/locking/qrwlock.c:
> > 
> > [snip rwlocks are inherently unfair, queued ones are somewhat milder, but
> > all implementations have writers-starving behaviour for read_lock() at least
> > when in_interrupt()]
> > 
> > D'oh...  Consider requested "Al, you are a moron" duly delivered...  I plead
> > having been on way too low caffeine and too little sleep ;-/
> > 
> > Looking at the original report, looks like the scenario there is meant to be
> > the following:
> > 
> > CPU1: read_lock(&tasklist_lock)
> > 	tasklist_lock grabbed
> > 
> > CPU2: get an sg write(2) feeding request to libata; host->lock is taken,
> > 	request is immediately completed and scsi_done() is about to be called.
> > 	host->lock grabbed
> > 
> > CPU3: write_lock_irq(&tasklist_lock)
> > 	spins on tasklist_lock until CPU1 gets through.
> > 
> > CPU2: get around to kill_fasync() called by sg_rq_end_io() and to grabbing
> > 	tasklist_lock inside send_sigio()
> > 	spins, since it's not in an interrupt and there's a pending writer
> > 	host->lock is held, spin until CPU3 gets through.
> 
> Right, for a reader not in_interrupt(), it may be blocked by a random
> waiting writer because of the fairness, even the lock is currently held
> by a reader:
> 
> 	CPU 1			CPU 2		CPU 3
> 	read_lock(&tasklist_lock); // get the lock
> 
> 						write_lock_irq(&tasklist_lock); // wait for the lock
> 
> 				read_lock(&tasklist_lock); // cannot get the lock because of the fairness

IOW, any caller of scsi_done() from non-interrupt context while
holding a spinlock that is also taken in an interrupt...

And we have drivers/scsi/scsi_error.c:scsi_send_eh_cmnd(), which calls
->queuecommand() under a mutex, with
#define DEF_SCSI_QCMD(func_name) \
        int func_name(struct Scsi_Host *shost, struct scsi_cmnd *cmd)   \
        {                                                               \
                unsigned long irq_flags;                                \
                int rc;                                                 \
                spin_lock_irqsave(shost->host_lock, irq_flags);         \
                rc = func_name##_lck(cmd);                              \
                spin_unlock_irqrestore(shost->host_lock, irq_flags);    \
                return rc;                                              \
        }

being commonly used for ->queuecommand() instances.  So any scsi_done()
in foo_lck() (quite a few of such) + use of ->host_lock in interrupt
for the same driver (also common)...

I wonder why that hadn't triggered the same warning a long time
ago - these warnings had been around for at least two years.

Am I missing something here?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-16 23:54             ` Boqun Feng
  2022-12-17  1:59               ` Al Viro
@ 2022-12-17  2:31               ` Linus Torvalds
  2022-12-17  2:59                 ` Boqun Feng
  2022-12-17  3:05                 ` Al Viro
  1 sibling, 2 replies; 14+ messages in thread
From: Linus Torvalds @ 2022-12-17  2:31 UTC (permalink / raw)
  To: Boqun Feng, Waiman Long
  Cc: Al Viro, Damien Le Moal, Wei Chen, linux-ide, linux-kernel,
	syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton,
	Peter Zijlstra

Ok, let's bring in Waiman for the rwlock side.

On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> Right, for a reader not in_interrupt(), it may be blocked by a random
> waiting writer because of the fairness, even the lock is currently held
> by a reader:
>
>         CPU 1                   CPU 2           CPU 3
>         read_lock(&tasklist_lock); // get the lock
>
>                                                 write_lock_irq(&tasklist_lock); // wait for the lock
>
>                                 read_lock(&tasklist_lock); // cannot get the lock because of the fairness

But this should be ok - because CPU1 can make progress and eventually
release the lock.

So the tasklist_lock use is fine on its own - the reason interrupts
are special is because an interrupt on CPU 1 taking the lock for
reading would deadlock otherwise. As long as it happens on another
CPU, the original CPU should then be able to make progress.

But the problem here seems to be thst *another* lock is also involved
(in this case apparently "host->lock", and now if CPU1 and CPU2 get
these two locks in a different order, you can get an ABBA deadlock.

And apparently our lockdep machinery doesn't catch that issue, so it
doesn't get flagged.

I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep
treats rwlocks as being _always_ unfair, not knowing about that "it's
only unfair when it's in interrupt context".

Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock?

Oh, how I hate tasklist_lock. It's pretty much our one remaining "one
big lock". It's been a pain for a long long time.

            Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-17  2:31               ` Linus Torvalds
@ 2022-12-17  2:59                 ` Boqun Feng
  2022-12-17  3:05                 ` Al Viro
  1 sibling, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2022-12-17  2:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, Al Viro, Damien Le Moal, Wei Chen, linux-ide,
	linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever,
	Jeff Layton, Peter Zijlstra

On Fri, Dec 16, 2022 at 08:31:54PM -0600, Linus Torvalds wrote:
> Ok, let's bring in Waiman for the rwlock side.
> 
> On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > Right, for a reader not in_interrupt(), it may be blocked by a random
> > waiting writer because of the fairness, even the lock is currently held
> > by a reader:
> >
> >         CPU 1                   CPU 2           CPU 3
> >         read_lock(&tasklist_lock); // get the lock
> >
> >                                                 write_lock_irq(&tasklist_lock); // wait for the lock
> >
> >                                 read_lock(&tasklist_lock); // cannot get the lock because of the fairness
> 
> But this should be ok - because CPU1 can make progress and eventually
> release the lock.
> 

Yes.

> So the tasklist_lock use is fine on its own - the reason interrupts
> are special is because an interrupt on CPU 1 taking the lock for
> reading would deadlock otherwise. As long as it happens on another
> CPU, the original CPU should then be able to make progress.
> 
> But the problem here seems to be thst *another* lock is also involved
> (in this case apparently "host->lock", and now if CPU1 and CPU2 get
> these two locks in a different order, you can get an ABBA deadlock.
> 

Right.

> And apparently our lockdep machinery doesn't catch that issue, so it
> doesn't get flagged.
> 

I'm confused. Isn't the original problem showing that lockdep catches
this?

> I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep
> treats rwlocks as being _always_ unfair, not knowing about that "it's
> only unfair when it's in interrupt context".
> 

The rules nowadays are:

*	If the reader is in_interrupt() or queued-spinlock implemention
	is not used, it's an unfair reader, i.e. it won't wait for
	any existing writer.

*	Otherwise, it's a fair reader.

> Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock?
> 

That's possible, but I need to make sure I understand the issue for
lockdep. It's that lockdep misses catching something or it has a false
positive?

Regards,
Boqun

> Oh, how I hate tasklist_lock. It's pretty much our one remaining "one
> big lock". It's been a pain for a long long time.
> 
>             Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-17  2:31               ` Linus Torvalds
  2022-12-17  2:59                 ` Boqun Feng
@ 2022-12-17  3:05                 ` Al Viro
  2022-12-17  4:41                   ` Waiman Long
  1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2022-12-17  3:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Boqun Feng, Waiman Long, Damien Le Moal, Wei Chen, linux-ide,
	linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever,
	Jeff Layton, Peter Zijlstra

On Fri, Dec 16, 2022 at 08:31:54PM -0600, Linus Torvalds wrote:
> Ok, let's bring in Waiman for the rwlock side.
> 
> On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > Right, for a reader not in_interrupt(), it may be blocked by a random
> > waiting writer because of the fairness, even the lock is currently held
> > by a reader:
> >
> >         CPU 1                   CPU 2           CPU 3
> >         read_lock(&tasklist_lock); // get the lock
> >
> >                                                 write_lock_irq(&tasklist_lock); // wait for the lock
> >
> >                                 read_lock(&tasklist_lock); // cannot get the lock because of the fairness
> 
> But this should be ok - because CPU1 can make progress and eventually
> release the lock.
> 
> So the tasklist_lock use is fine on its own - the reason interrupts
> are special is because an interrupt on CPU 1 taking the lock for
> reading would deadlock otherwise. As long as it happens on another
> CPU, the original CPU should then be able to make progress.
> 
> But the problem here seems to be thst *another* lock is also involved
> (in this case apparently "host->lock", and now if CPU1 and CPU2 get
> these two locks in a different order, you can get an ABBA deadlock.
> 
> And apparently our lockdep machinery doesn't catch that issue, so it
> doesn't get flagged.

Lockdep has actually caught that; the locks involved are mention in the
report (https://marc.info/?l=linux-ide&m=167094379710177&w=2).  The form
of report might have been better, but if anything, it doesn't mention
potential involvement of tasklist_lock writer, turning that into a deadlock.

OTOH, that's more or less implicit for the entire class:

read_lock(A)		[non-interrupt]
			local_irq_disable()	local_irq_disable()
			spin_lock(B)		write_lock(A)
			read_lock(A)
[in interrupt]
spin_lock(B)

is what that sort of reports is about.  In this case A is tasklist_lock,
B is host->lock.  Possible call chains for CPU1 and CPU2 are reported...

I wonder why analogues of that hadn't been reported for other SCSI hosts -
it's a really common pattern there...

> I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep
> treats rwlocks as being _always_ unfair, not knowing about that "it's
> only unfair when it's in interrupt context".
> 
> Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock?

ISTR threads about the possibility of explicit read_lock_unfair()...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-17  1:59               ` Al Viro
@ 2022-12-17  3:25                 ` Boqun Feng
  0 siblings, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2022-12-17  3:25 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Damien Le Moal, Wei Chen, linux-ide,
	linux-kernel, syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever,
	Jeff Layton, Peter Zijlstra

On Sat, Dec 17, 2022 at 01:59:32AM +0000, Al Viro wrote:
> On Fri, Dec 16, 2022 at 03:54:09PM -0800, Boqun Feng wrote:
> > On Fri, Dec 16, 2022 at 11:39:21PM +0000, Al Viro wrote:
> > > [Boqun Feng Cc'd]
> > > 
> > > On Fri, Dec 16, 2022 at 03:26:21AM -0800, Linus Torvalds wrote:
> > > > On Thu, Dec 15, 2022 at 7:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > > >
> > > > > CPU1: ptrace(2)
> > > > >         ptrace_check_attach()
> > > > >                 read_lock(&tasklist_lock);
> > > > >
> > > > > CPU2: setpgid(2)
> > > > >         write_lock_irq(&tasklist_lock);
> > > > >         spins
> > > > >
> > > > > CPU1: takes an interrupt that would call kill_fasync().  grep and the
> > > > > first instance of kill_fasync() is in hpet_interrupt() - it's not
> > > > > something exotic.  IRQs disabled on CPU2 won't stop it.
> > > > >         kill_fasync(..., SIGIO, ...)
> > > > >                 kill_fasync_rcu()
> > > > >                         read_lock_irqsave(&fa->fa_lock, flags);
> > > > >                         send_sigio()
> > > > >                                 read_lock_irqsave(&fown->lock, flags);
> > > > >                                 read_lock(&tasklist_lock);
> > > > >
> > > > > ... and CPU1 spins as well.
> > > > 
> > > > Nope. See kernel/locking/qrwlock.c:
> > > 
> > > [snip rwlocks are inherently unfair, queued ones are somewhat milder, but
> > > all implementations have writers-starving behaviour for read_lock() at least
> > > when in_interrupt()]
> > > 
> > > D'oh...  Consider requested "Al, you are a moron" duly delivered...  I plead
> > > having been on way too low caffeine and too little sleep ;-/
> > > 
> > > Looking at the original report, looks like the scenario there is meant to be
> > > the following:
> > > 
> > > CPU1: read_lock(&tasklist_lock)
> > > 	tasklist_lock grabbed
> > > 
> > > CPU2: get an sg write(2) feeding request to libata; host->lock is taken,
> > > 	request is immediately completed and scsi_done() is about to be called.
> > > 	host->lock grabbed
> > > 
> > > CPU3: write_lock_irq(&tasklist_lock)
> > > 	spins on tasklist_lock until CPU1 gets through.
> > > 
> > > CPU2: get around to kill_fasync() called by sg_rq_end_io() and to grabbing
> > > 	tasklist_lock inside send_sigio()
> > > 	spins, since it's not in an interrupt and there's a pending writer
> > > 	host->lock is held, spin until CPU3 gets through.
> > 
> > Right, for a reader not in_interrupt(), it may be blocked by a random
> > waiting writer because of the fairness, even the lock is currently held
> > by a reader:
> > 
> > 	CPU 1			CPU 2		CPU 3
> > 	read_lock(&tasklist_lock); // get the lock
> > 
> > 						write_lock_irq(&tasklist_lock); // wait for the lock
> > 
> > 				read_lock(&tasklist_lock); // cannot get the lock because of the fairness
> 
> IOW, any caller of scsi_done() from non-interrupt context while
> holding a spinlock that is also taken in an interrupt...
> 
> And we have drivers/scsi/scsi_error.c:scsi_send_eh_cmnd(), which calls
> ->queuecommand() under a mutex, with
> #define DEF_SCSI_QCMD(func_name) \
>         int func_name(struct Scsi_Host *shost, struct scsi_cmnd *cmd)   \
>         {                                                               \
>                 unsigned long irq_flags;                                \
>                 int rc;                                                 \
>                 spin_lock_irqsave(shost->host_lock, irq_flags);         \
>                 rc = func_name##_lck(cmd);                              \
>                 spin_unlock_irqrestore(shost->host_lock, irq_flags);    \
>                 return rc;                                              \
>         }
> 
> being commonly used for ->queuecommand() instances.  So any scsi_done()
> in foo_lck() (quite a few of such) + use of ->host_lock in interrupt
> for the same driver (also common)...
> 
> I wonder why that hadn't triggered the same warning a long time
> ago - these warnings had been around for at least two years.
> 

FWIW, the complete dependency chain is:

	&host->lock --> &new->fa_lock --> &f->f_owner.lock --> tasklist_lock

for the "&f->f_owner.lock" part to get into lockdep's radar, the
following call trace needs to appear once:

	kill_fasync():
	  kill_fasync_rcu():
	    send_sigio()

not sure whether it's rare or not though. And ->fa_lock also had its own
issue:

	https://lore.kernel.org/lkml/20210702091831.615042-1-desmondcheongzx@gmail.com/

which may have covered &host->lock for a while ;-)

Regards,
Boqun

> Am I missing something here?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible deadlock in __ata_sff_interrupt
  2022-12-17  3:05                 ` Al Viro
@ 2022-12-17  4:41                   ` Waiman Long
  0 siblings, 0 replies; 14+ messages in thread
From: Waiman Long @ 2022-12-17  4:41 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Boqun Feng, Damien Le Moal, Wei Chen, linux-ide, linux-kernel,
	syzkaller-bugs, syzbot, linux-fsdevel, Chuck Lever, Jeff Layton,
	Peter Zijlstra

On 12/16/22 22:05, Al Viro wrote:
> On Fri, Dec 16, 2022 at 08:31:54PM -0600, Linus Torvalds wrote:
>> Ok, let's bring in Waiman for the rwlock side.
>>
>> On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>>> Right, for a reader not in_interrupt(), it may be blocked by a random
>>> waiting writer because of the fairness, even the lock is currently held
>>> by a reader:
>>>
>>>          CPU 1                   CPU 2           CPU 3
>>>          read_lock(&tasklist_lock); // get the lock
>>>
>>>                                                  write_lock_irq(&tasklist_lock); // wait for the lock
>>>
>>>                                  read_lock(&tasklist_lock); // cannot get the lock because of the fairness
>> But this should be ok - because CPU1 can make progress and eventually
>> release the lock.
>>
>> So the tasklist_lock use is fine on its own - the reason interrupts
>> are special is because an interrupt on CPU 1 taking the lock for
>> reading would deadlock otherwise. As long as it happens on another
>> CPU, the original CPU should then be able to make progress.
>>
>> But the problem here seems to be thst *another* lock is also involved
>> (in this case apparently "host->lock", and now if CPU1 and CPU2 get
>> these two locks in a different order, you can get an ABBA deadlock.
>>
>> And apparently our lockdep machinery doesn't catch that issue, so it
>> doesn't get flagged.
> Lockdep has actually caught that; the locks involved are mention in the
> report (https://marc.info/?l=linux-ide&m=167094379710177&w=2).  The form
> of report might have been better, but if anything, it doesn't mention
> potential involvement of tasklist_lock writer, turning that into a deadlock.
>
> OTOH, that's more or less implicit for the entire class:
>
> read_lock(A)		[non-interrupt]
> 			local_irq_disable()	local_irq_disable()
> 			spin_lock(B)		write_lock(A)
> 			read_lock(A)
> [in interrupt]
> spin_lock(B)
>
> is what that sort of reports is about.  In this case A is tasklist_lock,
> B is host->lock.  Possible call chains for CPU1 and CPU2 are reported...
>
> I wonder why analogues of that hadn't been reported for other SCSI hosts -
> it's a really common pattern there...
>
>> I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep
>> treats rwlocks as being _always_ unfair, not knowing about that "it's
>> only unfair when it's in interrupt context".
>>
>> Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock?
That may not be a good idea as the cacheline bouncing problem will be 
back with reduced performance.
> ISTR threads about the possibility of explicit read_lock_unfair()...

Another possible alternative is to treat the read_lock as unfair if 
interrupt has been disabled as I think we should reduce the interrupt 
disabled interval as much as possible.

Thought?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-12-17  4:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-13 15:09 possible deadlock in __ata_sff_interrupt Wei Chen
2022-12-15  9:48 ` Damien Le Moal
2022-12-15 15:19   ` Al Viro
2022-12-16  1:44     ` Damien Le Moal
2022-12-16  3:41       ` Al Viro
2022-12-16 11:26         ` Linus Torvalds
2022-12-16 23:39           ` Al Viro
2022-12-16 23:54             ` Boqun Feng
2022-12-17  1:59               ` Al Viro
2022-12-17  3:25                 ` Boqun Feng
2022-12-17  2:31               ` Linus Torvalds
2022-12-17  2:59                 ` Boqun Feng
2022-12-17  3:05                 ` Al Viro
2022-12-17  4:41                   ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.