linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock)
  2020-11-01  9:26 lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock) Andrea Righi
@ 2020-10-31 10:17 ` Boqun Feng
  2020-11-02  7:33   ` Andrea Righi
  2020-11-01 16:28 ` Pavel Machek
  1 sibling, 1 reply; 8+ messages in thread
From: Boqun Feng @ 2020-10-31 10:17 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Pavel Machek, Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

Hi Andrea,

On Sun, Nov 01, 2020 at 10:26:14AM +0100, Andrea Righi wrote:
> I'm getting the following lockdep splat (see below).
> 
> Apparently this warning starts to be reported after applying:
> 
>  e918188611f0 ("locking: More accurate annotations for read_lock()")
> 
> It looks like a false positive to me, but it made me think a bit and
> IIUC there can be still a potential deadlock, even if the deadlock
> scenario is a bit different than what lockdep is showing.
> 
> In the assumption that read-locks are recursive only in_interrupt()
> context (as stated in e918188611f0), the following scenario can still
> happen:
> 
>  CPU0                                     CPU1
>  ----                                     ----
>  read_lock(&trig->leddev_list_lock);
>                                           write_lock(&trig->leddev_list_lock);
>  <soft-irq>
>  kbd_bh()
>    -> read_lock(&trig->leddev_list_lock);
> 
>  *** DEADLOCK ***
> 
> The write-lock is waiting on CPU1 and the second read_lock() on CPU0
> would be blocked by the write-lock *waiter* on CPU1 => deadlock.
> 

No, this is not a deadlock, as a write-lock waiter only blocks
*non-recursive* readers, so since the read_lock() in kbd_bh() is called
in soft-irq (which in_interrupt() returns true), so it's a recursive
reader and won't get blocked by the write-lock waiter.

> In that case we could prevent this deadlock condition using a workqueue
> to call kbd_propagate_led_state() instead of calling it directly from
> kbd_bh() (even if lockdep would still report the false positive).
> 

The deadlock senario reported by the following splat is:

	
	CPU 0:				CPU 1:					CPU 2:
	-----				-----					-----
	led_trigger_event():
	  read_lock(&trig->leddev_list_lock);
					<work queue processing>
	  				ata_hsm_qs_complete():
					  spin_lock_irqsave(&host->lock);
					  					write_lock(&trig->leddev_list_lock);
					  ata_port_freeze():
					    ata_do_link_abort():
					      ata_qc_complete():
					        ledtrig_disk_activity():
						  led_trigger_blink_oneshot():
						    read_lock(&trig->leddev_list_lock);
						    // ^ not in in_interrupt() context, so could get blocked by CPU 2
	<interrupt>
	  ata_bmdma_interrupt():
	    spin_lock_irqsave(&host->lock);
	  
, where CPU 0 is blocked by CPU 1 because of the spin_lock_irqsave() in
ata_bmdma_interrupt() and CPU 1 is blocked by CPU 2 because of the
read_lock() in led_trigger_blink_oneshot() and CPU 2 is blocked by CPU 0
because of an arbitrary writer on &trig->leddev_list_lock.

So I don't think it's false positive, but I might miss something
obvious, because I don't know what the code here actually does ;-)

Regards,
Boqun

> Can you help me to understand if this assumption is correct or if I'm
> missing something?
> 
> Thanks,
> -Andrea
> 
> Lockdep trace:
> 
> [    1.087260] WARNING: possible irq lock inversion dependency detected
> [    1.087267] 5.10.0-rc1+ #18 Not tainted
> [    1.088829] softirqs last  enabled at (0): [<ffffffff8108ea17>] copy_process+0x6c7/0x1c70
> [    1.089662] --------------------------------------------------------
> [    1.090284] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [    1.092766] swapper/3/0 just changed the state of lock:
> [    1.093325] ffff888006394c18 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200
> [    1.094190] but this lock took another, HARDIRQ-READ-unsafe lock in the past:
> [    1.094944]  (&trig->leddev_list_lock){.+.?}-{2:2}
> [    1.094946] 
> [    1.094946] 
> [    1.094946] and interrupts could create inverse lock ordering between them.
> [    1.094946] 
> [    1.096600] 
> [    1.096600] other info that might help us debug this:
> [    1.097250]  Possible interrupt unsafe locking scenario:
> [    1.097250] 
> [    1.097940]        CPU0                    CPU1
> [    1.098401]        ----                    ----
> [    1.098873]   lock(&trig->leddev_list_lock);
> [    1.099315]                                local_irq_disable();
> [    1.099932]                                lock(&host->lock);
> [    1.100527]                                lock(&trig->leddev_list_lock);
> [    1.101219]   <Interrupt>
> [    1.101490]     lock(&host->lock);
> [    1.101844] 
> [    1.101844]  *** DEADLOCK ***
> [    1.101844] 
> [    1.102447] no locks held by swapper/3/0.
> [    1.102858] 
> [    1.102858] the shortest dependencies between 2nd lock and 1st lock:
> [    1.103646]  -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 {
> [    1.104248]     HARDIRQ-ON-R at:
> [    1.104600]                       lock_acquire+0xec/0x430
> [    1.105120]                       _raw_read_lock+0x42/0x90
> [    1.105839]                       led_trigger_event+0x2b/0x70
> [    1.106348]                       rfkill_global_led_trigger_worker+0x94/0xb0
> [    1.106970]                       process_one_work+0x240/0x560
> [    1.107498]                       worker_thread+0x58/0x3d0
> [    1.107984]                       kthread+0x151/0x170
> [    1.108447]                       ret_from_fork+0x1f/0x30
> [    1.108924]     IN-SOFTIRQ-R at:
> [    1.109227]                       lock_acquire+0xec/0x430
> [    1.109820]                       _raw_read_lock+0x42/0x90
> [    1.110404]                       led_trigger_event+0x2b/0x70
> [    1.111051]                       kbd_bh+0x9e/0xc0
> [    1.111558]                       tasklet_action_common.constprop.0+0xe9/0x100
> [    1.112265]                       tasklet_action+0x22/0x30
> [    1.112917]                       __do_softirq+0xcc/0x46d
> [    1.113474]                       run_ksoftirqd+0x3f/0x70
> [    1.114033]                       smpboot_thread_fn+0x116/0x1f0
> [    1.114597]                       kthread+0x151/0x170
> [    1.115118]                       ret_from_fork+0x1f/0x30
> [    1.115674]     SOFTIRQ-ON-R at:
> [    1.115987]                       lock_acquire+0xec/0x430
> [    1.116468]                       _raw_read_lock+0x42/0x90
> [    1.116949]                       led_trigger_event+0x2b/0x70
> [    1.117454]                       rfkill_global_led_trigger_worker+0x94/0xb0
> [    1.118070]                       process_one_work+0x240/0x560
> [    1.118659]                       worker_thread+0x58/0x3d0
> [    1.119225]                       kthread+0x151/0x170
> [    1.119740]                       ret_from_fork+0x1f/0x30
> [    1.120294]     INITIAL READ USE at:
> [    1.120639]                           lock_acquire+0xec/0x430
> [    1.121141]                           _raw_read_lock+0x42/0x90
> [    1.121649]                           led_trigger_event+0x2b/0x70
> [    1.122177]                           rfkill_global_led_trigger_worker+0x94/0xb0
> [    1.122841]                           process_one_work+0x240/0x560
> [    1.123375]                           worker_thread+0x58/0x3d0
> [    1.123880]                           kthread+0x151/0x170
> [    1.124354]                           ret_from_fork+0x1f/0x30
> [    1.124853]   }
> [    1.125016]   ... key      at: [<ffffffff83da4c00>] __key.0+0x0/0x10
> [    1.125564]   ... acquired at:
> [    1.125836]    _raw_read_lock+0x42/0x90
> [    1.126176]    led_trigger_blink_oneshot+0x3b/0x90
> [    1.126596]    ledtrig_disk_activity+0x3c/0xa0
> [    1.126985]    ata_qc_complete+0x26/0x450
> [    1.127347]    ata_do_link_abort+0xa3/0xe0
> [    1.127705]    ata_port_freeze+0x2e/0x40
> [    1.128050]    ata_hsm_qc_complete+0x94/0xa0
> [    1.128431]    ata_sff_hsm_move+0x177/0x7a0
> [    1.128796]    ata_sff_pio_task+0xc7/0x1b0
> [    1.129154]    process_one_work+0x240/0x560
> [    1.129519]    worker_thread+0x58/0x3d0
> [    1.129857]    kthread+0x151/0x170
> [    1.130156]    ret_from_fork+0x1f/0x30
> [    1.130484] 
> [    1.130622] -> (&host->lock){-...}-{2:2} ops: 69 {
> [    1.131043]    IN-HARDIRQ-W at:
> [    1.131322]                     lock_acquire+0xec/0x430
> [    1.131777]                     _raw_spin_lock_irqsave+0x52/0xa0
> [    1.132303]                     ata_bmdma_interrupt+0x27/0x200
> [    1.132807]                     __handle_irq_event_percpu+0xd5/0x2b0
> [    1.133359]                     handle_irq_event+0x57/0xb0
> [    1.133837]                     handle_edge_irq+0x8c/0x230
> [    1.134314]                     asm_call_irq_on_stack+0xf/0x20
> [    1.134819]                     common_interrupt+0x100/0x1c0
> [    1.135328]                     asm_common_interrupt+0x1e/0x40
> [    1.135879]                     native_safe_halt+0xe/0x10
> [    1.136539]                     arch_cpu_idle+0x15/0x20
> [    1.137090]                     default_idle_call+0x59/0x1c0
> [    1.137584]                     do_idle+0x22c/0x2c0
> [    1.138013]                     cpu_startup_entry+0x20/0x30
> [    1.138507]                     start_secondary+0x11d/0x150
> [    1.138992]                     secondary_startup_64_no_verify+0xa6/0xab
> [    1.139572]    INITIAL USE at:
> [    1.139843]                    lock_acquire+0xec/0x430
> [    1.140303]                    _raw_spin_lock_irqsave+0x52/0xa0
> [    1.140865]                    ata_dev_init+0x54/0xe0
> [    1.141376]                    ata_link_init+0x8b/0xd0
> [    1.141857]                    ata_port_alloc+0x1f1/0x210
> [    1.142417]                    ata_host_alloc+0xf1/0x130
> [    1.142932]                    ata_host_alloc_pinfo+0x14/0xb0
> [    1.143453]                    ata_pci_sff_prepare_host+0x41/0xa0
> [    1.144031]                    ata_pci_bmdma_prepare_host+0x14/0x30
> [    1.144591]                    piix_init_one+0x21f/0x600
> [    1.145063]                    local_pci_probe+0x48/0x80
> [    1.145524]                    pci_device_probe+0x105/0x1c0
> [    1.146006]                    really_probe+0x221/0x490
> [    1.146459]                    driver_probe_device+0xe9/0x160
> [    1.146953]                    device_driver_attach+0xb2/0xc0
> [    1.147449]                    __driver_attach+0x91/0x150
> [    1.147916]                    bus_for_each_dev+0x81/0xc0
> [    1.148392]                    driver_attach+0x1e/0x20
> [    1.148893]                    bus_add_driver+0x138/0x1f0
> [    1.149447]                    driver_register+0x91/0xf0
> [    1.149964]                    __pci_register_driver+0x73/0x80
> [    1.150522]                    piix_init+0x1e/0x2e
> [    1.150999]                    do_one_initcall+0x5f/0x2d0
> [    1.151549]                    kernel_init_freeable+0x26f/0x2cf
> [    1.152127]                    kernel_init+0xe/0x113
> [    1.152638]                    ret_from_fork+0x1f/0x30
> [    1.153097]  }
> [    1.153262]  ... key      at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10
> [    1.153890]  ... acquired at:
> [    1.154204]    __lock_acquire+0x9da/0x2370
> [    1.154588]    lock_acquire+0xec/0x430
> [    1.154973]    _raw_spin_lock_irqsave+0x52/0xa0
> [    1.155418]    ata_bmdma_interrupt+0x27/0x200
> [    1.155857]    __handle_irq_event_percpu+0xd5/0x2b0
> [    1.156312]    handle_irq_event+0x57/0xb0
> [    1.156701]    handle_edge_irq+0x8c/0x230
> [    1.157087]    asm_call_irq_on_stack+0xf/0x20
> [    1.157524]    common_interrupt+0x100/0x1c0
> [    1.157914]    asm_common_interrupt+0x1e/0x40
> [    1.158298]    native_safe_halt+0xe/0x10
> [    1.158679]    arch_cpu_idle+0x15/0x20
> [    1.159051]    default_idle_call+0x59/0x1c0
> [    1.159427]    do_idle+0x22c/0x2c0
> [    1.159768]    cpu_startup_entry+0x20/0x30
> [    1.160154]    start_secondary+0x11d/0x150
> [    1.160551]    secondary_startup_64_no_verify+0xa6/0xab
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock)
@ 2020-11-01  9:26 Andrea Righi
  2020-10-31 10:17 ` Boqun Feng
  2020-11-01 16:28 ` Pavel Machek
  0 siblings, 2 replies; 8+ messages in thread
From: Andrea Righi @ 2020-11-01  9:26 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Pavel Machek, Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

I'm getting the following lockdep splat (see below).

Apparently this warning starts to be reported after applying:

 e918188611f0 ("locking: More accurate annotations for read_lock()")

It looks like a false positive to me, but it made me think a bit and
IIUC there can be still a potential deadlock, even if the deadlock
scenario is a bit different than what lockdep is showing.

In the assumption that read-locks are recursive only in_interrupt()
context (as stated in e918188611f0), the following scenario can still
happen:

 CPU0                                     CPU1
 ----                                     ----
 read_lock(&trig->leddev_list_lock);
                                          write_lock(&trig->leddev_list_lock);
 <soft-irq>
 kbd_bh()
   -> read_lock(&trig->leddev_list_lock);

 *** DEADLOCK ***

The write-lock is waiting on CPU1 and the second read_lock() on CPU0
would be blocked by the write-lock *waiter* on CPU1 => deadlock.

In that case we could prevent this deadlock condition using a workqueue
to call kbd_propagate_led_state() instead of calling it directly from
kbd_bh() (even if lockdep would still report the false positive).

Can you help me to understand if this assumption is correct or if I'm
missing something?

Thanks,
-Andrea

Lockdep trace:

[    1.087260] WARNING: possible irq lock inversion dependency detected
[    1.087267] 5.10.0-rc1+ #18 Not tainted
[    1.088829] softirqs last  enabled at (0): [<ffffffff8108ea17>] copy_process+0x6c7/0x1c70
[    1.089662] --------------------------------------------------------
[    1.090284] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    1.092766] swapper/3/0 just changed the state of lock:
[    1.093325] ffff888006394c18 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200
[    1.094190] but this lock took another, HARDIRQ-READ-unsafe lock in the past:
[    1.094944]  (&trig->leddev_list_lock){.+.?}-{2:2}
[    1.094946] 
[    1.094946] 
[    1.094946] and interrupts could create inverse lock ordering between them.
[    1.094946] 
[    1.096600] 
[    1.096600] other info that might help us debug this:
[    1.097250]  Possible interrupt unsafe locking scenario:
[    1.097250] 
[    1.097940]        CPU0                    CPU1
[    1.098401]        ----                    ----
[    1.098873]   lock(&trig->leddev_list_lock);
[    1.099315]                                local_irq_disable();
[    1.099932]                                lock(&host->lock);
[    1.100527]                                lock(&trig->leddev_list_lock);
[    1.101219]   <Interrupt>
[    1.101490]     lock(&host->lock);
[    1.101844] 
[    1.101844]  *** DEADLOCK ***
[    1.101844] 
[    1.102447] no locks held by swapper/3/0.
[    1.102858] 
[    1.102858] the shortest dependencies between 2nd lock and 1st lock:
[    1.103646]  -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 {
[    1.104248]     HARDIRQ-ON-R at:
[    1.104600]                       lock_acquire+0xec/0x430
[    1.105120]                       _raw_read_lock+0x42/0x90
[    1.105839]                       led_trigger_event+0x2b/0x70
[    1.106348]                       rfkill_global_led_trigger_worker+0x94/0xb0
[    1.106970]                       process_one_work+0x240/0x560
[    1.107498]                       worker_thread+0x58/0x3d0
[    1.107984]                       kthread+0x151/0x170
[    1.108447]                       ret_from_fork+0x1f/0x30
[    1.108924]     IN-SOFTIRQ-R at:
[    1.109227]                       lock_acquire+0xec/0x430
[    1.109820]                       _raw_read_lock+0x42/0x90
[    1.110404]                       led_trigger_event+0x2b/0x70
[    1.111051]                       kbd_bh+0x9e/0xc0
[    1.111558]                       tasklet_action_common.constprop.0+0xe9/0x100
[    1.112265]                       tasklet_action+0x22/0x30
[    1.112917]                       __do_softirq+0xcc/0x46d
[    1.113474]                       run_ksoftirqd+0x3f/0x70
[    1.114033]                       smpboot_thread_fn+0x116/0x1f0
[    1.114597]                       kthread+0x151/0x170
[    1.115118]                       ret_from_fork+0x1f/0x30
[    1.115674]     SOFTIRQ-ON-R at:
[    1.115987]                       lock_acquire+0xec/0x430
[    1.116468]                       _raw_read_lock+0x42/0x90
[    1.116949]                       led_trigger_event+0x2b/0x70
[    1.117454]                       rfkill_global_led_trigger_worker+0x94/0xb0
[    1.118070]                       process_one_work+0x240/0x560
[    1.118659]                       worker_thread+0x58/0x3d0
[    1.119225]                       kthread+0x151/0x170
[    1.119740]                       ret_from_fork+0x1f/0x30
[    1.120294]     INITIAL READ USE at:
[    1.120639]                           lock_acquire+0xec/0x430
[    1.121141]                           _raw_read_lock+0x42/0x90
[    1.121649]                           led_trigger_event+0x2b/0x70
[    1.122177]                           rfkill_global_led_trigger_worker+0x94/0xb0
[    1.122841]                           process_one_work+0x240/0x560
[    1.123375]                           worker_thread+0x58/0x3d0
[    1.123880]                           kthread+0x151/0x170
[    1.124354]                           ret_from_fork+0x1f/0x30
[    1.124853]   }
[    1.125016]   ... key      at: [<ffffffff83da4c00>] __key.0+0x0/0x10
[    1.125564]   ... acquired at:
[    1.125836]    _raw_read_lock+0x42/0x90
[    1.126176]    led_trigger_blink_oneshot+0x3b/0x90
[    1.126596]    ledtrig_disk_activity+0x3c/0xa0
[    1.126985]    ata_qc_complete+0x26/0x450
[    1.127347]    ata_do_link_abort+0xa3/0xe0
[    1.127705]    ata_port_freeze+0x2e/0x40
[    1.128050]    ata_hsm_qc_complete+0x94/0xa0
[    1.128431]    ata_sff_hsm_move+0x177/0x7a0
[    1.128796]    ata_sff_pio_task+0xc7/0x1b0
[    1.129154]    process_one_work+0x240/0x560
[    1.129519]    worker_thread+0x58/0x3d0
[    1.129857]    kthread+0x151/0x170
[    1.130156]    ret_from_fork+0x1f/0x30
[    1.130484] 
[    1.130622] -> (&host->lock){-...}-{2:2} ops: 69 {
[    1.131043]    IN-HARDIRQ-W at:
[    1.131322]                     lock_acquire+0xec/0x430
[    1.131777]                     _raw_spin_lock_irqsave+0x52/0xa0
[    1.132303]                     ata_bmdma_interrupt+0x27/0x200
[    1.132807]                     __handle_irq_event_percpu+0xd5/0x2b0
[    1.133359]                     handle_irq_event+0x57/0xb0
[    1.133837]                     handle_edge_irq+0x8c/0x230
[    1.134314]                     asm_call_irq_on_stack+0xf/0x20
[    1.134819]                     common_interrupt+0x100/0x1c0
[    1.135328]                     asm_common_interrupt+0x1e/0x40
[    1.135879]                     native_safe_halt+0xe/0x10
[    1.136539]                     arch_cpu_idle+0x15/0x20
[    1.137090]                     default_idle_call+0x59/0x1c0
[    1.137584]                     do_idle+0x22c/0x2c0
[    1.138013]                     cpu_startup_entry+0x20/0x30
[    1.138507]                     start_secondary+0x11d/0x150
[    1.138992]                     secondary_startup_64_no_verify+0xa6/0xab
[    1.139572]    INITIAL USE at:
[    1.139843]                    lock_acquire+0xec/0x430
[    1.140303]                    _raw_spin_lock_irqsave+0x52/0xa0
[    1.140865]                    ata_dev_init+0x54/0xe0
[    1.141376]                    ata_link_init+0x8b/0xd0
[    1.141857]                    ata_port_alloc+0x1f1/0x210
[    1.142417]                    ata_host_alloc+0xf1/0x130
[    1.142932]                    ata_host_alloc_pinfo+0x14/0xb0
[    1.143453]                    ata_pci_sff_prepare_host+0x41/0xa0
[    1.144031]                    ata_pci_bmdma_prepare_host+0x14/0x30
[    1.144591]                    piix_init_one+0x21f/0x600
[    1.145063]                    local_pci_probe+0x48/0x80
[    1.145524]                    pci_device_probe+0x105/0x1c0
[    1.146006]                    really_probe+0x221/0x490
[    1.146459]                    driver_probe_device+0xe9/0x160
[    1.146953]                    device_driver_attach+0xb2/0xc0
[    1.147449]                    __driver_attach+0x91/0x150
[    1.147916]                    bus_for_each_dev+0x81/0xc0
[    1.148392]                    driver_attach+0x1e/0x20
[    1.148893]                    bus_add_driver+0x138/0x1f0
[    1.149447]                    driver_register+0x91/0xf0
[    1.149964]                    __pci_register_driver+0x73/0x80
[    1.150522]                    piix_init+0x1e/0x2e
[    1.150999]                    do_one_initcall+0x5f/0x2d0
[    1.151549]                    kernel_init_freeable+0x26f/0x2cf
[    1.152127]                    kernel_init+0xe/0x113
[    1.152638]                    ret_from_fork+0x1f/0x30
[    1.153097]  }
[    1.153262]  ... key      at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10
[    1.153890]  ... acquired at:
[    1.154204]    __lock_acquire+0x9da/0x2370
[    1.154588]    lock_acquire+0xec/0x430
[    1.154973]    _raw_spin_lock_irqsave+0x52/0xa0
[    1.155418]    ata_bmdma_interrupt+0x27/0x200
[    1.155857]    __handle_irq_event_percpu+0xd5/0x2b0
[    1.156312]    handle_irq_event+0x57/0xb0
[    1.156701]    handle_edge_irq+0x8c/0x230
[    1.157087]    asm_call_irq_on_stack+0xf/0x20
[    1.157524]    common_interrupt+0x100/0x1c0
[    1.157914]    asm_common_interrupt+0x1e/0x40
[    1.158298]    native_safe_halt+0xe/0x10
[    1.158679]    arch_cpu_idle+0x15/0x20
[    1.159051]    default_idle_call+0x59/0x1c0
[    1.159427]    do_idle+0x22c/0x2c0
[    1.159768]    cpu_startup_entry+0x20/0x30
[    1.160154]    start_secondary+0x11d/0x150
[    1.160551]    secondary_startup_64_no_verify+0xa6/0xab


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock)
  2020-11-01  9:26 lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock) Andrea Righi
  2020-10-31 10:17 ` Boqun Feng
@ 2020-11-01 16:28 ` Pavel Machek
  2020-11-02  7:39   ` Andrea Righi
  1 sibling, 1 reply; 8+ messages in thread
From: Pavel Machek @ 2020-11-01 16:28 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Boqun Feng, Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1606 bytes --]

Hi!

> I'm getting the following lockdep splat (see below).
> 
> Apparently this warning starts to be reported after applying:
> 
>  e918188611f0 ("locking: More accurate annotations for read_lock()")
> 
> It looks like a false positive to me, but it made me think a bit and
> IIUC there can be still a potential deadlock, even if the deadlock
> scenario is a bit different than what lockdep is showing.
> 
> In the assumption that read-locks are recursive only in_interrupt()
> context (as stated in e918188611f0), the following scenario can still
> happen:
> 
>  CPU0                                     CPU1
>  ----                                     ----
>  read_lock(&trig->leddev_list_lock);
>                                           write_lock(&trig->leddev_list_lock);
>  <soft-irq>
>  kbd_bh()
>    -> read_lock(&trig->leddev_list_lock);
> 
>  *** DEADLOCK ***
> 
> The write-lock is waiting on CPU1 and the second read_lock() on CPU0
> would be blocked by the write-lock *waiter* on CPU1 => deadlock.
> 
> In that case we could prevent this deadlock condition using a workqueue
> to call kbd_propagate_led_state() instead of calling it directly from
> kbd_bh() (even if lockdep would still report the false positive).

console.c is already using bh to delay work from
interrupt. But... that should not be neccessary. led_trigger_event
should already be callable from interrupt context, AFAICT.

Could this be resolved by doing the operations directly from keyboard
interrupt?

Best regards,
								Pavel
-- 
HTTP://www.livejournal.com/~pavelmachek

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock)
  2020-10-31 10:17 ` Boqun Feng
@ 2020-11-02  7:33   ` Andrea Righi
  2020-11-02  8:56     ` Pavel Machek
  0 siblings, 1 reply; 8+ messages in thread
From: Andrea Righi @ 2020-11-02  7:33 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Pavel Machek, Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On Sat, Oct 31, 2020 at 06:17:40PM +0800, Boqun Feng wrote:
> Hi Andrea,
> 
> On Sun, Nov 01, 2020 at 10:26:14AM +0100, Andrea Righi wrote:
> > I'm getting the following lockdep splat (see below).
> > 
> > Apparently this warning starts to be reported after applying:
> > 
> >  e918188611f0 ("locking: More accurate annotations for read_lock()")
> > 
> > It looks like a false positive to me, but it made me think a bit and
> > IIUC there can be still a potential deadlock, even if the deadlock
> > scenario is a bit different than what lockdep is showing.
> > 
> > In the assumption that read-locks are recursive only in_interrupt()
> > context (as stated in e918188611f0), the following scenario can still
> > happen:
> > 
> >  CPU0                                     CPU1
> >  ----                                     ----
> >  read_lock(&trig->leddev_list_lock);
> >                                           write_lock(&trig->leddev_list_lock);
> >  <soft-irq>
> >  kbd_bh()
> >    -> read_lock(&trig->leddev_list_lock);
> > 
> >  *** DEADLOCK ***
> > 
> > The write-lock is waiting on CPU1 and the second read_lock() on CPU0
> > would be blocked by the write-lock *waiter* on CPU1 => deadlock.
> > 
> 
> No, this is not a deadlock, as a write-lock waiter only blocks
> *non-recursive* readers, so since the read_lock() in kbd_bh() is called
> in soft-irq (which in_interrupt() returns true), so it's a recursive
> reader and won't get blocked by the write-lock waiter.

That's right, I was missing that in_interrupt() returns true also from
soft-irq context.

> 
> > In that case we could prevent this deadlock condition using a workqueue
> > to call kbd_propagate_led_state() instead of calling it directly from
> > kbd_bh() (even if lockdep would still report the false positive).
> > 
> 
> The deadlock senario reported by the following splat is:
> 
> 	
> 	CPU 0:				CPU 1:					CPU 2:
> 	-----				-----					-----
> 	led_trigger_event():
> 	  read_lock(&trig->leddev_list_lock);
> 					<work queue processing>
> 	  				ata_hsm_qs_complete():
> 					  spin_lock_irqsave(&host->lock);
> 					  					write_lock(&trig->leddev_list_lock);
> 					  ata_port_freeze():
> 					    ata_do_link_abort():
> 					      ata_qc_complete():
> 					        ledtrig_disk_activity():
> 						  led_trigger_blink_oneshot():
> 						    read_lock(&trig->leddev_list_lock);
> 						    // ^ not in in_interrupt() context, so could get blocked by CPU 2
> 	<interrupt>
> 	  ata_bmdma_interrupt():
> 	    spin_lock_irqsave(&host->lock);
> 	  
> , where CPU 0 is blocked by CPU 1 because of the spin_lock_irqsave() in
> ata_bmdma_interrupt() and CPU 1 is blocked by CPU 2 because of the
> read_lock() in led_trigger_blink_oneshot() and CPU 2 is blocked by CPU 0
> because of an arbitrary writer on &trig->leddev_list_lock.
> 
> So I don't think it's false positive, but I might miss something
> obvious, because I don't know what the code here actually does ;-)

With the CPU2 part it all makes sense now and lockdep was right. :)

At this point I think we could just schedule a separate work to do the
led trigger and avoid calling it with host->lock held and that should
prevent the deadlock. I'll send a patch to do that.

Thanks tons for you detailed explanation!

-Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock)
  2020-11-01 16:28 ` Pavel Machek
@ 2020-11-02  7:39   ` Andrea Righi
  0 siblings, 0 replies; 8+ messages in thread
From: Andrea Righi @ 2020-11-02  7:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Boqun Feng, Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On Sun, Nov 01, 2020 at 05:28:38PM +0100, Pavel Machek wrote:
> Hi!
> 
> > I'm getting the following lockdep splat (see below).
> > 
> > Apparently this warning starts to be reported after applying:
> > 
> >  e918188611f0 ("locking: More accurate annotations for read_lock()")
> > 
> > It looks like a false positive to me, but it made me think a bit and
> > IIUC there can be still a potential deadlock, even if the deadlock
> > scenario is a bit different than what lockdep is showing.
> > 
> > In the assumption that read-locks are recursive only in_interrupt()
> > context (as stated in e918188611f0), the following scenario can still
> > happen:
> > 
> >  CPU0                                     CPU1
> >  ----                                     ----
> >  read_lock(&trig->leddev_list_lock);
> >                                           write_lock(&trig->leddev_list_lock);
> >  <soft-irq>
> >  kbd_bh()
> >    -> read_lock(&trig->leddev_list_lock);
> > 
> >  *** DEADLOCK ***
> > 
> > The write-lock is waiting on CPU1 and the second read_lock() on CPU0
> > would be blocked by the write-lock *waiter* on CPU1 => deadlock.
> > 
> > In that case we could prevent this deadlock condition using a workqueue
> > to call kbd_propagate_led_state() instead of calling it directly from
> > kbd_bh() (even if lockdep would still report the false positive).
> 
> console.c is already using bh to delay work from
> interrupt. But... that should not be neccessary. led_trigger_event
> should already be callable from interrupt context, AFAICT.
> 
> Could this be resolved by doing the operations directly from keyboard
> interrupt?

As pointed out by Boqun this is not a deadlock condition, because the
read_lock() called from soft-irq context is recursive (I was missing
that in_interrupt() returns true also from soft-irq context).

But the initial lockdep warning was correct, so there is still a
potential deadlock condition between trig->leddev_list_lock and
host->lock. And I think this can be prevented simply by scheduling the
led triggering part in a separate work from ata_hsm_qs_complete(), so
that led_trigger_event() won't be called with host->lock held. I'll send
a patch soon to do that.

-Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock)
  2020-11-02  7:33   ` Andrea Righi
@ 2020-11-02  8:56     ` Pavel Machek
  2020-11-02  9:09       ` Andrea Righi
  0 siblings, 1 reply; 8+ messages in thread
From: Pavel Machek @ 2020-11-02  8:56 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Boqun Feng, Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3492 bytes --]

Hi!

> > > I'm getting the following lockdep splat (see below).
> > > 
> > > Apparently this warning starts to be reported after applying:
> > > 
> > >  e918188611f0 ("locking: More accurate annotations for read_lock()")
> > > 
> > > It looks like a false positive to me, but it made me think a bit and
> > > IIUC there can be still a potential deadlock, even if the deadlock
> > > scenario is a bit different than what lockdep is showing.
> > > 
> > > In the assumption that read-locks are recursive only in_interrupt()
> > > context (as stated in e918188611f0), the following scenario can still
> > > happen:
> > > 
> > >  CPU0                                     CPU1
> > >  ----                                     ----
> > >  read_lock(&trig->leddev_list_lock);
> > >                                           write_lock(&trig->leddev_list_lock);
> > >  <soft-irq>
> > >  kbd_bh()
> > >    -> read_lock(&trig->leddev_list_lock);
> > > 
> > >  *** DEADLOCK ***
> > > 
> > > The write-lock is waiting on CPU1 and the second read_lock() on CPU0
> > > would be blocked by the write-lock *waiter* on CPU1 => deadlock.
> > > 
> > 
> > No, this is not a deadlock, as a write-lock waiter only blocks
> > *non-recursive* readers, so since the read_lock() in kbd_bh() is called
> > in soft-irq (which in_interrupt() returns true), so it's a recursive
> > reader and won't get blocked by the write-lock waiter.
> 
> That's right, I was missing that in_interrupt() returns true also from
> soft-irq context.
> 
> > > In that case we could prevent this deadlock condition using a workqueue
> > > to call kbd_propagate_led_state() instead of calling it directly from
> > > kbd_bh() (even if lockdep would still report the false positive).
> > > 
> > 
> > The deadlock senario reported by the following splat is:
> > 
> > 	
> > 	CPU 0:				CPU 1:					CPU 2:
> > 	-----				-----					-----
> > 	led_trigger_event():
> > 	  read_lock(&trig->leddev_list_lock);
> > 					<work queue processing>
> > 	  				ata_hsm_qs_complete():
> > 					  spin_lock_irqsave(&host->lock);
> > 					  					write_lock(&trig->leddev_list_lock);
> > 					  ata_port_freeze():
> > 					    ata_do_link_abort():
> > 					      ata_qc_complete():
> > 					        ledtrig_disk_activity():
> > 						  led_trigger_blink_oneshot():
> > 						    read_lock(&trig->leddev_list_lock);
> > 						    // ^ not in in_interrupt() context, so could get blocked by CPU 2
> > 	<interrupt>
> > 	  ata_bmdma_interrupt():
> > 	    spin_lock_irqsave(&host->lock);
> > 	  
> > , where CPU 0 is blocked by CPU 1 because of the spin_lock_irqsave() in
> > ata_bmdma_interrupt() and CPU 1 is blocked by CPU 2 because of the
> > read_lock() in led_trigger_blink_oneshot() and CPU 2 is blocked by CPU 0
> > because of an arbitrary writer on &trig->leddev_list_lock.
> > 
> > So I don't think it's false positive, but I might miss something
> > obvious, because I don't know what the code here actually does ;-)
> 
> With the CPU2 part it all makes sense now and lockdep was right. :)
> 
> At this point I think we could just schedule a separate work to do the
> led trigger and avoid calling it with host->lock held and that should
> prevent the deadlock. I'll send a patch to do that.

Let's... not do that, unless we have no choice.

Would it help if leddev_list_lock used _irqsave() locking?

Best regards,
								Pavel
-- 
http://www.livejournal.com/~pavelmachek

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock)
  2020-11-02  8:56     ` Pavel Machek
@ 2020-11-02  9:09       ` Andrea Righi
  2020-11-06  7:40         ` Andrea Righi
  0 siblings, 1 reply; 8+ messages in thread
From: Andrea Righi @ 2020-11-02  9:09 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Boqun Feng, Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On Mon, Nov 02, 2020 at 09:56:58AM +0100, Pavel Machek wrote:
> Hi!
> 
> > > > I'm getting the following lockdep splat (see below).
> > > > 
> > > > Apparently this warning starts to be reported after applying:
> > > > 
> > > >  e918188611f0 ("locking: More accurate annotations for read_lock()")
> > > > 
> > > > It looks like a false positive to me, but it made me think a bit and
> > > > IIUC there can be still a potential deadlock, even if the deadlock
> > > > scenario is a bit different than what lockdep is showing.
> > > > 
> > > > In the assumption that read-locks are recursive only in_interrupt()
> > > > context (as stated in e918188611f0), the following scenario can still
> > > > happen:
> > > > 
> > > >  CPU0                                     CPU1
> > > >  ----                                     ----
> > > >  read_lock(&trig->leddev_list_lock);
> > > >                                           write_lock(&trig->leddev_list_lock);
> > > >  <soft-irq>
> > > >  kbd_bh()
> > > >    -> read_lock(&trig->leddev_list_lock);
> > > > 
> > > >  *** DEADLOCK ***
> > > > 
> > > > The write-lock is waiting on CPU1 and the second read_lock() on CPU0
> > > > would be blocked by the write-lock *waiter* on CPU1 => deadlock.
> > > > 
> > > 
> > > No, this is not a deadlock, as a write-lock waiter only blocks
> > > *non-recursive* readers, so since the read_lock() in kbd_bh() is called
> > > in soft-irq (which in_interrupt() returns true), so it's a recursive
> > > reader and won't get blocked by the write-lock waiter.
> > 
> > That's right, I was missing that in_interrupt() returns true also from
> > soft-irq context.
> > 
> > > > In that case we could prevent this deadlock condition using a workqueue
> > > > to call kbd_propagate_led_state() instead of calling it directly from
> > > > kbd_bh() (even if lockdep would still report the false positive).
> > > > 
> > > 
> > > The deadlock senario reported by the following splat is:
> > > 
> > > 	
> > > 	CPU 0:				CPU 1:					CPU 2:
> > > 	-----				-----					-----
> > > 	led_trigger_event():
> > > 	  read_lock(&trig->leddev_list_lock);
> > > 					<work queue processing>
> > > 	  				ata_hsm_qs_complete():
> > > 					  spin_lock_irqsave(&host->lock);
> > > 					  					write_lock(&trig->leddev_list_lock);
> > > 					  ata_port_freeze():
> > > 					    ata_do_link_abort():
> > > 					      ata_qc_complete():
> > > 					        ledtrig_disk_activity():
> > > 						  led_trigger_blink_oneshot():
> > > 						    read_lock(&trig->leddev_list_lock);
> > > 						    // ^ not in in_interrupt() context, so could get blocked by CPU 2
> > > 	<interrupt>
> > > 	  ata_bmdma_interrupt():
> > > 	    spin_lock_irqsave(&host->lock);
> > > 	  
> > > , where CPU 0 is blocked by CPU 1 because of the spin_lock_irqsave() in
> > > ata_bmdma_interrupt() and CPU 1 is blocked by CPU 2 because of the
> > > read_lock() in led_trigger_blink_oneshot() and CPU 2 is blocked by CPU 0
> > > because of an arbitrary writer on &trig->leddev_list_lock.
> > > 
> > > So I don't think it's false positive, but I might miss something
> > > obvious, because I don't know what the code here actually does ;-)
> > 
> > With the CPU2 part it all makes sense now and lockdep was right. :)
> > 
> > At this point I think we could just schedule a separate work to do the
> > led trigger and avoid calling it with host->lock held and that should
> > prevent the deadlock. I'll send a patch to do that.
> 
> Let's... not do that, unless we have no choice.
> 
> Would it help if leddev_list_lock used _irqsave() locking?

Using read_lock_irqsave/irqrestore() in led_trigger_event() would be
enough to prevent the deadlock. If it's an acceptable solution I can
send a patch (already tested it and lockdep doesn't complain :)).

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock)
  2020-11-02  9:09       ` Andrea Righi
@ 2020-11-06  7:40         ` Andrea Righi
  0 siblings, 0 replies; 8+ messages in thread
From: Andrea Righi @ 2020-11-06  7:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Boqun Feng, Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On Mon, Nov 02, 2020 at 10:09:28AM +0100, Andrea Righi wrote:
> On Mon, Nov 02, 2020 at 09:56:58AM +0100, Pavel Machek wrote:
> > Hi!
> > 
> > > > > I'm getting the following lockdep splat (see below).
> > > > > 
> > > > > Apparently this warning starts to be reported after applying:
> > > > > 
> > > > >  e918188611f0 ("locking: More accurate annotations for read_lock()")
> > > > > 
> > > > > It looks like a false positive to me, but it made me think a bit and
> > > > > IIUC there can be still a potential deadlock, even if the deadlock
> > > > > scenario is a bit different than what lockdep is showing.
> > > > > 
> > > > > In the assumption that read-locks are recursive only in_interrupt()
> > > > > context (as stated in e918188611f0), the following scenario can still
> > > > > happen:
> > > > > 
> > > > >  CPU0                                     CPU1
> > > > >  ----                                     ----
> > > > >  read_lock(&trig->leddev_list_lock);
> > > > >                                           write_lock(&trig->leddev_list_lock);
> > > > >  <soft-irq>
> > > > >  kbd_bh()
> > > > >    -> read_lock(&trig->leddev_list_lock);
> > > > > 
> > > > >  *** DEADLOCK ***
> > > > > 
> > > > > The write-lock is waiting on CPU1 and the second read_lock() on CPU0
> > > > > would be blocked by the write-lock *waiter* on CPU1 => deadlock.
> > > > > 
> > > > 
> > > > No, this is not a deadlock, as a write-lock waiter only blocks
> > > > *non-recursive* readers, so since the read_lock() in kbd_bh() is called
> > > > in soft-irq (which in_interrupt() returns true), so it's a recursive
> > > > reader and won't get blocked by the write-lock waiter.
> > > 
> > > That's right, I was missing that in_interrupt() returns true also from
> > > soft-irq context.
> > > 
> > > > > In that case we could prevent this deadlock condition using a workqueue
> > > > > to call kbd_propagate_led_state() instead of calling it directly from
> > > > > kbd_bh() (even if lockdep would still report the false positive).
> > > > > 
> > > > 
> > > > The deadlock senario reported by the following splat is:
> > > > 
> > > > 	
> > > > 	CPU 0:				CPU 1:					CPU 2:
> > > > 	-----				-----					-----
> > > > 	led_trigger_event():
> > > > 	  read_lock(&trig->leddev_list_lock);
> > > > 					<work queue processing>
> > > > 	  				ata_hsm_qs_complete():
> > > > 					  spin_lock_irqsave(&host->lock);
> > > > 					  					write_lock(&trig->leddev_list_lock);
> > > > 					  ata_port_freeze():
> > > > 					    ata_do_link_abort():
> > > > 					      ata_qc_complete():
> > > > 					        ledtrig_disk_activity():
> > > > 						  led_trigger_blink_oneshot():
> > > > 						    read_lock(&trig->leddev_list_lock);
> > > > 						    // ^ not in in_interrupt() context, so could get blocked by CPU 2
> > > > 	<interrupt>
> > > > 	  ata_bmdma_interrupt():
> > > > 	    spin_lock_irqsave(&host->lock);
> > > > 	  
> > > > , where CPU 0 is blocked by CPU 1 because of the spin_lock_irqsave() in
> > > > ata_bmdma_interrupt() and CPU 1 is blocked by CPU 2 because of the
> > > > read_lock() in led_trigger_blink_oneshot() and CPU 2 is blocked by CPU 0
> > > > because of an arbitrary writer on &trig->leddev_list_lock.
> > > > 
> > > > So I don't think it's false positive, but I might miss something
> > > > obvious, because I don't know what the code here actually does ;-)
> > > 
> > > With the CPU2 part it all makes sense now and lockdep was right. :)
> > > 
> > > At this point I think we could just schedule a separate work to do the
> > > led trigger and avoid calling it with host->lock held and that should
> > > prevent the deadlock. I'll send a patch to do that.
> > 
> > Let's... not do that, unless we have no choice.
> > 
> > Would it help if leddev_list_lock used _irqsave() locking?
> 
> Using read_lock_irqsave/irqrestore() in led_trigger_event() would be
> enough to prevent the deadlock. If it's an acceptable solution I can
> send a patch (already tested it and lockdep doesn't complain :)).

Any comment on
https://lore.kernel.org/lkml/20201102104152.GG9930@xps-13-7390/?

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-06  7:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-01  9:26 lockdep: possible irq lock inversion dependency detected (trig->leddev_list_lock) Andrea Righi
2020-10-31 10:17 ` Boqun Feng
2020-11-02  7:33   ` Andrea Righi
2020-11-02  8:56     ` Pavel Machek
2020-11-02  9:09       ` Andrea Righi
2020-11-06  7:40         ` Andrea Righi
2020-11-01 16:28 ` Pavel Machek
2020-11-02  7:39   ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).