linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
@ 2010-08-16 18:42 Stefan Richter
  2010-08-16 21:42 ` Peter Zijlstra
  2010-08-17 14:35 ` Yong Zhang
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Richter @ 2010-08-16 18:42 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel

Hi, I just saw the following (twice, with a reboot in between) with
kernel 2.6.35 plus firewire driver patched to their state in 2.6.36-rc1,
i.e. basically with a kernel according to commit e78483c5.

I suspect it might also happen without firewire patches, as 2.6.35
already contains the change where firewire-core's per-transaction timer
was introduced which shows up in the following log.

It happened while I was testing hotplugging of FireWire disks with
several ones on a bus together with several FireWire 800 repeaters,
which sometimes act up so that reconnect does not succeed immediately.
Perhaps such reconnection troubles are necessary to provoke this lockdep
report.

[I only accidentally discovered this on a test PC which do not use very
frequently.  Unfortunately, I had lockdep disabled on my main PC on
which I test most of my FireWire hardware for quite a while now.  I
guess I should enable lockdep there again, but I have a suspicion that I
won't leave it on for too long.]

Note how the IN-HARDIRQ-W talks about a SCSI host lock of a libata host,
whereas the locking in the firewire stack surely does only interact
(hopefully correctly) with SCSI host locks of firewire-sbp2 host
instances.

What does this all mean?  How to act on it?

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.35 #1
---------------------------------------------------------
swapper/0 just changed the state of lock:
 (&t->split_timeout_timer){+.-...}, at: [<c1032228>] run_timer_softirq+0x112/0x21c
but this lock was taken by another, HARDIRQ-safe lock in the past:
 (&(shost->host_lock)->rlock){-.-...}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
no locks held by swapper/0.

the shortest dependencies between 2nd lock and 1st lock:
 -> (&(shost->host_lock)->rlock){-.-...} ops: 123308 {
    IN-HARDIRQ-W at:
                          [<c104a080>] __lock_acquire+0x612/0x153e
                          [<c104b008>] lock_acquire+0x5c/0x73
                          [<c1268b57>] _raw_spin_lock_irqsave+0x2e/0x3e
                          [<c11ad508>] scsi_eh_scmd_add+0x25/0x8a
                          [<c11ad5b7>] scsi_times_out+0x4a/0x64
                          [<c10e9fca>] blk_rq_timed_out+0xf/0x4b
                          [<c10ea044>] blk_abort_request+0x3e/0x41
                          [<c11c43c0>] ata_qc_schedule_eh+0x55/0x67
                          [<c11b9c4e>] ata_qc_complete+0x59/0x1b6
                          [<c11c6751>] ata_hsm_qc_complete+0xb5/0xcf
                          [<c11c6e18>] ata_sff_hsm_move+0x6ad/0x6fa
                          [<c11c7002>] __ata_sff_port_intr+0x9f/0xad
                          [<c11c70c9>] ata_bmdma_port_intr+0xb9/0x127
                          [<c11c71c3>] ata_bmdma_interrupt+0x8c/0x170
                          [<c1059b58>] handle_IRQ_event+0x1d/0xa2
                          [<c105b45f>] handle_fasteoi_irq+0x79/0xb2
                          [<c10045bb>] handle_irq+0x3b/0x48
                          [<c1003df9>] do_IRQ+0x45/0x9f
                          [<c1002d2e>] common_interrupt+0x2e/0x34
                          [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
                          [<c1001a72>] cpu_idle+0x49/0x77
                          [<c12594f7>] rest_init+0xbf/0xc4
                          [<c13c98c9>] start_kernel+0x306/0x30b
                          [<c13c90b7>] i386_start_kernel+0xb7/0xbf
    IN-SOFTIRQ-W at:
                          [<c104a0a1>] __lock_acquire+0x633/0x153e
                          [<c104b008>] lock_acquire+0x5c/0x73
                          [<c1268b57>] _raw_spin_lock_irqsave+0x2e/0x3e
                          [<c11aec72>] scsi_device_unbusy+0x1e/0x8d
                          [<c11a92fa>] scsi_finish_command+0x22/0xdf
                          [<c11af291>] scsi_softirq_done+0xf4/0xfc
                          [<c10e9d9c>] blk_done_softirq+0x5c/0x69
                          [<c102defb>] __do_softirq+0x8b/0x10a
                          [<c102dfa5>] do_softirq+0x2b/0x43
                          [<c102e031>] run_ksoftirqd+0x74/0x15a
                          [<c103ae19>] kthread+0x61/0x66
                          [<c1002d3a>] kernel_thread_helper+0x6/0x1a
    INITIAL USE at:
                         [<c104a17b>] __lock_acquire+0x70d/0x153e
                         [<c104b008>] lock_acquire+0x5c/0x73
                         [<c1268b57>] _raw_spin_lock_irqsave+0x2e/0x3e
                         [<c11ad5e0>] scsi_schedule_eh+0xf/0x4f
                         [<c11c4125>] ata_port_schedule_eh+0x37/0x3a
                         [<c11bd629>] async_port_probe+0x65/0x9c
                         [<c1040051>] async_thread+0xe7/0x1c8
                         [<c103ae19>] kthread+0x61/0x66
                         [<c1002d3a>] kernel_thread_helper+0x6/0x1a
  }
  ... key      at: [<c19b0ab4>] __key.30680+0x0/0x8
  ... acquired at:
   [<c104b008>] lock_acquire+0x5c/0x73
   [<c1032a3f>] del_timer_sync+0x29/0x6c
   [<f84a6ea0>] close_transaction+0x93/0xb4 [firewire_core]
   [<f84a6f00>] transmit_complete_callback+0x3f/0x43 [firewire_core]
   [<f8da60ff>] at_context_transmit+0x59b/0x5a3 [firewire_ohci]
   [<f8da6123>] ohci_send_request+0xd/0xf [firewire_ohci]
   [<f84a7ce4>] fw_send_request+0x21c/0x224 [firewire_core]
   [<f8dab307>] 0xf8dab307
   [<f8dac075>] 0xf8dac075
   [<c11a962a>] scsi_dispatch_cmd+0x182/0x1f2
   [<c11ae4ed>] scsi_request_fn+0x359/0x492
   [<c10e644f>] __generic_unplug_device+0x26/0x29
   [<c10e9306>] blk_execute_rq_nowait+0x56/0x75
   [<c10e93cd>] blk_execute_rq+0xa8/0xc7
   [<c11af359>] scsi_execute+0xc0/0x105
   [<c11af3f3>] scsi_execute_req+0x55/0x7d
   [<c11b3d9a>] sd_start_stop_device+0x7b/0x117
   [<c11b41a7>] sd_shutdown+0x108/0x117
   [<c11b42d0>] sd_remove+0x42/0x6f
   [<c11975af>] __device_release_driver+0x5e/0x96
   [<c119768d>] device_release_driver+0x1a/0x25
   [<c1196dcc>] bus_remove_device+0x7b/0x92
   [<c119592a>] device_del+0xfa/0x154
   [<c11b21ae>] __scsi_remove_device+0x3d/0x7b
   [<c11b2208>] scsi_remove_device+0x1c/0x27
   [<f8dab83e>] 0xf8dab83e
   [<c10f2b89>] kref_put+0x39/0x44
   [<f8dab0ff>] 0xf8dab0ff
   [<c11975af>] __device_release_driver+0x5e/0x96
   [<c119768d>] device_release_driver+0x1a/0x25
   [<c1196dcc>] bus_remove_device+0x7b/0x92
   [<c119592a>] device_del+0xfa/0x154
   [<c119598f>] device_unregister+0xb/0x15
   [<f84a4eb8>] shutdown_unit+0x8/0xc [firewire_core]
   [<c1195270>] device_for_each_child+0x2a/0x54
   [<f84a4f2e>] fw_device_shutdown+0x72/0xa4 [firewire_core]
   [<c1037f34>] worker_thread+0x16a/0x242
   [<c103ae19>] kthread+0x61/0x66
   [<c1002d3a>] kernel_thread_helper+0x6/0x1a

-> (&t->split_timeout_timer){+.-...} ops: 14281 {
   HARDIRQ-ON-W at:
                        [<c104a100>] __lock_acquire+0x692/0x153e
                        [<c104b008>] lock_acquire+0x5c/0x73
                        [<c1032291>] run_timer_softirq+0x17b/0x21c
                        [<c102defb>] __do_softirq+0x8b/0x10a
                        [<c102dfa5>] do_softirq+0x2b/0x43
                        [<c102e14f>] irq_exit+0x38/0x74
                        [<c1003e3d>] do_IRQ+0x89/0x9f
                        [<c1002d2e>] common_interrupt+0x2e/0x34
                        [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
                        [<c1001a72>] cpu_idle+0x49/0x77
                        [<c12594f7>] rest_init+0xbf/0xc4
                        [<c13c98c9>] start_kernel+0x306/0x30b
                        [<c13c90b7>] i386_start_kernel+0xb7/0xbf
   IN-SOFTIRQ-W at:
                        [<c104a0a1>] __lock_acquire+0x633/0x153e
                        [<c104b008>] lock_acquire+0x5c/0x73
                        [<c1032a3f>] del_timer_sync+0x29/0x6c
                        [<f84a6ddc>] fw_core_handle_response+0x108/0x139 [firewire_core]
                        [<f8da5b58>] handle_ar_packet+0x108/0x114 [firewire_ohci]
                        [<f8da6236>] ar_context_tasklet+0x111/0x11f [firewire_ohci]
                        [<c102da6d>] tasklet_action+0x8d/0xe0
                        [<c102defb>] __do_softirq+0x8b/0x10a
                        [<c102dfa5>] do_softirq+0x2b/0x43
                        [<c102e14f>] irq_exit+0x38/0x74
                        [<c1003e3d>] do_IRQ+0x89/0x9f
                        [<c1002d2e>] common_interrupt+0x2e/0x34
                        [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
                        [<c1001a72>] cpu_idle+0x49/0x77
                        [<c12594f7>] rest_init+0xbf/0xc4
                        [<c13c98c9>] start_kernel+0x306/0x30b
                        [<c13c90b7>] i386_start_kernel+0xb7/0xbf
   INITIAL USE at:
                       [<c104a17b>] __lock_acquire+0x70d/0x153e
                       [<c104b008>] lock_acquire+0x5c/0x73
                       [<c1032a3f>] del_timer_sync+0x29/0x6c
                       [<f84a6ddc>] fw_core_handle_response+0x108/0x139 [firewire_core]
                       [<f8da5de2>] at_context_transmit+0x27e/0x5a3 [firewire_ohci]
                       [<f8da6123>] ohci_send_request+0xd/0xf [firewire_ohci]
                       [<f84a7ce4>] fw_send_request+0x21c/0x224 [firewire_core]
                       [<f84a7d71>] fw_run_transaction+0x85/0xc5 [firewire_core]
                       [<f84a2485>] bm_work+0x16c/0x34b [firewire_core]
                       [<c1037f34>] worker_thread+0x16a/0x242
                       [<c103ae19>] kthread+0x61/0x66
                       [<c1002d3a>] kernel_thread_helper+0x6/0x1a
 }
 ... key      at: [<f84a97d0>] __key.22471+0x0/0xffffe611 [firewire_core]
 ... acquired at:
   [<c10494ba>] check_usage_backwards+0x94/0x9f
   [<c1048b60>] mark_lock+0x2c0/0x4cd
   [<c104a100>] __lock_acquire+0x692/0x153e
   [<c104b008>] lock_acquire+0x5c/0x73
   [<c1032291>] run_timer_softirq+0x17b/0x21c
   [<c102defb>] __do_softirq+0x8b/0x10a
   [<c102dfa5>] do_softirq+0x2b/0x43
   [<c102e14f>] irq_exit+0x38/0x74
   [<c1003e3d>] do_IRQ+0x89/0x9f
   [<c1002d2e>] common_interrupt+0x2e/0x34
   [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
   [<c1001a72>] cpu_idle+0x49/0x77
   [<c12594f7>] rest_init+0xbf/0xc4
   [<c13c98c9>] start_kernel+0x306/0x30b
   [<c13c90b7>] i386_start_kernel+0xb7/0xbf


stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.35 #1
Call Trace:
 [<c1266701>] ? printk+0xf/0x16
 [<c104937c>] print_irq_inversion_bug+0xeb/0xf6
 [<c10494ba>] check_usage_backwards+0x94/0x9f
 [<c1048b60>] mark_lock+0x2c0/0x4cd
 [<c1049426>] ? check_usage_backwards+0x0/0x9f
 [<c104a100>] __lock_acquire+0x692/0x153e
 [<c126910a>] ? _raw_spin_unlock_irqrestore+0x54/0x58
 [<c10fbc1f>] ? debug_object_deactivate+0xa1/0xcc
 [<c104b008>] lock_acquire+0x5c/0x73
 [<c1032228>] ? run_timer_softirq+0x112/0x21c
 [<c1032291>] run_timer_softirq+0x17b/0x21c
 [<c1032228>] ? run_timer_softirq+0x112/0x21c
 [<f84a77ce>] ? split_transaction_timeout_callback+0x0/0x8b [firewire_core]
 [<c102defb>] __do_softirq+0x8b/0x10a
 [<c102dfa5>] do_softirq+0x2b/0x43
 [<c102e14f>] irq_exit+0x38/0x74
 [<c1003e3d>] do_IRQ+0x89/0x9f
 [<c1002d2e>] common_interrupt+0x2e/0x34
 [<c1140f6d>] ? acpi_idle_enter_simple+0x122/0x14e
 [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
 [<c1001a72>] cpu_idle+0x49/0x77
 [<c12594f7>] rest_init+0xbf/0xc4
 [<c13c98c9>] start_kernel+0x306/0x30b
 [<c13c90b7>] i386_start_kernel+0xb7/0xbf


-- 
Stefan Richter
-=====-==-=- =--- =----
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-16 18:42 lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock Stefan Richter
@ 2010-08-16 21:42 ` Peter Zijlstra
  2010-08-16 22:09   ` Johannes Berg
  2010-08-16 22:27   ` Johannes Berg
  2010-08-17 14:35 ` Yong Zhang
  1 sibling, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2010-08-16 21:42 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel, Johannes Berg

On Mon, 2010-08-16 at 20:42 +0200, Stefan Richter wrote:
> What does this all mean?  How to act on it?
> 
> =========================================================
> [ INFO: possible irq lock inversion dependency detected ]
> 2.6.35 #1
> ---------------------------------------------------------
> swapper/0 just changed the state of lock:
>  (&t->split_timeout_timer){+.-...}, at: [<c1032228>] run_timer_softirq+0x112/0x21c
> but this lock was taken by another, HARDIRQ-safe lock in the past:
>  (&(shost->host_lock)->rlock){-.-...}
> 
> and interrupts could create inverse lock ordering between them.


It says the locks nest like:

 (&(shost->host_lock)->rlock){-.-...}
   (&t->split_timeout_timer){+.-...}

But:


>  -> (&(shost->host_lock)->rlock){-.-...} ops: 123308 {
>     IN-HARDIRQ-W at:
>                           [<c104a080>] __lock_acquire+0x612/0x153e
>                           [<c104b008>] lock_acquire+0x5c/0x73
>                           [<c1268b57>] _raw_spin_lock_irqsave+0x2e/0x3e
>                           [<c11ad508>] scsi_eh_scmd_add+0x25/0x8a

...

vs

> -> (&t->split_timeout_timer){+.-...} ops: 14281 {
>    HARDIRQ-ON-W at:
>                         [<c104a100>] __lock_acquire+0x692/0x153e
>                         [<c104b008>] lock_acquire+0x5c/0x73
>                         [<c1032291>] run_timer_softirq+0x17b/0x21c
>                         [<c102defb>] __do_softirq+0x8b/0x10a
>                         [<c102dfa5>] do_softirq+0x2b/0x43

...


So shost->host_lock->rlock is a lock used in hardirq context but
split_timeout_timer is a !irq-safe lock.

Which means that it now worries the following can happen:

softirq:
  spin_lock(&t->split_timeout_timer);

IRQ:
  spin_lock(&(shost->host_lock)->rlock);
   spin_lock(&t->split_timeout_timer);

Now, the thing is that split_timeout_timer is a fake lock used to
annotate timers, its use is to connect lock chains from within the timer
callback to del_timer_sync() callers, to detect deadlocks.

Now, I can't seem to remember why del_timer_sync() explicitly disables
IRQs but call_timer_fn() does not, Johill, happen to remember?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-16 21:42 ` Peter Zijlstra
@ 2010-08-16 22:09   ` Johannes Berg
  2010-08-16 22:27   ` Johannes Berg
  1 sibling, 0 replies; 17+ messages in thread
From: Johannes Berg @ 2010-08-16 22:09 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Stefan Richter, linux1394-devel, linux-kernel

On Mon, 2010-08-16 at 23:42 +0200, Peter Zijlstra wrote:

> Which means that it now worries the following can happen:
> 
> softirq:
>   spin_lock(&t->split_timeout_timer);
> 
> IRQ:
>   spin_lock(&(shost->host_lock)->rlock);
>    spin_lock(&t->split_timeout_timer);
> 
> Now, the thing is that split_timeout_timer is a fake lock used to
> annotate timers, its use is to connect lock chains from within the timer
> callback to del_timer_sync() callers, to detect deadlocks.
> 
> Now, I can't seem to remember why del_timer_sync() explicitly disables
> IRQs but call_timer_fn() does not, Johill, happen to remember?

Err, sorry, no. I do remember that we had long discussions and initially
I had wanted to disable the context checking for these fake locks
completely. I think it was just the minimal thing needed not to make it
warn always. Also, we can't do the same thing in call_timer_fn() since
we need to hold it across fn()? Wouldn't it complain if we held that
lock then while enabling IRQs again? Not sure.

I don't fully understand the scenario above yet though. Why does it
think we could ever take the fake lock in an IRQ to start with? That
must not happen. Or is that just preemptive worrying?

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-16 21:42 ` Peter Zijlstra
  2010-08-16 22:09   ` Johannes Berg
@ 2010-08-16 22:27   ` Johannes Berg
  2010-08-17 16:22     ` Stefan Richter
  1 sibling, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2010-08-16 22:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Stefan Richter, linux1394-devel, linux-kernel

On Mon, 2010-08-16 at 23:42 +0200, Peter Zijlstra wrote:

> softirq:
>   spin_lock(&t->split_timeout_timer);
> 
> IRQ:
>   spin_lock(&(shost->host_lock)->rlock);
>    spin_lock(&t->split_timeout_timer);

Actually, I think it's worried that you could have


CPU 0                         CPU 1
 softirq:                      softirq:
   spin_lock(timer)              spin_lock(rlock)
 irq:
   spin_lock(rlock)
                                 spin_lock(timer) [still in softirq]

because it has previously seen the nesting that I put on CPU 1, and now
found that the timer "lock" is used with interrupts enabled.

At least that's a scenario I could understand?

I'm convinced it's a false positive though, question is how to shut it
up :-)

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-16 18:42 lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock Stefan Richter
  2010-08-16 21:42 ` Peter Zijlstra
@ 2010-08-17 14:35 ` Yong Zhang
  2010-08-17 16:23   ` Stefan Richter
  1 sibling, 1 reply; 17+ messages in thread
From: Yong Zhang @ 2010-08-17 14:35 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel, Thomas Gleixner

On Mon, Aug 16, 2010 at 08:42:34PM +0200, Stefan Richter wrote:
> Hi, I just saw the following (twice, with a reboot in between) with
> kernel 2.6.35 plus firewire driver patched to their state in 2.6.36-rc1,
> i.e. basically with a kernel according to commit e78483c5.
> 
> I suspect it might also happen without firewire patches, as 2.6.35
> already contains the change where firewire-core's per-transaction timer
> was introduced which shows up in the following log.

Cc'ing Thomas.

I suspect it's introduced by commit 5c40cbfefa828208c671e2f58789e4dd04f79563
which call del_timer_sync() in softirq.

comments on del_timer_sync() say "It must not be called from interrupt contexts."

Thanks,
Yong
> 
> It happened while I was testing hotplugging of FireWire disks with
> several ones on a bus together with several FireWire 800 repeaters,
> which sometimes act up so that reconnect does not succeed immediately.
> Perhaps such reconnection troubles are necessary to provoke this lockdep
> report.
> 
> [I only accidentally discovered this on a test PC which do not use very
> frequently.  Unfortunately, I had lockdep disabled on my main PC on
> which I test most of my FireWire hardware for quite a while now.  I
> guess I should enable lockdep there again, but I have a suspicion that I
> won't leave it on for too long.]
> 
> Note how the IN-HARDIRQ-W talks about a SCSI host lock of a libata host,
> whereas the locking in the firewire stack surely does only interact
> (hopefully correctly) with SCSI host locks of firewire-sbp2 host
> instances.
> 
> What does this all mean?  How to act on it?
> 
> =========================================================
> [ INFO: possible irq lock inversion dependency detected ]
> 2.6.35 #1
> ---------------------------------------------------------
> swapper/0 just changed the state of lock:
>  (&t->split_timeout_timer){+.-...}, at: [<c1032228>] run_timer_softirq+0x112/0x21c
> but this lock was taken by another, HARDIRQ-safe lock in the past:
>  (&(shost->host_lock)->rlock){-.-...}
> 
> and interrupts could create inverse lock ordering between them.
> 
> 
> other info that might help us debug this:
> no locks held by swapper/0.
> 
> the shortest dependencies between 2nd lock and 1st lock:
>  -> (&(shost->host_lock)->rlock){-.-...} ops: 123308 {
>     IN-HARDIRQ-W at:
>                           [<c104a080>] __lock_acquire+0x612/0x153e
>                           [<c104b008>] lock_acquire+0x5c/0x73
>                           [<c1268b57>] _raw_spin_lock_irqsave+0x2e/0x3e
>                           [<c11ad508>] scsi_eh_scmd_add+0x25/0x8a
>                           [<c11ad5b7>] scsi_times_out+0x4a/0x64
>                           [<c10e9fca>] blk_rq_timed_out+0xf/0x4b
>                           [<c10ea044>] blk_abort_request+0x3e/0x41
>                           [<c11c43c0>] ata_qc_schedule_eh+0x55/0x67
>                           [<c11b9c4e>] ata_qc_complete+0x59/0x1b6
>                           [<c11c6751>] ata_hsm_qc_complete+0xb5/0xcf
>                           [<c11c6e18>] ata_sff_hsm_move+0x6ad/0x6fa
>                           [<c11c7002>] __ata_sff_port_intr+0x9f/0xad
>                           [<c11c70c9>] ata_bmdma_port_intr+0xb9/0x127
>                           [<c11c71c3>] ata_bmdma_interrupt+0x8c/0x170
>                           [<c1059b58>] handle_IRQ_event+0x1d/0xa2
>                           [<c105b45f>] handle_fasteoi_irq+0x79/0xb2
>                           [<c10045bb>] handle_irq+0x3b/0x48
>                           [<c1003df9>] do_IRQ+0x45/0x9f
>                           [<c1002d2e>] common_interrupt+0x2e/0x34
>                           [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
>                           [<c1001a72>] cpu_idle+0x49/0x77
>                           [<c12594f7>] rest_init+0xbf/0xc4
>                           [<c13c98c9>] start_kernel+0x306/0x30b
>                           [<c13c90b7>] i386_start_kernel+0xb7/0xbf
>     IN-SOFTIRQ-W at:
>                           [<c104a0a1>] __lock_acquire+0x633/0x153e
>                           [<c104b008>] lock_acquire+0x5c/0x73
>                           [<c1268b57>] _raw_spin_lock_irqsave+0x2e/0x3e
>                           [<c11aec72>] scsi_device_unbusy+0x1e/0x8d
>                           [<c11a92fa>] scsi_finish_command+0x22/0xdf
>                           [<c11af291>] scsi_softirq_done+0xf4/0xfc
>                           [<c10e9d9c>] blk_done_softirq+0x5c/0x69
>                           [<c102defb>] __do_softirq+0x8b/0x10a
>                           [<c102dfa5>] do_softirq+0x2b/0x43
>                           [<c102e031>] run_ksoftirqd+0x74/0x15a
>                           [<c103ae19>] kthread+0x61/0x66
>                           [<c1002d3a>] kernel_thread_helper+0x6/0x1a
>     INITIAL USE at:
>                          [<c104a17b>] __lock_acquire+0x70d/0x153e
>                          [<c104b008>] lock_acquire+0x5c/0x73
>                          [<c1268b57>] _raw_spin_lock_irqsave+0x2e/0x3e
>                          [<c11ad5e0>] scsi_schedule_eh+0xf/0x4f
>                          [<c11c4125>] ata_port_schedule_eh+0x37/0x3a
>                          [<c11bd629>] async_port_probe+0x65/0x9c
>                          [<c1040051>] async_thread+0xe7/0x1c8
>                          [<c103ae19>] kthread+0x61/0x66
>                          [<c1002d3a>] kernel_thread_helper+0x6/0x1a
>   }
>   ... key      at: [<c19b0ab4>] __key.30680+0x0/0x8
>   ... acquired at:
>    [<c104b008>] lock_acquire+0x5c/0x73
>    [<c1032a3f>] del_timer_sync+0x29/0x6c
>    [<f84a6ea0>] close_transaction+0x93/0xb4 [firewire_core]
>    [<f84a6f00>] transmit_complete_callback+0x3f/0x43 [firewire_core]
>    [<f8da60ff>] at_context_transmit+0x59b/0x5a3 [firewire_ohci]
>    [<f8da6123>] ohci_send_request+0xd/0xf [firewire_ohci]
>    [<f84a7ce4>] fw_send_request+0x21c/0x224 [firewire_core]
>    [<f8dab307>] 0xf8dab307
>    [<f8dac075>] 0xf8dac075
>    [<c11a962a>] scsi_dispatch_cmd+0x182/0x1f2
>    [<c11ae4ed>] scsi_request_fn+0x359/0x492
>    [<c10e644f>] __generic_unplug_device+0x26/0x29
>    [<c10e9306>] blk_execute_rq_nowait+0x56/0x75
>    [<c10e93cd>] blk_execute_rq+0xa8/0xc7
>    [<c11af359>] scsi_execute+0xc0/0x105
>    [<c11af3f3>] scsi_execute_req+0x55/0x7d
>    [<c11b3d9a>] sd_start_stop_device+0x7b/0x117
>    [<c11b41a7>] sd_shutdown+0x108/0x117
>    [<c11b42d0>] sd_remove+0x42/0x6f
>    [<c11975af>] __device_release_driver+0x5e/0x96
>    [<c119768d>] device_release_driver+0x1a/0x25
>    [<c1196dcc>] bus_remove_device+0x7b/0x92
>    [<c119592a>] device_del+0xfa/0x154
>    [<c11b21ae>] __scsi_remove_device+0x3d/0x7b
>    [<c11b2208>] scsi_remove_device+0x1c/0x27
>    [<f8dab83e>] 0xf8dab83e
>    [<c10f2b89>] kref_put+0x39/0x44
>    [<f8dab0ff>] 0xf8dab0ff
>    [<c11975af>] __device_release_driver+0x5e/0x96
>    [<c119768d>] device_release_driver+0x1a/0x25
>    [<c1196dcc>] bus_remove_device+0x7b/0x92
>    [<c119592a>] device_del+0xfa/0x154
>    [<c119598f>] device_unregister+0xb/0x15
>    [<f84a4eb8>] shutdown_unit+0x8/0xc [firewire_core]
>    [<c1195270>] device_for_each_child+0x2a/0x54
>    [<f84a4f2e>] fw_device_shutdown+0x72/0xa4 [firewire_core]
>    [<c1037f34>] worker_thread+0x16a/0x242
>    [<c103ae19>] kthread+0x61/0x66
>    [<c1002d3a>] kernel_thread_helper+0x6/0x1a
> 
> -> (&t->split_timeout_timer){+.-...} ops: 14281 {
>    HARDIRQ-ON-W at:
>                         [<c104a100>] __lock_acquire+0x692/0x153e
>                         [<c104b008>] lock_acquire+0x5c/0x73
>                         [<c1032291>] run_timer_softirq+0x17b/0x21c
>                         [<c102defb>] __do_softirq+0x8b/0x10a
>                         [<c102dfa5>] do_softirq+0x2b/0x43
>                         [<c102e14f>] irq_exit+0x38/0x74
>                         [<c1003e3d>] do_IRQ+0x89/0x9f
>                         [<c1002d2e>] common_interrupt+0x2e/0x34
>                         [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
>                         [<c1001a72>] cpu_idle+0x49/0x77
>                         [<c12594f7>] rest_init+0xbf/0xc4
>                         [<c13c98c9>] start_kernel+0x306/0x30b
>                         [<c13c90b7>] i386_start_kernel+0xb7/0xbf
>    IN-SOFTIRQ-W at:
>                         [<c104a0a1>] __lock_acquire+0x633/0x153e
>                         [<c104b008>] lock_acquire+0x5c/0x73
>                         [<c1032a3f>] del_timer_sync+0x29/0x6c
>                         [<f84a6ddc>] fw_core_handle_response+0x108/0x139 [firewire_core]
>                         [<f8da5b58>] handle_ar_packet+0x108/0x114 [firewire_ohci]
>                         [<f8da6236>] ar_context_tasklet+0x111/0x11f [firewire_ohci]
>                         [<c102da6d>] tasklet_action+0x8d/0xe0
>                         [<c102defb>] __do_softirq+0x8b/0x10a
>                         [<c102dfa5>] do_softirq+0x2b/0x43
>                         [<c102e14f>] irq_exit+0x38/0x74
>                         [<c1003e3d>] do_IRQ+0x89/0x9f
>                         [<c1002d2e>] common_interrupt+0x2e/0x34
>                         [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
>                         [<c1001a72>] cpu_idle+0x49/0x77
>                         [<c12594f7>] rest_init+0xbf/0xc4
>                         [<c13c98c9>] start_kernel+0x306/0x30b
>                         [<c13c90b7>] i386_start_kernel+0xb7/0xbf
>    INITIAL USE at:
>                        [<c104a17b>] __lock_acquire+0x70d/0x153e
>                        [<c104b008>] lock_acquire+0x5c/0x73
>                        [<c1032a3f>] del_timer_sync+0x29/0x6c
>                        [<f84a6ddc>] fw_core_handle_response+0x108/0x139 [firewire_core]
>                        [<f8da5de2>] at_context_transmit+0x27e/0x5a3 [firewire_ohci]
>                        [<f8da6123>] ohci_send_request+0xd/0xf [firewire_ohci]
>                        [<f84a7ce4>] fw_send_request+0x21c/0x224 [firewire_core]
>                        [<f84a7d71>] fw_run_transaction+0x85/0xc5 [firewire_core]
>                        [<f84a2485>] bm_work+0x16c/0x34b [firewire_core]
>                        [<c1037f34>] worker_thread+0x16a/0x242
>                        [<c103ae19>] kthread+0x61/0x66
>                        [<c1002d3a>] kernel_thread_helper+0x6/0x1a
>  }
>  ... key      at: [<f84a97d0>] __key.22471+0x0/0xffffe611 [firewire_core]
>  ... acquired at:
>    [<c10494ba>] check_usage_backwards+0x94/0x9f
>    [<c1048b60>] mark_lock+0x2c0/0x4cd
>    [<c104a100>] __lock_acquire+0x692/0x153e
>    [<c104b008>] lock_acquire+0x5c/0x73
>    [<c1032291>] run_timer_softirq+0x17b/0x21c
>    [<c102defb>] __do_softirq+0x8b/0x10a
>    [<c102dfa5>] do_softirq+0x2b/0x43
>    [<c102e14f>] irq_exit+0x38/0x74
>    [<c1003e3d>] do_IRQ+0x89/0x9f
>    [<c1002d2e>] common_interrupt+0x2e/0x34
>    [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
>    [<c1001a72>] cpu_idle+0x49/0x77
>    [<c12594f7>] rest_init+0xbf/0xc4
>    [<c13c98c9>] start_kernel+0x306/0x30b
>    [<c13c90b7>] i386_start_kernel+0xb7/0xbf
> 
> 
> stack backtrace:
> Pid: 0, comm: swapper Not tainted 2.6.35 #1
> Call Trace:
>  [<c1266701>] ? printk+0xf/0x16
>  [<c104937c>] print_irq_inversion_bug+0xeb/0xf6
>  [<c10494ba>] check_usage_backwards+0x94/0x9f
>  [<c1048b60>] mark_lock+0x2c0/0x4cd
>  [<c1049426>] ? check_usage_backwards+0x0/0x9f
>  [<c104a100>] __lock_acquire+0x692/0x153e
>  [<c126910a>] ? _raw_spin_unlock_irqrestore+0x54/0x58
>  [<c10fbc1f>] ? debug_object_deactivate+0xa1/0xcc
>  [<c104b008>] lock_acquire+0x5c/0x73
>  [<c1032228>] ? run_timer_softirq+0x112/0x21c
>  [<c1032291>] run_timer_softirq+0x17b/0x21c
>  [<c1032228>] ? run_timer_softirq+0x112/0x21c
>  [<f84a77ce>] ? split_transaction_timeout_callback+0x0/0x8b [firewire_core]
>  [<c102defb>] __do_softirq+0x8b/0x10a
>  [<c102dfa5>] do_softirq+0x2b/0x43
>  [<c102e14f>] irq_exit+0x38/0x74
>  [<c1003e3d>] do_IRQ+0x89/0x9f
>  [<c1002d2e>] common_interrupt+0x2e/0x34
>  [<c1140f6d>] ? acpi_idle_enter_simple+0x122/0x14e
>  [<c11f4500>] cpuidle_idle_call+0x6d/0xa1
>  [<c1001a72>] cpu_idle+0x49/0x77
>  [<c12594f7>] rest_init+0xbf/0xc4
>  [<c13c98c9>] start_kernel+0x306/0x30b
>  [<c13c90b7>] i386_start_kernel+0xb7/0xbf
> 
> 
> -- 
> Stefan Richter
> -=====-==-=- =--- =----
> http://arcgraph.de/sr/
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-16 22:27   ` Johannes Berg
@ 2010-08-17 16:22     ` Stefan Richter
  2010-08-18 13:17       ` Stefan Richter
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2010-08-17 16:22 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Peter Zijlstra, linux1394-devel, linux-kernel

Johannes Berg wrote:
> I'm convinced it's a false positive though, question is how to shut it
> up :-)

Is there anyway to differentiate the lockdep naming of shost->host_lock
(initialized in scsi_host_alloc) at runtime by means of the
scsi_host_template instance given to scsi_host_alloc?
-- 
Stefan Richter
-=====-==-=- =--- =---=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-17 14:35 ` Yong Zhang
@ 2010-08-17 16:23   ` Stefan Richter
  2010-08-18  7:01     ` Clemens Ladisch
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2010-08-17 16:23 UTC (permalink / raw)
  To: Yong Zhang
  Cc: linux1394-devel, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Johannes Berg

Yong Zhang wrote:
> I suspect it's introduced by commit 5c40cbfefa828208c671e2f58789e4dd04f79563
> which call del_timer_sync() in softirq.
> 
> comments on del_timer_sync() say "It must not be called from interrupt contexts."

I hope the del_timer_sync kerneldoc comment is about hardIRQ context,
*otherwise* commit 5c40cbfe is defective indeed.

The firewire stack calls del_timer_sync in process or softIRQ context
(firewire-ohci tasklets, block layer soft IRQ, networking soft IRQ...)
but never from hardIRQ.
-- 
Stefan Richter
-=====-==-=- =--- =---=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-17 16:23   ` Stefan Richter
@ 2010-08-18  7:01     ` Clemens Ladisch
  2010-08-18  8:09       ` Stefan Richter
  2010-08-18  8:59       ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Clemens Ladisch @ 2010-08-18  7:01 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Yong Zhang, Peter Zijlstra, Johannes Berg, Thomas Gleixner,
	linux1394-devel, linux-kernel

Stefan Richter wrote:
> Yong Zhang wrote:
> > I suspect it's introduced by commit 5c40cbfefa828208c671e2f58789e4dd04f79563
> > which call del_timer_sync() in softirq.
> > 
> > comments on del_timer_sync() say "It must not be called from interrupt contexts."
> 
> I hope the del_timer_sync kerneldoc comment is about hardIRQ context,

Then both the comment and its lockdep code would be wrong.

> *otherwise* commit 5c40cbfe is defective indeed.

--8<---------------------------------------------------------------->8--
firewire: core: do not use del_timer_sync() in interrupt context

Because we might be in interrupt context, replace del_timer_sync() with
del_timer().  If the timer was already running when we tried to delete
it, explicitly wait for it to finish to ensure that it does not access
the transaction data after the normal completion code might have freed
it.

Many thanks to Yong Zhang for diagnosing this and to Rusty Russell for
his Locking Guide.

Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
---
 drivers/firewire/core-transaction.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

--- a/drivers/firewire/core-transaction.c
+++ b/drivers/firewire/core-transaction.c
@@ -78,9 +78,16 @@ static int close_transaction(struct fw_t
 	struct fw_transaction *t;
 	unsigned long flags;
 
+retry:
 	spin_lock_irqsave(&card->lock, flags);
 	list_for_each_entry(t, &card->transaction_list, link) {
 		if (t == transaction) {
+			if (!del_timer(&t->split_timeout_timer)) {
+				/* wait for the timer to cancel it */
+				spin_unlock_irqrestore(&card->lock, flags);
+				cpu_relax();
+				goto retry;
+			}
 			list_del_init(&t->link);
 			card->tlabel_mask &= ~(1ULL << t->tlabel);
 			break;
@@ -89,7 +96,6 @@ static int close_transaction(struct fw_t
 	spin_unlock_irqrestore(&card->lock, flags);
 
 	if (&t->link != &card->transaction_list) {
-		del_timer_sync(&t->split_timeout_timer);
 		t->callback(card, rcode, NULL, 0, t->callback_data);
 		return 0;
 	}
@@ -918,9 +924,16 @@ void fw_core_handle_response(struct fw_c
 	source	= HEADER_GET_SOURCE(p->header[1]);
 	rcode	= HEADER_GET_RCODE(p->header[1]);
 
+retry:
 	spin_lock_irqsave(&card->lock, flags);
 	list_for_each_entry(t, &card->transaction_list, link) {
 		if (t->node_id == source && t->tlabel == tlabel) {
+			if (!del_timer(&t->split_timeout_timer)) {
+				/* wait for the timer to cancel it */
+				spin_unlock_irqrestore(&card->lock, flags);
+				cpu_relax();
+				goto retry;
+			}
 			list_del_init(&t->link);
 			card->tlabel_mask &= ~(1ULL << t->tlabel);
 			break;
@@ -963,8 +976,6 @@ void fw_core_handle_response(struct fw_c
 		break;
 	}
 
-	del_timer_sync(&t->split_timeout_timer);
-
 	/*
 	 * The response handler may be executed while the request handler
 	 * is still pending.  Cancel the request handler.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-18  7:01     ` Clemens Ladisch
@ 2010-08-18  8:09       ` Stefan Richter
  2010-08-18  8:53         ` Clemens Ladisch
  2010-08-18  9:08         ` Yong Zhang
  2010-08-18  8:59       ` Peter Zijlstra
  1 sibling, 2 replies; 17+ messages in thread
From: Stefan Richter @ 2010-08-18  8:09 UTC (permalink / raw)
  To: Clemens Ladisch
  Cc: Yong Zhang, Peter Zijlstra, Johannes Berg, Thomas Gleixner,
	linux1394-devel, linux-kernel

Clemens Ladisch wrote:
> firewire: core: do not use del_timer_sync() in interrupt context
> 
> Because we might be in interrupt context, replace del_timer_sync() with
> del_timer().

Again, is the term "interrupt contexts" meant to include softIRQ contexts?

(Nothing in firewire-core/-sbp2/-net etc. is called in hardware
interrupts.  BTW, we actually could and maybe should change all the
spinlocks in them from spin_lock_irq to spin_lock_bh.  Only
firewire-ohci's local ohci->lock needs to be IRQ safe.  OTOH this could
change if somebody comes up with a migration of the stack to threaded
IRQ handling.)
-- 
Stefan Richter
-=====-==-=- =--- =--=-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-18  8:09       ` Stefan Richter
@ 2010-08-18  8:53         ` Clemens Ladisch
  2010-08-18  9:08         ` Yong Zhang
  1 sibling, 0 replies; 17+ messages in thread
From: Clemens Ladisch @ 2010-08-18  8:53 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Yong Zhang, Peter Zijlstra, Johannes Berg, Thomas Gleixner,
	linux1394-devel, linux-kernel

Stefan Richter wrote:
> Clemens Ladisch wrote:
> > Because we might be in interrupt context, replace del_timer_sync() with
> > del_timer().
> 
> Again, is the term "interrupt contexts" meant to include softIRQ contexts?

Yes:
http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/g618.html#GLOSS-INTERRUPTCONTEXT

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-18  7:01     ` Clemens Ladisch
  2010-08-18  8:09       ` Stefan Richter
@ 2010-08-18  8:59       ` Peter Zijlstra
  2010-08-18  9:26         ` Clemens Ladisch
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2010-08-18  8:59 UTC (permalink / raw)
  To: Clemens Ladisch
  Cc: Stefan Richter, Yong Zhang, Johannes Berg, Thomas Gleixner,
	linux1394-devel, linux-kernel

On Wed, 2010-08-18 at 09:01 +0200, Clemens Ladisch wrote:
> +retry:
>         spin_lock_irqsave(&card->lock, flags);
>         list_for_each_entry(t, &card->transaction_list, link) {
>                 if (t == transaction) {
> +                       if (!del_timer(&t->split_timeout_timer)) {
> +                               /* wait for the timer to cancel it */
> +                               spin_unlock_irqrestore(&card->lock, flags);
> +                               cpu_relax();
> +                               goto retry;
> +                       } 

Open-coding spin loops like that is really ugly, and could cause trouble
for -rt.

Also, I believe that if you want the very same semantics as before, you
need to use try_to_del_timer_sync(), not del_timer().

Also, if del_timer_sync() is not allowed from any interrupt context
(including softirq) then doing the spin-loop like that doesn't actually
solve anything.

Thomas, any comments?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-18  8:09       ` Stefan Richter
  2010-08-18  8:53         ` Clemens Ladisch
@ 2010-08-18  9:08         ` Yong Zhang
  1 sibling, 0 replies; 17+ messages in thread
From: Yong Zhang @ 2010-08-18  9:08 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Clemens Ladisch, Peter Zijlstra, Johannes Berg, Thomas Gleixner,
	linux1394-devel, linux-kernel

On Wed, Aug 18, 2010 at 4:09 PM, Stefan Richter
<stefanr@s5r6.in-berlin.de> wrote:
> Again, is the term "interrupt contexts" meant to include softIRQ contexts?

I can't think out a scene which prevent del_timer_sync() from using in
softirq. :(

Then I go back and find that comments comes before linux-2.6.12,
then in linux-2.6.12, del_timer_sync() is like this:

int del_timer_sync(struct timer_list *timer)
{
	tvec_base_t *base;
	int i, ret = 0;

	check_timer(timer);

del_again:
	ret += del_timer(timer);

	for_each_online_cpu(i) {
		base = &per_cpu(tvec_bases, i);
		if (base->running_timer == timer) {
			while (base->running_timer == timer) {
				cpu_relax();
				preempt_check_resched();
			}
			break;
		}
	}
	smp_rmb();
	if (timer_pending(timer))
		goto del_again;

	return ret;
}
EXPORT_SYMBOL(del_timer_sync);

Seems preempt_check_resched() is the proof of that comments, but
now del_timer_sync() is changed.

I don't know if there is any other restriction.

Thanks,
Yong

>
> (Nothing in firewire-core/-sbp2/-net etc. is called in hardware
> interrupts.  BTW, we actually could and maybe should change all the
> spinlocks in them from spin_lock_irq to spin_lock_bh.  Only
> firewire-ohci's local ohci->lock needs to be IRQ safe.  OTOH this could
> change if somebody comes up with a migration of the stack to threaded
> IRQ handling.)
> --
> Stefan Richter
> -=====-==-=- =--- =--=-
> http://arcgraph.de/sr/
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-18  8:59       ` Peter Zijlstra
@ 2010-08-18  9:26         ` Clemens Ladisch
  2010-08-18  9:43           ` Peter Zijlstra
  2010-08-18 12:50           ` Stefan Richter
  0 siblings, 2 replies; 17+ messages in thread
From: Clemens Ladisch @ 2010-08-18  9:26 UTC (permalink / raw)
  To: Peter Zijlstra, Stefan Richter
  Cc: Yong Zhang, Johannes Berg, Thomas Gleixner, linux1394-devel,
	linux-kernel

Peter Zijlstra wrote:
> On Wed, 2010-08-18 at 09:01 +0200, Clemens Ladisch wrote:
> > +retry:
> >         spin_lock_irqsave(&card->lock, flags);
> >         list_for_each_entry(t, &card->transaction_list, link) {
> >                 if (t == transaction) {
> > +                       if (!del_timer(&t->split_timeout_timer)) {
> > +                               /* wait for the timer to cancel it */
> > +                               spin_unlock_irqrestore(&card->lock, flags);
> > +                               cpu_relax();
> > +                               goto retry;
> > +                       } 
> 
> Open-coding spin loops like that is really ugly, and could cause trouble
> for -rt.
> 
> Also, I believe that if you want the very same semantics as before, you
> need to use try_to_del_timer_sync(), not del_timer().

Like del_timer_sync(), it "must not be called from interrupt contexts."

> Also, if del_timer_sync() is not allowed from any interrupt context
> (including softirq) then doing the spin-loop like that doesn't actually
> solve anything.

Why?

Anyway, my first patch was crap because the loop isn't actually
necessary:

--8<---------------------------------------------------------------->8--
firewire: core: do not use del_timer_sync() in interrupt context

Because we might be in interrupt context, replace del_timer_sync() with
del_timer().  If the timer is already running, we know that it will
clean up the transaction, so we do not need to do any further processing
in the normal transaction handler.

Many thanks to Yong Zhang for diagnosing this.

Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
---
 drivers/firewire/core-transaction.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/drivers/firewire/core-transaction.c
+++ b/drivers/firewire/core-transaction.c
@@ -81,6 +81,8 @@ static int close_transaction(struct fw_transaction *transaction,
 	spin_lock_irqsave(&card->lock, flags);
 	list_for_each_entry(t, &card->transaction_list, link) {
 		if (t == transaction) {
+			if (!del_timer(&t->split_timeout_timer))
+				goto timed_out;
 			list_del_init(&t->link);
 			card->tlabel_mask &= ~(1ULL << t->tlabel);
 			break;
@@ -89,11 +91,11 @@ static int close_transaction(struct fw_transaction *transaction,
 	spin_unlock_irqrestore(&card->lock, flags);
 
 	if (&t->link != &card->transaction_list) {
-		del_timer_sync(&t->split_timeout_timer);
 		t->callback(card, rcode, NULL, 0, t->callback_data);
 		return 0;
 	}
 
+timed_out:
 	return -ENOENT;
 }
 
@@ -921,6 +923,8 @@ void fw_core_handle_response(struct fw_card *card, struct fw_packet *p)
 	spin_lock_irqsave(&card->lock, flags);
 	list_for_each_entry(t, &card->transaction_list, link) {
 		if (t->node_id == source && t->tlabel == tlabel) {
+			if (!del_timer(&t->split_timeout_timer))
+				goto timed_out;
 			list_del_init(&t->link);
 			card->tlabel_mask &= ~(1ULL << t->tlabel);
 			break;
@@ -929,6 +933,7 @@ void fw_core_handle_response(struct fw_card *card, struct fw_packet *p)
 	spin_unlock_irqrestore(&card->lock, flags);
 
 	if (&t->link == &card->transaction_list) {
+timed_out:
 		fw_notify("Unsolicited response (source %x, tlabel %x)\n",
 			  source, tlabel);
 		return;
@@ -963,8 +968,6 @@ void fw_core_handle_response(struct fw_card *card, struct fw_packet *p)
 		break;
 	}
 
-	del_timer_sync(&t->split_timeout_timer);
-
 	/*
 	 * The response handler may be executed while the request handler
 	 * is still pending.  Cancel the request handler.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-18  9:26         ` Clemens Ladisch
@ 2010-08-18  9:43           ` Peter Zijlstra
  2010-08-18 12:50           ` Stefan Richter
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2010-08-18  9:43 UTC (permalink / raw)
  To: Clemens Ladisch
  Cc: Stefan Richter, Yong Zhang, Johannes Berg, Thomas Gleixner,
	linux1394-devel, linux-kernel

On Wed, 2010-08-18 at 11:26 +0200, Clemens Ladisch wrote:
> Anyway, my first patch was crap because the loop isn't actually
> necessary.

Ok, without loop its all fine with me ;-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-18  9:26         ` Clemens Ladisch
  2010-08-18  9:43           ` Peter Zijlstra
@ 2010-08-18 12:50           ` Stefan Richter
  2010-08-18 13:05             ` Clemens Ladisch
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2010-08-18 12:50 UTC (permalink / raw)
  To: Clemens Ladisch
  Cc: Peter Zijlstra, Yong Zhang, Johannes Berg, Thomas Gleixner,
	linux1394-devel, linux-kernel

Clemens Ladisch wrote:
> firewire: core: do not use del_timer_sync() in interrupt context
> 
> Because we might be in interrupt context, replace del_timer_sync() with
> del_timer().

OK.  And thanks for the pointers into the locking guide.

> If the timer is already running, we know that it will
> clean up the transaction, so we do not need to do any further processing
> in the normal transaction handler.
> 
> Many thanks to Yong Zhang for diagnosing this.
> 
> Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
> Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
> ---
>  drivers/firewire/core-transaction.c |    9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> --- a/drivers/firewire/core-transaction.c
> +++ b/drivers/firewire/core-transaction.c
> @@ -81,6 +81,8 @@ static int close_transaction(struct fw_transaction *transaction,
>  	spin_lock_irqsave(&card->lock, flags);
>  	list_for_each_entry(t, &card->transaction_list, link) {
>  		if (t == transaction) {
> +			if (!del_timer(&t->split_timeout_timer))
> +				goto timed_out;

+			if (!del_timer(&t->split_timeout_timer)) {
+				spin_unlock_irqrestore(&card->lock, flags);
+				goto timed_out;
+			}

>  			list_del_init(&t->link);
>  			card->tlabel_mask &= ~(1ULL << t->tlabel);
>  			break;
> @@ -89,11 +91,11 @@ static int close_transaction(struct fw_transaction *transaction,
>  	spin_unlock_irqrestore(&card->lock, flags);
>  
>  	if (&t->link != &card->transaction_list) {
> -		del_timer_sync(&t->split_timeout_timer);
>  		t->callback(card, rcode, NULL, 0, t->callback_data);
>  		return 0;
>  	}
>  
> +timed_out:
>  	return -ENOENT;
>  }
>  
> @@ -921,6 +923,8 @@ void fw_core_handle_response(struct fw_card *card, struct fw_packet *p)
>  	spin_lock_irqsave(&card->lock, flags);
>  	list_for_each_entry(t, &card->transaction_list, link) {
>  		if (t->node_id == source && t->tlabel == tlabel) {
> +			if (!del_timer(&t->split_timeout_timer))
> +				goto timed_out;

Ditto.

>  			list_del_init(&t->link);
>  			card->tlabel_mask &= ~(1ULL << t->tlabel);
>  			break;
> @@ -929,6 +933,7 @@ void fw_core_handle_response(struct fw_card *card, struct fw_packet *p)
>  	spin_unlock_irqrestore(&card->lock, flags);
>  
>  	if (&t->link == &card->transaction_list) {
> +timed_out:
>  		fw_notify("Unsolicited response (source %x, tlabel %x)\n",
>  			  source, tlabel);
>  		return;
> @@ -963,8 +968,6 @@ void fw_core_handle_response(struct fw_card *card, struct fw_packet *p)
>  		break;
>  	}
>  
> -	del_timer_sync(&t->split_timeout_timer);
> -
>  	/*
>  	 * The response handler may be executed while the request handler
>  	 * is still pending.  Cancel the request handler.

Shall I commit with the added spin_unlocks?  Or do you prefer to send an update?
-- 
Stefan Richter
-=====-==-=- =--- =--=-
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-18 12:50           ` Stefan Richter
@ 2010-08-18 13:05             ` Clemens Ladisch
  0 siblings, 0 replies; 17+ messages in thread
From: Clemens Ladisch @ 2010-08-18 13:05 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Peter Zijlstra, Yong Zhang, Johannes Berg, Thomas Gleixner,
	linux1394-devel, linux-kernel

Stefan Richter wrote:
> Clemens Ladisch wrote:
> >  	spin_lock_irqsave(&card->lock, flags);
> >  	list_for_each_entry(t, &card->transaction_list, link) {
> >  		if (t == transaction) {
> > +			if (!del_timer(&t->split_timeout_timer))
> > +				goto timed_out;
> 
> +			if (!del_timer(&t->split_timeout_timer)) {
> +				spin_unlock_irqrestore(&card->lock, flags);
> +				goto timed_out;
> +			}

> Shall I commit with the added spin_unlocks?

I can obviously not be trusted to do this correctly; please go ahead.


Regards,
Clemens

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock
  2010-08-17 16:22     ` Stefan Richter
@ 2010-08-18 13:17       ` Stefan Richter
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Richter @ 2010-08-18 13:17 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Peter Zijlstra, linux1394-devel, linux-kernel, Clemens Ladisch

Stefan Richter wrote:
> Johannes Berg wrote:
>> I'm convinced it's a false positive though, question is how to shut it
>> up :-)
> 
> Is there anyway to differentiate the lockdep naming of shost->host_lock
> (initialized in scsi_host_alloc) at runtime by means of the
> scsi_host_template instance given to scsi_host_alloc?

So, even though the part of the lockdep report about nesting with a SCSI
host lock was a false positive, it did motivate Clemens' del_timer_sync
-> del_timer replacement, so it was helpful after all.  I guess there is
no immediate need anymore to do something about the conflation of
shost->host_lock instances.
-- 
Stefan Richter
-=====-==-=- =--- =--=-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-08-18 13:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-16 18:42 lockdep false positive? -- firewire-core transaction timer vs. scsi-core host lock Stefan Richter
2010-08-16 21:42 ` Peter Zijlstra
2010-08-16 22:09   ` Johannes Berg
2010-08-16 22:27   ` Johannes Berg
2010-08-17 16:22     ` Stefan Richter
2010-08-18 13:17       ` Stefan Richter
2010-08-17 14:35 ` Yong Zhang
2010-08-17 16:23   ` Stefan Richter
2010-08-18  7:01     ` Clemens Ladisch
2010-08-18  8:09       ` Stefan Richter
2010-08-18  8:53         ` Clemens Ladisch
2010-08-18  9:08         ` Yong Zhang
2010-08-18  8:59       ` Peter Zijlstra
2010-08-18  9:26         ` Clemens Ladisch
2010-08-18  9:43           ` Peter Zijlstra
2010-08-18 12:50           ` Stefan Richter
2010-08-18 13:05             ` Clemens Ladisch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).