netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Netpoll triggers soft lockup
@ 2013-04-11 13:42 Bart Van Assche
  2013-04-11 14:08 ` Neil Horman
  2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman
  0 siblings, 2 replies; 34+ messages in thread
From: Bart Van Assche @ 2013-04-11 13:42 UTC (permalink / raw)
  To: Neil Horman, David Miller, netdev

Hi,

While testing a driver against kernel 3.9-rc6 I ran into a soft lockup 
triggered by sending lots of kernel messages to a remote system via 
netconsole. This behavior was probably introduced by commit ca99ca14c 
("netpoll: protect napi_poll and poll_controller during 
dev_[open|close]"). That commit introduced a mutex in 
netpoll_poll_dev(), which can be called from interrupt context. Is there 
anyone who can tell me whether this is a bug in commit ca99ca14c or in 
netconsole ?

Read(10):------------[ cut here ]------------
WARNING: at kernel/mutex.c:434 mutex_trylock+0x16d/0x180()
Hardware name: P5Q DELUXE
Modules linked in: ib_srp scsi_transport_srp dm_mod qla2x00tgt(O) 
qla2xxx_scst(O) scsi_transport_fc iscsi_scst(O) ib_srpt(O) scst_vdisk(O) 
libcrc32c scst(O) crc32c brd netconsole configfs rdma_ucm rdma_cm iw_cm 
ib_addr scsi_tgt af_packet snd_hda_codec_hdmi snd_hda_codec_analog 
ib_ipoib ib_cm ib_uverbs ib_umad snd_hda_intel snd_hda_codec snd_hwdep 
snd_pcm snd_seq snd_timer mlx4_ib ib_sa ib_mad cpufreq_conservative 
ib_core cpufreq_userspace cpufreq_powersave snd_seq_device snd mlx4_core 
sr_mod skge soundcore pcspkr acpi_cpufreq button ehci_pci sg i2c_i801 
mperf snd_page_alloc cdrom microcode autofs4 ext3 jbd mbcache sd_mod 
crc_t10dif radeon uhci_hcd ttm drm_kms_helper ehci_hcd drm i2c_algo_bit 
i2c_core intel_agp intel_gtt agpgart usbcore usb_common processor 
thermal_sys hwmon scsi_dh_alua scsi_dh ata_generic ata_piix ahci libahci 
pata_marvell libata scsi_mod [last unloaded: scsi_transport_srp]
Pid: 178, comm: kworker/0:1H Tainted: G           O 3.9.0-rc6-debug+ #0
Call Trace:
  <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
  [<ffffffff8146f9f6>] printk+0x4d/0x4f
  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
  [<ffffffffa000b924>] scsi_io_completion+0x294/0x6c0 [scsi_mod]
  [<ffffffffa000113d>] scsi_finish_command+0xbd/0x120 [scsi_mod]
  [<ffffffffa000b58f>] scsi_softirq_done+0x13f/0x160 [scsi_mod]
  [<ffffffff8121dbc0>] blk_done_softirq+0x80/0xa0
  [<ffffffff81046e81>] ? __do_softirq+0xb1/0x3c0
  [<ffffffff81046ed1>] __do_softirq+0x101/0x3c0
  [<ffffffff810cc649>] ? handle_irq_event+0x59/0x80
  [<ffffffff81047355>] irq_exit+0xb5/0xc0
  [<ffffffff81484cd3>] do_IRQ+0x63/0xe0
  [<ffffffff8147a82f>] common_interrupt+0x6f/0x6f
  <EOI>  [<ffffffff810a1ea2>] ? mark_held_locks+0xb2/0x130
  [<ffffffff8147a46a>] ? _raw_spin_unlock_irq+0x3a/0x50
  [<ffffffff8147a460>] ? _raw_spin_unlock_irq+0x30/0x50
  [<ffffffff812148e5>] blk_delay_work+0x35/0x40
  [<ffffffff8106135d>] process_one_work+0x1fd/0x650
  [<ffffffff810612f2>] ? process_one_work+0x192/0x650
  [<ffffffff81061b50>] worker_thread+0x110/0x380
  [<ffffffff81061a40>] ? rescuer_thread+0x250/0x250
  [<ffffffff81067d6b>] kthread+0xdb/0xe0
  [<ffffffff81067c90>] ? kthread_create_on_node+0x140/0x140
  [<ffffffff8148345c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81067c90>] ? kthread_create_on_node+0x140/0x140
---[ end trace e3e3a22d8bb51cb7 ]---

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Netpoll triggers soft lockup
  2013-04-11 13:42 Netpoll triggers soft lockup Bart Van Assche
@ 2013-04-11 14:08 ` Neil Horman
  2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman
  1 sibling, 0 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-11 14:08 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: David Miller, netdev

On Thu, Apr 11, 2013 at 03:42:02PM +0200, Bart Van Assche wrote:
> Hi,
> 
> While testing a driver against kernel 3.9-rc6 I ran into a soft
> lockup triggered by sending lots of kernel messages to a remote
> system via netconsole. This behavior was probably introduced by
> commit ca99ca14c ("netpoll: protect napi_poll and poll_controller
> during dev_[open|close]"). That commit introduced a mutex in
> netpoll_poll_dev(), which can be called from interrupt context. Is
> there anyone who can tell me whether this is a bug in commit
> ca99ca14c or in netconsole ?
> 
Thanks for the report Bart, thats interesting.  You're right, the problem was
introduced by ca99ca14c, but it seems like a false positive to me.  The warning
is comming from DEBUG_LOCKS_WARN_ON in spin_lock_mutex.  It appears that the
intent here is to catch users of spin_lock_mutex (which is the mutex api), from
trying to use mutex locking in interrupt context as that results in a panic.  In
this case however, we're using the mutex_trylock call, which is guaranteed not
to sleep (which is the nature of the panic condition).  So we really don't need
to warn here.  I think the right thing to do in this case is to add a flag to
spin_lock_mutex that skips the warning if we're doing a trylock operation.  I'll
work up a patch shortly.

Thanks again!
Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 13:42 Netpoll triggers soft lockup Bart Van Assche
  2013-04-11 14:08 ` Neil Horman
@ 2013-04-11 15:18 ` Neil Horman
  2013-04-11 15:54   ` Christoph Paasch
                     ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-11 15:18 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Neil Horman, David Miller, netdev

Bart, this patch should fix your problem.  Could you please test it and confirm?

Bart Van Assche recently reported a warning to me:

<IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
 [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
 [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
 [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
 [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
 [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
 [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
 [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
 [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
 [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
 [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
 [<ffffffff8146f9f6>] printk+0x4d/0x4f
 [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]

This resulted from my commit ca99ca14c which introduced a mutex_trylock
operation in a path that could execute in interrupt context.  When mutex
debugging is enabled, the above warns the user when we are in fact exectuting in
interrupt context.

I think this is a false positive however.  The check is intended to catch users
who might be issuing sleeping calls in irq context, but the use of mutex_trylock
here is guaranteed not to sleep.

We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
with a __might_sleep call in the appropriate parent mutex operations, but for
the sake of effiency (which It seems is why the check was put in the spin lock
code only when debug is enabled), lets split the spin_lock_mutex call into two
components, where the outer component does the debug checking.  Then
mutex_trylock can just call the inner part as its callable from irq context
safely.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
CC: Bart Van Assche <bvanassche@acm.org>
CC: David Miller <davem@davemloft.net>
CC: netdev@vger.kernel.org
---
 kernel/mutex-debug.h | 11 +++++++++--
 kernel/mutex.c       |  4 ++--
 kernel/mutex.h       |  2 ++
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/mutex-debug.h b/kernel/mutex-debug.h
index 0799fd3..bd0d80a 100644
--- a/kernel/mutex-debug.h
+++ b/kernel/mutex-debug.h
@@ -37,13 +37,20 @@ static inline void mutex_clear_owner(struct mutex *lock)
 	lock->owner = NULL;
 }
 
-#define spin_lock_mutex(lock, flags)			\
+#define spin_lock_mutex_raw(lock, flags)		\
 	do {						\
 		struct mutex *l = container_of(lock, struct mutex, wait_lock); \
 							\
-		DEBUG_LOCKS_WARN_ON(in_interrupt());	\
 		local_irq_save(flags);			\
 		arch_spin_lock(&(lock)->rlock.raw_lock);\
+	} while (0)
+
+#define spin_lock_mutex(lock, flags)			\
+	do {						\
+		struct mutex *l = container_of(lock, struct mutex, wait_lock); \
+							\
+		DEBUG_LOCKS_WARN_ON(in_interrupt());	\
+		spin_lock_mutex_raw(lock, flags);	\
 		DEBUG_LOCKS_WARN_ON(l->magic != l);	\
 	} while (0)
 
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 52f2301..5c012c7 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -431,7 +431,7 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
 	unsigned long flags;
 	int prev;
 
-	spin_lock_mutex(&lock->wait_lock, flags);
+	spin_lock_mutex_raw(&lock->wait_lock, flags);
 
 	prev = atomic_xchg(&lock->count, -1);
 	if (likely(prev == 1)) {
@@ -443,7 +443,7 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
 	if (likely(list_empty(&lock->wait_list)))
 		atomic_set(&lock->count, 0);
 
-	spin_unlock_mutex(&lock->wait_lock, flags);
+	spin_unlock_mutex_raw(&lock->wait_lock, flags);
 
 	return prev == 1;
 }
diff --git a/kernel/mutex.h b/kernel/mutex.h
index 4115fbf..af6ffb4 100644
--- a/kernel/mutex.h
+++ b/kernel/mutex.h
@@ -11,6 +11,8 @@
 
 #define spin_lock_mutex(lock, flags) \
 		do { spin_lock(lock); (void)(flags); } while (0)
+#define spin_lock_mutex_raw(lock, flags) spin_lock_mutex((lock), (flags))
+
 #define spin_unlock_mutex(lock, flags) \
 		do { spin_unlock(lock); (void)(flags); } while (0)
 #define mutex_remove_waiter(lock, waiter, ti) \
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman
@ 2013-04-11 15:54   ` Christoph Paasch
  2013-04-11 17:04     ` Neil Horman
  2013-04-11 15:57   ` Eric Dumazet
  2013-04-11 17:31   ` Bart Van Assche
  2 siblings, 1 reply; 34+ messages in thread
From: Christoph Paasch @ 2013-04-11 15:54 UTC (permalink / raw)
  To: Neil Horman; +Cc: Bart Van Assche, David Miller, netdev

On Thursday 11 April 2013 11:18:06 Neil Horman wrote:
> Bart, this patch should fix your problem.  Could you please test it and
> confirm?
> 
> Bart Van Assche recently reported a warning to me:
> 
> <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
>  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
>  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
>  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
>  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
>  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
>  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
>  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
>  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
>  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
>  [<ffffffff8146f9f6>] printk+0x4d/0x4f
>  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
> 
> This resulted from my commit ca99ca14c which introduced a mutex_trylock
> operation in a path that could execute in interrupt context.  When mutex
> debugging is enabled, the above warns the user when we are in fact
> exectuting in interrupt context.
> 
> I think this is a false positive however.  The check is intended to catch
> users who might be issuing sleeping calls in irq context, but the use of
> mutex_trylock here is guaranteed not to sleep.

Even if he does not sleep, may we still hit a deadlock like the following:

netpoll_rx_disable() calls mutex_lock(), who ends up in __mutex_lock_common, 
calling spin_lock_mutex().

Immediatly after that, on the same CPU, the interrupt comes and 
netpoll_poll_dev calls mutex_trylock and ends up also calling 
spin_lock_mutex(). Now, it seems to me that we are deadlocked - the interrupt 
is spinning on the lock, because netpoll_rx_disable() already took it.

Or maybe I am missing something?


Cheers,
Christoph

-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org
UCLouvain
--

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman
  2013-04-11 15:54   ` Christoph Paasch
@ 2013-04-11 15:57   ` Eric Dumazet
  2013-04-11 16:56     ` Neil Horman
  2013-04-11 17:31   ` Bart Van Assche
  2 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2013-04-11 15:57 UTC (permalink / raw)
  To: Neil Horman; +Cc: Bart Van Assche, David Miller, netdev

On Thu, 2013-04-11 at 11:18 -0400, Neil Horman wrote:
> Bart, this patch should fix your problem.  Could you please test it and confirm?
> 
> Bart Van Assche recently reported a warning to me:
> 
> <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
>  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
>  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
>  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
>  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
>  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
>  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
>  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
>  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
>  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
>  [<ffffffff8146f9f6>] printk+0x4d/0x4f
>  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
> 
> This resulted from my commit ca99ca14c which introduced a mutex_trylock
> operation in a path that could execute in interrupt context.  When mutex
> debugging is enabled, the above warns the user when we are in fact exectuting in
> interrupt context.
> 
> I think this is a false positive however.  The check is intended to catch users
> who might be issuing sleeping calls in irq context, but the use of mutex_trylock
> here is guaranteed not to sleep.
> 
> We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
> with a __might_sleep call in the appropriate parent mutex operations, but for
> the sake of effiency (which It seems is why the check was put in the spin lock
> code only when debug is enabled), lets split the spin_lock_mutex call into two
> components, where the outer component does the debug checking.  Then
> mutex_trylock can just call the inner part as its callable from irq context
> safely.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Reported-by: Bart Van Assche <bvanassche@acm.org>
> CC: Bart Van Assche <bvanassche@acm.org>
> CC: David Miller <davem@davemloft.net>
> CC: netdev@vger.kernel.org

Please post your patch to lkml ?

Thanks !

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 15:57   ` Eric Dumazet
@ 2013-04-11 16:56     ` Neil Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-11 16:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Bart Van Assche, David Miller, netdev

On Thu, Apr 11, 2013 at 08:57:01AM -0700, Eric Dumazet wrote:
> On Thu, 2013-04-11 at 11:18 -0400, Neil Horman wrote:
> > Bart, this patch should fix your problem.  Could you please test it and confirm?
> > 
> > Bart Van Assche recently reported a warning to me:
> > 
> > <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
> >  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
> >  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
> >  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
> >  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
> >  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
> >  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
> >  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
> >  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
> >  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
> >  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
> >  [<ffffffff8146f9f6>] printk+0x4d/0x4f
> >  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
> > 
> > This resulted from my commit ca99ca14c which introduced a mutex_trylock
> > operation in a path that could execute in interrupt context.  When mutex
> > debugging is enabled, the above warns the user when we are in fact exectuting in
> > interrupt context.
> > 
> > I think this is a false positive however.  The check is intended to catch users
> > who might be issuing sleeping calls in irq context, but the use of mutex_trylock
> > here is guaranteed not to sleep.
> > 
> > We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
> > with a __might_sleep call in the appropriate parent mutex operations, but for
> > the sake of effiency (which It seems is why the check was put in the spin lock
> > code only when debug is enabled), lets split the spin_lock_mutex call into two
> > components, where the outer component does the debug checking.  Then
> > mutex_trylock can just call the inner part as its callable from irq context
> > safely.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > Reported-by: Bart Van Assche <bvanassche@acm.org>
> > CC: Bart Van Assche <bvanassche@acm.org>
> > CC: David Miller <davem@davemloft.net>
> > CC: netdev@vger.kernel.org
> 
> Please post your patch to lkml ?
> 
I will, I just wanted Bart to test it first, and CCed everyone that he sent his
origional note too.

Thanks
Neil

> Thanks !
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 15:54   ` Christoph Paasch
@ 2013-04-11 17:04     ` Neil Horman
  2013-04-11 17:51       ` Christoph Paasch
  0 siblings, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-11 17:04 UTC (permalink / raw)
  To: g; +Cc: Bart Van Assche, David Miller, netdev

On Thu, Apr 11, 2013 at 05:54:28PM +0200, Christoph Paasch wrote:
> On Thursday 11 April 2013 11:18:06 Neil Horman wrote:
> > Bart, this patch should fix your problem.  Could you please test it and
> > confirm?
> > 
> > Bart Van Assche recently reported a warning to me:
> > 
> > <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
> >  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
> >  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
> >  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
> >  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
> >  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
> >  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
> >  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
> >  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
> >  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
> >  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
> >  [<ffffffff8146f9f6>] printk+0x4d/0x4f
> >  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
> > 
> > This resulted from my commit ca99ca14c which introduced a mutex_trylock
> > operation in a path that could execute in interrupt context.  When mutex
> > debugging is enabled, the above warns the user when we are in fact
> > exectuting in interrupt context.
> > 
> > I think this is a false positive however.  The check is intended to catch
> > users who might be issuing sleeping calls in irq context, but the use of
> > mutex_trylock here is guaranteed not to sleep.
> 
> Even if he does not sleep, may we still hit a deadlock like the following:
> 
> netpoll_rx_disable() calls mutex_lock(), who ends up in __mutex_lock_common, 
> calling spin_lock_mutex().
> 
> Immediatly after that, on the same CPU, the interrupt comes and 
> netpoll_poll_dev calls mutex_trylock and ends up also calling 
> spin_lock_mutex(). Now, it seems to me that we are deadlocked - the interrupt 
> is spinning on the lock, because netpoll_rx_disable() already took it.
> 
> Or maybe I am missing something?
> 
I dont believe the deadlock you describe can happen.  The spin_lock_mutex
operation disables irqs on the local cpu with local_irq_save, so we won't loose
the cpu while we're holding the spinlock.  Likewise we don't restore the irq
flags until after we release said lock.  Once we have the mutex, if we're
preempted by another path that goes through the netpoll_poll_dev path, then we
hit the trylock api call.  The spinlock is either released or held on another
cpu (read: no deadlock), and if the mutex is held, then the trylock simply
fails.

Thanks
Neil

> 
> Cheers,
> Christoph
> 
> -- 
> IP Networking Lab --- http://inl.info.ucl.ac.be
> MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org
> UCLouvain
> --
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman
  2013-04-11 15:54   ` Christoph Paasch
  2013-04-11 15:57   ` Eric Dumazet
@ 2013-04-11 17:31   ` Bart Van Assche
  2013-04-11 17:52     ` Neil Horman
  2013-04-11 19:14     ` Neil Horman
  2 siblings, 2 replies; 34+ messages in thread
From: Bart Van Assche @ 2013-04-11 17:31 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, netdev

On 04/11/13 17:18, Neil Horman wrote:
> Bart, this patch should fix your problem.  Could you please test it and confirm?

The fails to build on my setup after having applied that patch:

kernel/mutex.c: In function ‘__mutex_trylock_slowpath’:
kernel/mutex.c:446:2: error: implicit declaration of function 
‘spin_unlock_mutex_raw’ [-Werror=implicit-function-declaration]

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 17:04     ` Neil Horman
@ 2013-04-11 17:51       ` Christoph Paasch
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Paasch @ 2013-04-11 17:51 UTC (permalink / raw)
  To: Neil Horman; +Cc: g, Bart Van Assche, David Miller, netdev

On Thursday 11 April 2013 13:04:05 Neil Horman wrote:
> I dont believe the deadlock you describe can happen.  The spin_lock_mutex
> operation disables irqs on the local cpu with local_irq_save, so we won't
> loose the cpu while we're holding the spinlock.  Likewise we don't restore
> the irq flags until after we release said lock.  Once we have the mutex, if
> we're preempted by another path that goes through the netpoll_poll_dev
> path, then we hit the trylock api call.  The spinlock is either released or
> held on another cpu (read: no deadlock), and if the mutex is held, then the
> trylock simply fails.

Thanks for explaining this to me. I was looking for a local_bh_disable() - but 
local_irq_save does the job too.


Thanks,
Christoph

-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org
UCLouvain
--

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 17:31   ` Bart Van Assche
@ 2013-04-11 17:52     ` Neil Horman
  2013-04-11 19:14     ` Neil Horman
  1 sibling, 0 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-11 17:52 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: David Miller, netdev

On Thu, Apr 11, 2013 at 07:31:07PM +0200, Bart Van Assche wrote:
> On 04/11/13 17:18, Neil Horman wrote:
> >Bart, this patch should fix your problem.  Could you please test it and confirm?
> 
> The fails to build on my setup after having applied that patch:
> 
> kernel/mutex.c: In function ‘__mutex_trylock_slowpath’:
> kernel/mutex.c:446:2: error: implicit declaration of function
> ‘spin_unlock_mutex_raw’ [-Werror=implicit-function-declaration]
> 
Strage, not sure why, although I didn't explicitly test it myself.  I'll take a
look this evening and fix it up for you.
Neil

> Bart.
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 17:31   ` Bart Van Assche
  2013-04-11 17:52     ` Neil Horman
@ 2013-04-11 19:14     ` Neil Horman
  2013-04-12  6:27       ` Bart Van Assche
  1 sibling, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-11 19:14 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: David Miller, netdev

On Thu, Apr 11, 2013 at 07:31:07PM +0200, Bart Van Assche wrote:
> On 04/11/13 17:18, Neil Horman wrote:
> >Bart, this patch should fix your problem.  Could you please test it and confirm?
> 
> The fails to build on my setup after having applied that patch:
> 
> kernel/mutex.c: In function ‘__mutex_trylock_slowpath’:
> kernel/mutex.c:446:2: error: implicit declaration of function
> ‘spin_unlock_mutex_raw’ [-Werror=implicit-function-declaration]
> 
> Bart.
> 


Sorry, turned out to be a stupid typo on my part.  This is a fixed version, I've
verified that it builds with and without CONFIG_DEBUG_MUTEXS enabled


commit 2f78b4659f119302849f2b753cc1fe5ee87125bc
Author: Neil Horman <nhorman@tuxdriver.com>
Date:   Thu Apr 11 11:05:12 2013 -0400

    spinlock: split out debugging check from spin_lock_mutex
    
    Bart, this patch should fix your problem.  Could you please test it and confirm?
    
    Bart Van Assche recently reported a warning to me:
    
    <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
     [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
     [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
     [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
     [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
     [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
     [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
     [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
     [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
     [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
     [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
     [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
     [<ffffffff8146f9f6>] printk+0x4d/0x4f
     [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
    
    This resulted from my commit ca99ca14c which introduced a mutex_trylock
    operation in a path that could execute in interrupt context.  When mutex
    debugging is enabled, the above warns the user when we are in fact exectuting in
    interrupt context.
    
    I think this is a false positive however.  The check is intended to catch users
    who might be issuing sleeping calls in irq context, but the use of mutex_trylock
    here is guaranteed not to sleep.
    
    We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
    with a __might_sleep call in the appropriate parent mutex operations, but for
    the sake of effiency (which It seems is why the check was put in the spin lock
    code only when debug is enabled), lets split the spin_lock_mutex call into two
    components, where the outer component does the debug checking.  Then
    mutex_trylock can just call the inner part as its callable from irq context
    safely.
    
    Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
    Reported-by: Bart Van Assche <bvanassche@acm.org>
    CC: Bart Van Assche <bvanassche@acm.org>
    CC: David Miller <davem@davemloft.net>
    CC: netdev@vger.kernel.org

diff --git a/kernel/mutex-debug.h b/kernel/mutex-debug.h
index 0799fd3..aac0858 100644
--- a/kernel/mutex-debug.h
+++ b/kernel/mutex-debug.h
@@ -37,13 +37,19 @@ static inline void mutex_clear_owner(struct mutex *lock)
 	lock->owner = NULL;
 }
 
+#define spin_lock_mutex_raw(lock, flags)		\
+	do {						\
+							\
+		local_irq_save(flags);			\
+		arch_spin_lock(&(lock)->rlock.raw_lock);\
+	} while (0)
+
 #define spin_lock_mutex(lock, flags)			\
 	do {						\
 		struct mutex *l = container_of(lock, struct mutex, wait_lock); \
 							\
 		DEBUG_LOCKS_WARN_ON(in_interrupt());	\
-		local_irq_save(flags);			\
-		arch_spin_lock(&(lock)->rlock.raw_lock);\
+		spin_lock_mutex_raw(lock, flags);	\
 		DEBUG_LOCKS_WARN_ON(l->magic != l);	\
 	} while (0)
 
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 52f2301..a62a5f8 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -431,7 +431,7 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
 	unsigned long flags;
 	int prev;
 
-	spin_lock_mutex(&lock->wait_lock, flags);
+	spin_lock_mutex_raw(&lock->wait_lock, flags);
 
 	prev = atomic_xchg(&lock->count, -1);
 	if (likely(prev == 1)) {
diff --git a/kernel/mutex.h b/kernel/mutex.h
index 4115fbf..af6ffb4 100644
--- a/kernel/mutex.h
+++ b/kernel/mutex.h
@@ -11,6 +11,8 @@
 
 #define spin_lock_mutex(lock, flags) \
 		do { spin_lock(lock); (void)(flags); } while (0)
+#define spin_lock_mutex_raw(lock, flags) spin_lock_mutex((lock), (flags))
+
 #define spin_unlock_mutex(lock, flags) \
 		do { spin_unlock(lock); (void)(flags); } while (0)
 #define mutex_remove_waiter(lock, waiter, ti) \

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-11 19:14     ` Neil Horman
@ 2013-04-12  6:27       ` Bart Van Assche
  2013-04-12 11:32         ` Neil Horman
  0 siblings, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2013-04-12  6:27 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, netdev, Ingo Molnar

On 04/11/13 21:14, Neil Horman wrote:
>      This resulted from my commit ca99ca14c which introduced a mutex_trylock
>      operation in a path that could execute in interrupt context.  When mutex
>      debugging is enabled, the above warns the user when we are in fact exectuting in
>      interrupt context.
>
>      I think this is a false positive however.  The check is intended to catch users
>      who might be issuing sleeping calls in irq context, but the use of mutex_trylock
>      here is guaranteed not to sleep.
>
>      We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
>      with a __might_sleep call in the appropriate parent mutex operations, but for
>      the sake of effiency (which It seems is why the check was put in the spin lock
>      code only when debug is enabled), lets split the spin_lock_mutex call into two
>      components, where the outer component does the debug checking.  Then
>      mutex_trylock can just call the inner part as its callable from irq context
>      safely.

Sorry but I'm not yet convinced that it's safe to invoke mutex_trylock() 
from IRQ context. Please have a look at the implementation of 
mutex_set_owner(), which is invoked by mutex_trylock(). 
mutex_set_owner() stores the value of the "current" pointer into 
lock->owner. The value of "current" does not have a meaning in IRQ context.

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-12  6:27       ` Bart Van Assche
@ 2013-04-12 11:32         ` Neil Horman
  2013-04-12 14:01           ` Bart Van Assche
  0 siblings, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-12 11:32 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar

On Fri, Apr 12, 2013 at 08:27:31AM +0200, Bart Van Assche wrote:
> On 04/11/13 21:14, Neil Horman wrote:
> >     This resulted from my commit ca99ca14c which introduced a mutex_trylock
> >     operation in a path that could execute in interrupt context.  When mutex
> >     debugging is enabled, the above warns the user when we are in fact exectuting in
> >     interrupt context.
> >
> >     I think this is a false positive however.  The check is intended to catch users
> >     who might be issuing sleeping calls in irq context, but the use of mutex_trylock
> >     here is guaranteed not to sleep.
> >
> >     We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
> >     with a __might_sleep call in the appropriate parent mutex operations, but for
> >     the sake of effiency (which It seems is why the check was put in the spin lock
> >     code only when debug is enabled), lets split the spin_lock_mutex call into two
> >     components, where the outer component does the debug checking.  Then
> >     mutex_trylock can just call the inner part as its callable from irq context
> >     safely.
> 
> Sorry but I'm not yet convinced that it's safe to invoke
> mutex_trylock() from IRQ context. Please have a look at the
> implementation of mutex_set_owner(), which is invoked by
> mutex_trylock(). mutex_set_owner() stores the value of the "current"
> pointer into lock->owner. The value of "current" does not have a
> meaning in IRQ context.
> 
> Bart.
> 
> 


Thats irrelevant, at least as far as deadlock safety is concerned.  current will
be set to the process tht was running when we were interrupted, but it won't
change during the course of the irq handler, which is all that matters.  The
lock->owner field is used for optimistic spinning.  The worst that will happen
is, if CONFIG_MUTEX_SPIN_ON_OWNER is configured, another process may wait on
this mutex, spinning on the wrong task to release it (see mutex_spin_on_owner).
Thats not efficient, but its not deadlock prone, and its not even that
inefficient, when you consider that the critical path in the netpoll code is
relatively short.  And using the trylock here is certainly preferable to the
memory corruption that was possible previously.

Regards
Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-12 11:32         ` Neil Horman
@ 2013-04-12 14:01           ` Bart Van Assche
  2013-04-12 18:45             ` Neil Horman
  2013-04-19  8:38             ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Ingo Molnar
  0 siblings, 2 replies; 34+ messages in thread
From: Bart Van Assche @ 2013-04-12 14:01 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, netdev, Ingo Molnar

On 04/12/13 13:32, Neil Horman wrote:
> On Fri, Apr 12, 2013 at 08:27:31AM +0200, Bart Van Assche wrote:
>> On 04/11/13 21:14, Neil Horman wrote:
>>>      This resulted from my commit ca99ca14c which introduced a mutex_trylock
>>>      operation in a path that could execute in interrupt context.  When mutex
>>>      debugging is enabled, the above warns the user when we are in fact exectuting in
>>>      interrupt context.
>>>
>>>      I think this is a false positive however.  The check is intended to catch users
>>>      who might be issuing sleeping calls in irq context, but the use of mutex_trylock
>>>      here is guaranteed not to sleep.
>>>
>>>      We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
>>>      with a __might_sleep call in the appropriate parent mutex operations, but for
>>>      the sake of effiency (which It seems is why the check was put in the spin lock
>>>      code only when debug is enabled), lets split the spin_lock_mutex call into two
>>>      components, where the outer component does the debug checking.  Then
>>>      mutex_trylock can just call the inner part as its callable from irq context
>>>      safely.
>>
>> Sorry but I'm not yet convinced that it's safe to invoke
>> mutex_trylock() from IRQ context. Please have a look at the
>> implementation of mutex_set_owner(), which is invoked by
>> mutex_trylock(). mutex_set_owner() stores the value of the "current"
>> pointer into lock->owner. The value of "current" does not have a
>> meaning in IRQ context.
> 
> Thats irrelevant, at least as far as deadlock safety is concerned.  current will
> be set to the process that was running when we were interrupted, but it won't
> change during the course of the irq handler, which is all that matters.  The
> lock->owner field is used for optimistic spinning.  The worst that will happen
> is, if CONFIG_MUTEX_SPIN_ON_OWNER is configured, another process may wait on
> this mutex, spinning on the wrong task to release it (see mutex_spin_on_owner).
> Thats not efficient, but its not deadlock prone, and its not even that
> inefficient, when you consider that the critical path in the netpoll code is
> relatively short.  And using the trylock here is certainly preferable to the
> memory corruption that was possible previously.

I think there is another issue with invoking mutex_trylock() and mutex_unlock()
from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled
__mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and
hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe.
Any thoughts about that ?

With v2 of your patch and CONFIG_DEBUG_MUTEXES enabled I get the warning below:

------------[ cut here ]------------
WARNING: at kernel/mutex.c:313 __mutex_unlock_slowpath+0x157/0x160()
Pid: 181, comm: kworker/0:1H Tainted: G           O 3.9.0-rc6-debug+ #1
Call Trace:
<IRQ>  [<ffffffff8103c3ef>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8103c44a>] warn_slowpath_null+0x1a/0x20
[<ffffffff81432047>] __mutex_unlock_slowpath+0x157/0x160
[<ffffffff8143205e>] mutex_unlock+0xe/0x10
[<ffffffff8136d031>] netpoll_poll_dev+0x111/0x9a0
[<ffffffff81345f32>] ? __alloc_skb+0x82/0x2a0
[<ffffffff8136dac5>] netpoll_send_skb_on_dev+0x205/0x3b0
[<ffffffff8136e00a>] netpoll_send_udp+0x28a/0x3a0
[<ffffffffa0524843>] ? write_msg+0x53/0x110 [netconsole]
[<ffffffffa05248bf>] write_msg+0xcf/0x110 [netconsole]
[<ffffffff8103d7f1>] call_console_drivers.constprop.16+0xa1/0x120
[<ffffffff8103e848>] console_unlock+0x3f8/0x450
[<ffffffff8103ecce>] vprintk_emit+0x1ee/0x510
[<ffffffff812d1f2c>] dev_vprintk_emit+0x5c/0x70
[<ffffffff810ff047>] ? mempool_free_slab+0x17/0x20
[<ffffffff810ff047>] ? mempool_free_slab+0x17/0x20
[<ffffffff81145922>] ? kmem_cache_free+0x1c2/0x1d0
[<ffffffff812d1f79>] dev_printk_emit+0x39/0x40
[<ffffffff811f6702>] ? blk_update_request+0x3d2/0x520
[<ffffffffa000a110>] ? device_block+0x10/0x10 [scsi_mod]
[<ffffffff812d2a7e>] __dev_printk+0x5e/0x90
[<ffffffff812d2e05>] dev_printk+0x45/0x50
[<ffffffffa000b5a7>] scsi_io_completion+0x277/0x6c0 [scsi_mod]
[<ffffffffa000107d>] scsi_finish_command+0xbd/0x120 [scsi_mod]
[<ffffffffa000b22f>] scsi_softirq_done+0x13f/0x160 [scsi_mod]
[<ffffffff811fd4c0>] blk_done_softirq+0x80/0xa0
[<ffffffff81044d61>] __do_softirq+0x101/0x280
[<ffffffff81045095>] irq_exit+0xb5/0xc0
[<ffffffff8143f2be>] smp_apic_timer_interrupt+0x6e/0x99
[<ffffffff8143e5ef>] apic_timer_interrupt+0x6f/0x80
<EOI>  [<ffffffff810986b2>] ? mark_held_locks+0xb2/0x130
[<ffffffff8143516a>] ? _raw_spin_unlock_irq+0x3a/0x50
[<ffffffff81435160>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8120b074>] cfq_kick_queue+0x44/0x50
[<ffffffff81059e5d>] process_one_work+0x1fd/0x510
[<ffffffff81059df2>] ? process_one_work+0x192/0x510
[<ffffffff8105bccf>] worker_thread+0x10f/0x380
[<ffffffff8105bbc0>] ? busy_worker_rebind_fn+0xb0/0xb0
[<ffffffff8106209b>] kthread+0xdb/0xe0
[<ffffffff81061fc0>] ? kthread_create_on_node+0x140/0x140
[<ffffffff8143d95c>] ret_from_fork+0x7c/0xb0
[<ffffffff81061fc0>] ? kthread_create_on_node+0x140/0x140
---[ end trace dd7421d6dfb2c4ed ]---

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-12 14:01           ` Bart Van Assche
@ 2013-04-12 18:45             ` Neil Horman
  2013-04-13  7:35               ` Bart Van Assche
  2013-04-19  8:38             ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Ingo Molnar
  1 sibling, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-12 18:45 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar

On Fri, Apr 12, 2013 at 04:01:04PM +0200, Bart Van Assche wrote:
> On 04/12/13 13:32, Neil Horman wrote:
> > On Fri, Apr 12, 2013 at 08:27:31AM +0200, Bart Van Assche wrote:
> >> On 04/11/13 21:14, Neil Horman wrote:
> >>>      This resulted from my commit ca99ca14c which introduced a mutex_trylock
> >>>      operation in a path that could execute in interrupt context.  When mutex
> >>>      debugging is enabled, the above warns the user when we are in fact exectuting in
> >>>      interrupt context.
> >>>
> >>>      I think this is a false positive however.  The check is intended to catch users
> >>>      who might be issuing sleeping calls in irq context, but the use of mutex_trylock
> >>>      here is guaranteed not to sleep.
> >>>
> >>>      We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
> >>>      with a __might_sleep call in the appropriate parent mutex operations, but for
> >>>      the sake of effiency (which It seems is why the check was put in the spin lock
> >>>      code only when debug is enabled), lets split the spin_lock_mutex call into two
> >>>      components, where the outer component does the debug checking.  Then
> >>>      mutex_trylock can just call the inner part as its callable from irq context
> >>>      safely.
> >>
> >> Sorry but I'm not yet convinced that it's safe to invoke
> >> mutex_trylock() from IRQ context. Please have a look at the
> >> implementation of mutex_set_owner(), which is invoked by
> >> mutex_trylock(). mutex_set_owner() stores the value of the "current"
> >> pointer into lock->owner. The value of "current" does not have a
> >> meaning in IRQ context.
> > 
> > Thats irrelevant, at least as far as deadlock safety is concerned.  current will
> > be set to the process that was running when we were interrupted, but it won't
> > change during the course of the irq handler, which is all that matters.  The
> > lock->owner field is used for optimistic spinning.  The worst that will happen
> > is, if CONFIG_MUTEX_SPIN_ON_OWNER is configured, another process may wait on
> > this mutex, spinning on the wrong task to release it (see mutex_spin_on_owner).
> > Thats not efficient, but its not deadlock prone, and its not even that
> > inefficient, when you consider that the critical path in the netpoll code is
> > relatively short.  And using the trylock here is certainly preferable to the
> > memory corruption that was possible previously.
> 
> I think there is another issue with invoking mutex_trylock() and mutex_unlock()
> from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled
> __mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and
> hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe.
> Any thoughts about that ?
> 
Yeah, its ugly, but in this specific case, its ok.  the netpoll code (in
netpoll_send_skb disables irq on the local cpu before entering the netpoll code
path any further, so whenver we frob this mutex from the local cpu, we're
guaranteed not to get pre-empted by an irq.

> With v2 of your patch and CONFIG_DEBUG_MUTEXES enabled I get the warning below:
> 
another instance of the same issue - its because we spin_lock_mutex in
mutex_unlock, rather than spin_lock_mutex_raw.  Can you fix it up by hand, or
shall I update the patch?

Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-12 18:45             ` Neil Horman
@ 2013-04-13  7:35               ` Bart Van Assche
  2013-04-13 12:03                 ` Neil Horman
  2013-04-15 14:16                 ` Neil Horman
  0 siblings, 2 replies; 34+ messages in thread
From: Bart Van Assche @ 2013-04-13  7:35 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, netdev, Ingo Molnar

On 04/12/13 20:45, Neil Horman wrote:
> On Fri, Apr 12, 2013 at 04:01:04PM +0200, Bart Van Assche wrote:
>> On 04/12/13 13:32, Neil Horman wrote:
>> I think there is another issue with invoking mutex_trylock() and mutex_unlock()
>> from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled
>> __mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and
>> hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe.
>> Any thoughts about that ?
>>
> Yeah, its ugly, but in this specific case, its ok.  the netpoll code (in
> netpoll_send_skb disables irq on the local cpu before entering the netpoll code
> path any further, so whenver we frob this mutex from the local cpu, we're
> guaranteed not to get pre-empted by an irq.

As far as I know it is neither allowed nor safe to call 
netpoll_rx_disable() with IRQs disabled. But that function can lock the 
dev_lock mutex. What do you think will happen with 
CONFIG_DEBUG_MUTEXES=n if an interrupt occurs during the 
mutex_lock(&ni->dev_lock) call, that mutex_lock() call has already 
locked the mutex-internal spin lock via spin_lock() and mutex_trylock() 
is invoked from inside the interrupt ? Can that result in anything else 
than deadlock and "CPU stuck" messages ?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-13  7:35               ` Bart Van Assche
@ 2013-04-13 12:03                 ` Neil Horman
  2013-04-15 14:16                 ` Neil Horman
  1 sibling, 0 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-13 12:03 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar

On Sat, Apr 13, 2013 at 09:35:07AM +0200, Bart Van Assche wrote:
> On 04/12/13 20:45, Neil Horman wrote:
> >On Fri, Apr 12, 2013 at 04:01:04PM +0200, Bart Van Assche wrote:
> >>On 04/12/13 13:32, Neil Horman wrote:
> >>I think there is another issue with invoking mutex_trylock() and mutex_unlock()
> >>from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled
> >>__mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and
> >>hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe.
> >>Any thoughts about that ?
> >>
> >Yeah, its ugly, but in this specific case, its ok.  the netpoll code (in
> >netpoll_send_skb disables irq on the local cpu before entering the netpoll code
> >path any further, so whenver we frob this mutex from the local cpu, we're
> >guaranteed not to get pre-empted by an irq.
> 
> As far as I know it is neither allowed nor safe to call
> netpoll_rx_disable() with IRQs disabled. But that function can lock
Where do you see netpoll_rx_disable getting called with irqs disabled?

> the dev_lock mutex. What do you think will happen with
> CONFIG_DEBUG_MUTEXES=n if an interrupt occurs during the
> mutex_lock(&ni->dev_lock) call, that mutex_lock() call has already
> locked the mutex-internal spin lock via spin_lock() and
> mutex_trylock() is invoked from inside the interrupt ? Can that
> result in anything else than deadlock and "CPU stuck" messages ?
> 
Please go back and look more closely.  Where do you see a mutex_lock call
getting made with interrupts enabled?

Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-13  7:35               ` Bart Van Assche
  2013-04-13 12:03                 ` Neil Horman
@ 2013-04-15 14:16                 ` Neil Horman
       [not found]                   ` <CAO+b5-oBfH3M0dnrQSs-p1BF_5hKy2tsU-dD=EP9+S=iqPs5ew@mail.gmail.com>
  1 sibling, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-15 14:16 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar

On Sat, Apr 13, 2013 at 09:35:07AM +0200, Bart Van Assche wrote:
> On 04/12/13 20:45, Neil Horman wrote:
> >On Fri, Apr 12, 2013 at 04:01:04PM +0200, Bart Van Assche wrote:
> >>On 04/12/13 13:32, Neil Horman wrote:
> >>I think there is another issue with invoking mutex_trylock() and mutex_unlock()
> >>from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled
> >>__mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and
> >>hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe.
> >>Any thoughts about that ?
> >>
> >Yeah, its ugly, but in this specific case, its ok.  the netpoll code (in
> >netpoll_send_skb disables irq on the local cpu before entering the netpoll code
> >path any further, so whenver we frob this mutex from the local cpu, we're
> >guaranteed not to get pre-empted by an irq.
> 
> As far as I know it is neither allowed nor safe to call
> netpoll_rx_disable() with IRQs disabled. But that function can lock
> the dev_lock mutex. What do you think will happen with
> CONFIG_DEBUG_MUTEXES=n if an interrupt occurs during the
> mutex_lock(&ni->dev_lock) call, that mutex_lock() call has already
> locked the mutex-internal spin lock via spin_lock() and
> mutex_trylock() is invoked from inside the interrupt ? Can that
> result in anything else than deadlock and "CPU stuck" messages ?
> 
> Thanks,
> 
> Bart.
> 


So I've been doing some reading over the weekend, and as a result I've been
developing some concerns about how mutex_trylock works.  I'm still convinced its
safe to use with the changes we're discussing here, but all the documentation
regarding mutex_trylock seems to say it simply shouldn't be used in interrupt
context (though it never goes into detail as to why).  As near as I can tell,
its because the spin locks that mutexes use are locked without disabling
interrupts (the implication being that, as we see in the patch we're looking at
here), you're safe if you disable irqs independently).  Ironically, Ingo patched
the mutex debug variants back in 2006 in commit
1fb00c6cbd8356f43b46322742f3c01c2a1f02da to disable irqs to work around a bug,
the result being that the debug variant gives you a WARNING about using
mutex_trylock in interrupts, when its actually safe, while the non-debug variant
provides no warning, but is actually prone to deadlock.

There are a few other use cases in the kernel in which mutex_trylock may be used
in an interrupt context (crash_kexec being the most notable, as its used for the
same purposes that we're using it in netpoll, and where I got the inspiration
for my previous netpoll changes.

The more I look at this, the more I'm starting to think that, in addition to the
debug changes I've got proposed here, we should additionally convert the
non-debug spin_lock_mutex variants to disable irqs while manipulating the
mutexes.  That would make mutex_trylock safe, and give us a nice mechanism to
create exclusion between sleepable paths that should not run atomically and
interrupt context paths.

Ingo, can you comment on these thoughts above please.  I'd like to get your
opinion on this prior to spinning up a new patch if I could please.

Thanks!
Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
       [not found]                   ` <CAO+b5-oBfH3M0dnrQSs-p1BF_5hKy2tsU-dD=EP9+S=iqPs5ew@mail.gmail.com>
@ 2013-04-16 17:24                     ` Neil Horman
  2013-04-18 19:29                       ` Neil Horman
  0 siblings, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-16 17:24 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar

On Tue, Apr 16, 2013 at 07:56:12AM -0700, Bart Van Assche wrote:
> On Mon, Apr 15, 2013 at 7:16 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > So I've been doing some reading over the weekend, and as a result I've been
> > developing some concerns about how mutex_trylock works.  I'm still
> > convinced its
> > safe to use with the changes we're discussing here, but all the
> > documentation
> > regarding mutex_trylock seems to say it simply shouldn't be used in
> > interrupt
> > context (though it never goes into detail as to why).  As near as I can
> > tell,
> > its because the spin locks that mutexes use are locked without disabling
> > interrupts (the implication being that, as we see in the patch we're
> > looking at
> > here), you're safe if you disable irqs independently).  Ironically, Ingo
> > patched
> > the mutex debug variants back in 2006 in commit
> > 1fb00c6cbd8356f43b46322742f3c01c2a1f02da to disable irqs to work around a
> > bug,
> > the result being that the debug variant gives you a WARNING about using
> > mutex_trylock in interrupts, when its actually safe, while the non-debug
> > variant
> > provides no warning, but is actually prone to deadlock.
> >
> > There are a few other use cases in the kernel in which mutex_trylock may
> > be used
> > in an interrupt context (crash_kexec being the most notable, as its used
> > for the
> > same purposes that we're using it in netpoll, and where I got the
> > inspiration
> > for my previous netpoll changes.
> >
> > The more I look at this, the more I'm starting to think that, in addition
> > to the
> > debug changes I've got proposed here, we should additionally convert the
> > non-debug spin_lock_mutex variants to disable irqs while manipulating the
> > mutexes.  That would make mutex_trylock safe, and give us a nice mechanism
> > to
> > create exclusion between sleepable paths that should not run atomically and
> > interrupt context paths.
> >
> > Ingo, can you comment on these thoughts above please.  I'd like to get your
> > opinion on this prior to spinning up a new patch if I could please.
> >
> 
> How about introducing a second type of mutex, and only using changing the
> spin_lock() into spin_lock_irq() for that new type of mutex ? That way no
> performance penalty would be introduced for existing mutex users and
> mutex_trylock() could be made safe for use inside IRQ-context for the new
> mutex type only.
> 
> Bart.
Thats a thought, I've actually been thinking about doing this with a completion
variable and a flag as well, but perhaps a mutex variant would be a better api
choice.  I'll put that together as an alternate, but I'd really like to see
what Ingo has to say regarding just switching the spin_lock to spin_lock_irqsave
for all cases as being a mutex, I'm not sure adding in irq protection is going
to be that much of a performance hit.

Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-16 17:24                     ` Neil Horman
@ 2013-04-18 19:29                       ` Neil Horman
  2013-04-22 20:12                         ` Neil Horman
  0 siblings, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-18 19:29 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar

Well, I guess Ingo doesn't have time  to respond here.  Given that, I've been
looking at other ways of adressing this problem (as the more I think about it,
the more I think changing mutex semantics, even to be more accepting, is going
to be met with lots of friction).  I've come up with the below change, and it
works well for me (I've hammered it with ifup/downs while sending messages over
netconsole for 24 hours now without any incident).  I'm not sure I like all the
details yet (I modified the rx_flag field to be a gneral flags field that uses
test/set/clear_bit, which I don't think is necessecary, but it seems saner to
me).  the meat of the change though is that instead of a mutex, we use two flags
that, respectively, inhibit netpoll usage, and indicate when we're in a netpoll
service path.  Using these two flags, in conjunction with a wait_queue, we can
signal to executors in the netpoll path that we're inhibiting its use and expect
to any user in the path to complete and wake up any contexts in the inhibiting
path.  Seems to work well for me.  Let me know what you think, and I'll make any
desired changes and post an official patch.

Best
Neil


commit c277d54d471572b0ec7b27812d205fb303ad44f9
Author: Neil Horman <nhorman@tuxdriver.com>
Date:   Wed Apr 17 15:59:44 2013 -0400

    netpoll: convert mutex into a wait_event
    
    Bart Van Assche recently reported a warning to me:
    
    <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
     [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
     [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
     [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
     [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
     [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
     [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
     [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
     [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
     [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
     [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
     [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
     [<ffffffff8146f9f6>] printk+0x4d/0x4f
     [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
    
    This resulted from my commit ca99ca14c which introduced a mutex_trylock
    operation in a path that could execute in interrupt context.  When mutex
    debugging is enabled, the above warns the user when we are in fact exectuting in
    interrupt context.
    
    After some discussion, I'm not sure a mutex is the right choice here, as it
    seems unsafe to use (deadlock possibilities due to implementation details).
    Instead lets convert the mutex to a mechanism that makes use of a
    wait_queue_head and some flag bits.  The dev_close paths can now set a bit to
    suspend netpoll and then sleep waiting for the NETPOLL_POLLING flag to clear,
    which allows for flushing of any in-progress execution in the
    netpoll_poll_dev_path.
    
    Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
    Reported-by: Bart Van Assche <bvanassche@acm.org>
    CC: Bart Van Assche <bvanassche@acm.org>
    CC: David Miller <davem@davemloft.net>
    CC: netdev@vger.kernel.org

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 9d7d8c6..52e1d14 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -35,12 +35,19 @@ struct netpoll {
 	struct work_struct cleanup_work;
 };
 
+/* Flag defs for netpoll_info->flags */
+
+#define NETPOLL_RX_ENABLED	1
+#define NETPOLL_RX_DROP		2
+#define NETPOLL_SUSPENDED	3
+#define NETPOLL_POLLING		4
+
 struct netpoll_info {
 	atomic_t refcnt;
 
-	unsigned long rx_flags;
+	unsigned long flags;
 	spinlock_t rx_lock;
-	struct mutex dev_lock;
+	wait_queue_head_t dev_wait;
 	struct list_head rx_np; /* netpolls that registered an rx_hook */
 
 	struct sk_buff_head neigh_tx; /* list of neigh requests to reply to */
@@ -53,11 +60,11 @@ struct netpoll_info {
 };
 
 #ifdef CONFIG_NETPOLL
-extern int netpoll_rx_disable(struct net_device *dev);
-extern void netpoll_rx_enable(struct net_device *dev);
+extern int netpoll_suspend(struct net_device *dev);
+extern void netpoll_resume(struct net_device *dev);
 #else
-static inline int netpoll_rx_disable(struct net_device *dev) { return 0; }
-static inline void netpoll_rx_enable(struct net_device *dev) { return; }
+static inline int netpoll_suspend(struct net_device *dev) { return 0; }
+static inline void netpoll_resume(struct net_device *dev) { return; }
 #endif
 
 void netpoll_send_udp(struct netpoll *np, const char *msg, int len);
@@ -88,7 +95,8 @@ static inline bool netpoll_rx_on(struct sk_buff *skb)
 {
 	struct netpoll_info *npinfo = rcu_dereference_bh(skb->dev->npinfo);
 
-	return npinfo && (!list_empty(&npinfo->rx_np) || npinfo->rx_flags);
+	return npinfo && (!list_empty(&npinfo->rx_np) ||
+	       test_bit(NETPOLL_RX_ENABLED, &npinfo->flags));
 }
 
 static inline bool netpoll_rx(struct sk_buff *skb)
@@ -104,8 +112,9 @@ static inline bool netpoll_rx(struct sk_buff *skb)
 
 	npinfo = rcu_dereference_bh(skb->dev->npinfo);
 	spin_lock(&npinfo->rx_lock);
-	/* check rx_flags again with the lock held */
-	if (npinfo->rx_flags && __netpoll_rx(skb, npinfo))
+	/* check flags again with the lock held */
+	if (test_bit(NETPOLL_RX_ENABLED, &npinfo->flags) &&
+	    __netpoll_rx(skb, npinfo))
 		ret = true;
 	spin_unlock(&npinfo->rx_lock);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 3655ff9..163dd3e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1200,7 +1200,7 @@ static int __dev_open(struct net_device *dev)
 	 * If we don't do this there is a chance ndo_poll_controller
 	 * or ndo_poll may be running while we open the device
 	 */
-	ret = netpoll_rx_disable(dev);
+	ret = netpoll_suspend(dev);
 	if (ret)
 		return ret;
 
@@ -1217,7 +1217,7 @@ static int __dev_open(struct net_device *dev)
 	if (!ret && ops->ndo_open)
 		ret = ops->ndo_open(dev);
 
-	netpoll_rx_enable(dev);
+	netpoll_resume(dev);
 
 	if (ret)
 		clear_bit(__LINK_STATE_START, &dev->state);
@@ -1311,7 +1311,7 @@ static int __dev_close(struct net_device *dev)
 	LIST_HEAD(single);
 
 	/* Temporarily disable netpoll until the interface is down */
-	retval = netpoll_rx_disable(dev);
+	retval = netpoll_suspend(dev);
 	if (retval)
 		return retval;
 
@@ -1319,7 +1319,7 @@ static int __dev_close(struct net_device *dev)
 	retval = __dev_close_many(&single);
 	list_del(&single);
 
-	netpoll_rx_enable(dev);
+	netpoll_resume(dev);
 	return retval;
 }
 
@@ -1360,7 +1360,7 @@ int dev_close(struct net_device *dev)
 		LIST_HEAD(single);
 
 		/* Block netpoll rx while the interface is going down */
-		ret = netpoll_rx_disable(dev);
+		ret = netpoll_suspend(dev);
 		if (ret)
 			return ret;
 
@@ -1368,7 +1368,7 @@ int dev_close(struct net_device *dev)
 		dev_close_many(&single);
 		list_del(&single);
 
-		netpoll_rx_enable(dev);
+		netpoll_resume(dev);
 	}
 	return ret;
 }
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index a3a17ae..a094227 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -50,8 +50,6 @@ static atomic_t trapped;
 DEFINE_STATIC_SRCU(netpoll_srcu);
 
 #define USEC_PER_POLL	50
-#define NETPOLL_RX_ENABLED  1
-#define NETPOLL_RX_DROP     2
 
 #define MAX_SKB_SIZE							\
 	(sizeof(struct ethhdr) +					\
@@ -155,7 +153,7 @@ static int poll_one_napi(struct netpoll_info *npinfo,
 	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
 		return budget;
 
-	npinfo->rx_flags |= NETPOLL_RX_DROP;
+	set_bit(NETPOLL_RX_DROP, &npinfo->flags);
 	atomic_inc(&trapped);
 	set_bit(NAPI_STATE_NPSVC, &napi->state);
 
@@ -164,7 +162,7 @@ static int poll_one_napi(struct netpoll_info *npinfo,
 
 	clear_bit(NAPI_STATE_NPSVC, &napi->state);
 	atomic_dec(&trapped);
-	npinfo->rx_flags &= ~NETPOLL_RX_DROP;
+	clear_bit(NETPOLL_RX_DROP, &npinfo->flags);
 
 	return budget - work;
 }
@@ -206,27 +204,22 @@ static void netpoll_poll_dev(struct net_device *dev)
 	 * the dev_open/close paths use this to block netpoll activity
 	 * while changing device state
 	 */
-	if (!mutex_trylock(&ni->dev_lock))
-		return;
+	set_bit(NETPOLL_POLLING, &ni->flags);
+	if (test_bit(NETPOLL_SUSPENDED, &ni->flags))
+		goto out;
 
-	if (!netif_running(dev)) {
-		mutex_unlock(&ni->dev_lock);
-		return;
-	}
+	if (!netif_running(dev))
+		goto out;
 
 	ops = dev->netdev_ops;
-	if (!ops->ndo_poll_controller) {
-		mutex_unlock(&ni->dev_lock);
-		return;
-	}
+	if (!ops->ndo_poll_controller)
+		goto out;
 
 	/* Process pending work on NIC */
 	ops->ndo_poll_controller(dev);
 
 	poll_napi(dev);
 
-	mutex_unlock(&ni->dev_lock);
-
 	if (dev->flags & IFF_SLAVE) {
 		if (ni) {
 			struct net_device *bond_dev;
@@ -245,32 +238,37 @@ static void netpoll_poll_dev(struct net_device *dev)
 	service_neigh_queue(ni);
 
 	zap_completion_queue();
+out:
+	clear_bit(NETPOLL_POLLING, &ni->dev_wait.lock);
+	wake_up(&ni->dev_wait);
 }
 
-int netpoll_rx_disable(struct net_device *dev)
+int netpoll_suspend(struct net_device *dev)
 {
 	struct netpoll_info *ni;
 	int idx;
+
 	might_sleep();
 	idx = srcu_read_lock(&netpoll_srcu);
 	ni = srcu_dereference(dev->npinfo, &netpoll_srcu);
-	if (ni)
-		mutex_lock(&ni->dev_lock);
+	if (ni && !test_and_set_bit(NETPOLL_SUSPENDED, &ni->flags))
+		wait_event(ni->dev_wait, !test_bit(NETPOLL_POLLING,
+			   &ni->flags));
 	srcu_read_unlock(&netpoll_srcu, idx);
 	return 0;
 }
-EXPORT_SYMBOL(netpoll_rx_disable);
+EXPORT_SYMBOL(netpoll_suspend);
 
-void netpoll_rx_enable(struct net_device *dev)
+void netpoll_resume(struct net_device *dev)
 {
 	struct netpoll_info *ni;
 	rcu_read_lock();
 	ni = rcu_dereference(dev->npinfo);
 	if (ni)
-		mutex_unlock(&ni->dev_lock);
+		clear_bit(NETPOLL_SUSPENDED, &ni->flags);
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(netpoll_rx_enable);
+EXPORT_SYMBOL(netpoll_resume);
 
 static void refill_skbs(void)
 {
@@ -1042,11 +1040,11 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp)
 			goto out;
 		}
 
-		npinfo->rx_flags = 0;
+		npinfo->flags = 0;
 		INIT_LIST_HEAD(&npinfo->rx_np);
 
 		spin_lock_init(&npinfo->rx_lock);
-		mutex_init(&npinfo->dev_lock);
+		init_waitqueue_head(&npinfo->dev_wait);
 		skb_queue_head_init(&npinfo->neigh_tx);
 		skb_queue_head_init(&npinfo->txq);
 		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
@@ -1068,7 +1066,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp)
 
 	if (np->rx_hook) {
 		spin_lock_irqsave(&npinfo->rx_lock, flags);
-		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
+		set_bit(NETPOLL_RX_ENABLED, &npinfo->flags);
 		list_add_tail(&np->rx, &npinfo->rx_np);
 		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
 	}
@@ -1251,7 +1249,7 @@ void __netpoll_cleanup(struct netpoll *np)
 		spin_lock_irqsave(&npinfo->rx_lock, flags);
 		list_del(&np->rx);
 		if (list_empty(&npinfo->rx_np))
-			npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+			clear_bit(NETPOLL_RX_ENABLED, &npinfo->flags);
 		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
 	}
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-12 14:01           ` Bart Van Assche
  2013-04-12 18:45             ` Neil Horman
@ 2013-04-19  8:38             ` Ingo Molnar
  2013-04-19 12:52               ` Neil Horman
  1 sibling, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2013-04-19  8:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Neil Horman, David Miller, netdev, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner


* Bart Van Assche <bvanassche@acm.org> wrote:

> WARNING: at kernel/mutex.c:313 __mutex_unlock_slowpath+0x157/0x160()
> Pid: 181, comm: kworker/0:1H Tainted: G           O 3.9.0-rc6-debug+ #1
> Call Trace:
> <IRQ>  [<ffffffff8103c3ef>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff8103c44a>] warn_slowpath_null+0x1a/0x20
> [<ffffffff81432047>] __mutex_unlock_slowpath+0x157/0x160
> [<ffffffff8143205e>] mutex_unlock+0xe/0x10
> [<ffffffff8136d031>] netpoll_poll_dev+0x111/0x9a0
> [<ffffffff81345f32>] ? __alloc_skb+0x82/0x2a0
> [<ffffffff8136dac5>] netpoll_send_skb_on_dev+0x205/0x3b0
> [<ffffffff8136e00a>] netpoll_send_udp+0x28a/0x3a0
> [<ffffffffa0524843>] ? write_msg+0x53/0x110 [netconsole]
> [<ffffffffa05248bf>] write_msg+0xcf/0x110 [netconsole]
> [<ffffffff8103d7f1>] call_console_drivers.constprop.16+0xa1/0x120
> [<ffffffff8103e848>] console_unlock+0x3f8/0x450
> [<ffffffff8103ecce>] vprintk_emit+0x1ee/0x510
> [<ffffffff812d1f2c>] dev_vprintk_emit+0x5c/0x70
> [<ffffffff810ff047>] ? mempool_free_slab+0x17/0x20

I *really* think that using a mutex from a low level debug interface like netpoll 
is a mistake. We want such interfaces to be as atomic and as self-contained as 
possible: using spinlocks, which could possibly be converted to raw spinlocks, 
etc.

mutexes should be used when there's an expectation of possibly long blocking time. 
That's not really the case for netpoll, we either are able to generate the skb 
right then and send it off, or we are in trouble, right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-19  8:38             ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Ingo Molnar
@ 2013-04-19 12:52               ` Neil Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-19 12:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner

On Fri, Apr 19, 2013 at 10:38:02AM +0200, Ingo Molnar wrote:
> 
> * Bart Van Assche <bvanassche@acm.org> wrote:
> 
> > WARNING: at kernel/mutex.c:313 __mutex_unlock_slowpath+0x157/0x160()
> > Pid: 181, comm: kworker/0:1H Tainted: G           O 3.9.0-rc6-debug+ #1
> > Call Trace:
> > <IRQ>  [<ffffffff8103c3ef>] warn_slowpath_common+0x7f/0xc0
> > [<ffffffff8103c44a>] warn_slowpath_null+0x1a/0x20
> > [<ffffffff81432047>] __mutex_unlock_slowpath+0x157/0x160
> > [<ffffffff8143205e>] mutex_unlock+0xe/0x10
> > [<ffffffff8136d031>] netpoll_poll_dev+0x111/0x9a0
> > [<ffffffff81345f32>] ? __alloc_skb+0x82/0x2a0
> > [<ffffffff8136dac5>] netpoll_send_skb_on_dev+0x205/0x3b0
> > [<ffffffff8136e00a>] netpoll_send_udp+0x28a/0x3a0
> > [<ffffffffa0524843>] ? write_msg+0x53/0x110 [netconsole]
> > [<ffffffffa05248bf>] write_msg+0xcf/0x110 [netconsole]
> > [<ffffffff8103d7f1>] call_console_drivers.constprop.16+0xa1/0x120
> > [<ffffffff8103e848>] console_unlock+0x3f8/0x450
> > [<ffffffff8103ecce>] vprintk_emit+0x1ee/0x510
> > [<ffffffff812d1f2c>] dev_vprintk_emit+0x5c/0x70
> > [<ffffffff810ff047>] ? mempool_free_slab+0x17/0x20
> 
> I *really* think that using a mutex from a low level debug interface like netpoll 
> is a mistake. We want such interfaces to be as atomic and as self-contained as 
> possible: using spinlocks, which could possibly be converted to raw spinlocks, 
> etc.
> 
> mutexes should be used when there's an expectation of possibly long blocking time. 
> That's not really the case for netpoll, we either are able to generate the skb 
> right then and send it off, or we are in trouble, right?
> 
> Thanks,
> 
> 	Ingo
> 


Well, this is a very seldom used path.  The idea behind the use of a mutex was
simply to allow the close/open paths to sleep during those periods when they
occured in parallel with a netpoll operation. The only reason spinlocks weren't
used was because we're not supposed to enter the drivers ndo_open/close paths in
atomic context, as subsequents sleeps are possible.  Thats why the
netpoll_poll_napi path had a mutex_trylock operation.  As it turns out though,
thats unsafe for this use case because the mutex implementation uses a spin_lock
without disabling interrupts (leading to the possibility of deadlocks), which in
turn led to our discussion of weather or not converting the mutex internal
implementation to use spin_lock_irqsave, as that would make mutex_trylock safe
to use in irq/softirq context.

Its all a bit moot at this point though, as I've come up with an alternate
solution that I think satifies our needs just as well.  Using a wait_queue_head,
with some atomic flags, we can block dev_open and close while netpoll is
executing, and retain the ability to wake those processes up safely from irq
context.  Please take a look at my new patch and let me know what you think.

Thanks!
Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-18 19:29                       ` Neil Horman
@ 2013-04-22 20:12                         ` Neil Horman
       [not found]                           ` <CAO+b5-r5jVJNZWuREUH5MQ3baeSPR8fVV1p9pMnukmiZd9nRhg@mail.gmail.com>
  0 siblings, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-22 20:12 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar

On Thu, Apr 18, 2013 at 03:29:12PM -0400, Neil Horman wrote:
> Well, I guess Ingo doesn't have time  to respond here.  Given that, I've been
> looking at other ways of adressing this problem (as the more I think about it,
> the more I think changing mutex semantics, even to be more accepting, is going
> to be met with lots of friction).  I've come up with the below change, and it
> works well for me (I've hammered it with ifup/downs while sending messages over
> netconsole for 24 hours now without any incident).  I'm not sure I like all the
> details yet (I modified the rx_flag field to be a gneral flags field that uses
> test/set/clear_bit, which I don't think is necessecary, but it seems saner to
> me).  the meat of the change though is that instead of a mutex, we use two flags
> that, respectively, inhibit netpoll usage, and indicate when we're in a netpoll
> service path.  Using these two flags, in conjunction with a wait_queue, we can
> signal to executors in the netpoll path that we're inhibiting its use and expect
> to any user in the path to complete and wake up any contexts in the inhibiting
> path.  Seems to work well for me.  Let me know what you think, and I'll make any
> desired changes and post an official patch.
> 
> Best
> Neil
> 
Bart, any testing/thoughts here?  i'd like to get this fixed in the next few
days, but I'd like to get any feedback you might have first.

Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
       [not found]                           ` <CAO+b5-r5jVJNZWuREUH5MQ3baeSPR8fVV1p9pMnukmiZd9nRhg@mail.gmail.com>
@ 2013-04-23 13:23                             ` Neil Horman
       [not found]                               ` <CAO+b5-rQPyO9QE9v+oQTeo+G-ftcsehSB5=63AZ13QW4EJ1X0Q@mail.gmail.com>
  0 siblings, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-23 13:23 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar

On Mon, Apr 22, 2013 at 01:41:11PM -0700, Bart Van Assche wrote:
> On Mon, Apr 22, 2013 at 1:12 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > Bart, any testing/thoughts here?  i'd like to get this fixed in the next
> > few
> > days, but I'd like to get any feedback you might have first.
> >
> 
> Sorry but I haven't been able to run any tests during the past eight days
> since I'm traveling and because of that I do not have access to my regular
> test setup.
> 
> Bart.
Copy that.  I can propose this change based on my own testing, or wait for you
to get back.  Do you have a preference?
Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
       [not found]                               ` <CAO+b5-rQPyO9QE9v+oQTeo+G-ftcsehSB5=63AZ13QW4EJ1X0Q@mail.gmail.com>
@ 2013-04-23 13:44                                 ` Neil Horman
  2013-04-23 17:33                                   ` David Miller
  0 siblings, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-23 13:44 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar

On Tue, Apr 23, 2013 at 06:38:19AM -0700, Bart Van Assche wrote:
> On Tue, Apr 23, 2013 at 6:23 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > Copy that.  I can propose this change based on my own testing, or wait for
> > you
> > to get back.  Do you have a preference?
> >
> 
> It will take until next week before I can run more tests. It depends on
> whether a fix can wait that long ?
> 
> Bart.
Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
the mutex solution is reporting a warning, but no actual problems).  So I think
its safe to wait a bit longer, unless you just want this squared away now.

FWIw, I think theres still some discussion to be had about the possibility of
converting mutexes to use spin_lock_irqsave internally, thereby making
mutex_trylock irq safe, but I think we can deal with that independently of this
issue.

Regards
Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-23 13:44                                 ` Neil Horman
@ 2013-04-23 17:33                                   ` David Miller
  2013-04-23 17:50                                     ` Neil Horman
  0 siblings, 1 reply; 34+ messages in thread
From: David Miller @ 2013-04-23 17:33 UTC (permalink / raw)
  To: nhorman; +Cc: bart.vanassche, netdev, mingo

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 23 Apr 2013 09:44:43 -0400

> Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
> proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
> the mutex solution is reporting a warning, but no actual problems).  So I think
> its safe to wait a bit longer, unless you just want this squared away now.

If it's just a warning and people aren't actually hitting the
potential deadlock, it can wait.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-23 17:33                                   ` David Miller
@ 2013-04-23 17:50                                     ` Neil Horman
  2013-04-27 18:53                                       ` bvba Bart Van Assche
  0 siblings, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-23 17:50 UTC (permalink / raw)
  To: David Miller; +Cc: bart.vanassche, netdev, mingo

On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Tue, 23 Apr 2013 09:44:43 -0400
> 
> > Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
> > proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
> > the mutex solution is reporting a warning, but no actual problems).  So I think
> > its safe to wait a bit longer, unless you just want this squared away now.
> 
> If it's just a warning and people aren't actually hitting the
> potential deadlock, it can wait.
> 
Copy that. Bart, I'll wait till you get back then.
Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-23 17:50                                     ` Neil Horman
@ 2013-04-27 18:53                                       ` bvba Bart Van Assche
  2013-04-29 18:13                                         ` Neil Horman
  0 siblings, 1 reply; 34+ messages in thread
From: bvba Bart Van Assche @ 2013-04-27 18:53 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, netdev, mingo

On 04/23/13 19:50, Neil Horman wrote:
> On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote:
>> From: Neil Horman <nhorman@tuxdriver.com>
>> Date: Tue, 23 Apr 2013 09:44:43 -0400
>>
>>> Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
>>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
>>> the mutex solution is reporting a warning, but no actual problems).  So I think
>>> its safe to wait a bit longer, unless you just want this squared away now.
>>
>> If it's just a warning and people aren't actually hitting the
>> potential deadlock, it can wait.
>>
> Copy that. Bart, I'll wait till you get back then.

(Just arrived home)

Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271:

kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601]
kernel: irq event stamp: 1999192
kernel: hardirqs last  enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450
kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80
kernel: softirqs last  enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280
kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0
kernel: CPU 0 
kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G           O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE
kernel: RIP: 0010:[<ffffffff8103e8a0>]  [<ffffffff8103e8a0>] console_unlock+0x420/0x450
kernel: Call Trace:
kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040
kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350
kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
kernel: [<ffffffff812bd764>] con_write+0x34/0x50
kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210
kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70
kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480
kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0
kernel: [<ffffffff812a2839>] tty_write+0x159/0x300
kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180
kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0
kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170
kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0
kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-27 18:53                                       ` bvba Bart Van Assche
@ 2013-04-29 18:13                                         ` Neil Horman
  2013-04-29 19:12                                           ` Bart Van Assche
  2013-04-30 15:35                                           ` [PATCH RFC] netpoll: convert mutex into a semaphore Neil Horman
  0 siblings, 2 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-29 18:13 UTC (permalink / raw)
  To: bvba Bart Van Assche; +Cc: David Miller, netdev, mingo

On Sat, Apr 27, 2013 at 08:53:54PM +0200, bvba Bart Van Assche wrote:
> On 04/23/13 19:50, Neil Horman wrote:
> > On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote:
> >> From: Neil Horman <nhorman@tuxdriver.com>
> >> Date: Tue, 23 Apr 2013 09:44:43 -0400
> >>
> >>> Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
> >>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
> >>> the mutex solution is reporting a warning, but no actual problems).  So I think
> >>> its safe to wait a bit longer, unless you just want this squared away now.
> >>
> >> If it's just a warning and people aren't actually hitting the
> >> potential deadlock, it can wait.
> >>
> > Copy that. Bart, I'll wait till you get back then.
> 
> (Just arrived home)
> 
> Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271:
> 
> kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601]
> kernel: irq event stamp: 1999192
> kernel: hardirqs last  enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450
> kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80
> kernel: softirqs last  enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280
> kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0
> kernel: CPU 0 
> kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G           O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE
> kernel: RIP: 0010:[<ffffffff8103e8a0>]  [<ffffffff8103e8a0>] console_unlock+0x420/0x450
> kernel: Call Trace:
> kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040
> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
> kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350
> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
> kernel: [<ffffffff812bd764>] con_write+0x34/0x50
> kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210
> kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70
> kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480
> kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0
> kernel: [<ffffffff812a2839>] tty_write+0x159/0x300
> kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180
> kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0
> kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170
> kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0
> kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b
> 
> Bart.
> 

So, looking at this, I'm not 100% sure whats going on.  console_unlock has the
ability to detect the need for rescheduling when we call
raw_spin_unlock_irqrestore at the bottom of the function, so there should be no
reason that we don't give the per-cpu watchdog task a chance to run.  About the
only case I see in which we might be in trouble would be if somehow got into an
infinite loop if every message in the log buffer had LOG_NOCONS set, in which
case we would spin in this loop without ever scheduling or re-enabling
interrupts.  But for that to happen I think you would have to be flooding the
log with messages that were bigger than the default message buffer size (about
1000 bytes without a newline).  You're not doing that are you?

Regardless, this is a different problem from the one we were origionally
addressing, we should probably handle it in a different thread.  

On the up side, I did discover while looking at this that semaphores, unlike
mutexes are quite liberal in what contexts they can be used in (i.e. irq
contexts are perfectly legitimate places to call up and down_trylock).  I'll
respin my patch to use that, as its much simpler than a wait queue

Best
Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
  2013-04-29 18:13                                         ` Neil Horman
@ 2013-04-29 19:12                                           ` Bart Van Assche
  2013-04-30 15:35                                           ` [PATCH RFC] netpoll: convert mutex into a semaphore Neil Horman
  1 sibling, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2013-04-29 19:12 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, netdev, mingo

On 04/29/13 20:13, Neil Horman wrote:
> On Sat, Apr 27, 2013 at 08:53:54PM +0200, Bart Van Assche wrote:
>> Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271:
>>
>> kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601]
>> kernel: irq event stamp: 1999192
>> kernel: hardirqs last  enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450
>> kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80
>> kernel: softirqs last  enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280
>> kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0
>> kernel: CPU 0
>> kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G           O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE
>> kernel: RIP: 0010:[<ffffffff8103e8a0>]  [<ffffffff8103e8a0>] console_unlock+0x420/0x450
>> kernel: Call Trace:
>> kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040
>> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>> kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350
>> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>> kernel: [<ffffffff812bd764>] con_write+0x34/0x50
>> kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210
>> kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70
>> kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480
>> kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0
>> kernel: [<ffffffff812a2839>] tty_write+0x159/0x300
>> kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180
>> kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0
>> kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170
>> kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0
>> kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b
>
> So, looking at this, I'm not 100% sure whats going on.  console_unlock has the
> ability to detect the need for rescheduling when we call
> raw_spin_unlock_irqrestore at the bottom of the function, so there should be no
> reason that we don't give the per-cpu watchdog task a chance to run.  About the
> only case I see in which we might be in trouble would be if somehow got into an
> infinite loop if every message in the log buffer had LOG_NOCONS set, in which
> case we would spin in this loop without ever scheduling or re-enabling
> interrupts.  But for that to happen I think you would have to be flooding the
> log with messages that were bigger than the default message buffer size (about
> 1000 bytes without a newline).  You're not doing that are you?

The above call trace was triggered by sending messages at a high rate 
over netconsole where each message has a length between 40 and (about) 
160 characters. I'm not sure though what exactly is causing this call 
trace - I haven't had the time yet to analyze this in detail.

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH RFC] netpoll: convert mutex into a semaphore
  2013-04-29 18:13                                         ` Neil Horman
  2013-04-29 19:12                                           ` Bart Van Assche
@ 2013-04-30 15:35                                           ` Neil Horman
  2013-05-01 19:00                                             ` David Miller
  1 sibling, 1 reply; 34+ messages in thread
From: Neil Horman @ 2013-04-30 15:35 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Bart Van Assche, David Miller

Bart Van Assche recently reported a warning to me:

<IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
[<ffffffff814761dd>] mutex_trylock+0x16d/0x180
[<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
[<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
[<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
[<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
[<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
[<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
[<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
[<ffffffff8103fb76>] console_unlock+0x2d6/0x450
[<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
[<ffffffff8146f9f6>] printk+0x4d/0x4f
[<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]

This resulted from my commit ca99ca14c which introduced a mutex_trylock
operation in a path that could execute in interrupt context.  When mutex
debugging is enabled, the above warns the user when we are in fact
exectuting in interrupt context
interrupt context.

After some discussion, It seems that a semaphore is the proper mechanism to use
here.  While mutexes are defined to be unusable in interrupt context, no such
condition exists for semaphores (save for the fact that the non blocking api
calls, like up and down_trylock must be used when in irq context).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
CC: Bart Van Assche <bvanassche@acm.org>
CC: David Miller <davem@davemloft.net>
CC: netdev@vger.kernel.org
---
 include/linux/netpoll.h |  2 +-
 net/core/netpoll.c      | 14 +++++++-------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 9d7d8c6..fa2cb76 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -40,7 +40,7 @@ struct netpoll_info {
 
 	unsigned long rx_flags;
 	spinlock_t rx_lock;
-	struct mutex dev_lock;
+	struct semaphore dev_lock;
 	struct list_head rx_np; /* netpolls that registered an rx_hook */
 
 	struct sk_buff_head neigh_tx; /* list of neigh requests to reply to */
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 209d842..a5802a8 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -206,17 +206,17 @@ static void netpoll_poll_dev(struct net_device *dev)
 	 * the dev_open/close paths use this to block netpoll activity
 	 * while changing device state
 	 */
-	if (!mutex_trylock(&ni->dev_lock))
+	if (!down_trylock(&ni->dev_lock))
 		return;
 
 	if (!netif_running(dev)) {
-		mutex_unlock(&ni->dev_lock);
+		up(&ni->dev_lock);
 		return;
 	}
 
 	ops = dev->netdev_ops;
 	if (!ops->ndo_poll_controller) {
-		mutex_unlock(&ni->dev_lock);
+		up(&ni->dev_lock);
 		return;
 	}
 
@@ -225,7 +225,7 @@ static void netpoll_poll_dev(struct net_device *dev)
 
 	poll_napi(dev);
 
-	mutex_unlock(&ni->dev_lock);
+	up(&ni->dev_lock);
 
 	if (dev->flags & IFF_SLAVE) {
 		if (ni) {
@@ -255,7 +255,7 @@ int netpoll_rx_disable(struct net_device *dev)
 	idx = srcu_read_lock(&netpoll_srcu);
 	ni = srcu_dereference(dev->npinfo, &netpoll_srcu);
 	if (ni)
-		mutex_lock(&ni->dev_lock);
+		down(&ni->dev_lock);
 	srcu_read_unlock(&netpoll_srcu, idx);
 	return 0;
 }
@@ -267,7 +267,7 @@ void netpoll_rx_enable(struct net_device *dev)
 	rcu_read_lock();
 	ni = rcu_dereference(dev->npinfo);
 	if (ni)
-		mutex_unlock(&ni->dev_lock);
+		up(&ni->dev_lock);
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(netpoll_rx_enable);
@@ -1047,7 +1047,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp)
 		INIT_LIST_HEAD(&npinfo->rx_np);
 
 		spin_lock_init(&npinfo->rx_lock);
-		mutex_init(&npinfo->dev_lock);
+		sema_init(&npinfo->dev_lock, 1);
 		skb_queue_head_init(&npinfo->neigh_tx);
 		skb_queue_head_init(&npinfo->txq);
 		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] netpoll: convert mutex into a semaphore
  2013-04-30 15:35                                           ` [PATCH RFC] netpoll: convert mutex into a semaphore Neil Horman
@ 2013-05-01 19:00                                             ` David Miller
  2013-05-01 19:34                                               ` Neil Horman
  0 siblings, 1 reply; 34+ messages in thread
From: David Miller @ 2013-05-01 19:00 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, bvanassche

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 30 Apr 2013 11:35:05 -0400

> Bart Van Assche recently reported a warning to me:
 ...
> This resulted from my commit ca99ca14c which introduced a mutex_trylock
> operation in a path that could execute in interrupt context.  When mutex
> debugging is enabled, the above warns the user when we are in fact
> exectuting in interrupt context
> interrupt context.
> 
> After some discussion, It seems that a semaphore is the proper mechanism to use
> here.  While mutexes are defined to be unusable in interrupt context, no such
> condition exists for semaphores (save for the fact that the non blocking api
> calls, like up and down_trylock must be used when in irq context).
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Reported-by: Bart Van Assche <bvanassche@acm.org>

Neil this looks good to me so I'm going to toss it into my tree and queue
it up for -stable too, thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] netpoll: convert mutex into a semaphore
  2013-05-01 19:00                                             ` David Miller
@ 2013-05-01 19:34                                               ` Neil Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Neil Horman @ 2013-05-01 19:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, bvanassche

On Wed, May 01, 2013 at 03:00:59PM -0400, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Tue, 30 Apr 2013 11:35:05 -0400
> 
> > Bart Van Assche recently reported a warning to me:
>  ...
> > This resulted from my commit ca99ca14c which introduced a mutex_trylock
> > operation in a path that could execute in interrupt context.  When mutex
> > debugging is enabled, the above warns the user when we are in fact
> > exectuting in interrupt context
> > interrupt context.
> > 
> > After some discussion, It seems that a semaphore is the proper mechanism to use
> > here.  While mutexes are defined to be unusable in interrupt context, no such
> > condition exists for semaphores (save for the fact that the non blocking api
> > calls, like up and down_trylock must be used when in irq context).
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > Reported-by: Bart Van Assche <bvanassche@acm.org>
> 
> Neil this looks good to me so I'm going to toss it into my tree and queue
> it up for -stable too, thanks!
> 
Ok, cool deal.  Thanks!
Neil

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
@ 2013-04-28  2:34 Neil Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-28  2:34 UTC (permalink / raw)
  To: bvba Bart Van Assche; +Cc: David Miller, netdev, mingo

You've got the wrong patch built then.  The netpoll path with my latest patch no longer has any mutexes in it.  This pears to be a separate problem.

Best
Neil

bvba Bart Van Assche <info@bartvanassche.be> wrote:

>On 04/23/13 19:50, Neil Horman wrote:
>> On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote:
>>> From: Neil Horman <nhorman@tuxdriver.com>
>>> Date: Tue, 23 Apr 2013 09:44:43 -0400
>>>
>>>> Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
>>>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
>>>> the mutex solution is reporting a warning, but no actual problems).  So I think
>>>> its safe to wait a bit longer, unless you just want this squared away now.
>>>
>>> If it's just a warning and people aren't actually hitting the
>>> potential deadlock, it can wait.
>>>
>> Copy that. Bart, I'll wait till you get back then.
>
>(Just arrived home)
>
>Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271:
>
>kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601]
>kernel: irq event stamp: 1999192
>kernel: hardirqs last  enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450
>kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80
>kernel: softirqs last  enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280
>kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0
>kernel: CPU 0 
>kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G           O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE
>kernel: RIP: 0010:[<ffffffff8103e8a0>]  [<ffffffff8103e8a0>] console_unlock+0x420/0x450
>kernel: Call Trace:
>kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040
>kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350
>kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>kernel: [<ffffffff812bd764>] con_write+0x34/0x50
>kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210
>kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70
>kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480
>kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0
>kernel: [<ffffffff812a2839>] tty_write+0x159/0x300
>kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180
>kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0
>kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170
>kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0
>kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b
>
>Bart.
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-05-01 19:34 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-11 13:42 Netpoll triggers soft lockup Bart Van Assche
2013-04-11 14:08 ` Neil Horman
2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman
2013-04-11 15:54   ` Christoph Paasch
2013-04-11 17:04     ` Neil Horman
2013-04-11 17:51       ` Christoph Paasch
2013-04-11 15:57   ` Eric Dumazet
2013-04-11 16:56     ` Neil Horman
2013-04-11 17:31   ` Bart Van Assche
2013-04-11 17:52     ` Neil Horman
2013-04-11 19:14     ` Neil Horman
2013-04-12  6:27       ` Bart Van Assche
2013-04-12 11:32         ` Neil Horman
2013-04-12 14:01           ` Bart Van Assche
2013-04-12 18:45             ` Neil Horman
2013-04-13  7:35               ` Bart Van Assche
2013-04-13 12:03                 ` Neil Horman
2013-04-15 14:16                 ` Neil Horman
     [not found]                   ` <CAO+b5-oBfH3M0dnrQSs-p1BF_5hKy2tsU-dD=EP9+S=iqPs5ew@mail.gmail.com>
2013-04-16 17:24                     ` Neil Horman
2013-04-18 19:29                       ` Neil Horman
2013-04-22 20:12                         ` Neil Horman
     [not found]                           ` <CAO+b5-r5jVJNZWuREUH5MQ3baeSPR8fVV1p9pMnukmiZd9nRhg@mail.gmail.com>
2013-04-23 13:23                             ` Neil Horman
     [not found]                               ` <CAO+b5-rQPyO9QE9v+oQTeo+G-ftcsehSB5=63AZ13QW4EJ1X0Q@mail.gmail.com>
2013-04-23 13:44                                 ` Neil Horman
2013-04-23 17:33                                   ` David Miller
2013-04-23 17:50                                     ` Neil Horman
2013-04-27 18:53                                       ` bvba Bart Van Assche
2013-04-29 18:13                                         ` Neil Horman
2013-04-29 19:12                                           ` Bart Van Assche
2013-04-30 15:35                                           ` [PATCH RFC] netpoll: convert mutex into a semaphore Neil Horman
2013-05-01 19:00                                             ` David Miller
2013-05-01 19:34                                               ` Neil Horman
2013-04-19  8:38             ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Ingo Molnar
2013-04-19 12:52               ` Neil Horman
2013-04-28  2:34 Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).