* Netpoll triggers soft lockup @ 2013-04-11 13:42 Bart Van Assche 2013-04-11 14:08 ` Neil Horman 2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman 0 siblings, 2 replies; 34+ messages in thread From: Bart Van Assche @ 2013-04-11 13:42 UTC (permalink / raw) To: Neil Horman, David Miller, netdev Hi, While testing a driver against kernel 3.9-rc6 I ran into a soft lockup triggered by sending lots of kernel messages to a remote system via netconsole. This behavior was probably introduced by commit ca99ca14c ("netpoll: protect napi_poll and poll_controller during dev_[open|close]"). That commit introduced a mutex in netpoll_poll_dev(), which can be called from interrupt context. Is there anyone who can tell me whether this is a bug in commit ca99ca14c or in netconsole ? Read(10):------------[ cut here ]------------ WARNING: at kernel/mutex.c:434 mutex_trylock+0x16d/0x180() Hardware name: P5Q DELUXE Modules linked in: ib_srp scsi_transport_srp dm_mod qla2x00tgt(O) qla2xxx_scst(O) scsi_transport_fc iscsi_scst(O) ib_srpt(O) scst_vdisk(O) libcrc32c scst(O) crc32c brd netconsole configfs rdma_ucm rdma_cm iw_cm ib_addr scsi_tgt af_packet snd_hda_codec_hdmi snd_hda_codec_analog ib_ipoib ib_cm ib_uverbs ib_umad snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer mlx4_ib ib_sa ib_mad cpufreq_conservative ib_core cpufreq_userspace cpufreq_powersave snd_seq_device snd mlx4_core sr_mod skge soundcore pcspkr acpi_cpufreq button ehci_pci sg i2c_i801 mperf snd_page_alloc cdrom microcode autofs4 ext3 jbd mbcache sd_mod crc_t10dif radeon uhci_hcd ttm drm_kms_helper ehci_hcd drm i2c_algo_bit i2c_core intel_agp intel_gtt agpgart usbcore usb_common processor thermal_sys hwmon scsi_dh_alua scsi_dh ata_generic ata_piix ahci libahci pata_marvell libata scsi_mod [last unloaded: scsi_transport_srp] Pid: 178, comm: kworker/0:1H Tainted: G O 3.9.0-rc6-debug+ #0 Call Trace: <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 [<ffffffff8146f9f6>] printk+0x4d/0x4f [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] [<ffffffffa000b924>] scsi_io_completion+0x294/0x6c0 [scsi_mod] [<ffffffffa000113d>] scsi_finish_command+0xbd/0x120 [scsi_mod] [<ffffffffa000b58f>] scsi_softirq_done+0x13f/0x160 [scsi_mod] [<ffffffff8121dbc0>] blk_done_softirq+0x80/0xa0 [<ffffffff81046e81>] ? __do_softirq+0xb1/0x3c0 [<ffffffff81046ed1>] __do_softirq+0x101/0x3c0 [<ffffffff810cc649>] ? handle_irq_event+0x59/0x80 [<ffffffff81047355>] irq_exit+0xb5/0xc0 [<ffffffff81484cd3>] do_IRQ+0x63/0xe0 [<ffffffff8147a82f>] common_interrupt+0x6f/0x6f <EOI> [<ffffffff810a1ea2>] ? mark_held_locks+0xb2/0x130 [<ffffffff8147a46a>] ? _raw_spin_unlock_irq+0x3a/0x50 [<ffffffff8147a460>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff812148e5>] blk_delay_work+0x35/0x40 [<ffffffff8106135d>] process_one_work+0x1fd/0x650 [<ffffffff810612f2>] ? process_one_work+0x192/0x650 [<ffffffff81061b50>] worker_thread+0x110/0x380 [<ffffffff81061a40>] ? rescuer_thread+0x250/0x250 [<ffffffff81067d6b>] kthread+0xdb/0xe0 [<ffffffff81067c90>] ? kthread_create_on_node+0x140/0x140 [<ffffffff8148345c>] ret_from_fork+0x7c/0xb0 [<ffffffff81067c90>] ? kthread_create_on_node+0x140/0x140 ---[ end trace e3e3a22d8bb51cb7 ]--- Thanks, Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Netpoll triggers soft lockup 2013-04-11 13:42 Netpoll triggers soft lockup Bart Van Assche @ 2013-04-11 14:08 ` Neil Horman 2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman 1 sibling, 0 replies; 34+ messages in thread From: Neil Horman @ 2013-04-11 14:08 UTC (permalink / raw) To: Bart Van Assche; +Cc: David Miller, netdev On Thu, Apr 11, 2013 at 03:42:02PM +0200, Bart Van Assche wrote: > Hi, > > While testing a driver against kernel 3.9-rc6 I ran into a soft > lockup triggered by sending lots of kernel messages to a remote > system via netconsole. This behavior was probably introduced by > commit ca99ca14c ("netpoll: protect napi_poll and poll_controller > during dev_[open|close]"). That commit introduced a mutex in > netpoll_poll_dev(), which can be called from interrupt context. Is > there anyone who can tell me whether this is a bug in commit > ca99ca14c or in netconsole ? > Thanks for the report Bart, thats interesting. You're right, the problem was introduced by ca99ca14c, but it seems like a false positive to me. The warning is comming from DEBUG_LOCKS_WARN_ON in spin_lock_mutex. It appears that the intent here is to catch users of spin_lock_mutex (which is the mutex api), from trying to use mutex locking in interrupt context as that results in a panic. In this case however, we're using the mutex_trylock call, which is guaranteed not to sleep (which is the nature of the panic condition). So we really don't need to warn here. I think the right thing to do in this case is to add a flag to spin_lock_mutex that skips the warning if we're doing a trylock operation. I'll work up a patch shortly. Thanks again! Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 13:42 Netpoll triggers soft lockup Bart Van Assche 2013-04-11 14:08 ` Neil Horman @ 2013-04-11 15:18 ` Neil Horman 2013-04-11 15:54 ` Christoph Paasch ` (2 more replies) 1 sibling, 3 replies; 34+ messages in thread From: Neil Horman @ 2013-04-11 15:18 UTC (permalink / raw) To: Bart Van Assche; +Cc: Neil Horman, David Miller, netdev Bart, this patch should fix your problem. Could you please test it and confirm? Bart Van Assche recently reported a warning to me: <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 [<ffffffff8146f9f6>] printk+0x4d/0x4f [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] This resulted from my commit ca99ca14c which introduced a mutex_trylock operation in a path that could execute in interrupt context. When mutex debugging is enabled, the above warns the user when we are in fact exectuting in interrupt context. I think this is a false positive however. The check is intended to catch users who might be issuing sleeping calls in irq context, but the use of mutex_trylock here is guaranteed not to sleep. We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex with a __might_sleep call in the appropriate parent mutex operations, but for the sake of effiency (which It seems is why the check was put in the spin lock code only when debug is enabled), lets split the spin_lock_mutex call into two components, where the outer component does the debug checking. Then mutex_trylock can just call the inner part as its callable from irq context safely. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Bart Van Assche <bvanassche@acm.org> CC: Bart Van Assche <bvanassche@acm.org> CC: David Miller <davem@davemloft.net> CC: netdev@vger.kernel.org --- kernel/mutex-debug.h | 11 +++++++++-- kernel/mutex.c | 4 ++-- kernel/mutex.h | 2 ++ 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/kernel/mutex-debug.h b/kernel/mutex-debug.h index 0799fd3..bd0d80a 100644 --- a/kernel/mutex-debug.h +++ b/kernel/mutex-debug.h @@ -37,13 +37,20 @@ static inline void mutex_clear_owner(struct mutex *lock) lock->owner = NULL; } -#define spin_lock_mutex(lock, flags) \ +#define spin_lock_mutex_raw(lock, flags) \ do { \ struct mutex *l = container_of(lock, struct mutex, wait_lock); \ \ - DEBUG_LOCKS_WARN_ON(in_interrupt()); \ local_irq_save(flags); \ arch_spin_lock(&(lock)->rlock.raw_lock);\ + } while (0) + +#define spin_lock_mutex(lock, flags) \ + do { \ + struct mutex *l = container_of(lock, struct mutex, wait_lock); \ + \ + DEBUG_LOCKS_WARN_ON(in_interrupt()); \ + spin_lock_mutex_raw(lock, flags); \ DEBUG_LOCKS_WARN_ON(l->magic != l); \ } while (0) diff --git a/kernel/mutex.c b/kernel/mutex.c index 52f2301..5c012c7 100644 --- a/kernel/mutex.c +++ b/kernel/mutex.c @@ -431,7 +431,7 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count) unsigned long flags; int prev; - spin_lock_mutex(&lock->wait_lock, flags); + spin_lock_mutex_raw(&lock->wait_lock, flags); prev = atomic_xchg(&lock->count, -1); if (likely(prev == 1)) { @@ -443,7 +443,7 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count) if (likely(list_empty(&lock->wait_list))) atomic_set(&lock->count, 0); - spin_unlock_mutex(&lock->wait_lock, flags); + spin_unlock_mutex_raw(&lock->wait_lock, flags); return prev == 1; } diff --git a/kernel/mutex.h b/kernel/mutex.h index 4115fbf..af6ffb4 100644 --- a/kernel/mutex.h +++ b/kernel/mutex.h @@ -11,6 +11,8 @@ #define spin_lock_mutex(lock, flags) \ do { spin_lock(lock); (void)(flags); } while (0) +#define spin_lock_mutex_raw(lock, flags) spin_lock_mutex((lock), (flags)) + #define spin_unlock_mutex(lock, flags) \ do { spin_unlock(lock); (void)(flags); } while (0) #define mutex_remove_waiter(lock, waiter, ti) \ -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman @ 2013-04-11 15:54 ` Christoph Paasch 2013-04-11 17:04 ` Neil Horman 2013-04-11 15:57 ` Eric Dumazet 2013-04-11 17:31 ` Bart Van Assche 2 siblings, 1 reply; 34+ messages in thread From: Christoph Paasch @ 2013-04-11 15:54 UTC (permalink / raw) To: Neil Horman; +Cc: Bart Van Assche, David Miller, netdev On Thursday 11 April 2013 11:18:06 Neil Horman wrote: > Bart, this patch should fix your problem. Could you please test it and > confirm? > > Bart Van Assche recently reported a warning to me: > > <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 > [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 > [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 > [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 > [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 > [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 > [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 > [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] > [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] > [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 > [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 > [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 > [<ffffffff8146f9f6>] printk+0x4d/0x4f > [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] > > This resulted from my commit ca99ca14c which introduced a mutex_trylock > operation in a path that could execute in interrupt context. When mutex > debugging is enabled, the above warns the user when we are in fact > exectuting in interrupt context. > > I think this is a false positive however. The check is intended to catch > users who might be issuing sleeping calls in irq context, but the use of > mutex_trylock here is guaranteed not to sleep. Even if he does not sleep, may we still hit a deadlock like the following: netpoll_rx_disable() calls mutex_lock(), who ends up in __mutex_lock_common, calling spin_lock_mutex(). Immediatly after that, on the same CPU, the interrupt comes and netpoll_poll_dev calls mutex_trylock and ends up also calling spin_lock_mutex(). Now, it seems to me that we are deadlocked - the interrupt is spinning on the lock, because netpoll_rx_disable() already took it. Or maybe I am missing something? Cheers, Christoph -- IP Networking Lab --- http://inl.info.ucl.ac.be MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org UCLouvain -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 15:54 ` Christoph Paasch @ 2013-04-11 17:04 ` Neil Horman 2013-04-11 17:51 ` Christoph Paasch 0 siblings, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-11 17:04 UTC (permalink / raw) To: g; +Cc: Bart Van Assche, David Miller, netdev On Thu, Apr 11, 2013 at 05:54:28PM +0200, Christoph Paasch wrote: > On Thursday 11 April 2013 11:18:06 Neil Horman wrote: > > Bart, this patch should fix your problem. Could you please test it and > > confirm? > > > > Bart Van Assche recently reported a warning to me: > > > > <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 > > [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 > > [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 > > [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 > > [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 > > [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 > > [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 > > [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] > > [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] > > [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 > > [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 > > [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 > > [<ffffffff8146f9f6>] printk+0x4d/0x4f > > [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] > > > > This resulted from my commit ca99ca14c which introduced a mutex_trylock > > operation in a path that could execute in interrupt context. When mutex > > debugging is enabled, the above warns the user when we are in fact > > exectuting in interrupt context. > > > > I think this is a false positive however. The check is intended to catch > > users who might be issuing sleeping calls in irq context, but the use of > > mutex_trylock here is guaranteed not to sleep. > > Even if he does not sleep, may we still hit a deadlock like the following: > > netpoll_rx_disable() calls mutex_lock(), who ends up in __mutex_lock_common, > calling spin_lock_mutex(). > > Immediatly after that, on the same CPU, the interrupt comes and > netpoll_poll_dev calls mutex_trylock and ends up also calling > spin_lock_mutex(). Now, it seems to me that we are deadlocked - the interrupt > is spinning on the lock, because netpoll_rx_disable() already took it. > > Or maybe I am missing something? > I dont believe the deadlock you describe can happen. The spin_lock_mutex operation disables irqs on the local cpu with local_irq_save, so we won't loose the cpu while we're holding the spinlock. Likewise we don't restore the irq flags until after we release said lock. Once we have the mutex, if we're preempted by another path that goes through the netpoll_poll_dev path, then we hit the trylock api call. The spinlock is either released or held on another cpu (read: no deadlock), and if the mutex is held, then the trylock simply fails. Thanks Neil > > Cheers, > Christoph > > -- > IP Networking Lab --- http://inl.info.ucl.ac.be > MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org > UCLouvain > -- > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 17:04 ` Neil Horman @ 2013-04-11 17:51 ` Christoph Paasch 0 siblings, 0 replies; 34+ messages in thread From: Christoph Paasch @ 2013-04-11 17:51 UTC (permalink / raw) To: Neil Horman; +Cc: g, Bart Van Assche, David Miller, netdev On Thursday 11 April 2013 13:04:05 Neil Horman wrote: > I dont believe the deadlock you describe can happen. The spin_lock_mutex > operation disables irqs on the local cpu with local_irq_save, so we won't > loose the cpu while we're holding the spinlock. Likewise we don't restore > the irq flags until after we release said lock. Once we have the mutex, if > we're preempted by another path that goes through the netpoll_poll_dev > path, then we hit the trylock api call. The spinlock is either released or > held on another cpu (read: no deadlock), and if the mutex is held, then the > trylock simply fails. Thanks for explaining this to me. I was looking for a local_bh_disable() - but local_irq_save does the job too. Thanks, Christoph -- IP Networking Lab --- http://inl.info.ucl.ac.be MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org UCLouvain -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman 2013-04-11 15:54 ` Christoph Paasch @ 2013-04-11 15:57 ` Eric Dumazet 2013-04-11 16:56 ` Neil Horman 2013-04-11 17:31 ` Bart Van Assche 2 siblings, 1 reply; 34+ messages in thread From: Eric Dumazet @ 2013-04-11 15:57 UTC (permalink / raw) To: Neil Horman; +Cc: Bart Van Assche, David Miller, netdev On Thu, 2013-04-11 at 11:18 -0400, Neil Horman wrote: > Bart, this patch should fix your problem. Could you please test it and confirm? > > Bart Van Assche recently reported a warning to me: > > <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 > [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 > [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 > [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 > [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 > [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 > [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 > [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] > [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] > [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 > [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 > [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 > [<ffffffff8146f9f6>] printk+0x4d/0x4f > [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] > > This resulted from my commit ca99ca14c which introduced a mutex_trylock > operation in a path that could execute in interrupt context. When mutex > debugging is enabled, the above warns the user when we are in fact exectuting in > interrupt context. > > I think this is a false positive however. The check is intended to catch users > who might be issuing sleeping calls in irq context, but the use of mutex_trylock > here is guaranteed not to sleep. > > We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex > with a __might_sleep call in the appropriate parent mutex operations, but for > the sake of effiency (which It seems is why the check was put in the spin lock > code only when debug is enabled), lets split the spin_lock_mutex call into two > components, where the outer component does the debug checking. Then > mutex_trylock can just call the inner part as its callable from irq context > safely. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > Reported-by: Bart Van Assche <bvanassche@acm.org> > CC: Bart Van Assche <bvanassche@acm.org> > CC: David Miller <davem@davemloft.net> > CC: netdev@vger.kernel.org Please post your patch to lkml ? Thanks ! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 15:57 ` Eric Dumazet @ 2013-04-11 16:56 ` Neil Horman 0 siblings, 0 replies; 34+ messages in thread From: Neil Horman @ 2013-04-11 16:56 UTC (permalink / raw) To: Eric Dumazet; +Cc: Bart Van Assche, David Miller, netdev On Thu, Apr 11, 2013 at 08:57:01AM -0700, Eric Dumazet wrote: > On Thu, 2013-04-11 at 11:18 -0400, Neil Horman wrote: > > Bart, this patch should fix your problem. Could you please test it and confirm? > > > > Bart Van Assche recently reported a warning to me: > > > > <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 > > [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 > > [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 > > [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 > > [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 > > [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 > > [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 > > [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] > > [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] > > [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 > > [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 > > [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 > > [<ffffffff8146f9f6>] printk+0x4d/0x4f > > [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] > > > > This resulted from my commit ca99ca14c which introduced a mutex_trylock > > operation in a path that could execute in interrupt context. When mutex > > debugging is enabled, the above warns the user when we are in fact exectuting in > > interrupt context. > > > > I think this is a false positive however. The check is intended to catch users > > who might be issuing sleeping calls in irq context, but the use of mutex_trylock > > here is guaranteed not to sleep. > > > > We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex > > with a __might_sleep call in the appropriate parent mutex operations, but for > > the sake of effiency (which It seems is why the check was put in the spin lock > > code only when debug is enabled), lets split the spin_lock_mutex call into two > > components, where the outer component does the debug checking. Then > > mutex_trylock can just call the inner part as its callable from irq context > > safely. > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > Reported-by: Bart Van Assche <bvanassche@acm.org> > > CC: Bart Van Assche <bvanassche@acm.org> > > CC: David Miller <davem@davemloft.net> > > CC: netdev@vger.kernel.org > > Please post your patch to lkml ? > I will, I just wanted Bart to test it first, and CCed everyone that he sent his origional note too. Thanks Neil > Thanks ! > > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman 2013-04-11 15:54 ` Christoph Paasch 2013-04-11 15:57 ` Eric Dumazet @ 2013-04-11 17:31 ` Bart Van Assche 2013-04-11 17:52 ` Neil Horman 2013-04-11 19:14 ` Neil Horman 2 siblings, 2 replies; 34+ messages in thread From: Bart Van Assche @ 2013-04-11 17:31 UTC (permalink / raw) To: Neil Horman; +Cc: David Miller, netdev On 04/11/13 17:18, Neil Horman wrote: > Bart, this patch should fix your problem. Could you please test it and confirm? The fails to build on my setup after having applied that patch: kernel/mutex.c: In function ‘__mutex_trylock_slowpath’: kernel/mutex.c:446:2: error: implicit declaration of function ‘spin_unlock_mutex_raw’ [-Werror=implicit-function-declaration] Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 17:31 ` Bart Van Assche @ 2013-04-11 17:52 ` Neil Horman 2013-04-11 19:14 ` Neil Horman 1 sibling, 0 replies; 34+ messages in thread From: Neil Horman @ 2013-04-11 17:52 UTC (permalink / raw) To: Bart Van Assche; +Cc: David Miller, netdev On Thu, Apr 11, 2013 at 07:31:07PM +0200, Bart Van Assche wrote: > On 04/11/13 17:18, Neil Horman wrote: > >Bart, this patch should fix your problem. Could you please test it and confirm? > > The fails to build on my setup after having applied that patch: > > kernel/mutex.c: In function ‘__mutex_trylock_slowpath’: > kernel/mutex.c:446:2: error: implicit declaration of function > ‘spin_unlock_mutex_raw’ [-Werror=implicit-function-declaration] > Strage, not sure why, although I didn't explicitly test it myself. I'll take a look this evening and fix it up for you. Neil > Bart. > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 17:31 ` Bart Van Assche 2013-04-11 17:52 ` Neil Horman @ 2013-04-11 19:14 ` Neil Horman 2013-04-12 6:27 ` Bart Van Assche 1 sibling, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-11 19:14 UTC (permalink / raw) To: Bart Van Assche; +Cc: David Miller, netdev On Thu, Apr 11, 2013 at 07:31:07PM +0200, Bart Van Assche wrote: > On 04/11/13 17:18, Neil Horman wrote: > >Bart, this patch should fix your problem. Could you please test it and confirm? > > The fails to build on my setup after having applied that patch: > > kernel/mutex.c: In function ‘__mutex_trylock_slowpath’: > kernel/mutex.c:446:2: error: implicit declaration of function > ‘spin_unlock_mutex_raw’ [-Werror=implicit-function-declaration] > > Bart. > Sorry, turned out to be a stupid typo on my part. This is a fixed version, I've verified that it builds with and without CONFIG_DEBUG_MUTEXS enabled commit 2f78b4659f119302849f2b753cc1fe5ee87125bc Author: Neil Horman <nhorman@tuxdriver.com> Date: Thu Apr 11 11:05:12 2013 -0400 spinlock: split out debugging check from spin_lock_mutex Bart, this patch should fix your problem. Could you please test it and confirm? Bart Van Assche recently reported a warning to me: <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 [<ffffffff8146f9f6>] printk+0x4d/0x4f [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] This resulted from my commit ca99ca14c which introduced a mutex_trylock operation in a path that could execute in interrupt context. When mutex debugging is enabled, the above warns the user when we are in fact exectuting in interrupt context. I think this is a false positive however. The check is intended to catch users who might be issuing sleeping calls in irq context, but the use of mutex_trylock here is guaranteed not to sleep. We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex with a __might_sleep call in the appropriate parent mutex operations, but for the sake of effiency (which It seems is why the check was put in the spin lock code only when debug is enabled), lets split the spin_lock_mutex call into two components, where the outer component does the debug checking. Then mutex_trylock can just call the inner part as its callable from irq context safely. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Bart Van Assche <bvanassche@acm.org> CC: Bart Van Assche <bvanassche@acm.org> CC: David Miller <davem@davemloft.net> CC: netdev@vger.kernel.org diff --git a/kernel/mutex-debug.h b/kernel/mutex-debug.h index 0799fd3..aac0858 100644 --- a/kernel/mutex-debug.h +++ b/kernel/mutex-debug.h @@ -37,13 +37,19 @@ static inline void mutex_clear_owner(struct mutex *lock) lock->owner = NULL; } +#define spin_lock_mutex_raw(lock, flags) \ + do { \ + \ + local_irq_save(flags); \ + arch_spin_lock(&(lock)->rlock.raw_lock);\ + } while (0) + #define spin_lock_mutex(lock, flags) \ do { \ struct mutex *l = container_of(lock, struct mutex, wait_lock); \ \ DEBUG_LOCKS_WARN_ON(in_interrupt()); \ - local_irq_save(flags); \ - arch_spin_lock(&(lock)->rlock.raw_lock);\ + spin_lock_mutex_raw(lock, flags); \ DEBUG_LOCKS_WARN_ON(l->magic != l); \ } while (0) diff --git a/kernel/mutex.c b/kernel/mutex.c index 52f2301..a62a5f8 100644 --- a/kernel/mutex.c +++ b/kernel/mutex.c @@ -431,7 +431,7 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count) unsigned long flags; int prev; - spin_lock_mutex(&lock->wait_lock, flags); + spin_lock_mutex_raw(&lock->wait_lock, flags); prev = atomic_xchg(&lock->count, -1); if (likely(prev == 1)) { diff --git a/kernel/mutex.h b/kernel/mutex.h index 4115fbf..af6ffb4 100644 --- a/kernel/mutex.h +++ b/kernel/mutex.h @@ -11,6 +11,8 @@ #define spin_lock_mutex(lock, flags) \ do { spin_lock(lock); (void)(flags); } while (0) +#define spin_lock_mutex_raw(lock, flags) spin_lock_mutex((lock), (flags)) + #define spin_unlock_mutex(lock, flags) \ do { spin_unlock(lock); (void)(flags); } while (0) #define mutex_remove_waiter(lock, waiter, ti) \ ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-11 19:14 ` Neil Horman @ 2013-04-12 6:27 ` Bart Van Assche 2013-04-12 11:32 ` Neil Horman 0 siblings, 1 reply; 34+ messages in thread From: Bart Van Assche @ 2013-04-12 6:27 UTC (permalink / raw) To: Neil Horman; +Cc: David Miller, netdev, Ingo Molnar On 04/11/13 21:14, Neil Horman wrote: > This resulted from my commit ca99ca14c which introduced a mutex_trylock > operation in a path that could execute in interrupt context. When mutex > debugging is enabled, the above warns the user when we are in fact exectuting in > interrupt context. > > I think this is a false positive however. The check is intended to catch users > who might be issuing sleeping calls in irq context, but the use of mutex_trylock > here is guaranteed not to sleep. > > We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex > with a __might_sleep call in the appropriate parent mutex operations, but for > the sake of effiency (which It seems is why the check was put in the spin lock > code only when debug is enabled), lets split the spin_lock_mutex call into two > components, where the outer component does the debug checking. Then > mutex_trylock can just call the inner part as its callable from irq context > safely. Sorry but I'm not yet convinced that it's safe to invoke mutex_trylock() from IRQ context. Please have a look at the implementation of mutex_set_owner(), which is invoked by mutex_trylock(). mutex_set_owner() stores the value of the "current" pointer into lock->owner. The value of "current" does not have a meaning in IRQ context. Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-12 6:27 ` Bart Van Assche @ 2013-04-12 11:32 ` Neil Horman 2013-04-12 14:01 ` Bart Van Assche 0 siblings, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-12 11:32 UTC (permalink / raw) To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar On Fri, Apr 12, 2013 at 08:27:31AM +0200, Bart Van Assche wrote: > On 04/11/13 21:14, Neil Horman wrote: > > This resulted from my commit ca99ca14c which introduced a mutex_trylock > > operation in a path that could execute in interrupt context. When mutex > > debugging is enabled, the above warns the user when we are in fact exectuting in > > interrupt context. > > > > I think this is a false positive however. The check is intended to catch users > > who might be issuing sleeping calls in irq context, but the use of mutex_trylock > > here is guaranteed not to sleep. > > > > We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex > > with a __might_sleep call in the appropriate parent mutex operations, but for > > the sake of effiency (which It seems is why the check was put in the spin lock > > code only when debug is enabled), lets split the spin_lock_mutex call into two > > components, where the outer component does the debug checking. Then > > mutex_trylock can just call the inner part as its callable from irq context > > safely. > > Sorry but I'm not yet convinced that it's safe to invoke > mutex_trylock() from IRQ context. Please have a look at the > implementation of mutex_set_owner(), which is invoked by > mutex_trylock(). mutex_set_owner() stores the value of the "current" > pointer into lock->owner. The value of "current" does not have a > meaning in IRQ context. > > Bart. > > Thats irrelevant, at least as far as deadlock safety is concerned. current will be set to the process tht was running when we were interrupted, but it won't change during the course of the irq handler, which is all that matters. The lock->owner field is used for optimistic spinning. The worst that will happen is, if CONFIG_MUTEX_SPIN_ON_OWNER is configured, another process may wait on this mutex, spinning on the wrong task to release it (see mutex_spin_on_owner). Thats not efficient, but its not deadlock prone, and its not even that inefficient, when you consider that the critical path in the netpoll code is relatively short. And using the trylock here is certainly preferable to the memory corruption that was possible previously. Regards Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-12 11:32 ` Neil Horman @ 2013-04-12 14:01 ` Bart Van Assche 2013-04-12 18:45 ` Neil Horman 2013-04-19 8:38 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Ingo Molnar 0 siblings, 2 replies; 34+ messages in thread From: Bart Van Assche @ 2013-04-12 14:01 UTC (permalink / raw) To: Neil Horman; +Cc: David Miller, netdev, Ingo Molnar On 04/12/13 13:32, Neil Horman wrote: > On Fri, Apr 12, 2013 at 08:27:31AM +0200, Bart Van Assche wrote: >> On 04/11/13 21:14, Neil Horman wrote: >>> This resulted from my commit ca99ca14c which introduced a mutex_trylock >>> operation in a path that could execute in interrupt context. When mutex >>> debugging is enabled, the above warns the user when we are in fact exectuting in >>> interrupt context. >>> >>> I think this is a false positive however. The check is intended to catch users >>> who might be issuing sleeping calls in irq context, but the use of mutex_trylock >>> here is guaranteed not to sleep. >>> >>> We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex >>> with a __might_sleep call in the appropriate parent mutex operations, but for >>> the sake of effiency (which It seems is why the check was put in the spin lock >>> code only when debug is enabled), lets split the spin_lock_mutex call into two >>> components, where the outer component does the debug checking. Then >>> mutex_trylock can just call the inner part as its callable from irq context >>> safely. >> >> Sorry but I'm not yet convinced that it's safe to invoke >> mutex_trylock() from IRQ context. Please have a look at the >> implementation of mutex_set_owner(), which is invoked by >> mutex_trylock(). mutex_set_owner() stores the value of the "current" >> pointer into lock->owner. The value of "current" does not have a >> meaning in IRQ context. > > Thats irrelevant, at least as far as deadlock safety is concerned. current will > be set to the process that was running when we were interrupted, but it won't > change during the course of the irq handler, which is all that matters. The > lock->owner field is used for optimistic spinning. The worst that will happen > is, if CONFIG_MUTEX_SPIN_ON_OWNER is configured, another process may wait on > this mutex, spinning on the wrong task to release it (see mutex_spin_on_owner). > Thats not efficient, but its not deadlock prone, and its not even that > inefficient, when you consider that the critical path in the netpoll code is > relatively short. And using the trylock here is certainly preferable to the > memory corruption that was possible previously. I think there is another issue with invoking mutex_trylock() and mutex_unlock() from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled __mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe. Any thoughts about that ? With v2 of your patch and CONFIG_DEBUG_MUTEXES enabled I get the warning below: ------------[ cut here ]------------ WARNING: at kernel/mutex.c:313 __mutex_unlock_slowpath+0x157/0x160() Pid: 181, comm: kworker/0:1H Tainted: G O 3.9.0-rc6-debug+ #1 Call Trace: <IRQ> [<ffffffff8103c3ef>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103c44a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81432047>] __mutex_unlock_slowpath+0x157/0x160 [<ffffffff8143205e>] mutex_unlock+0xe/0x10 [<ffffffff8136d031>] netpoll_poll_dev+0x111/0x9a0 [<ffffffff81345f32>] ? __alloc_skb+0x82/0x2a0 [<ffffffff8136dac5>] netpoll_send_skb_on_dev+0x205/0x3b0 [<ffffffff8136e00a>] netpoll_send_udp+0x28a/0x3a0 [<ffffffffa0524843>] ? write_msg+0x53/0x110 [netconsole] [<ffffffffa05248bf>] write_msg+0xcf/0x110 [netconsole] [<ffffffff8103d7f1>] call_console_drivers.constprop.16+0xa1/0x120 [<ffffffff8103e848>] console_unlock+0x3f8/0x450 [<ffffffff8103ecce>] vprintk_emit+0x1ee/0x510 [<ffffffff812d1f2c>] dev_vprintk_emit+0x5c/0x70 [<ffffffff810ff047>] ? mempool_free_slab+0x17/0x20 [<ffffffff810ff047>] ? mempool_free_slab+0x17/0x20 [<ffffffff81145922>] ? kmem_cache_free+0x1c2/0x1d0 [<ffffffff812d1f79>] dev_printk_emit+0x39/0x40 [<ffffffff811f6702>] ? blk_update_request+0x3d2/0x520 [<ffffffffa000a110>] ? device_block+0x10/0x10 [scsi_mod] [<ffffffff812d2a7e>] __dev_printk+0x5e/0x90 [<ffffffff812d2e05>] dev_printk+0x45/0x50 [<ffffffffa000b5a7>] scsi_io_completion+0x277/0x6c0 [scsi_mod] [<ffffffffa000107d>] scsi_finish_command+0xbd/0x120 [scsi_mod] [<ffffffffa000b22f>] scsi_softirq_done+0x13f/0x160 [scsi_mod] [<ffffffff811fd4c0>] blk_done_softirq+0x80/0xa0 [<ffffffff81044d61>] __do_softirq+0x101/0x280 [<ffffffff81045095>] irq_exit+0xb5/0xc0 [<ffffffff8143f2be>] smp_apic_timer_interrupt+0x6e/0x99 [<ffffffff8143e5ef>] apic_timer_interrupt+0x6f/0x80 <EOI> [<ffffffff810986b2>] ? mark_held_locks+0xb2/0x130 [<ffffffff8143516a>] ? _raw_spin_unlock_irq+0x3a/0x50 [<ffffffff81435160>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff8120b074>] cfq_kick_queue+0x44/0x50 [<ffffffff81059e5d>] process_one_work+0x1fd/0x510 [<ffffffff81059df2>] ? process_one_work+0x192/0x510 [<ffffffff8105bccf>] worker_thread+0x10f/0x380 [<ffffffff8105bbc0>] ? busy_worker_rebind_fn+0xb0/0xb0 [<ffffffff8106209b>] kthread+0xdb/0xe0 [<ffffffff81061fc0>] ? kthread_create_on_node+0x140/0x140 [<ffffffff8143d95c>] ret_from_fork+0x7c/0xb0 [<ffffffff81061fc0>] ? kthread_create_on_node+0x140/0x140 ---[ end trace dd7421d6dfb2c4ed ]--- Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-12 14:01 ` Bart Van Assche @ 2013-04-12 18:45 ` Neil Horman 2013-04-13 7:35 ` Bart Van Assche 2013-04-19 8:38 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Ingo Molnar 1 sibling, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-12 18:45 UTC (permalink / raw) To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar On Fri, Apr 12, 2013 at 04:01:04PM +0200, Bart Van Assche wrote: > On 04/12/13 13:32, Neil Horman wrote: > > On Fri, Apr 12, 2013 at 08:27:31AM +0200, Bart Van Assche wrote: > >> On 04/11/13 21:14, Neil Horman wrote: > >>> This resulted from my commit ca99ca14c which introduced a mutex_trylock > >>> operation in a path that could execute in interrupt context. When mutex > >>> debugging is enabled, the above warns the user when we are in fact exectuting in > >>> interrupt context. > >>> > >>> I think this is a false positive however. The check is intended to catch users > >>> who might be issuing sleeping calls in irq context, but the use of mutex_trylock > >>> here is guaranteed not to sleep. > >>> > >>> We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex > >>> with a __might_sleep call in the appropriate parent mutex operations, but for > >>> the sake of effiency (which It seems is why the check was put in the spin lock > >>> code only when debug is enabled), lets split the spin_lock_mutex call into two > >>> components, where the outer component does the debug checking. Then > >>> mutex_trylock can just call the inner part as its callable from irq context > >>> safely. > >> > >> Sorry but I'm not yet convinced that it's safe to invoke > >> mutex_trylock() from IRQ context. Please have a look at the > >> implementation of mutex_set_owner(), which is invoked by > >> mutex_trylock(). mutex_set_owner() stores the value of the "current" > >> pointer into lock->owner. The value of "current" does not have a > >> meaning in IRQ context. > > > > Thats irrelevant, at least as far as deadlock safety is concerned. current will > > be set to the process that was running when we were interrupted, but it won't > > change during the course of the irq handler, which is all that matters. The > > lock->owner field is used for optimistic spinning. The worst that will happen > > is, if CONFIG_MUTEX_SPIN_ON_OWNER is configured, another process may wait on > > this mutex, spinning on the wrong task to release it (see mutex_spin_on_owner). > > Thats not efficient, but its not deadlock prone, and its not even that > > inefficient, when you consider that the critical path in the netpoll code is > > relatively short. And using the trylock here is certainly preferable to the > > memory corruption that was possible previously. > > I think there is another issue with invoking mutex_trylock() and mutex_unlock() > from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled > __mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and > hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe. > Any thoughts about that ? > Yeah, its ugly, but in this specific case, its ok. the netpoll code (in netpoll_send_skb disables irq on the local cpu before entering the netpoll code path any further, so whenver we frob this mutex from the local cpu, we're guaranteed not to get pre-empted by an irq. > With v2 of your patch and CONFIG_DEBUG_MUTEXES enabled I get the warning below: > another instance of the same issue - its because we spin_lock_mutex in mutex_unlock, rather than spin_lock_mutex_raw. Can you fix it up by hand, or shall I update the patch? Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-12 18:45 ` Neil Horman @ 2013-04-13 7:35 ` Bart Van Assche 2013-04-13 12:03 ` Neil Horman 2013-04-15 14:16 ` Neil Horman 0 siblings, 2 replies; 34+ messages in thread From: Bart Van Assche @ 2013-04-13 7:35 UTC (permalink / raw) To: Neil Horman; +Cc: David Miller, netdev, Ingo Molnar On 04/12/13 20:45, Neil Horman wrote: > On Fri, Apr 12, 2013 at 04:01:04PM +0200, Bart Van Assche wrote: >> On 04/12/13 13:32, Neil Horman wrote: >> I think there is another issue with invoking mutex_trylock() and mutex_unlock() >> from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled >> __mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and >> hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe. >> Any thoughts about that ? >> > Yeah, its ugly, but in this specific case, its ok. the netpoll code (in > netpoll_send_skb disables irq on the local cpu before entering the netpoll code > path any further, so whenver we frob this mutex from the local cpu, we're > guaranteed not to get pre-empted by an irq. As far as I know it is neither allowed nor safe to call netpoll_rx_disable() with IRQs disabled. But that function can lock the dev_lock mutex. What do you think will happen with CONFIG_DEBUG_MUTEXES=n if an interrupt occurs during the mutex_lock(&ni->dev_lock) call, that mutex_lock() call has already locked the mutex-internal spin lock via spin_lock() and mutex_trylock() is invoked from inside the interrupt ? Can that result in anything else than deadlock and "CPU stuck" messages ? Thanks, Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-13 7:35 ` Bart Van Assche @ 2013-04-13 12:03 ` Neil Horman 2013-04-15 14:16 ` Neil Horman 1 sibling, 0 replies; 34+ messages in thread From: Neil Horman @ 2013-04-13 12:03 UTC (permalink / raw) To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar On Sat, Apr 13, 2013 at 09:35:07AM +0200, Bart Van Assche wrote: > On 04/12/13 20:45, Neil Horman wrote: > >On Fri, Apr 12, 2013 at 04:01:04PM +0200, Bart Van Assche wrote: > >>On 04/12/13 13:32, Neil Horman wrote: > >>I think there is another issue with invoking mutex_trylock() and mutex_unlock() > >>from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled > >>__mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and > >>hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe. > >>Any thoughts about that ? > >> > >Yeah, its ugly, but in this specific case, its ok. the netpoll code (in > >netpoll_send_skb disables irq on the local cpu before entering the netpoll code > >path any further, so whenver we frob this mutex from the local cpu, we're > >guaranteed not to get pre-empted by an irq. > > As far as I know it is neither allowed nor safe to call > netpoll_rx_disable() with IRQs disabled. But that function can lock Where do you see netpoll_rx_disable getting called with irqs disabled? > the dev_lock mutex. What do you think will happen with > CONFIG_DEBUG_MUTEXES=n if an interrupt occurs during the > mutex_lock(&ni->dev_lock) call, that mutex_lock() call has already > locked the mutex-internal spin lock via spin_lock() and > mutex_trylock() is invoked from inside the interrupt ? Can that > result in anything else than deadlock and "CPU stuck" messages ? > Please go back and look more closely. Where do you see a mutex_lock call getting made with interrupts enabled? Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-13 7:35 ` Bart Van Assche 2013-04-13 12:03 ` Neil Horman @ 2013-04-15 14:16 ` Neil Horman [not found] ` <CAO+b5-oBfH3M0dnrQSs-p1BF_5hKy2tsU-dD=EP9+S=iqPs5ew@mail.gmail.com> 1 sibling, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-15 14:16 UTC (permalink / raw) To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar On Sat, Apr 13, 2013 at 09:35:07AM +0200, Bart Van Assche wrote: > On 04/12/13 20:45, Neil Horman wrote: > >On Fri, Apr 12, 2013 at 04:01:04PM +0200, Bart Van Assche wrote: > >>On 04/12/13 13:32, Neil Horman wrote: > >>I think there is another issue with invoking mutex_trylock() and mutex_unlock() > >>from IRQ context: as far as I can see if CONFIG_DEBUG_MUTEXES is disabled > >>__mutex_unlock_common_slowpath() uses spin_lock() to lock mutex.wait_lock and > >>hence invoking mutex_unlock() from both non-IRQ and IRQ context is not safe. > >>Any thoughts about that ? > >> > >Yeah, its ugly, but in this specific case, its ok. the netpoll code (in > >netpoll_send_skb disables irq on the local cpu before entering the netpoll code > >path any further, so whenver we frob this mutex from the local cpu, we're > >guaranteed not to get pre-empted by an irq. > > As far as I know it is neither allowed nor safe to call > netpoll_rx_disable() with IRQs disabled. But that function can lock > the dev_lock mutex. What do you think will happen with > CONFIG_DEBUG_MUTEXES=n if an interrupt occurs during the > mutex_lock(&ni->dev_lock) call, that mutex_lock() call has already > locked the mutex-internal spin lock via spin_lock() and > mutex_trylock() is invoked from inside the interrupt ? Can that > result in anything else than deadlock and "CPU stuck" messages ? > > Thanks, > > Bart. > So I've been doing some reading over the weekend, and as a result I've been developing some concerns about how mutex_trylock works. I'm still convinced its safe to use with the changes we're discussing here, but all the documentation regarding mutex_trylock seems to say it simply shouldn't be used in interrupt context (though it never goes into detail as to why). As near as I can tell, its because the spin locks that mutexes use are locked without disabling interrupts (the implication being that, as we see in the patch we're looking at here), you're safe if you disable irqs independently). Ironically, Ingo patched the mutex debug variants back in 2006 in commit 1fb00c6cbd8356f43b46322742f3c01c2a1f02da to disable irqs to work around a bug, the result being that the debug variant gives you a WARNING about using mutex_trylock in interrupts, when its actually safe, while the non-debug variant provides no warning, but is actually prone to deadlock. There are a few other use cases in the kernel in which mutex_trylock may be used in an interrupt context (crash_kexec being the most notable, as its used for the same purposes that we're using it in netpoll, and where I got the inspiration for my previous netpoll changes. The more I look at this, the more I'm starting to think that, in addition to the debug changes I've got proposed here, we should additionally convert the non-debug spin_lock_mutex variants to disable irqs while manipulating the mutexes. That would make mutex_trylock safe, and give us a nice mechanism to create exclusion between sleepable paths that should not run atomically and interrupt context paths. Ingo, can you comment on these thoughts above please. I'd like to get your opinion on this prior to spinning up a new patch if I could please. Thanks! Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <CAO+b5-oBfH3M0dnrQSs-p1BF_5hKy2tsU-dD=EP9+S=iqPs5ew@mail.gmail.com>]
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex [not found] ` <CAO+b5-oBfH3M0dnrQSs-p1BF_5hKy2tsU-dD=EP9+S=iqPs5ew@mail.gmail.com> @ 2013-04-16 17:24 ` Neil Horman 2013-04-18 19:29 ` Neil Horman 0 siblings, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-16 17:24 UTC (permalink / raw) To: Bart Van Assche; +Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar On Tue, Apr 16, 2013 at 07:56:12AM -0700, Bart Van Assche wrote: > On Mon, Apr 15, 2013 at 7:16 AM, Neil Horman <nhorman@tuxdriver.com> wrote: > > > So I've been doing some reading over the weekend, and as a result I've been > > developing some concerns about how mutex_trylock works. I'm still > > convinced its > > safe to use with the changes we're discussing here, but all the > > documentation > > regarding mutex_trylock seems to say it simply shouldn't be used in > > interrupt > > context (though it never goes into detail as to why). As near as I can > > tell, > > its because the spin locks that mutexes use are locked without disabling > > interrupts (the implication being that, as we see in the patch we're > > looking at > > here), you're safe if you disable irqs independently). Ironically, Ingo > > patched > > the mutex debug variants back in 2006 in commit > > 1fb00c6cbd8356f43b46322742f3c01c2a1f02da to disable irqs to work around a > > bug, > > the result being that the debug variant gives you a WARNING about using > > mutex_trylock in interrupts, when its actually safe, while the non-debug > > variant > > provides no warning, but is actually prone to deadlock. > > > > There are a few other use cases in the kernel in which mutex_trylock may > > be used > > in an interrupt context (crash_kexec being the most notable, as its used > > for the > > same purposes that we're using it in netpoll, and where I got the > > inspiration > > for my previous netpoll changes. > > > > The more I look at this, the more I'm starting to think that, in addition > > to the > > debug changes I've got proposed here, we should additionally convert the > > non-debug spin_lock_mutex variants to disable irqs while manipulating the > > mutexes. That would make mutex_trylock safe, and give us a nice mechanism > > to > > create exclusion between sleepable paths that should not run atomically and > > interrupt context paths. > > > > Ingo, can you comment on these thoughts above please. I'd like to get your > > opinion on this prior to spinning up a new patch if I could please. > > > > How about introducing a second type of mutex, and only using changing the > spin_lock() into spin_lock_irq() for that new type of mutex ? That way no > performance penalty would be introduced for existing mutex users and > mutex_trylock() could be made safe for use inside IRQ-context for the new > mutex type only. > > Bart. Thats a thought, I've actually been thinking about doing this with a completion variable and a flag as well, but perhaps a mutex variant would be a better api choice. I'll put that together as an alternate, but I'd really like to see what Ingo has to say regarding just switching the spin_lock to spin_lock_irqsave for all cases as being a mutex, I'm not sure adding in irq protection is going to be that much of a performance hit. Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-16 17:24 ` Neil Horman @ 2013-04-18 19:29 ` Neil Horman 2013-04-22 20:12 ` Neil Horman 0 siblings, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-18 19:29 UTC (permalink / raw) To: Bart Van Assche; +Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar Well, I guess Ingo doesn't have time to respond here. Given that, I've been looking at other ways of adressing this problem (as the more I think about it, the more I think changing mutex semantics, even to be more accepting, is going to be met with lots of friction). I've come up with the below change, and it works well for me (I've hammered it with ifup/downs while sending messages over netconsole for 24 hours now without any incident). I'm not sure I like all the details yet (I modified the rx_flag field to be a gneral flags field that uses test/set/clear_bit, which I don't think is necessecary, but it seems saner to me). the meat of the change though is that instead of a mutex, we use two flags that, respectively, inhibit netpoll usage, and indicate when we're in a netpoll service path. Using these two flags, in conjunction with a wait_queue, we can signal to executors in the netpoll path that we're inhibiting its use and expect to any user in the path to complete and wake up any contexts in the inhibiting path. Seems to work well for me. Let me know what you think, and I'll make any desired changes and post an official patch. Best Neil commit c277d54d471572b0ec7b27812d205fb303ad44f9 Author: Neil Horman <nhorman@tuxdriver.com> Date: Wed Apr 17 15:59:44 2013 -0400 netpoll: convert mutex into a wait_event Bart Van Assche recently reported a warning to me: <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 [<ffffffff8146f9f6>] printk+0x4d/0x4f [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] This resulted from my commit ca99ca14c which introduced a mutex_trylock operation in a path that could execute in interrupt context. When mutex debugging is enabled, the above warns the user when we are in fact exectuting in interrupt context. After some discussion, I'm not sure a mutex is the right choice here, as it seems unsafe to use (deadlock possibilities due to implementation details). Instead lets convert the mutex to a mechanism that makes use of a wait_queue_head and some flag bits. The dev_close paths can now set a bit to suspend netpoll and then sleep waiting for the NETPOLL_POLLING flag to clear, which allows for flushing of any in-progress execution in the netpoll_poll_dev_path. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Bart Van Assche <bvanassche@acm.org> CC: Bart Van Assche <bvanassche@acm.org> CC: David Miller <davem@davemloft.net> CC: netdev@vger.kernel.org diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h index 9d7d8c6..52e1d14 100644 --- a/include/linux/netpoll.h +++ b/include/linux/netpoll.h @@ -35,12 +35,19 @@ struct netpoll { struct work_struct cleanup_work; }; +/* Flag defs for netpoll_info->flags */ + +#define NETPOLL_RX_ENABLED 1 +#define NETPOLL_RX_DROP 2 +#define NETPOLL_SUSPENDED 3 +#define NETPOLL_POLLING 4 + struct netpoll_info { atomic_t refcnt; - unsigned long rx_flags; + unsigned long flags; spinlock_t rx_lock; - struct mutex dev_lock; + wait_queue_head_t dev_wait; struct list_head rx_np; /* netpolls that registered an rx_hook */ struct sk_buff_head neigh_tx; /* list of neigh requests to reply to */ @@ -53,11 +60,11 @@ struct netpoll_info { }; #ifdef CONFIG_NETPOLL -extern int netpoll_rx_disable(struct net_device *dev); -extern void netpoll_rx_enable(struct net_device *dev); +extern int netpoll_suspend(struct net_device *dev); +extern void netpoll_resume(struct net_device *dev); #else -static inline int netpoll_rx_disable(struct net_device *dev) { return 0; } -static inline void netpoll_rx_enable(struct net_device *dev) { return; } +static inline int netpoll_suspend(struct net_device *dev) { return 0; } +static inline void netpoll_resume(struct net_device *dev) { return; } #endif void netpoll_send_udp(struct netpoll *np, const char *msg, int len); @@ -88,7 +95,8 @@ static inline bool netpoll_rx_on(struct sk_buff *skb) { struct netpoll_info *npinfo = rcu_dereference_bh(skb->dev->npinfo); - return npinfo && (!list_empty(&npinfo->rx_np) || npinfo->rx_flags); + return npinfo && (!list_empty(&npinfo->rx_np) || + test_bit(NETPOLL_RX_ENABLED, &npinfo->flags)); } static inline bool netpoll_rx(struct sk_buff *skb) @@ -104,8 +112,9 @@ static inline bool netpoll_rx(struct sk_buff *skb) npinfo = rcu_dereference_bh(skb->dev->npinfo); spin_lock(&npinfo->rx_lock); - /* check rx_flags again with the lock held */ - if (npinfo->rx_flags && __netpoll_rx(skb, npinfo)) + /* check flags again with the lock held */ + if (test_bit(NETPOLL_RX_ENABLED, &npinfo->flags) && + __netpoll_rx(skb, npinfo)) ret = true; spin_unlock(&npinfo->rx_lock); diff --git a/net/core/dev.c b/net/core/dev.c index 3655ff9..163dd3e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1200,7 +1200,7 @@ static int __dev_open(struct net_device *dev) * If we don't do this there is a chance ndo_poll_controller * or ndo_poll may be running while we open the device */ - ret = netpoll_rx_disable(dev); + ret = netpoll_suspend(dev); if (ret) return ret; @@ -1217,7 +1217,7 @@ static int __dev_open(struct net_device *dev) if (!ret && ops->ndo_open) ret = ops->ndo_open(dev); - netpoll_rx_enable(dev); + netpoll_resume(dev); if (ret) clear_bit(__LINK_STATE_START, &dev->state); @@ -1311,7 +1311,7 @@ static int __dev_close(struct net_device *dev) LIST_HEAD(single); /* Temporarily disable netpoll until the interface is down */ - retval = netpoll_rx_disable(dev); + retval = netpoll_suspend(dev); if (retval) return retval; @@ -1319,7 +1319,7 @@ static int __dev_close(struct net_device *dev) retval = __dev_close_many(&single); list_del(&single); - netpoll_rx_enable(dev); + netpoll_resume(dev); return retval; } @@ -1360,7 +1360,7 @@ int dev_close(struct net_device *dev) LIST_HEAD(single); /* Block netpoll rx while the interface is going down */ - ret = netpoll_rx_disable(dev); + ret = netpoll_suspend(dev); if (ret) return ret; @@ -1368,7 +1368,7 @@ int dev_close(struct net_device *dev) dev_close_many(&single); list_del(&single); - netpoll_rx_enable(dev); + netpoll_resume(dev); } return ret; } diff --git a/net/core/netpoll.c b/net/core/netpoll.c index a3a17ae..a094227 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -50,8 +50,6 @@ static atomic_t trapped; DEFINE_STATIC_SRCU(netpoll_srcu); #define USEC_PER_POLL 50 -#define NETPOLL_RX_ENABLED 1 -#define NETPOLL_RX_DROP 2 #define MAX_SKB_SIZE \ (sizeof(struct ethhdr) + \ @@ -155,7 +153,7 @@ static int poll_one_napi(struct netpoll_info *npinfo, if (!test_bit(NAPI_STATE_SCHED, &napi->state)) return budget; - npinfo->rx_flags |= NETPOLL_RX_DROP; + set_bit(NETPOLL_RX_DROP, &npinfo->flags); atomic_inc(&trapped); set_bit(NAPI_STATE_NPSVC, &napi->state); @@ -164,7 +162,7 @@ static int poll_one_napi(struct netpoll_info *npinfo, clear_bit(NAPI_STATE_NPSVC, &napi->state); atomic_dec(&trapped); - npinfo->rx_flags &= ~NETPOLL_RX_DROP; + clear_bit(NETPOLL_RX_DROP, &npinfo->flags); return budget - work; } @@ -206,27 +204,22 @@ static void netpoll_poll_dev(struct net_device *dev) * the dev_open/close paths use this to block netpoll activity * while changing device state */ - if (!mutex_trylock(&ni->dev_lock)) - return; + set_bit(NETPOLL_POLLING, &ni->flags); + if (test_bit(NETPOLL_SUSPENDED, &ni->flags)) + goto out; - if (!netif_running(dev)) { - mutex_unlock(&ni->dev_lock); - return; - } + if (!netif_running(dev)) + goto out; ops = dev->netdev_ops; - if (!ops->ndo_poll_controller) { - mutex_unlock(&ni->dev_lock); - return; - } + if (!ops->ndo_poll_controller) + goto out; /* Process pending work on NIC */ ops->ndo_poll_controller(dev); poll_napi(dev); - mutex_unlock(&ni->dev_lock); - if (dev->flags & IFF_SLAVE) { if (ni) { struct net_device *bond_dev; @@ -245,32 +238,37 @@ static void netpoll_poll_dev(struct net_device *dev) service_neigh_queue(ni); zap_completion_queue(); +out: + clear_bit(NETPOLL_POLLING, &ni->dev_wait.lock); + wake_up(&ni->dev_wait); } -int netpoll_rx_disable(struct net_device *dev) +int netpoll_suspend(struct net_device *dev) { struct netpoll_info *ni; int idx; + might_sleep(); idx = srcu_read_lock(&netpoll_srcu); ni = srcu_dereference(dev->npinfo, &netpoll_srcu); - if (ni) - mutex_lock(&ni->dev_lock); + if (ni && !test_and_set_bit(NETPOLL_SUSPENDED, &ni->flags)) + wait_event(ni->dev_wait, !test_bit(NETPOLL_POLLING, + &ni->flags)); srcu_read_unlock(&netpoll_srcu, idx); return 0; } -EXPORT_SYMBOL(netpoll_rx_disable); +EXPORT_SYMBOL(netpoll_suspend); -void netpoll_rx_enable(struct net_device *dev) +void netpoll_resume(struct net_device *dev) { struct netpoll_info *ni; rcu_read_lock(); ni = rcu_dereference(dev->npinfo); if (ni) - mutex_unlock(&ni->dev_lock); + clear_bit(NETPOLL_SUSPENDED, &ni->flags); rcu_read_unlock(); } -EXPORT_SYMBOL(netpoll_rx_enable); +EXPORT_SYMBOL(netpoll_resume); static void refill_skbs(void) { @@ -1042,11 +1040,11 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp) goto out; } - npinfo->rx_flags = 0; + npinfo->flags = 0; INIT_LIST_HEAD(&npinfo->rx_np); spin_lock_init(&npinfo->rx_lock); - mutex_init(&npinfo->dev_lock); + init_waitqueue_head(&npinfo->dev_wait); skb_queue_head_init(&npinfo->neigh_tx); skb_queue_head_init(&npinfo->txq); INIT_DELAYED_WORK(&npinfo->tx_work, queue_process); @@ -1068,7 +1066,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp) if (np->rx_hook) { spin_lock_irqsave(&npinfo->rx_lock, flags); - npinfo->rx_flags |= NETPOLL_RX_ENABLED; + set_bit(NETPOLL_RX_ENABLED, &npinfo->flags); list_add_tail(&np->rx, &npinfo->rx_np); spin_unlock_irqrestore(&npinfo->rx_lock, flags); } @@ -1251,7 +1249,7 @@ void __netpoll_cleanup(struct netpoll *np) spin_lock_irqsave(&npinfo->rx_lock, flags); list_del(&np->rx); if (list_empty(&npinfo->rx_np)) - npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; + clear_bit(NETPOLL_RX_ENABLED, &npinfo->flags); spin_unlock_irqrestore(&npinfo->rx_lock, flags); } ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-18 19:29 ` Neil Horman @ 2013-04-22 20:12 ` Neil Horman [not found] ` <CAO+b5-r5jVJNZWuREUH5MQ3baeSPR8fVV1p9pMnukmiZd9nRhg@mail.gmail.com> 0 siblings, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-22 20:12 UTC (permalink / raw) To: Bart Van Assche; +Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar On Thu, Apr 18, 2013 at 03:29:12PM -0400, Neil Horman wrote: > Well, I guess Ingo doesn't have time to respond here. Given that, I've been > looking at other ways of adressing this problem (as the more I think about it, > the more I think changing mutex semantics, even to be more accepting, is going > to be met with lots of friction). I've come up with the below change, and it > works well for me (I've hammered it with ifup/downs while sending messages over > netconsole for 24 hours now without any incident). I'm not sure I like all the > details yet (I modified the rx_flag field to be a gneral flags field that uses > test/set/clear_bit, which I don't think is necessecary, but it seems saner to > me). the meat of the change though is that instead of a mutex, we use two flags > that, respectively, inhibit netpoll usage, and indicate when we're in a netpoll > service path. Using these two flags, in conjunction with a wait_queue, we can > signal to executors in the netpoll path that we're inhibiting its use and expect > to any user in the path to complete and wake up any contexts in the inhibiting > path. Seems to work well for me. Let me know what you think, and I'll make any > desired changes and post an official patch. > > Best > Neil > Bart, any testing/thoughts here? i'd like to get this fixed in the next few days, but I'd like to get any feedback you might have first. Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <CAO+b5-r5jVJNZWuREUH5MQ3baeSPR8fVV1p9pMnukmiZd9nRhg@mail.gmail.com>]
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex [not found] ` <CAO+b5-r5jVJNZWuREUH5MQ3baeSPR8fVV1p9pMnukmiZd9nRhg@mail.gmail.com> @ 2013-04-23 13:23 ` Neil Horman [not found] ` <CAO+b5-rQPyO9QE9v+oQTeo+G-ftcsehSB5=63AZ13QW4EJ1X0Q@mail.gmail.com> 0 siblings, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-23 13:23 UTC (permalink / raw) To: Bart Van Assche; +Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar On Mon, Apr 22, 2013 at 01:41:11PM -0700, Bart Van Assche wrote: > On Mon, Apr 22, 2013 at 1:12 PM, Neil Horman <nhorman@tuxdriver.com> wrote: > > > Bart, any testing/thoughts here? i'd like to get this fixed in the next > > few > > days, but I'd like to get any feedback you might have first. > > > > Sorry but I haven't been able to run any tests during the past eight days > since I'm traveling and because of that I do not have access to my regular > test setup. > > Bart. Copy that. I can propose this change based on my own testing, or wait for you to get back. Do you have a preference? Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <CAO+b5-rQPyO9QE9v+oQTeo+G-ftcsehSB5=63AZ13QW4EJ1X0Q@mail.gmail.com>]
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex [not found] ` <CAO+b5-rQPyO9QE9v+oQTeo+G-ftcsehSB5=63AZ13QW4EJ1X0Q@mail.gmail.com> @ 2013-04-23 13:44 ` Neil Horman 2013-04-23 17:33 ` David Miller 0 siblings, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-23 13:44 UTC (permalink / raw) To: Bart Van Assche; +Cc: David Miller, netdev, Ingo Molnar On Tue, Apr 23, 2013 at 06:38:19AM -0700, Bart Van Assche wrote: > On Tue, Apr 23, 2013 at 6:23 AM, Neil Horman <nhorman@tuxdriver.com> wrote: > > > Copy that. I can propose this change based on my own testing, or wait for > > you > > to get back. Do you have a preference? > > > > It will take until next week before I can run more tests. It depends on > whether a fix can wait that long ? > > Bart. Dave, how do you feel about it? I'm comfortable with the wait queue change I've proposed, but I've not received any reports of actual netpoll deadlocks (i.e. the mutex solution is reporting a warning, but no actual problems). So I think its safe to wait a bit longer, unless you just want this squared away now. FWIw, I think theres still some discussion to be had about the possibility of converting mutexes to use spin_lock_irqsave internally, thereby making mutex_trylock irq safe, but I think we can deal with that independently of this issue. Regards Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-23 13:44 ` Neil Horman @ 2013-04-23 17:33 ` David Miller 2013-04-23 17:50 ` Neil Horman 0 siblings, 1 reply; 34+ messages in thread From: David Miller @ 2013-04-23 17:33 UTC (permalink / raw) To: nhorman; +Cc: bart.vanassche, netdev, mingo From: Neil Horman <nhorman@tuxdriver.com> Date: Tue, 23 Apr 2013 09:44:43 -0400 > Dave, how do you feel about it? I'm comfortable with the wait queue change I've > proposed, but I've not received any reports of actual netpoll deadlocks (i.e. > the mutex solution is reporting a warning, but no actual problems). So I think > its safe to wait a bit longer, unless you just want this squared away now. If it's just a warning and people aren't actually hitting the potential deadlock, it can wait. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-23 17:33 ` David Miller @ 2013-04-23 17:50 ` Neil Horman 2013-04-27 18:53 ` bvba Bart Van Assche 0 siblings, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-23 17:50 UTC (permalink / raw) To: David Miller; +Cc: bart.vanassche, netdev, mingo On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote: > From: Neil Horman <nhorman@tuxdriver.com> > Date: Tue, 23 Apr 2013 09:44:43 -0400 > > > Dave, how do you feel about it? I'm comfortable with the wait queue change I've > > proposed, but I've not received any reports of actual netpoll deadlocks (i.e. > > the mutex solution is reporting a warning, but no actual problems). So I think > > its safe to wait a bit longer, unless you just want this squared away now. > > If it's just a warning and people aren't actually hitting the > potential deadlock, it can wait. > Copy that. Bart, I'll wait till you get back then. Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-23 17:50 ` Neil Horman @ 2013-04-27 18:53 ` bvba Bart Van Assche 2013-04-29 18:13 ` Neil Horman 0 siblings, 1 reply; 34+ messages in thread From: bvba Bart Van Assche @ 2013-04-27 18:53 UTC (permalink / raw) To: Neil Horman; +Cc: David Miller, netdev, mingo On 04/23/13 19:50, Neil Horman wrote: > On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote: >> From: Neil Horman <nhorman@tuxdriver.com> >> Date: Tue, 23 Apr 2013 09:44:43 -0400 >> >>> Dave, how do you feel about it? I'm comfortable with the wait queue change I've >>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e. >>> the mutex solution is reporting a warning, but no actual problems). So I think >>> its safe to wait a bit longer, unless you just want this squared away now. >> >> If it's just a warning and people aren't actually hitting the >> potential deadlock, it can wait. >> > Copy that. Bart, I'll wait till you get back then. (Just arrived home) Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271: kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601] kernel: irq event stamp: 1999192 kernel: hardirqs last enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450 kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80 kernel: softirqs last enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280 kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0 kernel: CPU 0 kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE kernel: RIP: 0010:[<ffffffff8103e8a0>] [<ffffffff8103e8a0>] console_unlock+0x420/0x450 kernel: Call Trace: kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040 kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350 kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 kernel: [<ffffffff812bd764>] con_write+0x34/0x50 kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210 kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70 kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480 kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0 kernel: [<ffffffff812a2839>] tty_write+0x159/0x300 kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180 kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0 kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170 kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0 kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-27 18:53 ` bvba Bart Van Assche @ 2013-04-29 18:13 ` Neil Horman 2013-04-29 19:12 ` Bart Van Assche 2013-04-30 15:35 ` [PATCH RFC] netpoll: convert mutex into a semaphore Neil Horman 0 siblings, 2 replies; 34+ messages in thread From: Neil Horman @ 2013-04-29 18:13 UTC (permalink / raw) To: bvba Bart Van Assche; +Cc: David Miller, netdev, mingo On Sat, Apr 27, 2013 at 08:53:54PM +0200, bvba Bart Van Assche wrote: > On 04/23/13 19:50, Neil Horman wrote: > > On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote: > >> From: Neil Horman <nhorman@tuxdriver.com> > >> Date: Tue, 23 Apr 2013 09:44:43 -0400 > >> > >>> Dave, how do you feel about it? I'm comfortable with the wait queue change I've > >>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e. > >>> the mutex solution is reporting a warning, but no actual problems). So I think > >>> its safe to wait a bit longer, unless you just want this squared away now. > >> > >> If it's just a warning and people aren't actually hitting the > >> potential deadlock, it can wait. > >> > > Copy that. Bart, I'll wait till you get back then. > > (Just arrived home) > > Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271: > > kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601] > kernel: irq event stamp: 1999192 > kernel: hardirqs last enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450 > kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80 > kernel: softirqs last enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280 > kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0 > kernel: CPU 0 > kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE > kernel: RIP: 0010:[<ffffffff8103e8a0>] [<ffffffff8103e8a0>] console_unlock+0x420/0x450 > kernel: Call Trace: > kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040 > kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 > kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350 > kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 > kernel: [<ffffffff812bd764>] con_write+0x34/0x50 > kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210 > kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70 > kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480 > kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0 > kernel: [<ffffffff812a2839>] tty_write+0x159/0x300 > kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180 > kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0 > kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170 > kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0 > kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b > > Bart. > So, looking at this, I'm not 100% sure whats going on. console_unlock has the ability to detect the need for rescheduling when we call raw_spin_unlock_irqrestore at the bottom of the function, so there should be no reason that we don't give the per-cpu watchdog task a chance to run. About the only case I see in which we might be in trouble would be if somehow got into an infinite loop if every message in the log buffer had LOG_NOCONS set, in which case we would spin in this loop without ever scheduling or re-enabling interrupts. But for that to happen I think you would have to be flooding the log with messages that were bigger than the default message buffer size (about 1000 bytes without a newline). You're not doing that are you? Regardless, this is a different problem from the one we were origionally addressing, we should probably handle it in a different thread. On the up side, I did discover while looking at this that semaphores, unlike mutexes are quite liberal in what contexts they can be used in (i.e. irq contexts are perfectly legitimate places to call up and down_trylock). I'll respin my patch to use that, as its much simpler than a wait queue Best Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-29 18:13 ` Neil Horman @ 2013-04-29 19:12 ` Bart Van Assche 2013-04-30 15:35 ` [PATCH RFC] netpoll: convert mutex into a semaphore Neil Horman 1 sibling, 0 replies; 34+ messages in thread From: Bart Van Assche @ 2013-04-29 19:12 UTC (permalink / raw) To: Neil Horman; +Cc: David Miller, netdev, mingo On 04/29/13 20:13, Neil Horman wrote: > On Sat, Apr 27, 2013 at 08:53:54PM +0200, Bart Van Assche wrote: >> Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271: >> >> kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601] >> kernel: irq event stamp: 1999192 >> kernel: hardirqs last enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450 >> kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80 >> kernel: softirqs last enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280 >> kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0 >> kernel: CPU 0 >> kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE >> kernel: RIP: 0010:[<ffffffff8103e8a0>] [<ffffffff8103e8a0>] console_unlock+0x420/0x450 >> kernel: Call Trace: >> kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040 >> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 >> kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350 >> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 >> kernel: [<ffffffff812bd764>] con_write+0x34/0x50 >> kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210 >> kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70 >> kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480 >> kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0 >> kernel: [<ffffffff812a2839>] tty_write+0x159/0x300 >> kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180 >> kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0 >> kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170 >> kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0 >> kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b > > So, looking at this, I'm not 100% sure whats going on. console_unlock has the > ability to detect the need for rescheduling when we call > raw_spin_unlock_irqrestore at the bottom of the function, so there should be no > reason that we don't give the per-cpu watchdog task a chance to run. About the > only case I see in which we might be in trouble would be if somehow got into an > infinite loop if every message in the log buffer had LOG_NOCONS set, in which > case we would spin in this loop without ever scheduling or re-enabling > interrupts. But for that to happen I think you would have to be flooding the > log with messages that were bigger than the default message buffer size (about > 1000 bytes without a newline). You're not doing that are you? The above call trace was triggered by sending messages at a high rate over netconsole where each message has a length between 40 and (about) 160 characters. I'm not sure though what exactly is causing this call trace - I haven't had the time yet to analyze this in detail. Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH RFC] netpoll: convert mutex into a semaphore 2013-04-29 18:13 ` Neil Horman 2013-04-29 19:12 ` Bart Van Assche @ 2013-04-30 15:35 ` Neil Horman 2013-05-01 19:00 ` David Miller 1 sibling, 1 reply; 34+ messages in thread From: Neil Horman @ 2013-04-30 15:35 UTC (permalink / raw) To: netdev; +Cc: Neil Horman, Bart Van Assche, David Miller Bart Van Assche recently reported a warning to me: <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 [<ffffffff8146f9f6>] printk+0x4d/0x4f [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] This resulted from my commit ca99ca14c which introduced a mutex_trylock operation in a path that could execute in interrupt context. When mutex debugging is enabled, the above warns the user when we are in fact exectuting in interrupt context interrupt context. After some discussion, It seems that a semaphore is the proper mechanism to use here. While mutexes are defined to be unusable in interrupt context, no such condition exists for semaphores (save for the fact that the non blocking api calls, like up and down_trylock must be used when in irq context). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Bart Van Assche <bvanassche@acm.org> CC: Bart Van Assche <bvanassche@acm.org> CC: David Miller <davem@davemloft.net> CC: netdev@vger.kernel.org --- include/linux/netpoll.h | 2 +- net/core/netpoll.c | 14 +++++++------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h index 9d7d8c6..fa2cb76 100644 --- a/include/linux/netpoll.h +++ b/include/linux/netpoll.h @@ -40,7 +40,7 @@ struct netpoll_info { unsigned long rx_flags; spinlock_t rx_lock; - struct mutex dev_lock; + struct semaphore dev_lock; struct list_head rx_np; /* netpolls that registered an rx_hook */ struct sk_buff_head neigh_tx; /* list of neigh requests to reply to */ diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 209d842..a5802a8 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -206,17 +206,17 @@ static void netpoll_poll_dev(struct net_device *dev) * the dev_open/close paths use this to block netpoll activity * while changing device state */ - if (!mutex_trylock(&ni->dev_lock)) + if (!down_trylock(&ni->dev_lock)) return; if (!netif_running(dev)) { - mutex_unlock(&ni->dev_lock); + up(&ni->dev_lock); return; } ops = dev->netdev_ops; if (!ops->ndo_poll_controller) { - mutex_unlock(&ni->dev_lock); + up(&ni->dev_lock); return; } @@ -225,7 +225,7 @@ static void netpoll_poll_dev(struct net_device *dev) poll_napi(dev); - mutex_unlock(&ni->dev_lock); + up(&ni->dev_lock); if (dev->flags & IFF_SLAVE) { if (ni) { @@ -255,7 +255,7 @@ int netpoll_rx_disable(struct net_device *dev) idx = srcu_read_lock(&netpoll_srcu); ni = srcu_dereference(dev->npinfo, &netpoll_srcu); if (ni) - mutex_lock(&ni->dev_lock); + down(&ni->dev_lock); srcu_read_unlock(&netpoll_srcu, idx); return 0; } @@ -267,7 +267,7 @@ void netpoll_rx_enable(struct net_device *dev) rcu_read_lock(); ni = rcu_dereference(dev->npinfo); if (ni) - mutex_unlock(&ni->dev_lock); + up(&ni->dev_lock); rcu_read_unlock(); } EXPORT_SYMBOL(netpoll_rx_enable); @@ -1047,7 +1047,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp) INIT_LIST_HEAD(&npinfo->rx_np); spin_lock_init(&npinfo->rx_lock); - mutex_init(&npinfo->dev_lock); + sema_init(&npinfo->dev_lock, 1); skb_queue_head_init(&npinfo->neigh_tx); skb_queue_head_init(&npinfo->txq); INIT_DELAYED_WORK(&npinfo->tx_work, queue_process); -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] netpoll: convert mutex into a semaphore 2013-04-30 15:35 ` [PATCH RFC] netpoll: convert mutex into a semaphore Neil Horman @ 2013-05-01 19:00 ` David Miller 2013-05-01 19:34 ` Neil Horman 0 siblings, 1 reply; 34+ messages in thread From: David Miller @ 2013-05-01 19:00 UTC (permalink / raw) To: nhorman; +Cc: netdev, bvanassche From: Neil Horman <nhorman@tuxdriver.com> Date: Tue, 30 Apr 2013 11:35:05 -0400 > Bart Van Assche recently reported a warning to me: ... > This resulted from my commit ca99ca14c which introduced a mutex_trylock > operation in a path that could execute in interrupt context. When mutex > debugging is enabled, the above warns the user when we are in fact > exectuting in interrupt context > interrupt context. > > After some discussion, It seems that a semaphore is the proper mechanism to use > here. While mutexes are defined to be unusable in interrupt context, no such > condition exists for semaphores (save for the fact that the non blocking api > calls, like up and down_trylock must be used when in irq context). > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > Reported-by: Bart Van Assche <bvanassche@acm.org> Neil this looks good to me so I'm going to toss it into my tree and queue it up for -stable too, thanks! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] netpoll: convert mutex into a semaphore 2013-05-01 19:00 ` David Miller @ 2013-05-01 19:34 ` Neil Horman 0 siblings, 0 replies; 34+ messages in thread From: Neil Horman @ 2013-05-01 19:34 UTC (permalink / raw) To: David Miller; +Cc: netdev, bvanassche On Wed, May 01, 2013 at 03:00:59PM -0400, David Miller wrote: > From: Neil Horman <nhorman@tuxdriver.com> > Date: Tue, 30 Apr 2013 11:35:05 -0400 > > > Bart Van Assche recently reported a warning to me: > ... > > This resulted from my commit ca99ca14c which introduced a mutex_trylock > > operation in a path that could execute in interrupt context. When mutex > > debugging is enabled, the above warns the user when we are in fact > > exectuting in interrupt context > > interrupt context. > > > > After some discussion, It seems that a semaphore is the proper mechanism to use > > here. While mutexes are defined to be unusable in interrupt context, no such > > condition exists for semaphores (save for the fact that the non blocking api > > calls, like up and down_trylock must be used when in irq context). > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > Reported-by: Bart Van Assche <bvanassche@acm.org> > > Neil this looks good to me so I'm going to toss it into my tree and queue > it up for -stable too, thanks! > Ok, cool deal. Thanks! Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-12 14:01 ` Bart Van Assche 2013-04-12 18:45 ` Neil Horman @ 2013-04-19 8:38 ` Ingo Molnar 2013-04-19 12:52 ` Neil Horman 1 sibling, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2013-04-19 8:38 UTC (permalink / raw) To: Bart Van Assche Cc: Neil Horman, David Miller, netdev, Ingo Molnar, Peter Zijlstra, Thomas Gleixner * Bart Van Assche <bvanassche@acm.org> wrote: > WARNING: at kernel/mutex.c:313 __mutex_unlock_slowpath+0x157/0x160() > Pid: 181, comm: kworker/0:1H Tainted: G O 3.9.0-rc6-debug+ #1 > Call Trace: > <IRQ> [<ffffffff8103c3ef>] warn_slowpath_common+0x7f/0xc0 > [<ffffffff8103c44a>] warn_slowpath_null+0x1a/0x20 > [<ffffffff81432047>] __mutex_unlock_slowpath+0x157/0x160 > [<ffffffff8143205e>] mutex_unlock+0xe/0x10 > [<ffffffff8136d031>] netpoll_poll_dev+0x111/0x9a0 > [<ffffffff81345f32>] ? __alloc_skb+0x82/0x2a0 > [<ffffffff8136dac5>] netpoll_send_skb_on_dev+0x205/0x3b0 > [<ffffffff8136e00a>] netpoll_send_udp+0x28a/0x3a0 > [<ffffffffa0524843>] ? write_msg+0x53/0x110 [netconsole] > [<ffffffffa05248bf>] write_msg+0xcf/0x110 [netconsole] > [<ffffffff8103d7f1>] call_console_drivers.constprop.16+0xa1/0x120 > [<ffffffff8103e848>] console_unlock+0x3f8/0x450 > [<ffffffff8103ecce>] vprintk_emit+0x1ee/0x510 > [<ffffffff812d1f2c>] dev_vprintk_emit+0x5c/0x70 > [<ffffffff810ff047>] ? mempool_free_slab+0x17/0x20 I *really* think that using a mutex from a low level debug interface like netpoll is a mistake. We want such interfaces to be as atomic and as self-contained as possible: using spinlocks, which could possibly be converted to raw spinlocks, etc. mutexes should be used when there's an expectation of possibly long blocking time. That's not really the case for netpoll, we either are able to generate the skb right then and send it off, or we are in trouble, right? Thanks, Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex 2013-04-19 8:38 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Ingo Molnar @ 2013-04-19 12:52 ` Neil Horman 0 siblings, 0 replies; 34+ messages in thread From: Neil Horman @ 2013-04-19 12:52 UTC (permalink / raw) To: Ingo Molnar Cc: Bart Van Assche, David Miller, netdev, Ingo Molnar, Peter Zijlstra, Thomas Gleixner On Fri, Apr 19, 2013 at 10:38:02AM +0200, Ingo Molnar wrote: > > * Bart Van Assche <bvanassche@acm.org> wrote: > > > WARNING: at kernel/mutex.c:313 __mutex_unlock_slowpath+0x157/0x160() > > Pid: 181, comm: kworker/0:1H Tainted: G O 3.9.0-rc6-debug+ #1 > > Call Trace: > > <IRQ> [<ffffffff8103c3ef>] warn_slowpath_common+0x7f/0xc0 > > [<ffffffff8103c44a>] warn_slowpath_null+0x1a/0x20 > > [<ffffffff81432047>] __mutex_unlock_slowpath+0x157/0x160 > > [<ffffffff8143205e>] mutex_unlock+0xe/0x10 > > [<ffffffff8136d031>] netpoll_poll_dev+0x111/0x9a0 > > [<ffffffff81345f32>] ? __alloc_skb+0x82/0x2a0 > > [<ffffffff8136dac5>] netpoll_send_skb_on_dev+0x205/0x3b0 > > [<ffffffff8136e00a>] netpoll_send_udp+0x28a/0x3a0 > > [<ffffffffa0524843>] ? write_msg+0x53/0x110 [netconsole] > > [<ffffffffa05248bf>] write_msg+0xcf/0x110 [netconsole] > > [<ffffffff8103d7f1>] call_console_drivers.constprop.16+0xa1/0x120 > > [<ffffffff8103e848>] console_unlock+0x3f8/0x450 > > [<ffffffff8103ecce>] vprintk_emit+0x1ee/0x510 > > [<ffffffff812d1f2c>] dev_vprintk_emit+0x5c/0x70 > > [<ffffffff810ff047>] ? mempool_free_slab+0x17/0x20 > > I *really* think that using a mutex from a low level debug interface like netpoll > is a mistake. We want such interfaces to be as atomic and as self-contained as > possible: using spinlocks, which could possibly be converted to raw spinlocks, > etc. > > mutexes should be used when there's an expectation of possibly long blocking time. > That's not really the case for netpoll, we either are able to generate the skb > right then and send it off, or we are in trouble, right? > > Thanks, > > Ingo > Well, this is a very seldom used path. The idea behind the use of a mutex was simply to allow the close/open paths to sleep during those periods when they occured in parallel with a netpoll operation. The only reason spinlocks weren't used was because we're not supposed to enter the drivers ndo_open/close paths in atomic context, as subsequents sleeps are possible. Thats why the netpoll_poll_napi path had a mutex_trylock operation. As it turns out though, thats unsafe for this use case because the mutex implementation uses a spin_lock without disabling interrupts (leading to the possibility of deadlocks), which in turn led to our discussion of weather or not converting the mutex internal implementation to use spin_lock_irqsave, as that would make mutex_trylock safe to use in irq/softirq context. Its all a bit moot at this point though, as I've come up with an alternate solution that I think satifies our needs just as well. Using a wait_queue_head, with some atomic flags, we can block dev_open and close while netpoll is executing, and retain the ability to wake those processes up safely from irq context. Please take a look at my new patch and let me know what you think. Thanks! Neil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
@ 2013-04-28 2:34 Neil Horman
0 siblings, 0 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-28 2:34 UTC (permalink / raw)
To: bvba Bart Van Assche; +Cc: David Miller, netdev, mingo
You've got the wrong patch built then. The netpoll path with my latest patch no longer has any mutexes in it. This pears to be a separate problem.
Best
Neil
bvba Bart Van Assche <info@bartvanassche.be> wrote:
>On 04/23/13 19:50, Neil Horman wrote:
>> On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote:
>>> From: Neil Horman <nhorman@tuxdriver.com>
>>> Date: Tue, 23 Apr 2013 09:44:43 -0400
>>>
>>>> Dave, how do you feel about it? I'm comfortable with the wait queue change I've
>>>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
>>>> the mutex solution is reporting a warning, but no actual problems). So I think
>>>> its safe to wait a bit longer, unless you just want this squared away now.
>>>
>>> If it's just a warning and people aren't actually hitting the
>>> potential deadlock, it can wait.
>>>
>> Copy that. Bart, I'll wait till you get back then.
>
>(Just arrived home)
>
>Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271:
>
>kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601]
>kernel: irq event stamp: 1999192
>kernel: hardirqs last enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450
>kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80
>kernel: softirqs last enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280
>kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0
>kernel: CPU 0
>kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE
>kernel: RIP: 0010:[<ffffffff8103e8a0>] [<ffffffff8103e8a0>] console_unlock+0x420/0x450
>kernel: Call Trace:
>kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040
>kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350
>kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>kernel: [<ffffffff812bd764>] con_write+0x34/0x50
>kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210
>kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70
>kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480
>kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0
>kernel: [<ffffffff812a2839>] tty_write+0x159/0x300
>kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180
>kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0
>kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170
>kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0
>kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b
>
>Bart.
>
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2013-05-01 19:34 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-04-11 13:42 Netpoll triggers soft lockup Bart Van Assche 2013-04-11 14:08 ` Neil Horman 2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman 2013-04-11 15:54 ` Christoph Paasch 2013-04-11 17:04 ` Neil Horman 2013-04-11 17:51 ` Christoph Paasch 2013-04-11 15:57 ` Eric Dumazet 2013-04-11 16:56 ` Neil Horman 2013-04-11 17:31 ` Bart Van Assche 2013-04-11 17:52 ` Neil Horman 2013-04-11 19:14 ` Neil Horman 2013-04-12 6:27 ` Bart Van Assche 2013-04-12 11:32 ` Neil Horman 2013-04-12 14:01 ` Bart Van Assche 2013-04-12 18:45 ` Neil Horman 2013-04-13 7:35 ` Bart Van Assche 2013-04-13 12:03 ` Neil Horman 2013-04-15 14:16 ` Neil Horman [not found] ` <CAO+b5-oBfH3M0dnrQSs-p1BF_5hKy2tsU-dD=EP9+S=iqPs5ew@mail.gmail.com> 2013-04-16 17:24 ` Neil Horman 2013-04-18 19:29 ` Neil Horman 2013-04-22 20:12 ` Neil Horman [not found] ` <CAO+b5-r5jVJNZWuREUH5MQ3baeSPR8fVV1p9pMnukmiZd9nRhg@mail.gmail.com> 2013-04-23 13:23 ` Neil Horman [not found] ` <CAO+b5-rQPyO9QE9v+oQTeo+G-ftcsehSB5=63AZ13QW4EJ1X0Q@mail.gmail.com> 2013-04-23 13:44 ` Neil Horman 2013-04-23 17:33 ` David Miller 2013-04-23 17:50 ` Neil Horman 2013-04-27 18:53 ` bvba Bart Van Assche 2013-04-29 18:13 ` Neil Horman 2013-04-29 19:12 ` Bart Van Assche 2013-04-30 15:35 ` [PATCH RFC] netpoll: convert mutex into a semaphore Neil Horman 2013-05-01 19:00 ` David Miller 2013-05-01 19:34 ` Neil Horman 2013-04-19 8:38 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Ingo Molnar 2013-04-19 12:52 ` Neil Horman 2013-04-28 2:34 Neil Horman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).