netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Netpoll triggers soft lockup
@ 2013-04-11 13:42 Bart Van Assche
  2013-04-11 14:08 ` Neil Horman
  2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman
  0 siblings, 2 replies; 34+ messages in thread
From: Bart Van Assche @ 2013-04-11 13:42 UTC (permalink / raw)
  To: Neil Horman, David Miller, netdev

Hi,

While testing a driver against kernel 3.9-rc6 I ran into a soft lockup 
triggered by sending lots of kernel messages to a remote system via 
netconsole. This behavior was probably introduced by commit ca99ca14c 
("netpoll: protect napi_poll and poll_controller during 
dev_[open|close]"). That commit introduced a mutex in 
netpoll_poll_dev(), which can be called from interrupt context. Is there 
anyone who can tell me whether this is a bug in commit ca99ca14c or in 
netconsole ?

Read(10):------------[ cut here ]------------
WARNING: at kernel/mutex.c:434 mutex_trylock+0x16d/0x180()
Hardware name: P5Q DELUXE
Modules linked in: ib_srp scsi_transport_srp dm_mod qla2x00tgt(O) 
qla2xxx_scst(O) scsi_transport_fc iscsi_scst(O) ib_srpt(O) scst_vdisk(O) 
libcrc32c scst(O) crc32c brd netconsole configfs rdma_ucm rdma_cm iw_cm 
ib_addr scsi_tgt af_packet snd_hda_codec_hdmi snd_hda_codec_analog 
ib_ipoib ib_cm ib_uverbs ib_umad snd_hda_intel snd_hda_codec snd_hwdep 
snd_pcm snd_seq snd_timer mlx4_ib ib_sa ib_mad cpufreq_conservative 
ib_core cpufreq_userspace cpufreq_powersave snd_seq_device snd mlx4_core 
sr_mod skge soundcore pcspkr acpi_cpufreq button ehci_pci sg i2c_i801 
mperf snd_page_alloc cdrom microcode autofs4 ext3 jbd mbcache sd_mod 
crc_t10dif radeon uhci_hcd ttm drm_kms_helper ehci_hcd drm i2c_algo_bit 
i2c_core intel_agp intel_gtt agpgart usbcore usb_common processor 
thermal_sys hwmon scsi_dh_alua scsi_dh ata_generic ata_piix ahci libahci 
pata_marvell libata scsi_mod [last unloaded: scsi_transport_srp]
Pid: 178, comm: kworker/0:1H Tainted: G           O 3.9.0-rc6-debug+ #0
Call Trace:
  <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
  [<ffffffff8146f9f6>] printk+0x4d/0x4f
  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
  [<ffffffffa000b924>] scsi_io_completion+0x294/0x6c0 [scsi_mod]
  [<ffffffffa000113d>] scsi_finish_command+0xbd/0x120 [scsi_mod]
  [<ffffffffa000b58f>] scsi_softirq_done+0x13f/0x160 [scsi_mod]
  [<ffffffff8121dbc0>] blk_done_softirq+0x80/0xa0
  [<ffffffff81046e81>] ? __do_softirq+0xb1/0x3c0
  [<ffffffff81046ed1>] __do_softirq+0x101/0x3c0
  [<ffffffff810cc649>] ? handle_irq_event+0x59/0x80
  [<ffffffff81047355>] irq_exit+0xb5/0xc0
  [<ffffffff81484cd3>] do_IRQ+0x63/0xe0
  [<ffffffff8147a82f>] common_interrupt+0x6f/0x6f
  <EOI>  [<ffffffff810a1ea2>] ? mark_held_locks+0xb2/0x130
  [<ffffffff8147a46a>] ? _raw_spin_unlock_irq+0x3a/0x50
  [<ffffffff8147a460>] ? _raw_spin_unlock_irq+0x30/0x50
  [<ffffffff812148e5>] blk_delay_work+0x35/0x40
  [<ffffffff8106135d>] process_one_work+0x1fd/0x650
  [<ffffffff810612f2>] ? process_one_work+0x192/0x650
  [<ffffffff81061b50>] worker_thread+0x110/0x380
  [<ffffffff81061a40>] ? rescuer_thread+0x250/0x250
  [<ffffffff81067d6b>] kthread+0xdb/0xe0
  [<ffffffff81067c90>] ? kthread_create_on_node+0x140/0x140
  [<ffffffff8148345c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81067c90>] ? kthread_create_on_node+0x140/0x140
---[ end trace e3e3a22d8bb51cb7 ]---

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
@ 2013-04-28  2:34 Neil Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Neil Horman @ 2013-04-28  2:34 UTC (permalink / raw)
  To: bvba Bart Van Assche; +Cc: David Miller, netdev, mingo

You've got the wrong patch built then.  The netpoll path with my latest patch no longer has any mutexes in it.  This pears to be a separate problem.

Best
Neil

bvba Bart Van Assche <info@bartvanassche.be> wrote:

>On 04/23/13 19:50, Neil Horman wrote:
>> On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote:
>>> From: Neil Horman <nhorman@tuxdriver.com>
>>> Date: Tue, 23 Apr 2013 09:44:43 -0400
>>>
>>>> Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
>>>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
>>>> the mutex solution is reporting a warning, but no actual problems).  So I think
>>>> its safe to wait a bit longer, unless you just want this squared away now.
>>>
>>> If it's just a warning and people aren't actually hitting the
>>> potential deadlock, it can wait.
>>>
>> Copy that. Bart, I'll wait till you get back then.
>
>(Just arrived home)
>
>Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271:
>
>kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601]
>kernel: irq event stamp: 1999192
>kernel: hardirqs last  enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450
>kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80
>kernel: softirqs last  enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280
>kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0
>kernel: CPU 0 
>kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G           O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE
>kernel: RIP: 0010:[<ffffffff8103e8a0>]  [<ffffffff8103e8a0>] console_unlock+0x420/0x450
>kernel: Call Trace:
>kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040
>kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350
>kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>kernel: [<ffffffff812bd764>] con_write+0x34/0x50
>kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210
>kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70
>kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480
>kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0
>kernel: [<ffffffff812a2839>] tty_write+0x159/0x300
>kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180
>kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0
>kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170
>kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0
>kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b
>
>Bart.
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-05-01 19:34 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-11 13:42 Netpoll triggers soft lockup Bart Van Assche
2013-04-11 14:08 ` Neil Horman
2013-04-11 15:18 ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Neil Horman
2013-04-11 15:54   ` Christoph Paasch
2013-04-11 17:04     ` Neil Horman
2013-04-11 17:51       ` Christoph Paasch
2013-04-11 15:57   ` Eric Dumazet
2013-04-11 16:56     ` Neil Horman
2013-04-11 17:31   ` Bart Van Assche
2013-04-11 17:52     ` Neil Horman
2013-04-11 19:14     ` Neil Horman
2013-04-12  6:27       ` Bart Van Assche
2013-04-12 11:32         ` Neil Horman
2013-04-12 14:01           ` Bart Van Assche
2013-04-12 18:45             ` Neil Horman
2013-04-13  7:35               ` Bart Van Assche
2013-04-13 12:03                 ` Neil Horman
2013-04-15 14:16                 ` Neil Horman
     [not found]                   ` <CAO+b5-oBfH3M0dnrQSs-p1BF_5hKy2tsU-dD=EP9+S=iqPs5ew@mail.gmail.com>
2013-04-16 17:24                     ` Neil Horman
2013-04-18 19:29                       ` Neil Horman
2013-04-22 20:12                         ` Neil Horman
     [not found]                           ` <CAO+b5-r5jVJNZWuREUH5MQ3baeSPR8fVV1p9pMnukmiZd9nRhg@mail.gmail.com>
2013-04-23 13:23                             ` Neil Horman
     [not found]                               ` <CAO+b5-rQPyO9QE9v+oQTeo+G-ftcsehSB5=63AZ13QW4EJ1X0Q@mail.gmail.com>
2013-04-23 13:44                                 ` Neil Horman
2013-04-23 17:33                                   ` David Miller
2013-04-23 17:50                                     ` Neil Horman
2013-04-27 18:53                                       ` bvba Bart Van Assche
2013-04-29 18:13                                         ` Neil Horman
2013-04-29 19:12                                           ` Bart Van Assche
2013-04-30 15:35                                           ` [PATCH RFC] netpoll: convert mutex into a semaphore Neil Horman
2013-05-01 19:00                                             ` David Miller
2013-05-01 19:34                                               ` Neil Horman
2013-04-19  8:38             ` [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Ingo Molnar
2013-04-19 12:52               ` Neil Horman
2013-04-28  2:34 Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).