All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 60505] New: Heavy network traffic triggers vhost_net lockup
@ 2013-07-04 16:55 bugzilla-daemon
  2013-07-05 15:05 ` [Bug 60505] " bugzilla-daemon
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: bugzilla-daemon @ 2013-07-04 16:55 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=60505

            Bug ID: 60505
           Summary: Heavy network traffic triggers vhost_net lockup
           Product: Virtualization
           Version: unspecified
    Kernel Version: 3.9.8
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: bvanassche@acm.org
        Regression: No

When running heavy network traffic between host and guests after some time the
qemu-kvm processes lock up and become unkillable. Apparently these processes
hang in an ioctl. From the output of echo w > /proc/sysrq-trigger:

SysRq : Show Blocked State
 task                        PC stack   pid father
qemu-kvm        D ffff88011fa128c0     0 30506      1 0x00000004
ffff88005dc11d38 0000000000000086 ffff880097c20720 ffff88005dc11fd8
ffff88005dc11fd8 ffff88005dc11fd8 ffffffff81810440 ffff880097c20720
ffff88005dc11d48 ffff88005ddb0058 ffff88005ddb8528 ffff88005ddb0000
Call Trace:
[<ffffffff813fa679>] schedule+0x29/0x70
[<ffffffffa07a39a5>] vhost_work_flush+0xa5/0x120 [vhost_net]
[<ffffffffa07a3b19>] vhost_poll_flush+0x19/0x20 [vhost_net]
[<ffffffffa07a6077>] vhost_net_flush_vq+0x37/0x50 [vhost_net]
[<ffffffffa07a71d2>] vhost_net_ioctl+0x502/0x660 [vhost_net]
[<ffffffff8114bed0>] do_vfs_ioctl+0x90/0x520
[<ffffffff8114c3b0>] sys_ioctl+0x50/0x90
[<ffffffff81403a42>] system_call_fastpath+0x16/0x1b
qemu-kvm        D ffff88011fad28c0     0 30575      1 0x00000004
ffff880097e05d38 0000000000000082 ffff8800a3c50000 ffff880097e05fd8
ffff880097e05fd8 ffff880097e05fd8 ffff88011acd5ca0 ffff8800a3c50000
ffff880097e05d48 ffff8800d41c0058 ffff8800d41c8528 ffff8800d41c0000
Call Trace:
[<ffffffff813fa679>] schedule+0x29/0x70
[<ffffffffa07a39a5>] vhost_work_flush+0xa5/0x120 [vhost_net]
[<ffffffffa07a3b19>] vhost_poll_flush+0x19/0x20 [vhost_net]
[<ffffffffa07a6077>] vhost_net_flush_vq+0x37/0x50 [vhost_net]
[<ffffffffa07a71d2>] vhost_net_ioctl+0x502/0x660 [vhost_net]
[<ffffffff8114bed0>] do_vfs_ioctl+0x90/0x520
[<ffffffff8114c3b0>] sys_ioctl+0x50/0x90
[<ffffffff81403a42>] system_call_fastpath+0x16/0x1b

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 60505] Heavy network traffic triggers vhost_net lockup
  2013-07-04 16:55 [Bug 60505] New: Heavy network traffic triggers vhost_net lockup bugzilla-daemon
@ 2013-07-05 15:05 ` bugzilla-daemon
  2013-07-07  8:39 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2013-07-05 15:05 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=60505

--- Comment #1 from Bart Van Assche <bvanassche@acm.org> ---
Note: this might be a consequence of bug 60518.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 60505] Heavy network traffic triggers vhost_net lockup
  2013-07-04 16:55 [Bug 60505] New: Heavy network traffic triggers vhost_net lockup bugzilla-daemon
  2013-07-05 15:05 ` [Bug 60505] " bugzilla-daemon
@ 2013-07-07  8:39 ` bugzilla-daemon
  2013-07-07 11:27 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2013-07-07  8:39 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=60505

Michael S. Tsirkin <m.s.tsirkin@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |m.s.tsirkin@gmail.com

--- Comment #2 from Michael S. Tsirkin <m.s.tsirkin@gmail.com> ---
does this still trigger if you disable zerocopy tx?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 60505] Heavy network traffic triggers vhost_net lockup
  2013-07-04 16:55 [Bug 60505] New: Heavy network traffic triggers vhost_net lockup bugzilla-daemon
  2013-07-05 15:05 ` [Bug 60505] " bugzilla-daemon
  2013-07-07  8:39 ` bugzilla-daemon
@ 2013-07-07 11:27 ` bugzilla-daemon
  2013-07-08  9:20 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2013-07-07 11:27 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=60505

--- Comment #3 from Michael S. Tsirkin <m.s.tsirkin@gmail.com> ---
Also.
I just posted a patch fixing a bug in this function.
[PATCHv3] vhost-net: fix use-after-free in vhost_net_flush
could you please try with this patch applied?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 60505] Heavy network traffic triggers vhost_net lockup
  2013-07-04 16:55 [Bug 60505] New: Heavy network traffic triggers vhost_net lockup bugzilla-daemon
                   ` (2 preceding siblings ...)
  2013-07-07 11:27 ` bugzilla-daemon
@ 2013-07-08  9:20 ` bugzilla-daemon
  2013-07-08 10:09 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2013-07-08  9:20 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=60505

--- Comment #4 from Bart Van Assche <bvanassche@acm.org> ---
I have not yet tried to disable zero-copy tx. But even with the vhost-net patch
applied on kernel v3.9.9 I can still trigger this issue:

Jul  8 10:58:01 asus kernel: BUG: unable to handle kernel NULL pointer
dereference at 000000000000001c
Jul  8 10:58:01 asus kernel: IP: [<ffffffff810f73a9>]
put_compound_page+0x89/0x170
Jul  8 10:58:01 asus kernel: PGD 0 
Jul  8 10:58:01 asus kernel: Oops: 0000 [#1] SMP 
Jul  8 10:58:01 asus kernel: Modules linked in: dm_queue_length dm_multipath
ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net tun fuse
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables af_packet bridge stp llc rdma_ucm rdma_cm iw_cm ib_addr ib_srp
scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib
ib_sa ib_mad ib_core dm_mod hid_generic usbhid hid acpi_cpufreq mperf kvm_intel
i2c_i801 kvm r8169 ehci_pci snd_hda_codec_hdmi qla2xxx snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_hwdep ehci_hcd snd_pcm snd_seq mii sr_mod cdrom
sg snd_timer pcspkr snd_seq_device mlx4_core scsi_transport_fc wmi snd
soundcore snd_page_alloc crc32c_intel microcode autofs4 ext4 jbd2 mbcache crc16
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
raid10 raid0 raid1 sd_mod crc_t10dif i915 drm_kms_helper drm ahci libahci
intel_agp i2c_algo_bit intel_gtt agpgart xhci_hcd i2c_core video usbcore
usb_common button processor thermal_sys hwmon scsi_dh_alua scsi_dh pata_acpi
libata scsi_mod
Jul  8 10:58:01 asus kernel: CPU 3 
Jul  8 10:58:01 asus kernel: Pid: 5485, comm: vhost-5462 Not tainted 3.9.9+ #1
Gigabyte Technology Co., Ltd. Z68X-UD3H-B3/Z68X-UD3H-B3
Jul  8 10:58:01 asus kernel: RIP: 0010:[<ffffffff810f73a9>] 
[<ffffffff810f73a9>] put_compound_page+0x89/0x170
Jul  8 10:58:01 asus kernel: RSP: 0018:ffff8800aab13bd8  EFLAGS: 00010286
Jul  8 10:58:01 asus kernel: RAX: ffff880118b0b600 RBX: ffff880118b0b800 RCX:
ffffea000252801c
Jul  8 10:58:01 asus kernel: RDX: 0000000000000140 RSI: 0000000000000246 RDI:
ffff880118b0b800
Jul  8 10:58:01 asus kernel: RBP: ffff8800aab13bf8 R08: ffff8800aa8f4518 R09:
0000000000000010
Jul  8 10:58:01 asus kernel: R10: 0000000000000000 R11: 00007fa0c0000000 R12:
0000000000000000
Jul  8 10:58:01 asus kernel: R13: ffffffffa078f96c R14: 00000000000091aa R15:
ffff8800b3bb7500
Jul  8 10:58:01 asus kernel: FS:  0000000000000000(0000)
GS:ffff88011fac0000(0000) knlGS:0000000000000000
Jul  8 10:58:01 asus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  8 10:58:01 asus kernel: CR2: 000000000000001c CR3: 00000000aab9f000 CR4:
00000000000427e0
Jul  8 10:58:01 asus kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
Jul  8 10:58:01 asus kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
Jul  8 10:58:01 asus kernel: Process vhost-5462 (pid: 5485, threadinfo
ffff8800aab12000, task ffff880107920000)
Jul  8 10:58:01 asus kernel: Stack:
Jul  8 10:58:01 asus kernel: ffffea0000ecae40 0000000000000012 ffff8800b3bb7500
ffffffffa078f96c
Jul  8 10:58:01 asus kernel: ffff8800aab13c08 ffffffff810f77ec ffff8800aab13c28
ffffffff8132045f
Jul  8 10:58:01 asus kernel: ffff8800b3bb7500 ffff8800b3bb7500 ffff8800aab13c48
ffffffff813204fe
Jul  8 10:58:01 asus kernel: Call Trace:
Jul  8 10:58:01 asus kernel: [<ffffffff810f77ec>] put_page+0x2c/0x40
Jul  8 10:58:01 asus kernel: [<ffffffff8132045f>] skb_release_data+0x8f/0x110
Jul  8 10:58:01 asus kernel: [<ffffffff813204fe>] __kfree_skb+0x1e/0xa0
Jul  8 10:58:01 asus kernel: [<ffffffff813205b6>] kfree_skb+0x36/0xa0
Jul  8 10:58:01 asus kernel: [<ffffffffa078f96c>] tun_get_user+0x71c/0x810
[tun]
Jul  8 10:58:01 asus kernel: [<ffffffffa078faba>] tun_sendmsg+0x5a/0x80 [tun]
Jul  8 10:58:01 asus kernel: [<ffffffffa079e607>] handle_tx+0x287/0x680
[vhost_net]
Jul  8 10:58:01 asus kernel: [<ffffffffa079ea35>] handle_tx_kick+0x15/0x20
[vhost_net]
Jul  8 10:58:01 asus kernel: [<ffffffffa079a80a>] vhost_worker+0xaa/0x1a0
[vhost_net]
Jul  8 10:58:01 asus kernel: [<ffffffff8105ef80>] kthread+0xc0/0xd0
Jul  8 10:58:01 asus kernel: [<ffffffff8140395c>] ret_from_fork+0x7c/0xb0
Jul  8 10:58:01 asus kernel: Code: 8b 6d f8 c9 c3 48 8b 07 f6 c4 80 75 0d f0 ff
4b 1c 0f 94 c0 84 c0 74 c9 eb bf 4c 8b 67 30 48 8b 07 f6 c4 80 74 e7 4c 39 e7
74 e2 <41> 8b 54 24 1c 49 8d 4c 24 1c 85 d2 74 d4 8d 72 01 89 d0 f0 0f 
Jul  8 10:58:01 asus kernel: RIP  [<ffffffff810f73a9>]
put_compound_page+0x89/0x170
Jul  8 10:58:01 asus kernel: RSP <ffff8800aab13bd8>
Jul  8 10:58:01 asus kernel: CR2: 000000000000001c
Jul  8 10:58:01 asus kernel: ---[ end trace 481d0b283c089c9a ]---

The patch I ran this test with is as follows:

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index dfff647..98f81e6 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -857,7 +857,7 @@ static long vhost_net_set_backend(struct vhost_net *n,
unsigned index, int fd)
     mutex_unlock(&vq->mutex);

     if (oldubufs) {
-        vhost_ubuf_put_and_wait(oldubufs);
+        vhost_ubuf_put_wait_and_free(oldubufs);
         mutex_lock(&vq->mutex);
         vhost_zerocopy_signal_used(n, vq);
         mutex_unlock(&vq->mutex);
@@ -875,7 +875,7 @@ err_used:
     rcu_assign_pointer(vq->private_data, oldsock);
     vhost_net_enable_vq(n, vq);
     if (ubufs)
-        vhost_ubuf_put_and_wait(ubufs);
+        vhost_ubuf_put_wait_and_free(ubufs);
 err_ubufs:
     fput(sock->file);
 err_vq:
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 0d96700..348fce4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1576,5 +1576,10 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref
*ubufs)
 {
     kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
     wait_event(ubufs->wait, !atomic_read(&ubufs->kref.refcount));
+}
+
+void vhost_ubuf_put_wait_and_free(struct vhost_ubuf_ref *ubufs)
+{
+    vhost_ubuf_put_and_wait(ubufs);
     kfree(ubufs);
 }
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 17261e2..ab2eb0d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -63,6 +63,7 @@ struct vhost_ubuf_ref {
 struct vhost_ubuf_ref *vhost_ubuf_alloc(struct vhost_virtqueue *, bool zcopy);
 void vhost_ubuf_put(struct vhost_ubuf_ref *);
 void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *);
+void vhost_ubuf_put_wait_and_free(struct vhost_ubuf_ref *ubufs);

 struct ubuf_info;

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Bug 60505] Heavy network traffic triggers vhost_net lockup
  2013-07-04 16:55 [Bug 60505] New: Heavy network traffic triggers vhost_net lockup bugzilla-daemon
                   ` (3 preceding siblings ...)
  2013-07-08  9:20 ` bugzilla-daemon
@ 2013-07-08 10:09 ` bugzilla-daemon
  2013-07-08 10:11 ` bugzilla-daemon
  2013-08-13 18:58 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2013-07-08 10:09 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=60505

Bart Van Assche <bvanassche@acm.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Regression|No                          |Yes

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 60505] Heavy network traffic triggers vhost_net lockup
  2013-07-04 16:55 [Bug 60505] New: Heavy network traffic triggers vhost_net lockup bugzilla-daemon
                   ` (4 preceding siblings ...)
  2013-07-08 10:09 ` bugzilla-daemon
@ 2013-07-08 10:11 ` bugzilla-daemon
  2013-08-13 18:58 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2013-07-08 10:11 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=60505

--- Comment #5 from Bart Van Assche <bvanassche@acm.org> ---
The lockup does not occur with kernel 3.8.12 but occurs with at least kernel
3.9.9 and kernel 3.10. I have been able to trigger the lockup with kernel 3.10
without seeing any tasks hanging in vhost_work_flush().

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 60505] Heavy network traffic triggers vhost_net lockup
  2013-07-04 16:55 [Bug 60505] New: Heavy network traffic triggers vhost_net lockup bugzilla-daemon
                   ` (5 preceding siblings ...)
  2013-07-08 10:11 ` bugzilla-daemon
@ 2013-08-13 18:58 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2013-08-13 18:58 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=60505

Bart Van Assche <bvanassche@acm.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #6 from Bart Van Assche <bvanassche@acm.org> ---
Kernel 3.10.5 passed my tests.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-08-13 18:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-04 16:55 [Bug 60505] New: Heavy network traffic triggers vhost_net lockup bugzilla-daemon
2013-07-05 15:05 ` [Bug 60505] " bugzilla-daemon
2013-07-07  8:39 ` bugzilla-daemon
2013-07-07 11:27 ` bugzilla-daemon
2013-07-08  9:20 ` bugzilla-daemon
2013-07-08 10:09 ` bugzilla-daemon
2013-07-08 10:11 ` bugzilla-daemon
2013-08-13 18:58 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.