kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] Multiqueue virtio-net
@ 2012-12-28 10:31 Jason Wang
  2012-12-28 10:31 ` [PATCH 01/12] tap: multiqueue support Jason Wang
                   ` (13 more replies)
  0 siblings, 14 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:31 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: krkumar2, kvm, mprivozn, Jason Wang, rusty, jwhan, shiyer

Hello all:

This seires is an update of last version of multiqueue virtio-net support.

Recently, linux tap gets multiqueue support. This series implements basic
support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
enable the multiqueue support for virtio-net.

Both vhost and userspace multiqueue were implemented for virtio-net, but
userspace could be get much benefits since dataplane like parallized mechanism
were not implemented.

User could start a multiqueue virtio-net card through adding a "queues"
parameter to tap.

./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0

Management tools such as libvirt can pass multiple pre-created fds through

./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device virtio-net-pci,netdev=hn0

You can fetch and try the code from:
git://github.com/jasowang/qemu.git

Patch 1 adds a generic method of creating multiqueue taps and implement the
linux part.
Patch 2 - 4 introduce some helpers which could be used to refactor the nic
emulation codes to support multiqueue.
Patch 5 introduces multiqueue support for qemu networking code: each peers of
NetClientState were abstracted as a queue. Though this, most of the codes could
be reusued without change.
Patch 6 adds basic multiqueue support for vhost which could let vhost just
handle a subset of all virtqueues.
Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
virtio-net.
Patch 9-12 implement the multiqueue support of virtio-net

Changes from RFC v2:
- rebase the codes to latest qemu
- align the multiqueue virtio-net implementation to virtio spec
- split the patches into more smaller patches
- set_link and hotplug support

Changes from RFC V1:
- rebase to the latest
- fix memory leak in parse_netdev
- fix guest notifiers assignment/de-assignment
- changes the command lines to:
   qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2

Reference:
v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481

Perf Numbers:

Two Intel Xeon 5620 with direct connected intel 82599EB
Host/Guest kernel: David net tree
vhost enabled

- lots of improvents of both latency and cpu utilization in request-reponse test
- get regression of guest sending small packets which because TCP tends to batch
  less when the latency were improved

1q/2q/4q
TCP_RR
 size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
TCP_CRR
 size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
64 100  28747.37 584.17 49081.87 667.87 60612.94 662
256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
256 50  28354.7  579.85 40578.31 607    60261.71 657.87
256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
TCP_STREAM guest receiving
 size #sessions throughput  norm throughput  norm throughput  norm
1 1     16.27   1.33   16.1    1.12   16.13   0.99
1 2     33.04   2.08   32.96   2.19   32.75   1.98
1 4     66.62   6.83   68.3    5.56   66.14   2.65
64 1    896.55  56.67  914.02  58.14  898.9   61.56
64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
TCP_MAERTS guest sending
 size #sessions throughput  norm throughput  norm throughput  norm
1 1     15.94   0.62   15.55   0.61   15.13   0.59
1 2     36.11   0.83   32.46   0.69   32.28   0.69
1 4     71.59   1      68.91   0.94   61.52   0.77
64 1    630.71  22.52  622.11  22.35  605.09  21.84
64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47


Jason Wang (12):
  tap: multiqueue support
  net: introduce qemu_get_queue()
  net: introduce qemu_get_nic()
  net: intorduce qemu_del_nic()
  net: multiqueue support
  vhost: multiqueue support
  virtio: introduce virtio_queue_del()
  virtio: add a queue_index to VirtQueue
  virtio-net: separate virtqueue from VirtIONet
  virtio-net: multiqueue support
  virtio-net: migration support for multiqueue
  virtio-net: compat multiqueue support

 hw/cadence_gem.c        |   16 +-
 hw/dp8393x.c            |   16 +-
 hw/e1000.c              |   28 ++--
 hw/eepro100.c           |   18 +-
 hw/etraxfs_eth.c        |   10 +-
 hw/lan9118.c            |   16 +-
 hw/lance.c              |    2 +-
 hw/mcf_fec.c            |   12 +-
 hw/milkymist-minimac2.c |   10 +-
 hw/mipsnet.c            |   10 +-
 hw/musicpal.c           |    6 +-
 hw/ne2000-isa.c         |    4 +-
 hw/ne2000.c             |   12 +-
 hw/opencores_eth.c      |   12 +-
 hw/pc_piix.c            |    4 +
 hw/pcnet-pci.c          |    4 +-
 hw/pcnet.c              |   12 +-
 hw/qdev-properties.c    |   46 ++++-
 hw/qdev-properties.h    |    6 +-
 hw/rtl8139.c            |   20 +-
 hw/smc91c111.c          |   10 +-
 hw/spapr_llan.c         |    8 +-
 hw/stellaris_enet.c     |   10 +-
 hw/usb/dev-network.c    |   16 +-
 hw/vhost.c              |   52 +++--
 hw/vhost.h              |    2 +
 hw/vhost_net.c          |    7 +-
 hw/vhost_net.h          |    2 +-
 hw/virtio-net.c         |  523 ++++++++++++++++++++++++++++++++++-------------
 hw/virtio-net.h         |   27 +++-
 hw/virtio.c             |   17 ++
 hw/virtio.h             |    3 +
 hw/xen_nic.c            |   14 +-
 hw/xgmac.c              |   10 +-
 hw/xilinx_axienet.c     |   10 +-
 hw/xilinx_ethlite.c     |   10 +-
 net.c                   |  198 ++++++++++++++----
 net.h                   |   31 +++-
 net/tap-aix.c           |   18 ++-
 net/tap-bsd.c           |   18 ++-
 net/tap-haiku.c         |   18 ++-
 net/tap-linux.c         |   70 ++++++-
 net/tap-linux.h         |    4 +
 net/tap-solaris.c       |   18 ++-
 net/tap-win32.c         |   10 +
 net/tap.c               |  248 ++++++++++++++++-------
 net/tap.h               |    8 +-
 qapi-schema.json        |    5 +-
 savevm.c                |    2 +-
 49 files changed, 1177 insertions(+), 456 deletions(-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 01/12] tap: multiqueue support
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
@ 2012-12-28 10:31 ` Jason Wang
  2013-01-09  9:56   ` Stefan Hajnoczi
  2013-01-10 10:28   ` Stefan Hajnoczi
  2012-12-28 10:31 ` [PATCH 02/12] net: introduce qemu_get_queue() Jason Wang
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:31 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

Recently, linux support multiqueue tap which could let userspace call TUNSETIFF
for a signle device many times to create multiple file descriptors as
independent queues. User could also enable/disabe a specific queue through
TUNSETQUEUE.

The patch adds the generic infrastructure to create multiqueue taps. To achieve
this a new parameter "queues" were introduced to specify how many queues were
expected to be created for tap. The "fd" parameter were also changed to support
a list of file descriptors which could be used by management (such as libvirt)
to pass pre-created file descriptors (queues) to qemu.

Each TAPState were still associated to a tap fd, which mean multiple TAPStates
were created when user needs multiqueue taps.

Only linux part were implemented now, since it's the only OS that support
multiqueue tap.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/tap-aix.c     |   18 ++++-
 net/tap-bsd.c     |   18 ++++-
 net/tap-haiku.c   |   18 ++++-
 net/tap-linux.c   |   70 +++++++++++++++-
 net/tap-linux.h   |    4 +
 net/tap-solaris.c |   18 ++++-
 net/tap-win32.c   |   10 ++
 net/tap.c         |  248 +++++++++++++++++++++++++++++++++++++----------------
 net/tap.h         |    8 ++-
 qapi-schema.json  |    5 +-
 10 files changed, 335 insertions(+), 82 deletions(-)

diff --git a/net/tap-aix.c b/net/tap-aix.c
index f27c177..f931ef3 100644
--- a/net/tap-aix.c
+++ b/net/tap-aix.c
@@ -25,7 +25,8 @@
 #include "net/tap.h"
 #include <stdio.h>
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+             int vnet_hdr_required, int mq_required)
 {
     fprintf(stderr, "no tap on AIX\n");
     return -1;
@@ -59,3 +60,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
                         int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_attach(int fd)
+{
+    return -1;
+}
+
+int tap_fd_detach(int fd)
+{
+    return -1;
+}
+
+int tap_fd_ifname(int fd, char *ifname)
+{
+    return -1;
+}
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index a3b717d..07c287d 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -33,7 +33,8 @@
 #include <net/if_tap.h>
 #endif
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+             int vnet_hdr_required, int mq_required)
 {
     int fd;
 #ifdef TAPGIFNAME
@@ -145,3 +146,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
                         int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_attach(int fd)
+{
+    return -1;
+}
+
+int tap_fd_detach(int fd)
+{
+    return -1;
+}
+
+int tap_fd_ifname(int fd, char *ifname)
+{
+    return -1;
+}
diff --git a/net/tap-haiku.c b/net/tap-haiku.c
index 34739d1..62ab423 100644
--- a/net/tap-haiku.c
+++ b/net/tap-haiku.c
@@ -25,7 +25,8 @@
 #include "net/tap.h"
 #include <stdio.h>
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+             int vnet_hdr_required, int mq_required)
 {
     fprintf(stderr, "no tap on Haiku\n");
     return -1;
@@ -59,3 +60,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
                         int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_attach(int fd)
+{
+    return -1;
+}
+
+int tap_fd_detach(int fd)
+{
+    return -1;
+}
+
+int tap_fd_ifname(int fd, char *ifname)
+{
+    return -1;
+}
diff --git a/net/tap-linux.c b/net/tap-linux.c
index c6521be..0854ef5 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -35,7 +35,8 @@
 
 #define PATH_NET_TUN "/dev/net/tun"
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+             int vnet_hdr_required, int mq_required)
 {
     struct ifreq ifr;
     int fd, ret;
@@ -67,6 +68,20 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required
         }
     }
 
+    if (mq_required) {
+        unsigned int features;
+
+        if ((ioctl(fd, TUNGETFEATURES, &features) != 0) ||
+            !(features & IFF_MULTI_QUEUE)) {
+            error_report("multiqueue required, but no kernel "
+                         "support for IFF_MULTI_QUEUE available");
+            close(fd);
+            return -1;
+        } else {
+            ifr.ifr_flags |= IFF_MULTI_QUEUE;
+        }
+    }
+
     if (ifname[0] != '\0')
         pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname);
     else
@@ -200,3 +215,56 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
         }
     }
 }
+
+/* Attach a file descriptor to a TUN/TAP device. This descriptor should be
+ * detached before.
+ */
+int tap_fd_attach(int fd)
+{
+    struct ifreq ifr;
+    int ret;
+
+    memset(&ifr, 0, sizeof(ifr));
+
+    ifr.ifr_flags = IFF_ATTACH_QUEUE;
+    ret = ioctl(fd, TUNSETQUEUE, (void *) &ifr);
+
+    if (ret != 0) {
+        error_report("could not attach fd to tap");
+    }
+
+    return ret;
+}
+
+/* Detach a file descriptor to a TUN/TAP device. This file descriptor must have
+ * been attach to a device.
+ */
+int tap_fd_detach(int fd)
+{
+    struct ifreq ifr;
+    int ret;
+
+    memset(&ifr, 0, sizeof(ifr));
+
+    ifr.ifr_flags = IFF_DETACH_QUEUE;
+    ret = ioctl(fd, TUNSETQUEUE, (void *) &ifr);
+
+    if (ret != 0) {
+        error_report("could not detach fd");
+    }
+
+    return ret;
+}
+
+int tap_get_ifname(int fd, char *ifname)
+{
+    struct ifreq ifr;
+
+    if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
+        error_report("TUNGETIFF ioctl() failed: %s", strerror(errno));
+        return -1;
+    }
+
+    pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
+    return 0;
+}
diff --git a/net/tap-linux.h b/net/tap-linux.h
index 659e981..648d29f 100644
--- a/net/tap-linux.h
+++ b/net/tap-linux.h
@@ -29,6 +29,7 @@
 #define TUNSETSNDBUF   _IOW('T', 212, int)
 #define TUNGETVNETHDRSZ _IOR('T', 215, int)
 #define TUNSETVNETHDRSZ _IOW('T', 216, int)
+#define TUNSETQUEUE  _IOW('T', 217, int)
 
 #endif
 
@@ -36,6 +37,9 @@
 #define IFF_TAP		0x0002
 #define IFF_NO_PI	0x1000
 #define IFF_VNET_HDR	0x4000
+#define IFF_MULTI_QUEUE 0x0100
+#define IFF_ATTACH_QUEUE 0x0200
+#define IFF_DETACH_QUEUE 0x0400
 
 /* Features for GSO (TUNSETOFFLOAD). */
 #define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 5d6ac42..2df3ec1 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -173,7 +173,8 @@ static int tap_alloc(char *dev, size_t dev_size)
     return tap_fd;
 }
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+             int vnet_hdr_required, int mq_required)
 {
     char  dev[10]="";
     int fd;
@@ -225,3 +226,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
                         int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_attach(int fd)
+{
+    return -1;
+}
+
+int tap_fd_detach(int fd)
+{
+    return -1;
+}
+
+int tap_fd_ifname(int fd, char *ifname)
+{
+    return -1;
+}
diff --git a/net/tap-win32.c b/net/tap-win32.c
index f9bd741..d7b1f7a 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -763,3 +763,13 @@ void tap_set_vnet_hdr_len(NetClientState *nc, int len)
 {
     assert(0);
 }
+
+int tap_attach(NetClientState *nc)
+{
+    assert(0);
+}
+
+int tap_detach(NetClientState *nc)
+{
+    assert(0);
+}
diff --git a/net/tap.c b/net/tap.c
index 1abfd44..01f826a 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -60,6 +60,7 @@ typedef struct TAPState {
     unsigned int write_poll : 1;
     unsigned int using_vnet_hdr : 1;
     unsigned int has_ufo: 1;
+    unsigned int enabled:1;
     VHostNetState *vhost_net;
     unsigned host_vnet_hdr_len;
 } TAPState;
@@ -73,9 +74,9 @@ static void tap_writable(void *opaque);
 static void tap_update_fd_handler(TAPState *s)
 {
     qemu_set_fd_handler2(s->fd,
-                         s->read_poll  ? tap_can_send : NULL,
-                         s->read_poll  ? tap_send     : NULL,
-                         s->write_poll ? tap_writable : NULL,
+                         s->read_poll && s->enabled ? tap_can_send : NULL,
+                         s->read_poll && s->enabled ? tap_send     : NULL,
+                         s->write_poll && s->enabled ? tap_writable : NULL,
                          s);
 }
 
@@ -340,6 +341,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
     s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
     s->using_vnet_hdr = 0;
     s->has_ufo = tap_probe_has_ufo(s->fd);
+    s->enabled = 1;
     tap_set_offload(&s->nc, 0, 0, 0, 0, 0);
     /*
      * Make sure host header length is set correctly in tap:
@@ -559,17 +561,10 @@ int net_init_bridge(const NetClientOptions *opts, const char *name,
 
 static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
                         const char *setup_script, char *ifname,
-                        size_t ifname_sz)
+                        size_t ifname_sz, int mq_required)
 {
     int fd, vnet_hdr_required;
 
-    if (tap->has_ifname) {
-        pstrcpy(ifname, ifname_sz, tap->ifname);
-    } else {
-        assert(ifname_sz > 0);
-        ifname[0] = '\0';
-    }
-
     if (tap->has_vnet_hdr) {
         *vnet_hdr = tap->vnet_hdr;
         vnet_hdr_required = *vnet_hdr;
@@ -578,7 +573,8 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
         vnet_hdr_required = 0;
     }
 
-    TFR(fd = tap_open(ifname, ifname_sz, vnet_hdr, vnet_hdr_required));
+    TFR(fd = tap_open(ifname, ifname_sz, vnet_hdr, vnet_hdr_required,
+                      mq_required));
     if (fd < 0) {
         return -1;
     }
@@ -594,69 +590,37 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
     return fd;
 }
 
-int net_init_tap(const NetClientOptions *opts, const char *name,
-                 NetClientState *peer)
-{
-    const NetdevTapOptions *tap;
-
-    int fd, vnet_hdr = 0;
-    const char *model;
-    TAPState *s;
+#define MAX_TAP_QUEUES 1024
 
-    /* for the no-fd, no-helper case */
-    const char *script = NULL; /* suppress wrong "uninit'd use" gcc warning */
-    char ifname[128];
-
-    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_TAP);
-    tap = opts->tap;
-
-    if (tap->has_fd) {
-        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
-            tap->has_vnet_hdr || tap->has_helper) {
-            error_report("ifname=, script=, downscript=, vnet_hdr=, "
-                         "and helper= are invalid with fd=");
-            return -1;
-        }
-
-        fd = monitor_handle_fd_param(cur_mon, tap->fd);
-        if (fd == -1) {
-            return -1;
-        }
-
-        fcntl(fd, F_SETFL, O_NONBLOCK);
-
-        vnet_hdr = tap_probe_vnet_hdr(fd);
-
-        model = "tap";
-
-    } else if (tap->has_helper) {
-        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
-            tap->has_vnet_hdr) {
-            error_report("ifname=, script=, downscript=, and vnet_hdr= "
-                         "are invalid with helper=");
-            return -1;
-        }
-
-        fd = net_bridge_run_helper(tap->helper, DEFAULT_BRIDGE_INTERFACE);
-        if (fd == -1) {
-            return -1;
-        }
+static int tap_fd(const StringList *fd, const char **fds)
+{
+    const StringList *c = fd;
+    size_t i = 0, num_opts = 0;
 
-        fcntl(fd, F_SETFL, O_NONBLOCK);
+    while (c) {
+        num_opts++;
+        c = c->next;
+    }
 
-        vnet_hdr = tap_probe_vnet_hdr(fd);
+    if (num_opts == 0) {
+        return 0;
+    }
 
-        model = "bridge";
+    c = fd;
+    while (c) {
+        fds[i++] = c->value->str;
+        c = c->next;
+    }
 
-    } else {
-        script = tap->has_script ? tap->script : DEFAULT_NETWORK_SCRIPT;
-        fd = net_tap_init(tap, &vnet_hdr, script, ifname, sizeof ifname);
-        if (fd == -1) {
-            return -1;
-        }
+    return num_opts;
+}
 
-        model = "tap";
-    }
+static int __net_init_tap(const NetdevTapOptions *tap, NetClientState *peer,
+                          const char *model, const char *name,
+                          const char *ifname, const char *script,
+                          const char *downscript, int vnet_hdr, int fd)
+{
+    TAPState *s;
 
     s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
     if (!s) {
@@ -674,11 +638,6 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
         snprintf(s->nc.info_str, sizeof(s->nc.info_str), "helper=%s",
                  tap->helper);
     } else {
-        const char *downscript;
-
-        downscript = tap->has_downscript ? tap->downscript :
-                                           DEFAULT_NETWORK_DOWN_SCRIPT;
-
         snprintf(s->nc.info_str, sizeof(s->nc.info_str),
                  "ifname=%s,script=%s,downscript=%s", ifname, script,
                  downscript);
@@ -716,9 +675,150 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
     return 0;
 }
 
+int net_init_tap(const NetClientOptions *opts, const char *name,
+                 NetClientState *peer)
+{
+    const NetdevTapOptions *tap;
+    const char *fds[MAX_TAP_QUEUES];
+    int fd, vnet_hdr = 0, i, queues;
+    /* for the no-fd, no-helper case */
+    const char *script = NULL; /* suppress wrong "uninit'd use" gcc warning */
+    const char *downscript = NULL;
+    char ifname[128];
+
+    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_TAP);
+    tap = opts->tap;
+    queues = tap->has_queues ? tap->queues : 1;
+
+    if (tap->has_fd) {
+        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
+            tap->has_vnet_hdr || tap->has_helper) {
+            error_report("ifname=, script=, downscript=, vnet_hdr=, "
+                         "and helper= are invalid with fd=");
+            return -1;
+        }
+
+        if (queues != tap_fd(tap->fd, fds)) {
+            error_report("the number of fds were not equal to queues");
+            return -1;
+        }
+
+        for (i = 0; i < queues; i++) {
+            fd = monitor_handle_fd_param(cur_mon, fds[i]);
+            if (fd == -1) {
+                return -1;
+            }
+
+            fcntl(fd, F_SETFL, O_NONBLOCK);
+
+            if (i == 0) {
+                vnet_hdr = tap_probe_vnet_hdr(fd);
+            }
+
+            if (__net_init_tap(tap, peer, "tap", name, ifname,
+                               script, downscript, vnet_hdr, fd)) {
+                return -1;
+            }
+        }
+    } else if (tap->has_helper) {
+        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
+            tap->has_vnet_hdr) {
+            error_report("ifname=, script=, downscript=, and vnet_hdr= "
+                         "are invalid with helper=");
+            return -1;
+        }
+
+        /* FIXME: correct ? */
+        for (i = 0; i < queues; i++) {
+            fd = net_bridge_run_helper(tap->helper, DEFAULT_BRIDGE_INTERFACE);
+            if (fd == -1) {
+                return -1;
+            }
+
+            fcntl(fd, F_SETFL, O_NONBLOCK);
+
+            if (i == 0) {
+                vnet_hdr = tap_probe_vnet_hdr(fd);
+            }
+
+            if (__net_init_tap(tap, peer, "bridge", name, ifname,
+                               script, downscript, vnet_hdr, fd)) {
+                return -1;
+            }
+        }
+    } else {
+        script = tap->has_script ? tap->script : DEFAULT_NETWORK_SCRIPT;
+        downscript = tap->has_downscript ? tap->downscript :
+                                           DEFAULT_NETWORK_DOWN_SCRIPT;
+
+        if (tap->has_ifname) {
+            pstrcpy(ifname, sizeof ifname, tap->ifname);
+        } else {
+            ifname[0] = '\0';
+        }
+
+        for (i = 0; i < queues; i++) {
+            fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
+                              ifname, sizeof ifname, queues > 1);
+            if (fd == -1) {
+                return -1;
+            }
+
+            if (i == 0 && tap_get_ifname(fd, ifname) != 0) {
+                error_report("could not get ifname");
+                return -1;
+            }
+
+            if (__net_init_tap(tap, peer, "tap", name, ifname,
+                               i >= 1 ? "no" : script,
+                               i >= 1 ? "no" : downscript,
+                               vnet_hdr, fd)) {
+                return -1;
+            }
+        }
+    }
+
+    return 0;
+}
+
 VHostNetState *tap_get_vhost_net(NetClientState *nc)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
     assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP);
     return s->vhost_net;
 }
+
+int tap_attach(NetClientState *nc)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+    int ret;
+
+    if (s->enabled) {
+        return 0;
+    } else {
+        ret = tap_fd_attach(s->fd);
+        if (ret == 0) {
+            s->enabled = 1;
+            tap_update_fd_handler(s);
+        }
+        return ret;
+    }
+}
+
+int tap_detach(NetClientState *nc)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+    int ret;
+
+    if (s->enabled == 0) {
+        return 0;
+    } else {
+        ret = tap_fd_detach(s->fd);
+        if (ret == 0) {
+            qemu_purge_queued_packets(nc);
+            s->enabled = 0;
+            tap_update_fd_handler(s);
+        }
+        return ret;
+    }
+}
diff --git a/net/tap.h b/net/tap.h
index d44d83a..02f154e 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -32,7 +32,8 @@
 #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
 #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required);
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+             int vnet_hdr_required, int mq_required);
 
 ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen);
 
@@ -49,6 +50,11 @@ int tap_probe_vnet_hdr_len(int fd, int len);
 int tap_probe_has_ufo(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo);
 void tap_fd_set_vnet_hdr_len(int fd, int len);
+int tap_fd_attach(int fd);
+int tap_fd_detach(int fd);
+int tap_attach(NetClientState *nc);
+int tap_detach(NetClientState *nc);
+int tap_get_ifname(int fd, char *ifname);
 
 int tap_get_fd(NetClientState *nc);
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 5dfa052..583eb7c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2465,7 +2465,7 @@
 { 'type': 'NetdevTapOptions',
   'data': {
     '*ifname':     'str',
-    '*fd':         'str',
+    '*fd':         ['String'],
     '*script':     'str',
     '*downscript': 'str',
     '*helper':     'str',
@@ -2473,7 +2473,8 @@
     '*vnet_hdr':   'bool',
     '*vhost':      'bool',
     '*vhostfd':    'str',
-    '*vhostforce': 'bool' } }
+    '*vhostforce': 'bool',
+    '*queues':     'uint32'} }
 
 ##
 # @NetdevSocketOptions
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 02/12] net: introduce qemu_get_queue()
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
  2012-12-28 10:31 ` [PATCH 01/12] tap: multiqueue support Jason Wang
@ 2012-12-28 10:31 ` Jason Wang
  2012-12-28 10:31 ` [PATCH 03/12] net: introduce qemu_get_nic() Jason Wang
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:31 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

To support multiqueue, the patch introduce a helper qemu_get_queue()
which is used to get the NetClientState of a device. The following patches would
refactor this helper to support multiqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/cadence_gem.c        |    8 +++---
 hw/dp8393x.c            |    8 +++---
 hw/e1000.c              |   20 +++++++-------
 hw/eepro100.c           |   12 ++++----
 hw/etraxfs_eth.c        |    4 +-
 hw/lan9118.c            |   10 +++---
 hw/mcf_fec.c            |    4 +-
 hw/milkymist-minimac2.c |    4 +-
 hw/mipsnet.c            |    4 +-
 hw/musicpal.c           |    2 +-
 hw/ne2000-isa.c         |    2 +-
 hw/ne2000.c             |    6 ++--
 hw/opencores_eth.c      |    6 ++--
 hw/pcnet-pci.c          |    2 +-
 hw/pcnet.c              |    6 ++--
 hw/rtl8139.c            |   12 ++++----
 hw/smc91c111.c          |    4 +-
 hw/spapr_llan.c         |    4 +-
 hw/stellaris_enet.c     |    4 +-
 hw/usb/dev-network.c    |   10 +++---
 hw/virtio-net.c         |   64 +++++++++++++++++++++++-----------------------
 hw/xen_nic.c            |   10 +++---
 hw/xgmac.c              |    4 +-
 hw/xilinx_axienet.c     |    4 +-
 hw/xilinx_ethlite.c     |    4 +-
 net.c                   |    5 +++
 net.h                   |    1 +
 savevm.c                |    2 +-
 28 files changed, 116 insertions(+), 110 deletions(-)

diff --git a/hw/cadence_gem.c b/hw/cadence_gem.c
index 0c037a2..9d27ecb 100644
--- a/hw/cadence_gem.c
+++ b/hw/cadence_gem.c
@@ -389,10 +389,10 @@ static void gem_init_register_masks(GemState *s)
  */
 static void phy_update_link(GemState *s)
 {
-    DB_PRINT("down %d\n", s->nic->nc.link_down);
+    DB_PRINT("down %d\n", qemu_get_queue(s->nic)->link_down);
 
     /* Autonegotiation status mirrors link status.  */
-    if (s->nic->nc.link_down) {
+    if (qemu_get_queue(s->nic)->link_down) {
         s->phy_regs[PHY_REG_STATUS] &= ~(PHY_REG_STATUS_ANEGCMPL |
                                          PHY_REG_STATUS_LINK);
         s->phy_regs[PHY_REG_INT_ST] |= PHY_REG_INT_ST_LINKC;
@@ -906,9 +906,9 @@ static void gem_transmit(GemState *s)
 
             /* Send the packet somewhere */
             if (s->phy_loop) {
-                gem_receive(&s->nic->nc, tx_packet, total_bytes);
+                gem_receive(qemu_get_queue(s->nic), tx_packet, total_bytes);
             } else {
-                qemu_send_packet(&s->nic->nc, tx_packet, total_bytes);
+                qemu_send_packet(qemu_get_queue(s->nic), tx_packet, total_bytes);
             }
 
             /* Prepare for next packet */
diff --git a/hw/dp8393x.c b/hw/dp8393x.c
index 3f6386e..db50cc6 100644
--- a/hw/dp8393x.c
+++ b/hw/dp8393x.c
@@ -408,13 +408,13 @@ static void do_transmit_packets(dp8393xState *s)
         if (s->regs[SONIC_RCR] & (SONIC_RCR_LB1 | SONIC_RCR_LB0)) {
             /* Loopback */
             s->regs[SONIC_TCR] |= SONIC_TCR_CRSL;
-            if (s->nic->nc.info->can_receive(&s->nic->nc)) {
+            if (qemu_get_queue(s->nic)->info->can_receive(qemu_get_queue(s->nic))) {
                 s->loopback_packet = 1;
-                s->nic->nc.info->receive(&s->nic->nc, s->tx_buffer, tx_len);
+                qemu_get_queue(s->nic)->info->receive(qemu_get_queue(s->nic), s->tx_buffer, tx_len);
             }
         } else {
             /* Transmit packet */
-            qemu_send_packet(&s->nic->nc, s->tx_buffer, tx_len);
+            qemu_send_packet(qemu_get_queue(s->nic), s->tx_buffer, tx_len);
         }
         s->regs[SONIC_TCR] |= SONIC_TCR_PTX;
 
@@ -903,7 +903,7 @@ void dp83932_init(NICInfo *nd, hwaddr base, int it_shift,
 
     s->nic = qemu_new_nic(&net_dp83932_info, &s->conf, nd->model, nd->name, s);
 
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
     qemu_register_reset(nic_reset, s);
     nic_reset(s);
 
diff --git a/hw/e1000.c b/hw/e1000.c
index 5537ad2..aaa4f88 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -164,7 +164,7 @@ static void
 set_phy_ctrl(E1000State *s, int index, uint16_t val)
 {
     if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) {
-        s->nic->nc.link_down = true;
+        qemu_get_queue(s->nic)->link_down = true;
         e1000_link_down(s);
         s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE;
         DBGOUT(PHY, "Start link auto negotiation\n");
@@ -176,7 +176,7 @@ static void
 e1000_autoneg_timer(void *opaque)
 {
     E1000State *s = opaque;
-    s->nic->nc.link_down = false;
+    qemu_get_queue(s->nic)->link_down = false;
     e1000_link_up(s);
     s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE;
     DBGOUT(PHY, "Auto negotiation is completed\n");
@@ -279,7 +279,7 @@ static void e1000_reset(void *opaque)
     d->rxbuf_min_shift = 1;
     memset(&d->tx, 0, sizeof d->tx);
 
-    if (d->nic->nc.link_down) {
+    if (qemu_get_queue(d->nic)->link_down) {
         e1000_link_down(d);
     }
 
@@ -307,7 +307,7 @@ set_rx_control(E1000State *s, int index, uint32_t val)
     s->rxbuf_min_shift = ((val / E1000_RCTL_RDMTS_QUAT) & 3) + 1;
     DBGOUT(RX, "RCTL: %d, mac_reg[RCTL] = 0x%x\n", s->mac_reg[RDT],
            s->mac_reg[RCTL]);
-    qemu_flush_queued_packets(&s->nic->nc);
+    qemu_flush_queued_packets(qemu_get_queue(s->nic));
 }
 
 static void
@@ -459,9 +459,9 @@ static void
 e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
 {
     if (s->phy_reg[PHY_CTRL] & MII_CR_LOOPBACK) {
-        s->nic->nc.info->receive(&s->nic->nc, buf, size);
+        qemu_get_queue(s->nic)->info->receive(qemu_get_queue(s->nic), buf, size);
     } else {
-        qemu_send_packet(&s->nic->nc, buf, size);
+        qemu_send_packet(qemu_get_queue(s->nic), buf, size);
     }
 }
 
@@ -945,7 +945,7 @@ set_rdt(E1000State *s, int index, uint32_t val)
 {
     s->mac_reg[index] = val & 0xffff;
     if (e1000_has_rxbufs(s, 1)) {
-        qemu_flush_queued_packets(&s->nic->nc);
+        qemu_flush_queued_packets(qemu_get_queue(s->nic));
     }
 }
 
@@ -1102,7 +1102,7 @@ static int e1000_post_load(void *opaque, int version_id)
 
     /* nc.link_down can't be migrated, so infer link_down according
      * to link status bit in mac_reg[STATUS] */
-    s->nic->nc.link_down = (s->mac_reg[STATUS] & E1000_STATUS_LU) == 0;
+    qemu_get_queue(s->nic)->link_down = (s->mac_reg[STATUS] & E1000_STATUS_LU) == 0;
 
     return 0;
 }
@@ -1234,7 +1234,7 @@ pci_e1000_uninit(PCIDevice *dev)
     qemu_free_timer(d->autoneg_timer);
     memory_region_destroy(&d->mmio);
     memory_region_destroy(&d->io);
-    qemu_del_net_client(&d->nic->nc);
+    qemu_del_net_client(qemu_get_queue(d->nic));
 }
 
 static NetClientInfo net_e1000_info = {
@@ -1281,7 +1281,7 @@ static int pci_e1000_init(PCIDevice *pci_dev)
     d->nic = qemu_new_nic(&net_e1000_info, &d->conf,
                           object_get_typename(OBJECT(d)), d->dev.qdev.id, d);
 
-    qemu_format_nic_info_str(&d->nic->nc, macaddr);
+    qemu_format_nic_info_str(qemu_get_queue(d->nic), macaddr);
 
     add_boot_device_path(d->conf.bootindex, &pci_dev->qdev, "/ethernet-phy@0");
 
diff --git a/hw/eepro100.c b/hw/eepro100.c
index a189474..c6e91c7 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -828,7 +828,7 @@ static void tx_command(EEPRO100State *s)
         }
     }
     TRACE(RXTX, logout("%p sending frame, len=%d,%s\n", s, size, nic_dump(buf, size)));
-    qemu_send_packet(&s->nic->nc, buf, size);
+    qemu_send_packet(qemu_get_queue(s->nic), buf, size);
     s->statistics.tx_good_frames++;
     /* Transmit with bad status would raise an CX/TNO interrupt.
      * (82557 only). Emulation never has bad status. */
@@ -1036,7 +1036,7 @@ static void eepro100_ru_command(EEPRO100State * s, uint8_t val)
         }
         set_ru_state(s, ru_ready);
         s->ru_offset = e100_read_reg4(s, SCBPointer);
-        qemu_flush_queued_packets(&s->nic->nc);
+        qemu_flush_queued_packets(qemu_get_queue(s->nic));
         TRACE(OTHER, logout("val=0x%02x (rx start)\n", val));
         break;
     case RX_RESUME:
@@ -1849,7 +1849,7 @@ static void pci_nic_uninit(PCIDevice *pci_dev)
     memory_region_destroy(&s->flash_bar);
     vmstate_unregister(&pci_dev->qdev, s->vmstate, s);
     eeprom93xx_free(&pci_dev->qdev, s->eeprom);
-    qemu_del_net_client(&s->nic->nc);
+    qemu_del_net_client(qemu_get_queue(s->nic));
 }
 
 static NetClientInfo net_eepro100_info = {
@@ -1895,14 +1895,14 @@ static int e100_nic_init(PCIDevice *pci_dev)
     s->nic = qemu_new_nic(&net_eepro100_info, &s->conf,
                           object_get_typename(OBJECT(pci_dev)), pci_dev->qdev.id, s);
 
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
-    TRACE(OTHER, logout("%s\n", s->nic->nc.info_str));
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
+    TRACE(OTHER, logout("%s\n", qemu_get_queue(s->nic)->info_str));
 
     qemu_register_reset(nic_reset, s);
 
     s->vmstate = g_malloc(sizeof(vmstate_eepro100));
     memcpy(s->vmstate, &vmstate_eepro100, sizeof(vmstate_eepro100));
-    s->vmstate->name = s->nic->nc.model;
+    s->vmstate->name = qemu_get_queue(s->nic)->model;
     vmstate_register(&pci_dev->qdev, -1, s->vmstate, s);
 
     add_boot_device_path(s->conf.bootindex, &pci_dev->qdev, "/ethernet-phy@0");
diff --git a/hw/etraxfs_eth.c b/hw/etraxfs_eth.c
index 3d42426..dbafb55 100644
--- a/hw/etraxfs_eth.c
+++ b/hw/etraxfs_eth.c
@@ -545,7 +545,7 @@ static int eth_tx_push(void *opaque, unsigned char *buf, int len, bool eop)
 	struct fs_eth *eth = opaque;
 
 	D(printf("%s buf=%p len=%d\n", __func__, buf, len));
-	qemu_send_packet(&eth->nic->nc, buf, len);
+	qemu_send_packet(qemu_get_queue(eth->nic), buf, len);
 	return len;
 }
 
@@ -606,7 +606,7 @@ static int fs_eth_init(SysBusDevice *dev)
 	qemu_macaddr_default_if_unset(&s->conf.macaddr);
 	s->nic = qemu_new_nic(&net_etraxfs_info, &s->conf,
 			      object_get_typename(OBJECT(s)), dev->qdev.id, s);
-	qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+	qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
 	tdk_init(&s->phy);
 	mdio_attach(&s->mdio_bus, &s->phy, s->phyaddr);
diff --git a/hw/lan9118.c b/hw/lan9118.c
index f724e1c..8f340a5 100644
--- a/hw/lan9118.c
+++ b/hw/lan9118.c
@@ -341,7 +341,7 @@ static void lan9118_update(lan9118_state *s)
 
 static void lan9118_mac_changed(lan9118_state *s)
 {
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
 static void lan9118_reload_eeprom(lan9118_state *s)
@@ -373,7 +373,7 @@ static void phy_update_irq(lan9118_state *s)
 static void phy_update_link(lan9118_state *s)
 {
     /* Autonegotiation status mirrors link status.  */
-    if (s->nic->nc.link_down) {
+    if (qemu_get_queue(s->nic)->link_down) {
         s->phy_status &= ~0x0024;
         s->phy_int |= PHY_INT_DOWN;
     } else {
@@ -657,9 +657,9 @@ static void do_tx_packet(lan9118_state *s)
     /* FIXME: Honor TX disable, and allow queueing of packets.  */
     if (s->phy_control & 0x4000)  {
         /* This assumes the receive routine doesn't touch the VLANClient.  */
-        lan9118_receive(&s->nic->nc, s->txp->data, s->txp->len);
+        lan9118_receive(qemu_get_queue(s->nic), s->txp->data, s->txp->len);
     } else {
-        qemu_send_packet(&s->nic->nc, s->txp->data, s->txp->len);
+        qemu_send_packet(qemu_get_queue(s->nic), s->txp->data, s->txp->len);
     }
     s->txp->fifo_used = 0;
 
@@ -1335,7 +1335,7 @@ static int lan9118_init1(SysBusDevice *dev)
 
     s->nic = qemu_new_nic(&net_lan9118_info, &s->conf,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
     s->eeprom[0] = 0xa5;
     for (i = 0; i < 6; i++) {
         s->eeprom[i + 1] = s->conf.macaddr.a[i];
diff --git a/hw/mcf_fec.c b/hw/mcf_fec.c
index 1ed193c..d7532b1 100644
--- a/hw/mcf_fec.c
+++ b/hw/mcf_fec.c
@@ -174,7 +174,7 @@ static void mcf_fec_do_tx(mcf_fec_state *s)
         if (bd.flags & FEC_BD_L) {
             /* Last buffer in frame.  */
             DPRINTF("Sending packet\n");
-            qemu_send_packet(&s->nic->nc, frame, len);
+            qemu_send_packet(qemu_get_queue(s->nic), frame, len);
             ptr = frame;
             frame_size = 0;
             s->eir |= FEC_INT_TXF;
@@ -476,5 +476,5 @@ void mcf_fec_init(MemoryRegion *sysmem, NICInfo *nd,
 
     s->nic = qemu_new_nic(&net_mcf_fec_info, &s->conf, nd->model, nd->name, s);
 
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
diff --git a/hw/milkymist-minimac2.c b/hw/milkymist-minimac2.c
index b204e5f..8a5cc40 100644
--- a/hw/milkymist-minimac2.c
+++ b/hw/milkymist-minimac2.c
@@ -257,7 +257,7 @@ static void minimac2_tx(MilkymistMinimac2State *s)
     trace_milkymist_minimac2_tx_frame(txcount - 12);
 
     /* send packet, skipping preamble and sfd */
-    qemu_send_packet_raw(&s->nic->nc, buf + 8, txcount - 12);
+    qemu_send_packet_raw(qemu_get_queue(s->nic), buf + 8, txcount - 12);
 
     s->regs[R_TXCOUNT] = 0;
 
@@ -480,7 +480,7 @@ static int milkymist_minimac2_init(SysBusDevice *dev)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_milkymist_minimac2_info, &s->conf,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     return 0;
 }
diff --git a/hw/mipsnet.c b/hw/mipsnet.c
index bece332..5d1ab5a 100644
--- a/hw/mipsnet.c
+++ b/hw/mipsnet.c
@@ -173,7 +173,7 @@ static void mipsnet_ioport_write(void *opaque, hwaddr addr,
         if (s->tx_written == s->tx_count) {
             /* Send buffer. */
             trace_mipsnet_send(s->tx_count);
-            qemu_send_packet(&s->nic->nc, s->tx_buffer, s->tx_count);
+            qemu_send_packet(qemu_get_queue(s->nic), s->tx_buffer, s->tx_count);
             s->tx_count = s->tx_written = 0;
             s->intctl |= MIPSNET_INTCTL_TXDONE;
             s->busy = 1;
@@ -241,7 +241,7 @@ static int mipsnet_sysbus_init(SysBusDevice *dev)
 
     s->nic = qemu_new_nic(&net_mipsnet_info, &s->conf,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     return 0;
 }
diff --git a/hw/musicpal.c b/hw/musicpal.c
index e0c57c8..2b62264 100644
--- a/hw/musicpal.c
+++ b/hw/musicpal.c
@@ -256,7 +256,7 @@ static void eth_send(mv88w8618_eth_state *s, int queue_index)
             len = desc.bytes;
             if (len < 2048) {
                 cpu_physical_memory_read(desc.buffer, buf, len);
-                qemu_send_packet(&s->nic->nc, buf, len);
+                qemu_send_packet(qemu_get_queue(s->nic), buf, len);
             }
             desc.cmdstat &= ~MP_ETH_TX_OWN;
             s->icr |= 1 << (MP_ETH_IRQ_TXLO_BIT - queue_index);
diff --git a/hw/ne2000-isa.c b/hw/ne2000-isa.c
index 69982a9..4a6105e 100644
--- a/hw/ne2000-isa.c
+++ b/hw/ne2000-isa.c
@@ -77,7 +77,7 @@ static int isa_ne2000_initfn(ISADevice *dev)
 
     s->nic = qemu_new_nic(&net_ne2000_isa_info, &s->c,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->c.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->c.macaddr.a);
 
     return 0;
 }
diff --git a/hw/ne2000.c b/hw/ne2000.c
index d3dd9a6..21d9ace 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -300,7 +300,7 @@ static void ne2000_ioport_write(void *opaque, uint32_t addr, uint32_t val)
                     index -= NE2000_PMEM_SIZE;
                 /* fail safe: check range on the transmitted length  */
                 if (index + s->tcnt <= NE2000_PMEM_END) {
-                    qemu_send_packet(&s->nic->nc, s->mem + index, s->tcnt);
+                    qemu_send_packet(qemu_get_queue(s->nic), s->mem + index, s->tcnt);
                 }
                 /* signal end of transfer */
                 s->tsr = ENTSR_PTX;
@@ -737,7 +737,7 @@ static int pci_ne2000_init(PCIDevice *pci_dev)
 
     s->nic = qemu_new_nic(&net_ne2000_info, &s->c,
                           object_get_typename(OBJECT(pci_dev)), pci_dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->c.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->c.macaddr.a);
 
     add_boot_device_path(s->c.bootindex, &pci_dev->qdev, "/ethernet-phy@0");
 
@@ -750,7 +750,7 @@ static void pci_ne2000_exit(PCIDevice *pci_dev)
     NE2000State *s = &d->ne2000;
 
     memory_region_destroy(&s->io);
-    qemu_del_net_client(&s->nic->nc);
+    qemu_del_net_client(qemu_get_queue(s->nic));
 }
 
 static Property ne2000_properties[] = {
diff --git a/hw/opencores_eth.c b/hw/opencores_eth.c
index b2780b9..821e54f 100644
--- a/hw/opencores_eth.c
+++ b/hw/opencores_eth.c
@@ -339,7 +339,7 @@ static void open_eth_reset(void *opaque)
     s->rx_desc = 0x40;
 
     mii_reset(&s->mii);
-    open_eth_set_link_status(&s->nic->nc);
+    open_eth_set_link_status(qemu_get_queue(s->nic));
 }
 
 static int open_eth_can_receive(NetClientState *nc)
@@ -499,7 +499,7 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)
     if (tx_len > len) {
         memset(buf + len, 0, tx_len - len);
     }
-    qemu_send_packet(&s->nic->nc, buf, tx_len);
+    qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len);
 
     if (tx->len_flags & TXD_WR) {
         s->tx_desc = 0;
@@ -606,7 +606,7 @@ static void open_eth_mii_command_host_write(OpenEthState *s, uint32_t val)
         } else {
             s->regs[MIIRX_DATA] = 0xffff;
         }
-        SET_REGFIELD(s, MIISTATUS, LINKFAIL, s->nic->nc.link_down);
+        SET_REGFIELD(s, MIISTATUS, LINKFAIL, qemu_get_queue(s->nic)->link_down);
     }
 }
 
diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index 0bf438f..d9fb591 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -279,7 +279,7 @@ static void pci_pcnet_uninit(PCIDevice *dev)
     memory_region_destroy(&d->io_bar);
     qemu_del_timer(d->state.poll_timer);
     qemu_free_timer(d->state.poll_timer);
-    qemu_del_net_client(&d->state.nic->nc);
+    qemu_del_net_client(qemu_get_queue(d->state.nic));
 }
 
 static NetClientInfo net_pci_pcnet_info = {
diff --git a/hw/pcnet.c b/hw/pcnet.c
index 54eecd0..b3dc309 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -1261,11 +1261,11 @@ static void pcnet_transmit(PCNetState *s)
                 if (BCR_SWSTYLE(s) == 1)
                     add_crc = !GET_FIELD(tmd.status, TMDS, NOFCS);
                 s->looptest = add_crc ? PCNET_LOOPTEST_CRC : PCNET_LOOPTEST_NOCRC;
-                pcnet_receive(&s->nic->nc, s->buffer, s->xmit_pos);
+                pcnet_receive(qemu_get_queue(s->nic), s->buffer, s->xmit_pos);
                 s->looptest = 0;
             } else
                 if (s->nic)
-                    qemu_send_packet(&s->nic->nc, s->buffer, s->xmit_pos);
+                    qemu_send_packet(qemu_get_queue(s->nic), s->buffer, s->xmit_pos);
 
             s->csr[0] &= ~0x0008;   /* clear TDMD */
             s->csr[4] |= 0x0004;    /* set TXSTRT */
@@ -1730,7 +1730,7 @@ int pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo *info)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(info, &s->conf, object_get_typename(OBJECT(dev)), dev->id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     add_boot_device_path(s->conf.bootindex, dev, "/ethernet-phy@0");
 
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index e3aa8bf..cb975b2 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -1786,7 +1786,7 @@ static void rtl8139_transfer_frame(RTL8139State *s, uint8_t *buf, int size,
         }
 
         DPRINTF("+++ transmit loopback mode\n");
-        rtl8139_do_receive(&s->nic->nc, buf, size, do_interrupt);
+        rtl8139_do_receive(qemu_get_queue(s->nic), buf, size, do_interrupt);
 
         if (iov) {
             g_free(buf2);
@@ -1795,9 +1795,9 @@ static void rtl8139_transfer_frame(RTL8139State *s, uint8_t *buf, int size,
     else
     {
         if (iov) {
-            qemu_sendv_packet(&s->nic->nc, iov, 3);
+            qemu_sendv_packet(qemu_get_queue(s->nic), iov, 3);
         } else {
-            qemu_send_packet(&s->nic->nc, buf, size);
+            qemu_send_packet(qemu_get_queue(s->nic), buf, size);
         }
     }
 }
@@ -3229,7 +3229,7 @@ static int rtl8139_post_load(void *opaque, int version_id)
 
     /* nc.link_down can't be migrated, so infer link_down according
      * to link status bit in BasicModeStatus */
-    s->nic->nc.link_down = (s->BasicModeStatus & 0x04) == 0;
+    qemu_get_queue(s->nic)->link_down = (s->BasicModeStatus & 0x04) == 0;
 
     return 0;
 }
@@ -3445,7 +3445,7 @@ static void pci_rtl8139_uninit(PCIDevice *dev)
     }
     qemu_del_timer(s->timer);
     qemu_free_timer(s->timer);
-    qemu_del_net_client(&s->nic->nc);
+    qemu_del_net_client(qemu_get_queue(s->nic));
 }
 
 static void rtl8139_set_link_status(NetClientState *nc)
@@ -3502,7 +3502,7 @@ static int pci_rtl8139_init(PCIDevice *dev)
 
     s->nic = qemu_new_nic(&net_rtl8139_info, &s->conf,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     s->cplus_txbuffer = NULL;
     s->cplus_txbuffer_len = 0;
diff --git a/hw/smc91c111.c b/hw/smc91c111.c
index 4ceed01..b466d66 100644
--- a/hw/smc91c111.c
+++ b/hw/smc91c111.c
@@ -237,7 +237,7 @@ static void smc91c111_do_tx(smc91c111_state *s)
             smc91c111_release_packet(s, packetnum);
         else if (s->tx_fifo_done_len < NUM_PACKETS)
             s->tx_fifo_done[s->tx_fifo_done_len++] = packetnum;
-        qemu_send_packet(&s->nic->nc, p, len);
+        qemu_send_packet(qemu_get_queue(s->nic), p, len);
     }
     s->tx_fifo_len = 0;
     smc91c111_update(s);
@@ -753,7 +753,7 @@ static int smc91c111_init1(SysBusDevice *dev)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_smc91c111_info, &s->conf,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
     /* ??? Save/restore.  */
     return 0;
 }
diff --git a/hw/spapr_llan.c b/hw/spapr_llan.c
index 09ad69f..5232852 100644
--- a/hw/spapr_llan.c
+++ b/hw/spapr_llan.c
@@ -199,7 +199,7 @@ static int spapr_vlan_init(VIOsPAPRDevice *sdev)
 
     dev->nic = qemu_new_nic(&net_spapr_vlan_info, &dev->nicconf,
                             object_get_typename(OBJECT(sdev)), sdev->qdev.id, dev);
-    qemu_format_nic_info_str(&dev->nic->nc, dev->nicconf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(dev->nic), dev->nicconf.macaddr.a);
 
     return 0;
 }
@@ -462,7 +462,7 @@ static target_ulong h_send_logical_lan(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         p += VLAN_BD_LEN(bufs[i]);
     }
 
-    qemu_send_packet(&dev->nic->nc, lbuf, total_len);
+    qemu_send_packet(qemu_get_queue(dev->nic), lbuf, total_len);
 
     return H_SUCCESS;
 }
diff --git a/hw/stellaris_enet.c b/hw/stellaris_enet.c
index a530b10..65d73a0 100644
--- a/hw/stellaris_enet.c
+++ b/hw/stellaris_enet.c
@@ -259,7 +259,7 @@ static void stellaris_enet_write(void *opaque, hwaddr offset,
                     memset(&s->tx_fifo[s->tx_frame_len], 0, 60 - s->tx_frame_len);
                     s->tx_fifo_len = 60;
                 }
-                qemu_send_packet(&s->nic->nc, s->tx_fifo, s->tx_frame_len);
+                qemu_send_packet(qemu_get_queue(s->nic), s->tx_fifo, s->tx_frame_len);
                 s->tx_frame_len = -1;
                 s->ris |= SE_INT_TXEMP;
                 stellaris_enet_update(s);
@@ -412,7 +412,7 @@ static int stellaris_enet_init(SysBusDevice *dev)
 
     s->nic = qemu_new_nic(&net_stellaris_enet_info, &s->conf,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     stellaris_enet_reset(s);
     register_savevm(&s->busdev.qdev, "stellaris_enet", -1, 1,
diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index 30cb033..ab7e7a7 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1011,7 +1011,7 @@ static int rndis_keepalive_response(USBNetState *s,
 static void usb_net_reset_in_buf(USBNetState *s)
 {
     s->in_ptr = s->in_len = 0;
-    qemu_flush_queued_packets(&s->nic->nc);
+    qemu_flush_queued_packets(qemu_get_queue(s->nic));
 }
 
 static int rndis_parse(USBNetState *s, uint8_t *data, int length)
@@ -1195,7 +1195,7 @@ static void usb_net_handle_dataout(USBNetState *s, USBPacket *p)
 
     if (!is_rndis(s)) {
         if (p->iov.size < 64) {
-            qemu_send_packet(&s->nic->nc, s->out_buf, s->out_ptr);
+            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf, s->out_ptr);
             s->out_ptr = 0;
         }
         return;
@@ -1208,7 +1208,7 @@ static void usb_net_handle_dataout(USBNetState *s, USBPacket *p)
         uint32_t offs = 8 + le32_to_cpu(msg->DataOffset);
         uint32_t size = le32_to_cpu(msg->DataLength);
         if (offs + size <= len)
-            qemu_send_packet(&s->nic->nc, s->out_buf + offs, size);
+            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf + offs, size);
     }
     s->out_ptr -= len;
     memmove(s->out_buf, &s->out_buf[len], s->out_ptr);
@@ -1329,7 +1329,7 @@ static void usb_net_handle_destroy(USBDevice *dev)
 
     /* TODO: remove the nd_table[] entry */
     rndis_clear_responsequeue(s);
-    qemu_del_net_client(&s->nic->nc);
+    qemu_del_net_client(qemu_get_queue(s->nic));
 }
 
 static NetClientInfo net_usbnet_info = {
@@ -1360,7 +1360,7 @@ static int usb_net_initfn(USBDevice *dev)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_usbnet_info, &s->conf,
                           object_get_typename(OBJECT(s)), s->dev.qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
     snprintf(s->usbstring_mac, sizeof(s->usbstring_mac),
              "%02x%02x%02x%02x%02x%02x",
              0x40,
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 108ce07..1c59db1 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -95,7 +95,7 @@ static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config)
 
     if (memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
         memcpy(n->mac, netcfg.mac, ETH_ALEN);
-        qemu_format_nic_info_str(&n->nic->nc, n->mac);
+        qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
     }
 }
 
@@ -107,26 +107,26 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
 
 static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
 {
-    if (!n->nic->nc.peer) {
+    if (!qemu_get_queue(n->nic)->peer) {
         return;
     }
-    if (n->nic->nc.peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
+    if (qemu_get_queue(n->nic)->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
         return;
     }
 
-    if (!tap_get_vhost_net(n->nic->nc.peer)) {
+    if (!tap_get_vhost_net(qemu_get_queue(n->nic)->peer)) {
         return;
     }
     if (!!n->vhost_started == virtio_net_started(n, status) &&
-                              !n->nic->nc.peer->link_down) {
+                              !qemu_get_queue(n->nic)->peer->link_down) {
         return;
     }
     if (!n->vhost_started) {
         int r;
-        if (!vhost_net_query(tap_get_vhost_net(n->nic->nc.peer), &n->vdev)) {
+        if (!vhost_net_query(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev)) {
             return;
         }
-        r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
+        r = vhost_net_start(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev);
         if (r < 0) {
             error_report("unable to start vhost net: %d: "
                          "falling back on userspace virtio", -r);
@@ -134,7 +134,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
             n->vhost_started = 1;
         }
     } else {
-        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
+        vhost_net_stop(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev);
         n->vhost_started = 0;
     }
 }
@@ -204,13 +204,13 @@ static void virtio_net_reset(VirtIODevice *vdev)
 
 static void peer_test_vnet_hdr(VirtIONet *n)
 {
-    if (!n->nic->nc.peer)
+    if (!qemu_get_queue(n->nic)->peer)
         return;
 
-    if (n->nic->nc.peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP)
+    if (qemu_get_queue(n->nic)->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP)
         return;
 
-    n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->nc.peer);
+    n->has_vnet_hdr = tap_has_vnet_hdr(qemu_get_queue(n->nic)->peer);
 }
 
 static int peer_has_vnet_hdr(VirtIONet *n)
@@ -223,7 +223,7 @@ static int peer_has_ufo(VirtIONet *n)
     if (!peer_has_vnet_hdr(n))
         return 0;
 
-    n->has_ufo = tap_has_ufo(n->nic->nc.peer);
+    n->has_ufo = tap_has_ufo(qemu_get_queue(n->nic)->peer);
 
     return n->has_ufo;
 }
@@ -236,8 +236,8 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs)
         sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct virtio_net_hdr);
 
     if (peer_has_vnet_hdr(n) &&
-        tap_has_vnet_hdr_len(n->nic->nc.peer, n->guest_hdr_len)) {
-        tap_set_vnet_hdr_len(n->nic->nc.peer, n->guest_hdr_len);
+        tap_has_vnet_hdr_len(qemu_get_queue(n->nic)->peer, n->guest_hdr_len)) {
+        tap_set_vnet_hdr_len(qemu_get_queue(n->nic)->peer, n->guest_hdr_len);
         n->host_hdr_len = n->guest_hdr_len;
     }
 }
@@ -265,14 +265,14 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
         features &= ~(0x1 << VIRTIO_NET_F_HOST_UFO);
     }
 
-    if (!n->nic->nc.peer ||
-        n->nic->nc.peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
+    if (!qemu_get_queue(n->nic)->peer ||
+        qemu_get_queue(n->nic)->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
         return features;
     }
-    if (!tap_get_vhost_net(n->nic->nc.peer)) {
+    if (!tap_get_vhost_net(qemu_get_queue(n->nic)->peer)) {
         return features;
     }
-    return vhost_net_get_features(tap_get_vhost_net(n->nic->nc.peer), features);
+    return vhost_net_get_features(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), features);
 }
 
 static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
@@ -297,21 +297,21 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
     virtio_net_set_mrg_rx_bufs(n, !!(features & (1 << VIRTIO_NET_F_MRG_RXBUF)));
 
     if (n->has_vnet_hdr) {
-        tap_set_offload(n->nic->nc.peer,
+        tap_set_offload(qemu_get_queue(n->nic)->peer,
                         (features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
                         (features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
                         (features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
                         (features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
                         (features >> VIRTIO_NET_F_GUEST_UFO)  & 1);
     }
-    if (!n->nic->nc.peer ||
-        n->nic->nc.peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
+    if (!qemu_get_queue(n->nic)->peer ||
+        qemu_get_queue(n->nic)->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
         return;
     }
-    if (!tap_get_vhost_net(n->nic->nc.peer)) {
+    if (!tap_get_vhost_net(qemu_get_queue(n->nic)->peer)) {
         return;
     }
-    vhost_net_ack_features(tap_get_vhost_net(n->nic->nc.peer), features);
+    vhost_net_ack_features(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), features);
 }
 
 static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
@@ -463,7 +463,7 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
 
-    qemu_flush_queued_packets(&n->nic->nc);
+    qemu_flush_queued_packets(qemu_get_queue(n->nic));
 }
 
 static int virtio_net_can_receive(NetClientState *nc)
@@ -605,7 +605,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
     unsigned mhdr_cnt = 0;
     size_t offset, i, guest_offset;
 
-    if (!virtio_net_can_receive(&n->nic->nc))
+    if (!virtio_net_can_receive(qemu_get_queue(n->nic)))
         return -1;
 
     /* hdr_len refers to the header we supply to the guest */
@@ -754,7 +754,7 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
 
         len = n->guest_hdr_len;
 
-        ret = qemu_sendv_packet_async(&n->nic->nc, out_sg, out_num,
+        ret = qemu_sendv_packet_async(qemu_get_queue(n->nic), out_sg, out_num,
                                       virtio_net_tx_complete);
         if (ret == 0) {
             virtio_queue_set_notification(n->tx_vq, 0);
@@ -951,7 +951,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
         }
 
         if (n->has_vnet_hdr) {
-            tap_set_offload(n->nic->nc.peer,
+            tap_set_offload(qemu_get_queue(n->nic)->peer,
                     (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
                     (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
                     (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
@@ -989,7 +989,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
 
     /* nc.link_down can't be migrated, so infer link_down according
      * to link status bit in n->status */
-    n->nic->nc.link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    qemu_get_queue(n->nic)->link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
 
     return 0;
 }
@@ -1051,13 +1051,13 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
     n->nic = qemu_new_nic(&net_virtio_info, conf, object_get_typename(OBJECT(dev)), dev->id, n);
     peer_test_vnet_hdr(n);
     if (peer_has_vnet_hdr(n)) {
-        tap_using_vnet_hdr(n->nic->nc.peer, 1);
+        tap_using_vnet_hdr(qemu_get_queue(n->nic)->peer, 1);
         n->host_hdr_len = sizeof(struct virtio_net_hdr);
     } else {
         n->host_hdr_len = 0;
     }
 
-    qemu_format_nic_info_str(&n->nic->nc, conf->macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(n->nic), conf->macaddr.a);
 
     n->tx_waiting = 0;
     n->tx_burst = net->txburst;
@@ -1084,7 +1084,7 @@ void virtio_net_exit(VirtIODevice *vdev)
     /* This will stop vhost backend if appropriate. */
     virtio_net_set_status(vdev, 0);
 
-    qemu_purge_queued_packets(&n->nic->nc);
+    qemu_purge_queued_packets(qemu_get_queue(n->nic));
 
     unregister_savevm(n->qdev, "virtio-net", n);
 
@@ -1098,6 +1098,6 @@ void virtio_net_exit(VirtIODevice *vdev)
         qemu_bh_delete(n->tx_bh);
     }
 
-    qemu_del_net_client(&n->nic->nc);
+    qemu_del_net_client(qemu_get_queue(n->nic));
     virtio_cleanup(&n->vdev);
 }
diff --git a/hw/xen_nic.c b/hw/xen_nic.c
index cf7d559..cfd6def 100644
--- a/hw/xen_nic.c
+++ b/hw/xen_nic.c
@@ -186,9 +186,9 @@ static void net_tx_packets(struct XenNetDev *netdev)
                 }
                 memcpy(tmpbuf, page + txreq.offset, txreq.size);
                 net_checksum_calculate(tmpbuf, txreq.size);
-                qemu_send_packet(&netdev->nic->nc, tmpbuf, txreq.size);
+                qemu_send_packet(qemu_get_queue(netdev->nic), tmpbuf, txreq.size);
             } else {
-                qemu_send_packet(&netdev->nic->nc, page + txreq.offset, txreq.size);
+                qemu_send_packet(qemu_get_queue(netdev->nic), page + txreq.offset, txreq.size);
             }
             xc_gnttab_munmap(netdev->xendev.gnttabdev, page, 1);
             net_tx_response(netdev, &txreq, NETIF_RSP_OKAY);
@@ -330,7 +330,7 @@ static int net_init(struct XenDevice *xendev)
     netdev->nic = qemu_new_nic(&net_xen_info, &netdev->conf,
                                "xen", NULL, netdev);
 
-    snprintf(netdev->nic->nc.info_str, sizeof(netdev->nic->nc.info_str),
+    snprintf(qemu_get_queue(netdev->nic)->info_str, sizeof(qemu_get_queue(netdev->nic)->info_str),
              "nic: xenbus vif macaddr=%s", netdev->mac);
 
     /* fill info */
@@ -406,7 +406,7 @@ static void net_disconnect(struct XenDevice *xendev)
         netdev->rxs = NULL;
     }
     if (netdev->nic) {
-        qemu_del_net_client(&netdev->nic->nc);
+        qemu_del_net_client(qemu_get_queue(netdev->nic));
         netdev->nic = NULL;
     }
 }
@@ -415,7 +415,7 @@ static void net_event(struct XenDevice *xendev)
 {
     struct XenNetDev *netdev = container_of(xendev, struct XenNetDev, xendev);
     net_tx_packets(netdev);
-    qemu_flush_queued_packets(&netdev->nic->nc);
+    qemu_flush_queued_packets(qemu_get_queue(netdev->nic));
 }
 
 static int net_free(struct XenDevice *xendev)
diff --git a/hw/xgmac.c b/hw/xgmac.c
index ec50c74..0ec3c85 100644
--- a/hw/xgmac.c
+++ b/hw/xgmac.c
@@ -235,7 +235,7 @@ static void xgmac_enet_send(struct XgmacState *s)
         frame_size += len;
         if (bd.ctl_stat & 0x20000000) {
             /* Last buffer in frame.  */
-            qemu_send_packet(&s->nic->nc, frame, len);
+            qemu_send_packet(qemu_get_queue(s->nic), frame, len);
             ptr = frame;
             frame_size = 0;
             s->regs[DMA_STATUS] |= DMA_STATUS_TI | DMA_STATUS_NIS;
@@ -391,7 +391,7 @@ static int xgmac_enet_init(SysBusDevice *dev)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_xgmac_enet_info, &s->conf,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     s->regs[XGMAC_ADDR_HIGH(0)] = (s->conf.macaddr.a[5] << 8) |
                                    s->conf.macaddr.a[4];
diff --git a/hw/xilinx_axienet.c b/hw/xilinx_axienet.c
index f2e3bf1..254fe72 100644
--- a/hw/xilinx_axienet.c
+++ b/hw/xilinx_axienet.c
@@ -827,7 +827,7 @@ axienet_stream_push(StreamSlave *obj, uint8_t *buf, size_t size, uint32_t *hdr)
         buf[write_off + 1] = csum & 0xff;
     }
 
-    qemu_send_packet(&s->nic->nc, buf, size);
+    qemu_send_packet(qemu_get_queue(s->nic), buf, size);
 
     s->stats.tx_bytes += size;
     s->regs[R_IS] |= IS_TX_COMPLETE;
@@ -854,7 +854,7 @@ static int xilinx_enet_init(SysBusDevice *dev)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_xilinx_enet_info, &s->conf,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     tdk_init(&s->TEMAC.phy);
     mdio_attach(&s->TEMAC.mdio_bus, &s->TEMAC.phy, s->c_phyaddr);
diff --git a/hw/xilinx_ethlite.c b/hw/xilinx_ethlite.c
index 13bd456..aa19c97 100644
--- a/hw/xilinx_ethlite.c
+++ b/hw/xilinx_ethlite.c
@@ -117,7 +117,7 @@ eth_write(void *opaque, hwaddr addr,
 
             D(qemu_log("%s addr=%x val=%x\n", __func__, addr * 4, value));
             if ((value & (CTRL_P | CTRL_S)) == CTRL_S) {
-                qemu_send_packet(&s->nic->nc,
+                qemu_send_packet(qemu_get_queue(s->nic),
                                  (void *) &s->regs[base],
                                  s->regs[base + R_TX_LEN0]);
                 D(qemu_log("eth_tx %d\n", s->regs[base + R_TX_LEN0]));
@@ -223,7 +223,7 @@ static int xilinx_ethlite_init(SysBusDevice *dev)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_xilinx_ethlite_info, &s->conf,
                           object_get_typename(OBJECT(dev)), dev->qdev.id, s);
-    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
     return 0;
 }
 
diff --git a/net.c b/net.c
index e8ae13e..23fcbce 100644
--- a/net.c
+++ b/net.c
@@ -233,6 +233,11 @@ NICState *qemu_new_nic(NetClientInfo *info,
     return nic;
 }
 
+NetClientState *qemu_get_queue(NICState *nic)
+{
+    return &nic->nc;
+}
+
 static void qemu_cleanup_net_client(NetClientState *nc)
 {
     QTAILQ_REMOVE(&net_clients, nc, next);
diff --git a/net.h b/net.h
index 04fda1d..fe44cb8 100644
--- a/net.h
+++ b/net.h
@@ -77,6 +77,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
                        const char *model,
                        const char *name,
                        void *opaque);
+NetClientState *qemu_get_queue(NICState *nic);
 void qemu_del_net_client(NetClientState *nc);
 NetClientState *qemu_find_vlan_client_by_name(Monitor *mon, int vlan_id,
                                               const char *client_str);
diff --git a/savevm.c b/savevm.c
index 5d04d59..2bc513b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -129,7 +129,7 @@ static void qemu_announce_self_iter(NICState *nic, void *opaque)
 
     len = announce_self_create(buf, nic->conf->macaddr.a);
 
-    qemu_send_packet_raw(&nic->nc, buf, len);
+    qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
 }
 
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 03/12] net: introduce qemu_get_nic()
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
  2012-12-28 10:31 ` [PATCH 01/12] tap: multiqueue support Jason Wang
  2012-12-28 10:31 ` [PATCH 02/12] net: introduce qemu_get_queue() Jason Wang
@ 2012-12-28 10:31 ` Jason Wang
  2012-12-28 10:31 ` [PATCH 04/12] net: intorduce qemu_del_nic() Jason Wang
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:31 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

To support multiqueue , the patch introduce a helper qemu_get_nic() to get the
NICState from a NetClientState. The following patches would refactor this helper
to support multiqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/cadence_gem.c        |    8 ++++----
 hw/dp8393x.c            |    6 +++---
 hw/e1000.c              |    8 ++++----
 hw/eepro100.c           |    6 +++---
 hw/etraxfs_eth.c        |    6 +++---
 hw/lan9118.c            |    6 +++---
 hw/lance.c              |    2 +-
 hw/mcf_fec.c            |    6 +++---
 hw/milkymist-minimac2.c |    6 +++---
 hw/mipsnet.c            |    6 +++---
 hw/musicpal.c           |    4 ++--
 hw/ne2000-isa.c         |    2 +-
 hw/ne2000.c             |    6 +++---
 hw/opencores_eth.c      |    6 +++---
 hw/pcnet-pci.c          |    2 +-
 hw/pcnet.c              |    6 +++---
 hw/rtl8139.c            |    8 ++++----
 hw/smc91c111.c          |    6 +++---
 hw/spapr_llan.c         |    4 ++--
 hw/stellaris_enet.c     |    6 +++---
 hw/usb/dev-network.c    |    6 +++---
 hw/virtio-net.c         |   10 +++++-----
 hw/xen_nic.c            |    4 ++--
 hw/xgmac.c              |    6 +++---
 hw/xilinx_axienet.c     |    6 +++---
 hw/xilinx_ethlite.c     |    6 +++---
 net.c                   |   20 ++++++++++++++++----
 net.h                   |    2 ++
 28 files changed, 92 insertions(+), 78 deletions(-)

diff --git a/hw/cadence_gem.c b/hw/cadence_gem.c
index 9d27ecb..6874de9 100644
--- a/hw/cadence_gem.c
+++ b/hw/cadence_gem.c
@@ -409,7 +409,7 @@ static int gem_can_receive(NetClientState *nc)
 {
     GemState *s;
 
-    s = DO_UPCAST(NICState, nc, nc)->opaque;
+    s = qemu_get_nic_opaque(nc);
 
     DB_PRINT("\n");
 
@@ -612,7 +612,7 @@ static ssize_t gem_receive(NetClientState *nc, const uint8_t *buf, size_t size)
     uint8_t    rxbuf[2048];
     uint8_t   *rxbuf_ptr;
 
-    s = DO_UPCAST(NICState, nc, nc)->opaque;
+    s = qemu_get_nic_opaque(nc);
 
     /* Do nothing if receive is not enabled. */
     if (!(s->regs[GEM_NWCTRL] & GEM_NWCTRL_RXENA)) {
@@ -1148,7 +1148,7 @@ static const MemoryRegionOps gem_ops = {
 
 static void gem_cleanup(NetClientState *nc)
 {
-    GemState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    GemState *s = qemu_get_nic_opaque(nc);
 
     DB_PRINT("\n");
     s->nic = NULL;
@@ -1157,7 +1157,7 @@ static void gem_cleanup(NetClientState *nc)
 static void gem_set_link(NetClientState *nc)
 {
     DB_PRINT("\n");
-    phy_update_link(DO_UPCAST(NICState, nc, nc)->opaque);
+    phy_update_link(qemu_get_nic_opaque(nc));
 }
 
 static NetClientInfo net_gem_info = {
diff --git a/hw/dp8393x.c b/hw/dp8393x.c
index db50cc6..8f20a4a 100644
--- a/hw/dp8393x.c
+++ b/hw/dp8393x.c
@@ -675,7 +675,7 @@ static const MemoryRegionOps dp8393x_ops = {
 
 static int nic_can_receive(NetClientState *nc)
 {
-    dp8393xState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    dp8393xState *s = qemu_get_nic_opaque(nc);
 
     if (!(s->regs[SONIC_CR] & SONIC_CR_RXEN))
         return 0;
@@ -724,7 +724,7 @@ static int receive_filter(dp8393xState *s, const uint8_t * buf, int size)
 
 static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf, size_t size)
 {
-    dp8393xState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    dp8393xState *s = qemu_get_nic_opaque(nc);
     uint16_t data[10];
     int packet_type;
     uint32_t available, address;
@@ -860,7 +860,7 @@ static void nic_reset(void *opaque)
 
 static void nic_cleanup(NetClientState *nc)
 {
-    dp8393xState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    dp8393xState *s = qemu_get_nic_opaque(nc);
 
     memory_region_del_subregion(s->address_space, &s->mmio);
     memory_region_destroy(&s->mmio);
diff --git a/hw/e1000.c b/hw/e1000.c
index aaa4f88..004f057 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -735,7 +735,7 @@ receive_filter(E1000State *s, const uint8_t *buf, int size)
 static void
 e1000_set_link_status(NetClientState *nc)
 {
-    E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    E1000State *s = qemu_get_nic_opaque(nc);
     uint32_t old_status = s->mac_reg[STATUS];
 
     if (nc->link_down) {
@@ -769,7 +769,7 @@ static bool e1000_has_rxbufs(E1000State *s, size_t total_size)
 static int
 e1000_can_receive(NetClientState *nc)
 {
-    E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    E1000State *s = qemu_get_nic_opaque(nc);
 
     return (s->mac_reg[RCTL] & E1000_RCTL_EN) && e1000_has_rxbufs(s, 1);
 }
@@ -785,7 +785,7 @@ static uint64_t rx_desc_base(E1000State *s)
 static ssize_t
 e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    E1000State *s = qemu_get_nic_opaque(nc);
     struct e1000_rx_desc desc;
     dma_addr_t base;
     unsigned int n, rdt;
@@ -1220,7 +1220,7 @@ e1000_mmio_setup(E1000State *d)
 static void
 e1000_cleanup(NetClientState *nc)
 {
-    E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    E1000State *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/eepro100.c b/hw/eepro100.c
index c6e91c7..c09a315 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1619,7 +1619,7 @@ static const MemoryRegionOps eepro100_ops = {
 
 static int nic_can_receive(NetClientState *nc)
 {
-    EEPRO100State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    EEPRO100State *s = qemu_get_nic_opaque(nc);
     TRACE(RXTX, logout("%p\n", s));
     return get_ru_state(s) == ru_ready;
 #if 0
@@ -1633,7 +1633,7 @@ static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf, size_t size)
      * - Magic packets should set bit 30 in power management driver register.
      * - Interesting packets should set bit 29 in power management driver register.
      */
-    EEPRO100State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    EEPRO100State *s = qemu_get_nic_opaque(nc);
     uint16_t rfd_status = 0xa000;
 #if defined(CONFIG_PAD_RECEIVED_FRAMES)
     uint8_t min_buf[60];
@@ -1835,7 +1835,7 @@ static const VMStateDescription vmstate_eepro100 = {
 
 static void nic_cleanup(NetClientState *nc)
 {
-    EEPRO100State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    EEPRO100State *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/etraxfs_eth.c b/hw/etraxfs_eth.c
index dbafb55..ee6d1ad 100644
--- a/hw/etraxfs_eth.c
+++ b/hw/etraxfs_eth.c
@@ -515,7 +515,7 @@ static int eth_can_receive(NetClientState *nc)
 static ssize_t eth_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
 	unsigned char sa_bcast[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
-	struct fs_eth *eth = DO_UPCAST(NICState, nc, nc)->opaque;
+	struct fs_eth *eth = qemu_get_nic_opaque(nc);
 	int use_ma0 = eth->regs[RW_REC_CTRL] & 1;
 	int use_ma1 = eth->regs[RW_REC_CTRL] & 2;
 	int r_bcast = eth->regs[RW_REC_CTRL] & 8;
@@ -551,7 +551,7 @@ static int eth_tx_push(void *opaque, unsigned char *buf, int len, bool eop)
 
 static void eth_set_link(NetClientState *nc)
 {
-	struct fs_eth *eth = DO_UPCAST(NICState, nc, nc)->opaque;
+	struct fs_eth *eth = qemu_get_nic_opaque(nc);
 	D(printf("%s %d\n", __func__, nc->link_down));
 	eth->phy.link = !nc->link_down;
 }
@@ -568,7 +568,7 @@ static const MemoryRegionOps eth_ops = {
 
 static void eth_cleanup(NetClientState *nc)
 {
-	struct fs_eth *eth = DO_UPCAST(NICState, nc, nc)->opaque;
+	struct fs_eth *eth = qemu_get_nic_opaque(nc);
 
 	/* Disconnect the client.  */
 	eth->dma_out->client.push = NULL;
diff --git a/hw/lan9118.c b/hw/lan9118.c
index 8f340a5..d1f40d5 100644
--- a/hw/lan9118.c
+++ b/hw/lan9118.c
@@ -386,7 +386,7 @@ static void phy_update_link(lan9118_state *s)
 
 static void lan9118_set_link(NetClientState *nc)
 {
-    phy_update_link(DO_UPCAST(NICState, nc, nc)->opaque);
+    phy_update_link(qemu_get_nic_opaque(nc));
 }
 
 static void phy_reset(lan9118_state *s)
@@ -512,7 +512,7 @@ static int lan9118_filter(lan9118_state *s, const uint8_t *addr)
 static ssize_t lan9118_receive(NetClientState *nc, const uint8_t *buf,
                                size_t size)
 {
-    lan9118_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    lan9118_state *s = qemu_get_nic_opaque(nc);
     int fifo_len;
     int offset;
     int src_pos;
@@ -1306,7 +1306,7 @@ static const MemoryRegionOps lan9118_16bit_mem_ops = {
 
 static void lan9118_cleanup(NetClientState *nc)
 {
-    lan9118_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    lan9118_state *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/lance.c b/hw/lance.c
index a3e6dd9..dc27078 100644
--- a/hw/lance.c
+++ b/hw/lance.c
@@ -87,7 +87,7 @@ static const MemoryRegionOps lance_mem_ops = {
 
 static void lance_cleanup(NetClientState *nc)
 {
-    PCNetState *d = DO_UPCAST(NICState, nc, nc)->opaque;
+    PCNetState *d = qemu_get_nic_opaque(nc);
 
     pcnet_common_cleanup(d);
 }
diff --git a/hw/mcf_fec.c b/hw/mcf_fec.c
index d7532b1..7fc89b5 100644
--- a/hw/mcf_fec.c
+++ b/hw/mcf_fec.c
@@ -353,13 +353,13 @@ static void mcf_fec_write(void *opaque, hwaddr addr,
 
 static int mcf_fec_can_receive(NetClientState *nc)
 {
-    mcf_fec_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    mcf_fec_state *s = qemu_get_nic_opaque(nc);
     return s->rx_enabled;
 }
 
 static ssize_t mcf_fec_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    mcf_fec_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    mcf_fec_state *s = qemu_get_nic_opaque(nc);
     mcf_fec_bd bd;
     uint32_t flags = 0;
     uint32_t addr;
@@ -441,7 +441,7 @@ static const MemoryRegionOps mcf_fec_ops = {
 
 static void mcf_fec_cleanup(NetClientState *nc)
 {
-    mcf_fec_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    mcf_fec_state *s = qemu_get_nic_opaque(nc);
 
     memory_region_del_subregion(s->sysmem, &s->iomem);
     memory_region_destroy(&s->iomem);
diff --git a/hw/milkymist-minimac2.c b/hw/milkymist-minimac2.c
index 8a5cc40..9723628 100644
--- a/hw/milkymist-minimac2.c
+++ b/hw/milkymist-minimac2.c
@@ -280,7 +280,7 @@ static void update_rx_interrupt(MilkymistMinimac2State *s)
 
 static ssize_t minimac2_rx(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    MilkymistMinimac2State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    MilkymistMinimac2State *s = qemu_get_nic_opaque(nc);
 
     uint32_t r_count;
     uint32_t r_state;
@@ -410,7 +410,7 @@ static const MemoryRegionOps minimac2_ops = {
 
 static int minimac2_can_rx(NetClientState *nc)
 {
-    MilkymistMinimac2State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    MilkymistMinimac2State *s = qemu_get_nic_opaque(nc);
 
     if (s->regs[R_STATE0] == STATE_LOADED) {
         return 1;
@@ -424,7 +424,7 @@ static int minimac2_can_rx(NetClientState *nc)
 
 static void minimac2_cleanup(NetClientState *nc)
 {
-    MilkymistMinimac2State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    MilkymistMinimac2State *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/mipsnet.c b/hw/mipsnet.c
index 5d1ab5a..c411bd5 100644
--- a/hw/mipsnet.c
+++ b/hw/mipsnet.c
@@ -64,7 +64,7 @@ static int mipsnet_buffer_full(MIPSnetState *s)
 
 static int mipsnet_can_receive(NetClientState *nc)
 {
-    MIPSnetState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    MIPSnetState *s = qemu_get_nic_opaque(nc);
 
     if (s->busy)
         return 0;
@@ -73,7 +73,7 @@ static int mipsnet_can_receive(NetClientState *nc)
 
 static ssize_t mipsnet_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    MIPSnetState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    MIPSnetState *s = qemu_get_nic_opaque(nc);
 
     trace_mipsnet_receive(size);
     if (!mipsnet_can_receive(nc))
@@ -211,7 +211,7 @@ static const VMStateDescription vmstate_mipsnet = {
 
 static void mipsnet_cleanup(NetClientState *nc)
 {
-    MIPSnetState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    MIPSnetState *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/musicpal.c b/hw/musicpal.c
index 2b62264..29ece8b 100644
--- a/hw/musicpal.c
+++ b/hw/musicpal.c
@@ -189,7 +189,7 @@ static int eth_can_receive(NetClientState *nc)
 
 static ssize_t eth_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    mv88w8618_eth_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    mv88w8618_eth_state *s = qemu_get_nic_opaque(nc);
     uint32_t desc_addr;
     mv88w8618_rx_desc desc;
     int i;
@@ -368,7 +368,7 @@ static const MemoryRegionOps mv88w8618_eth_ops = {
 
 static void eth_cleanup(NetClientState *nc)
 {
-    mv88w8618_eth_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    mv88w8618_eth_state *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/ne2000-isa.c b/hw/ne2000-isa.c
index 4a6105e..1f168ec 100644
--- a/hw/ne2000-isa.c
+++ b/hw/ne2000-isa.c
@@ -38,7 +38,7 @@ typedef struct ISANE2000State {
 
 static void isa_ne2000_cleanup(NetClientState *nc)
 {
-    NE2000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    NE2000State *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/ne2000.c b/hw/ne2000.c
index 21d9ace..64b73fe 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -167,7 +167,7 @@ static int ne2000_buffer_full(NE2000State *s)
 
 int ne2000_can_receive(NetClientState *nc)
 {
-    NE2000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    NE2000State *s = qemu_get_nic_opaque(nc);
 
     if (s->cmd & E8390_STOP)
         return 1;
@@ -178,7 +178,7 @@ int ne2000_can_receive(NetClientState *nc)
 
 ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
 {
-    NE2000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    NE2000State *s = qemu_get_nic_opaque(nc);
     int size = size_;
     uint8_t *p;
     unsigned int total_len, next, avail, len, index, mcast_idx;
@@ -705,7 +705,7 @@ void ne2000_setup_io(NE2000State *s, unsigned size)
 
 static void ne2000_cleanup(NetClientState *nc)
 {
-    NE2000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    NE2000State *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/opencores_eth.c b/hw/opencores_eth.c
index 821e54f..e91d3f6 100644
--- a/hw/opencores_eth.c
+++ b/hw/opencores_eth.c
@@ -313,7 +313,7 @@ static void open_eth_int_source_write(OpenEthState *s,
 
 static void open_eth_set_link_status(NetClientState *nc)
 {
-    OpenEthState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    OpenEthState *s = qemu_get_nic_opaque(nc);
 
     if (GET_REGBIT(s, MIICOMMAND, SCANSTAT)) {
         SET_REGFIELD(s, MIISTATUS, LINKFAIL, nc->link_down);
@@ -344,7 +344,7 @@ static void open_eth_reset(void *opaque)
 
 static int open_eth_can_receive(NetClientState *nc)
 {
-    OpenEthState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    OpenEthState *s = qemu_get_nic_opaque(nc);
 
     return GET_REGBIT(s, MODER, RXEN) &&
         (s->regs[TX_BD_NUM] < 0x80) &&
@@ -354,7 +354,7 @@ static int open_eth_can_receive(NetClientState *nc)
 static ssize_t open_eth_receive(NetClientState *nc,
         const uint8_t *buf, size_t size)
 {
-    OpenEthState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    OpenEthState *s = qemu_get_nic_opaque(nc);
     size_t maxfl = GET_REGFIELD(s, PACKETLEN, MAXFL);
     size_t minfl = GET_REGFIELD(s, PACKETLEN, MINFL);
     size_t fcsl = 4;
diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index d9fb591..f4a03eb 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -266,7 +266,7 @@ static void pci_physical_memory_read(void *dma_opaque, hwaddr addr,
 
 static void pci_pcnet_cleanup(NetClientState *nc)
 {
-    PCNetState *d = DO_UPCAST(NICState, nc, nc)->opaque;
+    PCNetState *d = qemu_get_nic_opaque(nc);
 
     pcnet_common_cleanup(d);
 }
diff --git a/hw/pcnet.c b/hw/pcnet.c
index b3dc309..4ad4bcb 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -1006,7 +1006,7 @@ static int pcnet_tdte_poll(PCNetState *s)
 
 int pcnet_can_receive(NetClientState *nc)
 {
-    PCNetState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    PCNetState *s = qemu_get_nic_opaque(nc);
     if (CSR_STOP(s) || CSR_SPND(s))
         return 0;
 
@@ -1017,7 +1017,7 @@ int pcnet_can_receive(NetClientState *nc)
 
 ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
 {
-    PCNetState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    PCNetState *s = qemu_get_nic_opaque(nc);
     int is_padr = 0, is_bcast = 0, is_ladr = 0;
     uint8_t buf1[60];
     int remaining;
@@ -1199,7 +1199,7 @@ ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
 
 void pcnet_set_link_status(NetClientState *nc)
 {
-    PCNetState *d = DO_UPCAST(NICState, nc, nc)->opaque;
+    PCNetState *d = qemu_get_nic_opaque(nc);
 
     d->lnkst = nc->link_down ? 0 : 0x40;
 }
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index cb975b2..b6cb90e 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -786,7 +786,7 @@ static bool rtl8139_cp_rx_valid(RTL8139State *s)
 
 static int rtl8139_can_receive(NetClientState *nc)
 {
-    RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    RTL8139State *s = qemu_get_nic_opaque(nc);
     int avail;
 
     /* Receive (drop) packets if card is disabled.  */
@@ -808,7 +808,7 @@ static int rtl8139_can_receive(NetClientState *nc)
 
 static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt)
 {
-    RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    RTL8139State *s = qemu_get_nic_opaque(nc);
     /* size is the length of the buffer passed to the driver */
     int size = size_;
     const uint8_t *dot1q_buf = NULL;
@@ -3428,7 +3428,7 @@ static void rtl8139_timer(void *opaque)
 
 static void rtl8139_cleanup(NetClientState *nc)
 {
-    RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    RTL8139State *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
@@ -3450,7 +3450,7 @@ static void pci_rtl8139_uninit(PCIDevice *dev)
 
 static void rtl8139_set_link_status(NetClientState *nc)
 {
-    RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    RTL8139State *s = qemu_get_nic_opaque(nc);
 
     if (nc->link_down) {
         s->BasicModeStatus &= ~0x04;
diff --git a/hw/smc91c111.c b/hw/smc91c111.c
index b466d66..b00d338 100644
--- a/hw/smc91c111.c
+++ b/hw/smc91c111.c
@@ -630,7 +630,7 @@ static uint32_t smc91c111_readl(void *opaque, hwaddr offset)
 
 static int smc91c111_can_receive(NetClientState *nc)
 {
-    smc91c111_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    smc91c111_state *s = qemu_get_nic_opaque(nc);
 
     if ((s->rcr & RCR_RXEN) == 0 || (s->rcr & RCR_SOFT_RST))
         return 1;
@@ -641,7 +641,7 @@ static int smc91c111_can_receive(NetClientState *nc)
 
 static ssize_t smc91c111_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    smc91c111_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    smc91c111_state *s = qemu_get_nic_opaque(nc);
     int status;
     int packetsize;
     uint32_t crc;
@@ -730,7 +730,7 @@ static const MemoryRegionOps smc91c111_mem_ops = {
 
 static void smc91c111_cleanup(NetClientState *nc)
 {
-    smc91c111_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    smc91c111_state *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/spapr_llan.c b/hw/spapr_llan.c
index 5232852..899831b 100644
--- a/hw/spapr_llan.c
+++ b/hw/spapr_llan.c
@@ -85,7 +85,7 @@ typedef struct VIOsPAPRVLANDevice {
 
 static int spapr_vlan_can_receive(NetClientState *nc)
 {
-    VIOsPAPRVLANDevice *dev = DO_UPCAST(NICState, nc, nc)->opaque;
+    VIOsPAPRVLANDevice *dev = qemu_get_nic_opaque(nc);
 
     return (dev->isopen && dev->rx_bufs > 0);
 }
@@ -93,7 +93,7 @@ static int spapr_vlan_can_receive(NetClientState *nc)
 static ssize_t spapr_vlan_receive(NetClientState *nc, const uint8_t *buf,
                                   size_t size)
 {
-    VIOsPAPRDevice *sdev = DO_UPCAST(NICState, nc, nc)->opaque;
+    VIOsPAPRDevice *sdev = qemu_get_nic_opaque(nc);
     VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
     vlan_bd_t rxq_bd = vio_ldq(sdev, dev->buf_list + VLAN_RXQ_BD_OFF);
     vlan_bd_t bd;
diff --git a/hw/stellaris_enet.c b/hw/stellaris_enet.c
index 65d73a0..1332c11 100644
--- a/hw/stellaris_enet.c
+++ b/hw/stellaris_enet.c
@@ -80,7 +80,7 @@ static void stellaris_enet_update(stellaris_enet_state *s)
 /* TODO: Implement MAC address filtering.  */
 static ssize_t stellaris_enet_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    stellaris_enet_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    stellaris_enet_state *s = qemu_get_nic_opaque(nc);
     int n;
     uint8_t *p;
     uint32_t crc;
@@ -122,7 +122,7 @@ static ssize_t stellaris_enet_receive(NetClientState *nc, const uint8_t *buf, si
 
 static int stellaris_enet_can_receive(NetClientState *nc)
 {
-    stellaris_enet_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    stellaris_enet_state *s = qemu_get_nic_opaque(nc);
 
     if ((s->rctl & SE_RCTL_RXEN) == 0)
         return 1;
@@ -383,7 +383,7 @@ static int stellaris_enet_load(QEMUFile *f, void *opaque, int version_id)
 
 static void stellaris_enet_cleanup(NetClientState *nc)
 {
-    stellaris_enet_state *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    stellaris_enet_state *s = qemu_get_nic_opaque(nc);
 
     unregister_savevm(&s->busdev.qdev, "stellaris_enet", s);
 
diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index ab7e7a7..3de6218 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1260,7 +1260,7 @@ static void usb_net_handle_data(USBDevice *dev, USBPacket *p)
 
 static ssize_t usbnet_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    USBNetState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    USBNetState *s = qemu_get_nic_opaque(nc);
     uint8_t *in_buf = s->in_buf;
     size_t total_size = size;
 
@@ -1307,7 +1307,7 @@ static ssize_t usbnet_receive(NetClientState *nc, const uint8_t *buf, size_t siz
 
 static int usbnet_can_receive(NetClientState *nc)
 {
-    USBNetState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    USBNetState *s = qemu_get_nic_opaque(nc);
 
     if (is_rndis(s) && s->rndis_state != RNDIS_DATA_INITIALIZED) {
         return 1;
@@ -1318,7 +1318,7 @@ static int usbnet_can_receive(NetClientState *nc)
 
 static void usbnet_cleanup(NetClientState *nc)
 {
-    USBNetState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    USBNetState *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 1c59db1..bf6414b 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -167,7 +167,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 
 static void virtio_net_set_link_status(NetClientState *nc)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    VirtIONet *n = qemu_get_nic_opaque(nc);
     uint16_t old_status = n->status;
 
     if (nc->link_down)
@@ -468,7 +468,7 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
 
 static int virtio_net_can_receive(NetClientState *nc)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    VirtIONet *n = qemu_get_nic_opaque(nc);
     if (!n->vdev.vm_running) {
         return 0;
     }
@@ -599,7 +599,7 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
 
 static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    VirtIONet *n = qemu_get_nic_opaque(nc);
     struct iovec mhdr_sg[VIRTQUEUE_MAX_SIZE];
     struct virtio_net_hdr_mrg_rxbuf mhdr;
     unsigned mhdr_cnt = 0;
@@ -697,7 +697,7 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq);
 
 static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    VirtIONet *n = qemu_get_nic_opaque(nc);
 
     virtqueue_push(n->tx_vq, &n->async_tx.elem, 0);
     virtio_notify(&n->vdev, n->tx_vq);
@@ -996,7 +996,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
 
 static void virtio_net_cleanup(NetClientState *nc)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    VirtIONet *n = qemu_get_nic_opaque(nc);
 
     n->nic = NULL;
 }
diff --git a/hw/xen_nic.c b/hw/xen_nic.c
index cfd6def..055adf4 100644
--- a/hw/xen_nic.c
+++ b/hw/xen_nic.c
@@ -235,7 +235,7 @@ static void net_rx_response(struct XenNetDev *netdev,
 
 static int net_rx_ok(NetClientState *nc)
 {
-    struct XenNetDev *netdev = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
     RING_IDX rc, rp;
 
     if (netdev->xendev.be_state != XenbusStateConnected) {
@@ -256,7 +256,7 @@ static int net_rx_ok(NetClientState *nc)
 
 static ssize_t net_rx_packet(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    struct XenNetDev *netdev = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct XenNetDev *netdev = qemu_get_nic_opaque(nc);
     netif_rx_request_t rxreq;
     RING_IDX rc, rp;
     void *page;
diff --git a/hw/xgmac.c b/hw/xgmac.c
index 0ec3c85..bbb0433 100644
--- a/hw/xgmac.c
+++ b/hw/xgmac.c
@@ -310,7 +310,7 @@ static const MemoryRegionOps enet_mem_ops = {
 
 static int eth_can_rx(NetClientState *nc)
 {
-    struct XgmacState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct XgmacState *s = qemu_get_nic_opaque(nc);
 
     /* RX enabled?  */
     return s->regs[DMA_CONTROL] & DMA_CONTROL_SR;
@@ -318,7 +318,7 @@ static int eth_can_rx(NetClientState *nc)
 
 static ssize_t eth_rx(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    struct XgmacState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct XgmacState *s = qemu_get_nic_opaque(nc);
     static const unsigned char sa_bcast[6] = {0xff, 0xff, 0xff,
                                               0xff, 0xff, 0xff};
     int unicast, broadcast, multicast;
@@ -366,7 +366,7 @@ out:
 
 static void eth_cleanup(NetClientState *nc)
 {
-    struct XgmacState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct XgmacState *s = qemu_get_nic_opaque(nc);
     s->nic = NULL;
 }
 
diff --git a/hw/xilinx_axienet.c b/hw/xilinx_axienet.c
index 254fe72..370f16e 100644
--- a/hw/xilinx_axienet.c
+++ b/hw/xilinx_axienet.c
@@ -618,7 +618,7 @@ static const MemoryRegionOps enet_ops = {
 
 static int eth_can_rx(NetClientState *nc)
 {
-    struct XilinxAXIEnet *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct XilinxAXIEnet *s = qemu_get_nic_opaque(nc);
 
     /* RX enabled?  */
     return !axienet_rx_resetting(s) && axienet_rx_enabled(s);
@@ -641,7 +641,7 @@ static int enet_match_addr(const uint8_t *buf, uint32_t f0, uint32_t f1)
 
 static ssize_t eth_rx(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    struct XilinxAXIEnet *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct XilinxAXIEnet *s = qemu_get_nic_opaque(nc);
     static const unsigned char sa_bcast[6] = {0xff, 0xff, 0xff,
                                               0xff, 0xff, 0xff};
     static const unsigned char sa_ipmcast[3] = {0x01, 0x00, 0x52};
@@ -786,7 +786,7 @@ static ssize_t eth_rx(NetClientState *nc, const uint8_t *buf, size_t size)
 static void eth_cleanup(NetClientState *nc)
 {
     /* FIXME.  */
-    struct XilinxAXIEnet *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct XilinxAXIEnet *s = qemu_get_nic_opaque(nc);
     g_free(s->rxmem);
     g_free(s);
 }
diff --git a/hw/xilinx_ethlite.c b/hw/xilinx_ethlite.c
index aa19c97..b2ace04 100644
--- a/hw/xilinx_ethlite.c
+++ b/hw/xilinx_ethlite.c
@@ -162,7 +162,7 @@ static const MemoryRegionOps eth_ops = {
 
 static int eth_can_rx(NetClientState *nc)
 {
-    struct xlx_ethlite *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct xlx_ethlite *s = qemu_get_nic_opaque(nc);
     int r;
     r = !(s->regs[R_RX_CTRL0] & CTRL_S);
     return r;
@@ -170,7 +170,7 @@ static int eth_can_rx(NetClientState *nc)
 
 static ssize_t eth_rx(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-    struct xlx_ethlite *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct xlx_ethlite *s = qemu_get_nic_opaque(nc);
     unsigned int rxbase = s->rxbuf * (0x800 / 4);
 
     /* DA filter.  */
@@ -196,7 +196,7 @@ static ssize_t eth_rx(NetClientState *nc, const uint8_t *buf, size_t size)
 
 static void eth_cleanup(NetClientState *nc)
 {
-    struct xlx_ethlite *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    struct xlx_ethlite *s = qemu_get_nic_opaque(nc);
 
     s->nic = NULL;
 }
diff --git a/net.c b/net.c
index 23fcbce..ef5b8c9 100644
--- a/net.c
+++ b/net.c
@@ -226,7 +226,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
 
     nc = qemu_new_net_client(info, conf->peer, model, name);
 
-    nic = DO_UPCAST(NICState, nc, nc);
+    nic = qemu_get_nic(nc);
     nic->conf = conf;
     nic->opaque = opaque;
 
@@ -238,6 +238,18 @@ NetClientState *qemu_get_queue(NICState *nic)
     return &nic->nc;
 }
 
+NICState *qemu_get_nic(NetClientState *nc)
+{
+    return DO_UPCAST(NICState, nc, nc);
+}
+
+void *qemu_get_nic_opaque(NetClientState *nc)
+{
+    NICState *nic = qemu_get_nic(nc);
+
+    return nic->opaque;
+}
+
 static void qemu_cleanup_net_client(NetClientState *nc)
 {
     QTAILQ_REMOVE(&net_clients, nc, next);
@@ -264,7 +276,7 @@ void qemu_del_net_client(NetClientState *nc)
 {
     /* If there is a peer NIC, delete and cleanup client, but do not free. */
     if (nc->peer && nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
-        NICState *nic = DO_UPCAST(NICState, nc, nc->peer);
+        NICState *nic = qemu_get_nic(nc->peer);
         if (nic->peer_deleted) {
             return;
         }
@@ -280,7 +292,7 @@ void qemu_del_net_client(NetClientState *nc)
 
     /* If this is a peer NIC and peer has already been deleted, free it now. */
     if (nc->peer && nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
-        NICState *nic = DO_UPCAST(NICState, nc, nc);
+        NICState *nic = qemu_get_nic(nc);
         if (nic->peer_deleted) {
             qemu_free_net_client(nc->peer);
         }
@@ -296,7 +308,7 @@ void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
 
     QTAILQ_FOREACH(nc, &net_clients, next) {
         if (nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
-            func(DO_UPCAST(NICState, nc, nc), opaque);
+            func(qemu_get_nic(nc), opaque);
         }
     }
 }
diff --git a/net.h b/net.h
index fe44cb8..56b79fb 100644
--- a/net.h
+++ b/net.h
@@ -78,6 +78,8 @@ NICState *qemu_new_nic(NetClientInfo *info,
                        const char *name,
                        void *opaque);
 NetClientState *qemu_get_queue(NICState *nic);
+NICState *qemu_get_nic(NetClientState *nc);
+void *qemu_get_nic_opaque(NetClientState *nc);
 void qemu_del_net_client(NetClientState *nc);
 NetClientState *qemu_find_vlan_client_by_name(Monitor *mon, int vlan_id,
                                               const char *client_str);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 04/12] net: intorduce qemu_del_nic()
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (2 preceding siblings ...)
  2012-12-28 10:31 ` [PATCH 03/12] net: introduce qemu_get_nic() Jason Wang
@ 2012-12-28 10:31 ` Jason Wang
  2012-12-28 10:31 ` [PATCH 05/12] net: multiqueue support Jason Wang
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:31 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: krkumar2, kvm, mprivozn, Jason Wang, rusty, jwhan, shiyer

To support multiqueue nic, this patch separate the nic destructor from
qemu_del_net_client() to a new helper qemu_del_nic(). The following patches
would refactor this function to support multiqueue nic.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/e1000.c           |    2 +-
 hw/eepro100.c        |    2 +-
 hw/ne2000.c          |    2 +-
 hw/pcnet-pci.c       |    2 +-
 hw/rtl8139.c         |    2 +-
 hw/usb/dev-network.c |    2 +-
 hw/virtio-net.c      |    2 +-
 hw/xen_nic.c         |    2 +-
 net.c                |   15 ++++++++++++++-
 net.h                |    1 +
 10 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 004f057..eb181d8 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1234,7 +1234,7 @@ pci_e1000_uninit(PCIDevice *dev)
     qemu_free_timer(d->autoneg_timer);
     memory_region_destroy(&d->mmio);
     memory_region_destroy(&d->io);
-    qemu_del_net_client(qemu_get_queue(d->nic));
+    qemu_del_nic(d->nic);
 }
 
 static NetClientInfo net_e1000_info = {
diff --git a/hw/eepro100.c b/hw/eepro100.c
index c09a315..d072131 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1849,7 +1849,7 @@ static void pci_nic_uninit(PCIDevice *pci_dev)
     memory_region_destroy(&s->flash_bar);
     vmstate_unregister(&pci_dev->qdev, s->vmstate, s);
     eeprom93xx_free(&pci_dev->qdev, s->eeprom);
-    qemu_del_net_client(qemu_get_queue(s->nic));
+    qemu_del_nic(s->nic);
 }
 
 static NetClientInfo net_eepro100_info = {
diff --git a/hw/ne2000.c b/hw/ne2000.c
index 64b73fe..e6810ad 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -750,7 +750,7 @@ static void pci_ne2000_exit(PCIDevice *pci_dev)
     NE2000State *s = &d->ne2000;
 
     memory_region_destroy(&s->io);
-    qemu_del_net_client(qemu_get_queue(s->nic));
+    qemu_del_nic(s->nic);
 }
 
 static Property ne2000_properties[] = {
diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index f4a03eb..c910fa3 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -279,7 +279,7 @@ static void pci_pcnet_uninit(PCIDevice *dev)
     memory_region_destroy(&d->io_bar);
     qemu_del_timer(d->state.poll_timer);
     qemu_free_timer(d->state.poll_timer);
-    qemu_del_net_client(qemu_get_queue(d->state.nic));
+    qemu_del_nic(d->state.nic);
 }
 
 static NetClientInfo net_pci_pcnet_info = {
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index b6cb90e..a8acd19 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -3445,7 +3445,7 @@ static void pci_rtl8139_uninit(PCIDevice *dev)
     }
     qemu_del_timer(s->timer);
     qemu_free_timer(s->timer);
-    qemu_del_net_client(qemu_get_queue(s->nic));
+    qemu_del_nic(s->nic);
 }
 
 static void rtl8139_set_link_status(NetClientState *nc)
diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index 3de6218..6de0521 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1329,7 +1329,7 @@ static void usb_net_handle_destroy(USBDevice *dev)
 
     /* TODO: remove the nd_table[] entry */
     rndis_clear_responsequeue(s);
-    qemu_del_net_client(qemu_get_queue(s->nic));
+    qemu_del_nic(s->nic);
 }
 
 static NetClientInfo net_usbnet_info = {
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index bf6414b..d57a5a5 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -1098,6 +1098,6 @@ void virtio_net_exit(VirtIODevice *vdev)
         qemu_bh_delete(n->tx_bh);
     }
 
-    qemu_del_net_client(qemu_get_queue(n->nic));
+    qemu_del_nic(n->nic);
     virtio_cleanup(&n->vdev);
 }
diff --git a/hw/xen_nic.c b/hw/xen_nic.c
index 055adf4..6275cd3 100644
--- a/hw/xen_nic.c
+++ b/hw/xen_nic.c
@@ -406,7 +406,7 @@ static void net_disconnect(struct XenDevice *xendev)
         netdev->rxs = NULL;
     }
     if (netdev->nic) {
-        qemu_del_net_client(qemu_get_queue(netdev->nic));
+        qemu_del_nic(netdev->nic);
         netdev->nic = NULL;
     }
 }
diff --git a/net.c b/net.c
index ef5b8c9..97ee542 100644
--- a/net.c
+++ b/net.c
@@ -290,6 +290,15 @@ void qemu_del_net_client(NetClientState *nc)
         return;
     }
 
+    assert(nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC);
+
+    qemu_cleanup_net_client(nc);
+    qemu_free_net_client(nc);
+}
+
+void qemu_del_nic(NICState *nic)
+{
+    NetClientState *nc = qemu_get_queue(nic);
     /* If this is a peer NIC and peer has already been deleted, free it now. */
     if (nc->peer && nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
         NICState *nic = qemu_get_nic(nc);
@@ -932,7 +941,11 @@ void net_cleanup(void)
     NetClientState *nc, *next_vc;
 
     QTAILQ_FOREACH_SAFE(nc, &net_clients, next, next_vc) {
-        qemu_del_net_client(nc);
+        if (nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
+            qemu_del_nic(qemu_get_nic(nc));
+        } else {
+            qemu_del_net_client(nc);
+        }
     }
 }
 
diff --git a/net.h b/net.h
index 56b79fb..0d53337 100644
--- a/net.h
+++ b/net.h
@@ -77,6 +77,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
                        const char *model,
                        const char *name,
                        void *opaque);
+void qemu_del_nic(NICState *nic);
 NetClientState *qemu_get_queue(NICState *nic);
 NICState *qemu_get_nic(NetClientState *nc);
 void *qemu_get_nic_opaque(NetClientState *nc);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 05/12] net: multiqueue support
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (3 preceding siblings ...)
  2012-12-28 10:31 ` [PATCH 04/12] net: intorduce qemu_del_nic() Jason Wang
@ 2012-12-28 10:31 ` Jason Wang
  2012-12-28 18:06   ` Blue Swirl
  2012-12-28 10:31 ` [PATCH 06/12] vhost: " Jason Wang
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:31 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

This patch adds basic multiqueue support for qemu. The idea is simple, an array
of NetClientStates were introduced in NICState, parse_netdev() were extended to
find and match all NetClientStates belongs to the backend and place their
pointers in NICConf. Then qemu_new_nic can setup a N:N mapping between NICStates
that belongs to a nic and NICStates belongs to the netdev. After this, each
peers of a NICStaet were abstracted as a queue.

To adapt this change, set_link/netdev_del command will find all the
NetClientStates of a nic or a netdev, and change all their state in one run.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/dp8393x.c         |    2 +-
 hw/mcf_fec.c         |    2 +-
 hw/qdev-properties.c |   46 +++++++++++--
 hw/qdev-properties.h |    6 +-
 net.c                |  172 +++++++++++++++++++++++++++++++++++++-------------
 net.h                |   27 +++++++-
 6 files changed, 195 insertions(+), 60 deletions(-)

diff --git a/hw/dp8393x.c b/hw/dp8393x.c
index 8f20a4a..fad0837 100644
--- a/hw/dp8393x.c
+++ b/hw/dp8393x.c
@@ -899,7 +899,7 @@ void dp83932_init(NICInfo *nd, hwaddr base, int it_shift,
     s->regs[SONIC_SR] = 0x0004; /* only revision recognized by Linux */
 
     s->conf.macaddr = nd->macaddr;
-    s->conf.peer = nd->netdev;
+    s->conf.peers.ncs[0] = nd->netdev;
 
     s->nic = qemu_new_nic(&net_dp83932_info, &s->conf, nd->model, nd->name, s);
 
diff --git a/hw/mcf_fec.c b/hw/mcf_fec.c
index 7fc89b5..c298bec 100644
--- a/hw/mcf_fec.c
+++ b/hw/mcf_fec.c
@@ -472,7 +472,7 @@ void mcf_fec_init(MemoryRegion *sysmem, NICInfo *nd,
     memory_region_add_subregion(sysmem, base, &s->iomem);
 
     s->conf.macaddr = nd->macaddr;
-    s->conf.peer = nd->netdev;
+    s->conf.peers[0] = nd->netdev;
 
     s->nic = qemu_new_nic(&net_mcf_fec_info, &s->conf, nd->model, nd->name, s);
 
diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 81d901c..6e45def 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -585,16 +585,47 @@ PropertyInfo qdev_prop_chr = {
 
 static int parse_netdev(DeviceState *dev, const char *str, void **ptr)
 {
-    NetClientState *netdev = qemu_find_netdev(str);
+    NICPeers *peers_ptr = (NICPeers *)ptr;
+    NICConf *conf = container_of(peers_ptr, NICConf, peers);
+    NetClientState **ncs = peers_ptr->ncs;
+    NetClientState *peers[MAX_QUEUE_NUM];
+    int queues, i = 0;
+    int ret;
 
-    if (netdev == NULL) {
-        return -ENOENT;
+    queues = qemu_find_net_clients_except(str, peers,
+                                          NET_CLIENT_OPTIONS_KIND_NIC,
+                                          MAX_QUEUE_NUM);
+    if (queues == 0) {
+        ret = -ENOENT;
+        goto err;
     }
-    if (netdev->peer) {
-        return -EEXIST;
+
+    if (queues > MAX_QUEUE_NUM) {
+        ret = -E2BIG;
+        goto err;
+    }
+
+    for (i = 0; i < queues; i++) {
+        if (peers[i] == NULL) {
+            ret = -ENOENT;
+            goto err;
+        }
+
+        if (peers[i]->peer) {
+            ret = -EEXIST;
+            goto err;
+        }
+
+        ncs[i] = peers[i];
+        ncs[i]->queue_index = i;
     }
-    *ptr = netdev;
+
+    conf->queues = queues;
+
     return 0;
+
+err:
+    return ret;
 }
 
 static const char *print_netdev(void *ptr)
@@ -661,7 +692,8 @@ static void set_vlan(Object *obj, Visitor *v, void *opaque,
 {
     DeviceState *dev = DEVICE(obj);
     Property *prop = opaque;
-    NetClientState **ptr = qdev_get_prop_ptr(dev, prop);
+    NICPeers *peers_ptr = qdev_get_prop_ptr(dev, prop);
+    NetClientState **ptr = &peers_ptr->ncs[0];
     Error *local_err = NULL;
     int32_t id;
     NetClientState *hubport;
diff --git a/hw/qdev-properties.h b/hw/qdev-properties.h
index 5b046ab..2d90848 100644
--- a/hw/qdev-properties.h
+++ b/hw/qdev-properties.h
@@ -31,7 +31,7 @@ extern PropertyInfo qdev_prop_pci_host_devaddr;
         .name      = (_name),                                    \
         .info      = &(_prop),                                   \
         .offset    = offsetof(_state, _field)                    \
-            + type_check(_type,typeof_field(_state, _field)),    \
+            + type_check(_type, typeof_field(_state, _field)),   \
         }
 #define DEFINE_PROP_DEFAULT(_name, _state, _field, _defval, _prop, _type) { \
         .name      = (_name),                                           \
@@ -77,9 +77,9 @@ extern PropertyInfo qdev_prop_pci_host_devaddr;
 #define DEFINE_PROP_STRING(_n, _s, _f)             \
     DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*)
 #define DEFINE_PROP_NETDEV(_n, _s, _f)             \
-    DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, NetClientState*)
+    DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, NICPeers)
 #define DEFINE_PROP_VLAN(_n, _s, _f)             \
-    DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, NetClientState*)
+    DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, NICPeers)
 #define DEFINE_PROP_DRIVE(_n, _s, _f) \
     DEFINE_PROP(_n, _s, _f, qdev_prop_drive, BlockDriverState *)
 #define DEFINE_PROP_MACADDR(_n, _s, _f)         \
diff --git a/net.c b/net.c
index 97ee542..4ceba33 100644
--- a/net.c
+++ b/net.c
@@ -181,17 +181,12 @@ static char *assign_name(NetClientState *nc1, const char *model)
     return g_strdup(buf);
 }
 
-NetClientState *qemu_new_net_client(NetClientInfo *info,
-                                    NetClientState *peer,
-                                    const char *model,
-                                    const char *name)
+void qemu_net_client_setup(NetClientState *nc,
+                           NetClientInfo *info,
+                           NetClientState *peer,
+                           const char *model,
+                           const char *name)
 {
-    NetClientState *nc;
-
-    assert(info->size >= sizeof(NetClientState));
-
-    nc = g_malloc0(info->size);
-
     nc->info = info;
     nc->model = g_strdup(model);
     if (name) {
@@ -208,6 +203,20 @@ NetClientState *qemu_new_net_client(NetClientInfo *info,
     QTAILQ_INSERT_TAIL(&net_clients, nc, next);
 
     nc->send_queue = qemu_new_net_queue(nc);
+}
+
+
+NetClientState *qemu_new_net_client(NetClientInfo *info,
+                                    NetClientState *peer,
+                                    const char *model,
+                                    const char *name)
+{
+    NetClientState *nc;
+
+    assert(info->size >= sizeof(NetClientState));
+
+    nc = g_malloc0(info->size);
+    qemu_net_client_setup(nc, info, peer, model, name);
 
     return nc;
 }
@@ -219,28 +228,43 @@ NICState *qemu_new_nic(NetClientInfo *info,
                        void *opaque)
 {
     NetClientState *nc;
+    NetClientState **peers = conf->peers.ncs;
     NICState *nic;
+    int i;
 
     assert(info->type == NET_CLIENT_OPTIONS_KIND_NIC);
     assert(info->size >= sizeof(NICState));
 
-    nc = qemu_new_net_client(info, conf->peer, model, name);
+    nc = qemu_new_net_client(info, peers[0], model, name);
+    nc->queue_index = 0;
 
     nic = qemu_get_nic(nc);
     nic->conf = conf;
     nic->opaque = opaque;
 
+    for (i = 1; i < conf->queues; i++) {
+        qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, nc->name);
+        nic->ncs[i].queue_index = i;
+    }
+
     return nic;
 }
 
+NetClientState *qemu_get_subqueue(NICState *nic, int queue_index)
+{
+    return &nic->ncs[queue_index];
+}
+
 NetClientState *qemu_get_queue(NICState *nic)
 {
-    return &nic->nc;
+    return qemu_get_subqueue(nic, 0);
 }
 
 NICState *qemu_get_nic(NetClientState *nc)
 {
-    return DO_UPCAST(NICState, nc, nc);
+    NetClientState *nc0 = nc - nc->queue_index;
+
+    return DO_UPCAST(NICState, ncs[0], nc0);
 }
 
 void *qemu_get_nic_opaque(NetClientState *nc)
@@ -254,12 +278,10 @@ static void qemu_cleanup_net_client(NetClientState *nc)
 {
     QTAILQ_REMOVE(&net_clients, nc, next);
 
-    if (nc->info->cleanup) {
-        nc->info->cleanup(nc);
-    }
+    nc->info->cleanup(nc);
 }
 
-static void qemu_free_net_client(NetClientState *nc)
+static void qemu_free_net_client(NetClientState *nc, bool free)
 {
     if (nc->send_queue) {
         qemu_del_net_queue(nc->send_queue);
@@ -269,11 +291,24 @@ static void qemu_free_net_client(NetClientState *nc)
     }
     g_free(nc->name);
     g_free(nc->model);
-    g_free(nc);
+
+    if (free)
+        g_free(nc);
 }
 
 void qemu_del_net_client(NetClientState *nc)
 {
+    NetClientState *ncs[MAX_QUEUE_NUM];
+    int queues, i;
+
+    /* If the NetClientState belongs to a multiqueue backend, we will change all
+     * other NetClientStates also.
+     */
+    queues = qemu_find_net_clients_except(nc->name, ncs,
+                                          NET_CLIENT_OPTIONS_KIND_NIC,
+                                          MAX_QUEUE_NUM);
+    assert(queues != 0);
+
     /* If there is a peer NIC, delete and cleanup client, but do not free. */
     if (nc->peer && nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
         NICState *nic = qemu_get_nic(nc->peer);
@@ -281,34 +316,50 @@ void qemu_del_net_client(NetClientState *nc)
             return;
         }
         nic->peer_deleted = true;
-        /* Let NIC know peer is gone. */
-        nc->peer->link_down = true;
+
+        for (i = 0; i < queues; i++) {
+            ncs[i]->peer->link_down = true;
+        }
+
         if (nc->peer->info->link_status_changed) {
             nc->peer->info->link_status_changed(nc->peer);
         }
-        qemu_cleanup_net_client(nc);
+
+        for (i = 0; i < queues; i++) {
+            qemu_cleanup_net_client(ncs[i]);
+        }
+
         return;
     }
 
     assert(nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC);
 
-    qemu_cleanup_net_client(nc);
-    qemu_free_net_client(nc);
+    for (i = 0; i < queues; i++) {
+        qemu_cleanup_net_client(ncs[i]);
+        qemu_free_net_client(ncs[i], true);
+    }
 }
 
 void qemu_del_nic(NICState *nic)
 {
-    NetClientState *nc = qemu_get_queue(nic);
+    int i, queues = nic->conf->queues;
+
     /* If this is a peer NIC and peer has already been deleted, free it now. */
-    if (nc->peer && nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
-        NICState *nic = qemu_get_nic(nc);
-        if (nic->peer_deleted) {
-            qemu_free_net_client(nc->peer);
+    if (nic->peer_deleted) {
+        for (i = 0; i < queues; i++) {
+            qemu_free_net_client(qemu_get_subqueue(nic, i)->peer, true);
         }
     }
 
-    qemu_cleanup_net_client(nc);
-    qemu_free_net_client(nc);
+    for (i = 1; i < queues; i++) {
+        NetClientState *nc = qemu_get_subqueue(nic, i);
+
+        qemu_cleanup_net_client(nc);
+        qemu_free_net_client(nc, false);
+    }
+
+    qemu_cleanup_net_client(qemu_get_subqueue(nic, 0));
+    qemu_free_net_client(qemu_get_subqueue(nic, 0), true);
 }
 
 void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
@@ -317,7 +368,9 @@ void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
 
     QTAILQ_FOREACH(nc, &net_clients, next) {
         if (nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
-            func(qemu_get_nic(nc), opaque);
+            if (nc->queue_index == 0) {
+                func(qemu_get_nic(nc), opaque);
+            }
         }
     }
 }
@@ -507,6 +560,27 @@ NetClientState *qemu_find_netdev(const char *id)
     return NULL;
 }
 
+int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
+                                 NetClientOptionsKind type, int max)
+{
+    NetClientState *nc;
+    int ret = 0;
+
+    QTAILQ_FOREACH(nc, &net_clients, next) {
+        if (nc->info->type == type) {
+            continue;
+        }
+        if (!strcmp(nc->name, id)) {
+            if (ret < max) {
+                ncs[ret] = nc;
+            }
+            ret++;
+        }
+    }
+
+    return ret;
+}
+
 static int nic_get_free_idx(void)
 {
     int index;
@@ -873,8 +947,11 @@ void qmp_netdev_del(const char *id, Error **errp)
 
 void print_net_client(Monitor *mon, NetClientState *nc)
 {
-    monitor_printf(mon, "%s: type=%s,%s\n", nc->name,
-                   NetClientOptionsKind_lookup[nc->info->type], nc->info_str);
+    monitor_printf(mon, "%s: index=%d,type=%s,%s,link=%s\n", nc->name,
+                       nc->queue_index,
+                       NetClientOptionsKind_lookup[nc->info->type],
+                       nc->info_str,
+                       nc->link_down ? "down" : "up");
 }
 
 void do_info_network(Monitor *mon)
@@ -905,20 +982,23 @@ void do_info_network(Monitor *mon)
 
 void qmp_set_link(const char *name, bool up, Error **errp)
 {
-    NetClientState *nc = NULL;
+    NetClientState *ncs[MAX_QUEUE_NUM];
+    NetClientState *nc;
+    int queues, i;
 
-    QTAILQ_FOREACH(nc, &net_clients, next) {
-        if (!strcmp(nc->name, name)) {
-            goto done;
-        }
-    }
-done:
-    if (!nc) {
+    queues = qemu_find_net_clients_except(name, ncs,
+                                          NET_CLIENT_OPTIONS_KIND_MAX,
+                                          MAX_QUEUE_NUM);
+
+    if (queues == 0) {
         error_set(errp, QERR_DEVICE_NOT_FOUND, name);
         return;
     }
+    nc = ncs[0];
 
-    nc->link_down = !up;
+    for (i = 0; i < queues; i++) {
+        ncs[i]->link_down = !up;
+    }
 
     if (nc->info->link_status_changed) {
         nc->info->link_status_changed(nc);
@@ -938,9 +1018,13 @@ done:
 
 void net_cleanup(void)
 {
-    NetClientState *nc, *next_vc;
+    NetClientState *nc;
 
-    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, next_vc) {
+    /* We may del multiple entries during qemu_del_net_client(),
+     * so QTAILQ_FOREACH_SAFE() is also not safe here.
+     */
+    while (!QTAILQ_EMPTY(&net_clients)) {
+        nc = QTAILQ_FIRST(&net_clients);
         if (nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
             qemu_del_nic(qemu_get_nic(nc));
         } else {
diff --git a/net.h b/net.h
index 0d53337..6ff1afc 100644
--- a/net.h
+++ b/net.h
@@ -9,24 +9,32 @@
 #include "vmstate.h"
 #include "qapi-types.h"
 
+#define MAX_QUEUE_NUM 1024
+
 struct MACAddr {
     uint8_t a[6];
 };
 
 /* qdev nic properties */
 
+typedef struct NICPeers {
+    NetClientState *ncs[MAX_QUEUE_NUM];
+} NICPeers;
+
 typedef struct NICConf {
     MACAddr macaddr;
-    NetClientState *peer;
+    NICPeers peers;
     int32_t bootindex;
+    int32_t queues;
 } NICConf;
 
 #define DEFINE_NIC_PROPERTIES(_state, _conf)                            \
     DEFINE_PROP_MACADDR("mac",   _state, _conf.macaddr),                \
-    DEFINE_PROP_VLAN("vlan",     _state, _conf.peer),                   \
-    DEFINE_PROP_NETDEV("netdev", _state, _conf.peer),                   \
+    DEFINE_PROP_VLAN("vlan",     _state, _conf.peers),                   \
+    DEFINE_PROP_NETDEV("netdev", _state, _conf.peers),                   \
     DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1)
 
+
 /* Net clients */
 
 typedef void (NetPoll)(NetClientState *, bool enable);
@@ -58,16 +66,26 @@ struct NetClientState {
     char *name;
     char info_str[256];
     unsigned receive_disabled : 1;
+    unsigned int queue_index;
 };
 
 typedef struct NICState {
-    NetClientState nc;
+    NetClientState ncs[MAX_QUEUE_NUM];
     NICConf *conf;
     void *opaque;
     bool peer_deleted;
 } NICState;
 
 NetClientState *qemu_find_netdev(const char *id);
+int qemu_find_net_clients_except(const char *id,
+                                 NetClientState **ncs,
+                                 NetClientOptionsKind type,
+                                 int max);
+void qemu_net_client_setup(NetClientState *nc,
+                           NetClientInfo *info,
+                           NetClientState *peer,
+                           const char *model,
+                           const char *name);
 NetClientState *qemu_new_net_client(NetClientInfo *info,
                                     NetClientState *peer,
                                     const char *model,
@@ -78,6 +96,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
                        const char *name,
                        void *opaque);
 void qemu_del_nic(NICState *nic);
+NetClientState *qemu_get_subqueue(NICState *nic, int queue_index);
 NetClientState *qemu_get_queue(NICState *nic);
 NICState *qemu_get_nic(NetClientState *nc);
 void *qemu_get_nic_opaque(NetClientState *nc);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 06/12] vhost: multiqueue support
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (4 preceding siblings ...)
  2012-12-28 10:31 ` [PATCH 05/12] net: multiqueue support Jason Wang
@ 2012-12-28 10:31 ` Jason Wang
  2012-12-28 10:31 ` [PATCH 07/12] virtio: introduce virtio_queue_del() Jason Wang
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:31 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

This patch lets vhost support multiqueue. The idea is simple, just launching
multiple threads of vhost and let each of vhost thread processing a subset of
the virtqueues of the device.

The only thing needed is passing a virtqueue index when starting vhost device,
this is used to track the first virtqueue which this vhost thread serves.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/vhost.c      |   52 +++++++++++++++++++++++++++++++++-------------------
 hw/vhost.h      |    2 ++
 hw/vhost_net.c  |    7 +++++--
 hw/vhost_net.h  |    2 +-
 hw/virtio-net.c |    3 ++-
 5 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 16322a1..63c76d6 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -619,11 +619,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
 {
     hwaddr s, l, a;
     int r;
+    int vhost_vq_index = idx % dev->nvqs;
     struct vhost_vring_file file = {
-        .index = idx,
+        .index = vhost_vq_index
     };
     struct vhost_vring_state state = {
-        .index = idx,
+        .index = vhost_vq_index
     };
     struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
 
@@ -669,11 +670,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
         goto fail_alloc_ring;
     }
 
-    r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
+    r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled);
     if (r < 0) {
         r = -errno;
         goto fail_alloc;
     }
+
     file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq));
     r = ioctl(dev->control, VHOST_SET_VRING_KICK, &file);
     if (r) {
@@ -714,7 +716,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
                                     unsigned idx)
 {
     struct vhost_vring_state state = {
-        .index = idx,
+        .index = idx % dev->nvqs,
     };
     int r;
     r = ioctl(dev->control, VHOST_GET_VRING_BASE, &state);
@@ -829,7 +831,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     }
 
     for (i = 0; i < hdev->nvqs; ++i) {
-        r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, true);
+        r = vdev->binding->set_host_notifier(vdev->binding_opaque,
+                                             hdev->vq_index + i,
+                                             true);
         if (r < 0) {
             fprintf(stderr, "vhost VQ %d notifier binding failed: %d\n", i, -r);
             goto fail_vq;
@@ -839,7 +843,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     return 0;
 fail_vq:
     while (--i >= 0) {
-        r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false);
+        r = vdev->binding->set_host_notifier(vdev->binding_opaque,
+                                             hdev->vq_index + i,
+                                             false);
         if (r < 0) {
             fprintf(stderr, "vhost VQ %d notifier cleanup error: %d\n", i, -r);
             fflush(stderr);
@@ -860,7 +866,9 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     int i, r;
 
     for (i = 0; i < hdev->nvqs; ++i) {
-        r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false);
+        r = vdev->binding->set_host_notifier(vdev->binding_opaque,
+                                             hdev->vq_index + i,
+                                             false);
         if (r < 0) {
             fprintf(stderr, "vhost VQ %d notifier cleanup failed: %d\n", i, -r);
             fflush(stderr);
@@ -879,10 +887,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         goto fail;
     }
 
-    r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
-    if (r < 0) {
-        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
-        goto fail_notifiers;
+    if (hdev->vq_index == 0) {
+        r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
+        if (r < 0) {
+            fprintf(stderr, "Error binding guest notifier: %d\n", -r);
+            goto fail_notifiers;
+        }
     }
 
     r = vhost_dev_set_features(hdev, hdev->log_enabled);
@@ -898,7 +908,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         r = vhost_virtqueue_init(hdev,
                                  vdev,
                                  hdev->vqs + i,
-                                 i);
+                                 hdev->vq_index + i);
         if (r < 0) {
             goto fail_vq;
         }
@@ -925,8 +935,9 @@ fail_vq:
         vhost_virtqueue_cleanup(hdev,
                                 vdev,
                                 hdev->vqs + i,
-                                i);
+                                hdev->vq_index + i);
     }
+    i = hdev->nvqs;
 fail_mem:
 fail_features:
     vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
@@ -944,21 +955,24 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
         vhost_virtqueue_cleanup(hdev,
                                 vdev,
                                 hdev->vqs + i,
-                                i);
+                                hdev->vq_index + i);
     }
     for (i = 0; i < hdev->n_mem_sections; ++i) {
         vhost_sync_dirty_bitmap(hdev, &hdev->mem_sections[i],
                                 0, (hwaddr)~0x0ull);
     }
-    r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
-    if (r < 0) {
-        fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
-        fflush(stderr);
+    if (hdev->vq_index == 0) {
+        r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
+        if (r < 0) {
+            fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
+            fflush(stderr);
+        }
+        assert (r>= 0);
     }
-    assert (r >= 0);
 
     hdev->started = false;
     g_free(hdev->log);
     hdev->log = NULL;
     hdev->log_size = 0;
 }
+
diff --git a/hw/vhost.h b/hw/vhost.h
index 0c47229..e94a9f7 100644
--- a/hw/vhost.h
+++ b/hw/vhost.h
@@ -34,6 +34,8 @@ struct vhost_dev {
     MemoryRegionSection *mem_sections;
     struct vhost_virtqueue *vqs;
     int nvqs;
+    /* the first virtuque which would be used by this vhost dev */
+    int vq_index;
     unsigned long long features;
     unsigned long long acked_features;
     unsigned long long backend_features;
diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index 8241601..cdb294c 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -138,13 +138,15 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
 }
 
 int vhost_net_start(struct vhost_net *net,
-                    VirtIODevice *dev)
+                    VirtIODevice *dev,
+                    int vq_index)
 {
     struct vhost_vring_file file = { };
     int r;
 
     net->dev.nvqs = 2;
     net->dev.vqs = net->vqs;
+    net->dev.vq_index = vq_index;
 
     r = vhost_dev_enable_notifiers(&net->dev, dev);
     if (r < 0) {
@@ -214,7 +216,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
 }
 
 int vhost_net_start(struct vhost_net *net,
-		    VirtIODevice *dev)
+                    VirtIODevice *dev,
+                    int vq_index)
 {
     return -ENOSYS;
 }
diff --git a/hw/vhost_net.h b/hw/vhost_net.h
index a9db234..c9a8429 100644
--- a/hw/vhost_net.h
+++ b/hw/vhost_net.h
@@ -9,7 +9,7 @@ typedef struct vhost_net VHostNetState;
 VHostNetState *vhost_net_init(NetClientState *backend, int devfd, bool force);
 
 bool vhost_net_query(VHostNetState *net, VirtIODevice *dev);
-int vhost_net_start(VHostNetState *net, VirtIODevice *dev);
+int vhost_net_start(VHostNetState *net, VirtIODevice *dev, int vq_index);
 void vhost_net_stop(VHostNetState *net, VirtIODevice *dev);
 
 void vhost_net_cleanup(VHostNetState *net);
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index d57a5a5..70bc0e6 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -126,7 +126,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
         if (!vhost_net_query(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev)) {
             return;
         }
-        r = vhost_net_start(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev);
+        r = vhost_net_start(tap_get_vhost_net(qemu_get_queue(n->nic)->peer),
+                            &n->vdev, 0);
         if (r < 0) {
             error_report("unable to start vhost net: %d: "
                          "falling back on userspace virtio", -r);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 07/12] virtio: introduce virtio_queue_del()
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (5 preceding siblings ...)
  2012-12-28 10:31 ` [PATCH 06/12] vhost: " Jason Wang
@ 2012-12-28 10:31 ` Jason Wang
  2013-01-08  7:14   ` Michael S. Tsirkin
  2012-12-28 10:32 ` [PATCH 08/12] virtio: add a queue_index to VirtQueue Jason Wang
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:31 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: krkumar2, kvm, mprivozn, Jason Wang, rusty, jwhan, shiyer

Some device (such as virtio-net) needs the ability to destroy or re-order the
virtqueues, this patch adds a helper to do this.

Signed-off-by: Jason Wang <jasowang>
---
 hw/virtio.c |    9 +++++++++
 hw/virtio.h |    2 ++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index f40a8c5..bc3c9c3 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -700,6 +700,15 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
     return &vdev->vq[i];
 }
 
+void virtio_del_queue(VirtIODevice *vdev, int n)
+{
+    if (n < 0 || n >= VIRTIO_PCI_QUEUE_MAX) {
+        abort();
+    }
+
+    vdev->vq[n].vring.num = 0;
+}
+
 void virtio_irq(VirtQueue *vq)
 {
     trace_virtio_irq(vq);
diff --git a/hw/virtio.h b/hw/virtio.h
index 7c17f7b..f6cb0f9 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -138,6 +138,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
                             void (*handle_output)(VirtIODevice *,
                                                   VirtQueue *));
 
+void virtio_del_queue(VirtIODevice *vdev, int n);
+
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len);
 void virtqueue_flush(VirtQueue *vq, unsigned int count);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 08/12] virtio: add a queue_index to VirtQueue
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (6 preceding siblings ...)
  2012-12-28 10:31 ` [PATCH 07/12] virtio: introduce virtio_queue_del() Jason Wang
@ 2012-12-28 10:32 ` Jason Wang
  2012-12-28 10:32 ` [PATCH 09/12] virtio-net: separate virtqueue from VirtIONet Jason Wang
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:32 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

Add a queue_index to VirtQueue and a helper to fetch it, this could be used by
multiqueue supported device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio.c |    8 ++++++++
 hw/virtio.h |    1 +
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index bc3c9c3..726c139 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -72,6 +72,8 @@ struct VirtQueue
     /* Notification enabled? */
     bool notification;
 
+    uint16_t queue_index;
+
     int inuse;
 
     uint16_t vector;
@@ -929,6 +931,7 @@ VirtIODevice *virtio_common_init(const char *name, uint16_t device_id,
     for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
         vdev->vq[i].vector = VIRTIO_NO_VECTOR;
         vdev->vq[i].vdev = vdev;
+        vdev->vq[i].queue_index = i;
     }
 
     vdev->name = name;
@@ -1008,6 +1011,11 @@ VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n)
     return vdev->vq + n;
 }
 
+uint16_t virtio_get_queue_index(VirtQueue *vq)
+{
+    return vq->queue_index;
+}
+
 static void virtio_queue_guest_notifier_read(EventNotifier *n)
 {
     VirtQueue *vq = container_of(n, VirtQueue, guest_notifier);
diff --git a/hw/virtio.h b/hw/virtio.h
index f6cb0f9..07b38d6 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -237,6 +237,7 @@ hwaddr virtio_queue_get_ring_size(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
 void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx);
 VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n);
+uint16_t virtio_get_queue_index(VirtQueue *vq);
 int virtio_queue_get_id(VirtQueue *vq);
 EventNotifier *virtio_queue_get_guest_notifier(VirtQueue *vq);
 void virtio_queue_set_guest_notifier_fd_handler(VirtQueue *vq, bool assign,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 09/12] virtio-net: separate virtqueue from VirtIONet
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (7 preceding siblings ...)
  2012-12-28 10:32 ` [PATCH 08/12] virtio: add a queue_index to VirtQueue Jason Wang
@ 2012-12-28 10:32 ` Jason Wang
  2012-12-28 10:32 ` [PATCH 10/12] virtio-net: multiqueue support Jason Wang
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:32 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

To support multiqueue virtio-net, the first step is to separate the virtqueue
related fields from VirtIONet to a new structure VirtIONetQueue. The following
patches will add an array of VirtIONetQueue to VirtIONet based on this patch.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio-net.c |  209 ++++++++++++++++++++++++++++++++-----------------------
 1 files changed, 121 insertions(+), 88 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 70bc0e6..c6f0915 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -26,28 +26,34 @@
 #define MAC_TABLE_ENTRIES    64
 #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
 
+typedef struct VirtIONetQueue {
+    VirtQueue *rx_vq;
+    VirtQueue *tx_vq;
+    QEMUTimer *tx_timer;
+    QEMUBH *tx_bh;
+    int tx_waiting;
+    struct {
+        VirtQueueElement elem;
+        ssize_t len;
+    } async_tx;
+    struct VirtIONet *n;
+    uint8_t vhost_started;
+} VirtIONetQueue;
+
 typedef struct VirtIONet
 {
     VirtIODevice vdev;
     uint8_t mac[ETH_ALEN];
     uint16_t status;
-    VirtQueue *rx_vq;
-    VirtQueue *tx_vq;
+    VirtIONetQueue vq;
     VirtQueue *ctrl_vq;
     NICState *nic;
-    QEMUTimer *tx_timer;
-    QEMUBH *tx_bh;
     uint32_t tx_timeout;
     int32_t tx_burst;
-    int tx_waiting;
     uint32_t has_vnet_hdr;
     size_t host_hdr_len;
     size_t guest_hdr_len;
     uint8_t has_ufo;
-    struct {
-        VirtQueueElement elem;
-        ssize_t len;
-    } async_tx;
     int mergeable_rx_bufs;
     uint8_t promisc;
     uint8_t allmulti;
@@ -55,7 +61,6 @@ typedef struct VirtIONet
     uint8_t nomulti;
     uint8_t nouni;
     uint8_t nobcast;
-    uint8_t vhost_started;
     struct {
         int in_use;
         int first_multi;
@@ -67,6 +72,12 @@ typedef struct VirtIONet
     DeviceState *qdev;
 } VirtIONet;
 
+static VirtIONetQueue *virtio_net_get_queue(NetClientState *nc)
+{
+    VirtIONet *n = qemu_get_nic_opaque(nc);
+
+    return &n->vq;
+}
 /* TODO
  * - we could suppress RX interrupt if we were so inclined.
  */
@@ -107,6 +118,8 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
 
 static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
 {
+    VirtIONetQueue *q = &n->vq;
+
     if (!qemu_get_queue(n->nic)->peer) {
         return;
     }
@@ -117,11 +130,11 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
     if (!tap_get_vhost_net(qemu_get_queue(n->nic)->peer)) {
         return;
     }
-    if (!!n->vhost_started == virtio_net_started(n, status) &&
+    if (!!q->vhost_started == virtio_net_started(n, status) &&
                               !qemu_get_queue(n->nic)->peer->link_down) {
         return;
     }
-    if (!n->vhost_started) {
+    if (!q->vhost_started) {
         int r;
         if (!vhost_net_query(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev)) {
             return;
@@ -132,36 +145,37 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
             error_report("unable to start vhost net: %d: "
                          "falling back on userspace virtio", -r);
         } else {
-            n->vhost_started = 1;
+            q->vhost_started = 1;
         }
     } else {
         vhost_net_stop(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev);
-        n->vhost_started = 0;
+        q->vhost_started = 0;
     }
 }
 
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
     VirtIONet *n = to_virtio_net(vdev);
+    VirtIONetQueue *q = &n->vq;
 
     virtio_net_vhost_status(n, status);
 
-    if (!n->tx_waiting) {
+    if (!q->tx_waiting) {
         return;
     }
 
-    if (virtio_net_started(n, status) && !n->vhost_started) {
-        if (n->tx_timer) {
-            qemu_mod_timer(n->tx_timer,
+    if (virtio_net_started(n, status) && !q->vhost_started) {
+        if (q->tx_timer) {
+            qemu_mod_timer(q->tx_timer,
                            qemu_get_clock_ns(vm_clock) + n->tx_timeout);
         } else {
-            qemu_bh_schedule(n->tx_bh);
+            qemu_bh_schedule(q->tx_bh);
         }
     } else {
-        if (n->tx_timer) {
-            qemu_del_timer(n->tx_timer);
+        if (q->tx_timer) {
+            qemu_del_timer(q->tx_timer);
         } else {
-            qemu_bh_cancel(n->tx_bh);
+            qemu_bh_cancel(q->tx_bh);
         }
     }
 }
@@ -470,35 +484,40 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
 static int virtio_net_can_receive(NetClientState *nc)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
+    VirtIONetQueue *q = virtio_net_get_queue(nc);
+
     if (!n->vdev.vm_running) {
         return 0;
     }
 
-    if (!virtio_queue_ready(n->rx_vq) ||
-        !(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
+    if (!virtio_queue_ready(q->rx_vq) ||
+        !(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
         return 0;
+    }
 
     return 1;
 }
 
-static int virtio_net_has_buffers(VirtIONet *n, int bufsize)
+static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
 {
-    if (virtio_queue_empty(n->rx_vq) ||
+    VirtIONet *n = q->n;
+    if (virtio_queue_empty(q->rx_vq) ||
         (n->mergeable_rx_bufs &&
-         !virtqueue_avail_bytes(n->rx_vq, bufsize, 0))) {
-        virtio_queue_set_notification(n->rx_vq, 1);
+         !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
+        virtio_queue_set_notification(q->rx_vq, 1);
 
         /* To avoid a race condition where the guest has made some buffers
          * available after the above check but before notification was
          * enabled, check for available buffers again.
          */
-        if (virtio_queue_empty(n->rx_vq) ||
+        if (virtio_queue_empty(q->rx_vq) ||
             (n->mergeable_rx_bufs &&
-             !virtqueue_avail_bytes(n->rx_vq, bufsize, 0)))
+             !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
             return 0;
+        }
     }
 
-    virtio_queue_set_notification(n->rx_vq, 0);
+    virtio_queue_set_notification(q->rx_vq, 0);
     return 1;
 }
 
@@ -601,6 +620,7 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
 static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
+    VirtIONetQueue *q = virtio_net_get_queue(nc);
     struct iovec mhdr_sg[VIRTQUEUE_MAX_SIZE];
     struct virtio_net_hdr_mrg_rxbuf mhdr;
     unsigned mhdr_cnt = 0;
@@ -610,8 +630,9 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
         return -1;
 
     /* hdr_len refers to the header we supply to the guest */
-    if (!virtio_net_has_buffers(n, size + n->guest_hdr_len - n->host_hdr_len))
+    if (!virtio_net_has_buffers(q, size + n->guest_hdr_len - n->host_hdr_len)) {
         return 0;
+    }
 
     if (!receive_filter(n, buf, size))
         return size;
@@ -625,7 +646,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
 
         total = 0;
 
-        if (virtqueue_pop(n->rx_vq, &elem) == 0) {
+        if (virtqueue_pop(q->rx_vq, &elem) == 0) {
             if (i == 0)
                 return -1;
             error_report("virtio-net unexpected empty queue: "
@@ -678,7 +699,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
         }
 
         /* signal other side */
-        virtqueue_fill(n->rx_vq, &elem, total, i++);
+        virtqueue_fill(q->rx_vq, &elem, total, i++);
     }
 
     if (mhdr_cnt) {
@@ -688,30 +709,32 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
                      &mhdr.num_buffers, sizeof mhdr.num_buffers);
     }
 
-    virtqueue_flush(n->rx_vq, i);
-    virtio_notify(&n->vdev, n->rx_vq);
+    virtqueue_flush(q->rx_vq, i);
+    virtio_notify(&n->vdev, q->rx_vq);
 
     return size;
 }
 
-static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq);
+static int32_t virtio_net_flush_tx(VirtIONetQueue *q);
 
 static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
+    VirtIONetQueue *q = virtio_net_get_queue(nc);
 
-    virtqueue_push(n->tx_vq, &n->async_tx.elem, 0);
-    virtio_notify(&n->vdev, n->tx_vq);
+    virtqueue_push(q->tx_vq, &q->async_tx.elem, 0);
+    virtio_notify(&n->vdev, q->tx_vq);
 
-    n->async_tx.elem.out_num = n->async_tx.len = 0;
+    q->async_tx.elem.out_num = q->async_tx.len = 0;
 
-    virtio_queue_set_notification(n->tx_vq, 1);
-    virtio_net_flush_tx(n, n->tx_vq);
+    virtio_queue_set_notification(q->tx_vq, 1);
+    virtio_net_flush_tx(q);
 }
 
 /* TX */
-static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
+static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 {
+    VirtIONet *n = q->n;
     VirtQueueElement elem;
     int32_t num_packets = 0;
     if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
@@ -720,12 +743,12 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
 
     assert(n->vdev.vm_running);
 
-    if (n->async_tx.elem.out_num) {
-        virtio_queue_set_notification(n->tx_vq, 0);
+    if (q->async_tx.elem.out_num) {
+        virtio_queue_set_notification(q->tx_vq, 0);
         return num_packets;
     }
 
-    while (virtqueue_pop(vq, &elem)) {
+    while (virtqueue_pop(q->tx_vq, &elem)) {
         ssize_t ret, len;
         unsigned int out_num = elem.out_num;
         struct iovec *out_sg = &elem.out_sg[0];
@@ -758,16 +781,16 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
         ret = qemu_sendv_packet_async(qemu_get_queue(n->nic), out_sg, out_num,
                                       virtio_net_tx_complete);
         if (ret == 0) {
-            virtio_queue_set_notification(n->tx_vq, 0);
-            n->async_tx.elem = elem;
-            n->async_tx.len  = len;
+            virtio_queue_set_notification(q->tx_vq, 0);
+            q->async_tx.elem = elem;
+            q->async_tx.len  = len;
             return -EBUSY;
         }
 
         len += ret;
 
-        virtqueue_push(vq, &elem, 0);
-        virtio_notify(&n->vdev, vq);
+        virtqueue_push(q->tx_vq, &elem, 0);
+        virtio_notify(&n->vdev, q->tx_vq);
 
         if (++num_packets >= n->tx_burst) {
             break;
@@ -779,22 +802,23 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
 static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
+    VirtIONetQueue *q = &n->vq;
 
     /* This happens when device was stopped but VCPU wasn't. */
     if (!n->vdev.vm_running) {
-        n->tx_waiting = 1;
+        q->tx_waiting = 1;
         return;
     }
 
-    if (n->tx_waiting) {
+    if (q->tx_waiting) {
         virtio_queue_set_notification(vq, 1);
-        qemu_del_timer(n->tx_timer);
-        n->tx_waiting = 0;
-        virtio_net_flush_tx(n, vq);
+        qemu_del_timer(q->tx_timer);
+        q->tx_waiting = 0;
+        virtio_net_flush_tx(q);
     } else {
-        qemu_mod_timer(n->tx_timer,
+        qemu_mod_timer(q->tx_timer,
                        qemu_get_clock_ns(vm_clock) + n->tx_timeout);
-        n->tx_waiting = 1;
+        q->tx_waiting = 1;
         virtio_queue_set_notification(vq, 0);
     }
 }
@@ -802,48 +826,51 @@ static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
 static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
+    VirtIONetQueue *q = &n->vq;
 
-    if (unlikely(n->tx_waiting)) {
+    if (unlikely(q->tx_waiting)) {
         return;
     }
-    n->tx_waiting = 1;
+    q->tx_waiting = 1;
     /* This happens when device was stopped but VCPU wasn't. */
     if (!n->vdev.vm_running) {
         return;
     }
     virtio_queue_set_notification(vq, 0);
-    qemu_bh_schedule(n->tx_bh);
+    qemu_bh_schedule(q->tx_bh);
 }
 
 static void virtio_net_tx_timer(void *opaque)
 {
-    VirtIONet *n = opaque;
+    VirtIONetQueue *q = opaque;
+    VirtIONet *n = q->n;
     assert(n->vdev.vm_running);
 
-    n->tx_waiting = 0;
+    q->tx_waiting = 0;
 
     /* Just in case the driver is not ready on more */
     if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
         return;
 
-    virtio_queue_set_notification(n->tx_vq, 1);
-    virtio_net_flush_tx(n, n->tx_vq);
+    virtio_queue_set_notification(q->tx_vq, 1);
+    virtio_net_flush_tx(q);
 }
 
 static void virtio_net_tx_bh(void *opaque)
 {
-    VirtIONet *n = opaque;
+    VirtIONetQueue *q = opaque;
+    VirtIONet *n = q->n;
     int32_t ret;
 
     assert(n->vdev.vm_running);
 
-    n->tx_waiting = 0;
+    q->tx_waiting = 0;
 
     /* Just in case the driver is not ready on more */
     if (unlikely(!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)))
         return;
 
-    ret = virtio_net_flush_tx(n, n->tx_vq);
+    ret = virtio_net_flush_tx(q);
     if (ret == -EBUSY) {
         return; /* Notification re-enable handled by tx_complete */
     }
@@ -851,33 +878,34 @@ static void virtio_net_tx_bh(void *opaque)
     /* If we flush a full burst of packets, assume there are
      * more coming and immediately reschedule */
     if (ret >= n->tx_burst) {
-        qemu_bh_schedule(n->tx_bh);
-        n->tx_waiting = 1;
+        qemu_bh_schedule(q->tx_bh);
+        q->tx_waiting = 1;
         return;
     }
 
     /* If less than a full burst, re-enable notification and flush
      * anything that may have come in while we weren't looking.  If
      * we find something, assume the guest is still active and reschedule */
-    virtio_queue_set_notification(n->tx_vq, 1);
-    if (virtio_net_flush_tx(n, n->tx_vq) > 0) {
-        virtio_queue_set_notification(n->tx_vq, 0);
-        qemu_bh_schedule(n->tx_bh);
-        n->tx_waiting = 1;
+    virtio_queue_set_notification(q->tx_vq, 1);
+    if (virtio_net_flush_tx(q) > 0) {
+        virtio_queue_set_notification(q->tx_vq, 0);
+        qemu_bh_schedule(q->tx_bh);
+        q->tx_waiting = 1;
     }
 }
 
 static void virtio_net_save(QEMUFile *f, void *opaque)
 {
     VirtIONet *n = opaque;
+    VirtIONetQueue *q = &n->vq;
 
     /* At this point, backend must be stopped, otherwise
      * it might keep writing to memory. */
-    assert(!n->vhost_started);
+    assert(!q->vhost_started);
     virtio_save(&n->vdev, f);
 
     qemu_put_buffer(f, n->mac, ETH_ALEN);
-    qemu_put_be32(f, n->tx_waiting);
+    qemu_put_be32(f, q->tx_waiting);
     qemu_put_be32(f, n->mergeable_rx_bufs);
     qemu_put_be16(f, n->status);
     qemu_put_byte(f, n->promisc);
@@ -898,6 +926,7 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
 static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
 {
     VirtIONet *n = opaque;
+    VirtIONetQueue *q = &n->vq;
     int i;
     int ret;
 
@@ -910,7 +939,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
     }
 
     qemu_get_buffer(f, n->mac, ETH_ALEN);
-    n->tx_waiting = qemu_get_be32(f);
+    q->tx_waiting = qemu_get_be32(f);
 
     virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f));
 
@@ -1027,7 +1056,8 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
     n->vdev.bad_features = virtio_net_bad_features;
     n->vdev.reset = virtio_net_reset;
     n->vdev.set_status = virtio_net_set_status;
-    n->rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
+    n->vq.rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
+    n->vq.n = n;
 
     if (net->tx && strcmp(net->tx, "timer") && strcmp(net->tx, "bh")) {
         error_report("virtio-net: "
@@ -1037,12 +1067,14 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
     }
 
     if (net->tx && !strcmp(net->tx, "timer")) {
-        n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_timer);
-        n->tx_timer = qemu_new_timer_ns(vm_clock, virtio_net_tx_timer, n);
+        n->vq.tx_vq = virtio_add_queue(&n->vdev, 256,
+                                       virtio_net_handle_tx_timer);
+        n->vq.tx_timer = qemu_new_timer_ns(vm_clock,
+                                           virtio_net_tx_timer, &n->vq);
         n->tx_timeout = net->txtimer;
     } else {
-        n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_bh);
-        n->tx_bh = qemu_bh_new(virtio_net_tx_bh, n);
+        n->vq.tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_bh);
+        n->vq.tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vq);
     }
     n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
     qemu_macaddr_default_if_unset(&conf->macaddr);
@@ -1060,7 +1092,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
 
     qemu_format_nic_info_str(qemu_get_queue(n->nic), conf->macaddr.a);
 
-    n->tx_waiting = 0;
+    n->vq.tx_waiting = 0;
     n->tx_burst = net->txburst;
     virtio_net_set_mrg_rx_bufs(n, 0);
     n->promisc = 1; /* for compatibility */
@@ -1081,6 +1113,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
 void virtio_net_exit(VirtIODevice *vdev)
 {
     VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
+    VirtIONetQueue *q = &n->vq;
 
     /* This will stop vhost backend if appropriate. */
     virtio_net_set_status(vdev, 0);
@@ -1092,11 +1125,11 @@ void virtio_net_exit(VirtIODevice *vdev)
     g_free(n->mac_table.macs);
     g_free(n->vlans);
 
-    if (n->tx_timer) {
-        qemu_del_timer(n->tx_timer);
-        qemu_free_timer(n->tx_timer);
+    if (q->tx_timer) {
+        qemu_del_timer(q->tx_timer);
+        qemu_free_timer(q->tx_timer);
     } else {
-        qemu_bh_delete(n->tx_bh);
+        qemu_bh_delete(q->tx_bh);
     }
 
     qemu_del_nic(n->nic);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 10/12] virtio-net: multiqueue support
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (8 preceding siblings ...)
  2012-12-28 10:32 ` [PATCH 09/12] virtio-net: separate virtqueue from VirtIONet Jason Wang
@ 2012-12-28 10:32 ` Jason Wang
  2012-12-28 17:52   ` Blue Swirl
  2013-01-08  9:07   ` [Qemu-devel] " Wanlong Gao
  2012-12-28 10:32 ` [PATCH 11/12] virtio-net: migration support for multiqueue Jason Wang
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:32 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

This patch implements both userspace and vhost support for multiple queue
virtio-net (VIRTIO_NET_F_MQ). This is done by introducing an array of
VirtIONetQueue to VirtIONet.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio-net.c |  318 ++++++++++++++++++++++++++++++++++++++++++-------------
 hw/virtio-net.h |   27 +++++-
 2 files changed, 271 insertions(+), 74 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index c6f0915..aaeef1b 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -45,7 +45,7 @@ typedef struct VirtIONet
     VirtIODevice vdev;
     uint8_t mac[ETH_ALEN];
     uint16_t status;
-    VirtIONetQueue vq;
+    VirtIONetQueue vqs[MAX_QUEUE_NUM];
     VirtQueue *ctrl_vq;
     NICState *nic;
     uint32_t tx_timeout;
@@ -70,14 +70,23 @@ typedef struct VirtIONet
     } mac_table;
     uint32_t *vlans;
     DeviceState *qdev;
+    int multiqueue;
+    uint16_t max_queues;
+    uint16_t curr_queues;
 } VirtIONet;
 
-static VirtIONetQueue *virtio_net_get_queue(NetClientState *nc)
+static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
 
-    return &n->vq;
+    return &n->vqs[nc->queue_index];
 }
+
+static int vq2q(int queue_index)
+{
+    return queue_index / 2;
+}
+
 /* TODO
  * - we could suppress RX interrupt if we were so inclined.
  */
@@ -93,6 +102,7 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
     struct virtio_net_config netcfg;
 
     stw_p(&netcfg.status, n->status);
+    stw_p(&netcfg.max_virtqueue_pairs, n->max_queues);
     memcpy(netcfg.mac, n->mac, ETH_ALEN);
     memcpy(config, &netcfg, sizeof(netcfg));
 }
@@ -116,31 +126,33 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
         (n->status & VIRTIO_NET_S_LINK_UP) && n->vdev.vm_running;
 }
 
-static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
+static void virtio_net_vhost_status(VirtIONet *n, int queue_index,
+                                    uint8_t status)
 {
-    VirtIONetQueue *q = &n->vq;
+    NetClientState *nc = qemu_get_subqueue(n->nic, queue_index);
+    VirtIONetQueue *q = &n->vqs[queue_index];
 
-    if (!qemu_get_queue(n->nic)->peer) {
+    if (!nc->peer) {
         return;
     }
-    if (qemu_get_queue(n->nic)->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
+    if (nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
         return;
     }
 
-    if (!tap_get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+    if (!tap_get_vhost_net(nc->peer)) {
         return;
     }
-    if (!!q->vhost_started == virtio_net_started(n, status) &&
-                              !qemu_get_queue(n->nic)->peer->link_down) {
+    if (!!q->vhost_started ==
+        (virtio_net_started(n, status) && !nc->peer->link_down)) {
         return;
     }
     if (!q->vhost_started) {
         int r;
-        if (!vhost_net_query(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev)) {
+        if (!vhost_net_query(tap_get_vhost_net(nc->peer), &n->vdev)) {
             return;
         }
-        r = vhost_net_start(tap_get_vhost_net(qemu_get_queue(n->nic)->peer),
-                            &n->vdev, 0);
+        r = vhost_net_start(tap_get_vhost_net(nc->peer), &n->vdev,
+                            queue_index * 2);
         if (r < 0) {
             error_report("unable to start vhost net: %d: "
                          "falling back on userspace virtio", -r);
@@ -148,7 +160,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
             q->vhost_started = 1;
         }
     } else {
-        vhost_net_stop(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev);
+        vhost_net_stop(tap_get_vhost_net(nc->peer), &n->vdev);
         q->vhost_started = 0;
     }
 }
@@ -156,26 +168,35 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
     VirtIONet *n = to_virtio_net(vdev);
-    VirtIONetQueue *q = &n->vq;
+    int i;
 
-    virtio_net_vhost_status(n, status);
+    for (i = 0; i < n->max_queues; i++) {
+        VirtIONetQueue *q = &n->vqs[i];
+        uint8_t queue_status = status;
 
-    if (!q->tx_waiting) {
-        return;
-    }
+        if ((!n->multiqueue && i != 0) || i >= n->curr_queues) {
+            queue_status = 0;
+        }
 
-    if (virtio_net_started(n, status) && !q->vhost_started) {
-        if (q->tx_timer) {
-            qemu_mod_timer(q->tx_timer,
-                           qemu_get_clock_ns(vm_clock) + n->tx_timeout);
-        } else {
-            qemu_bh_schedule(q->tx_bh);
+        virtio_net_vhost_status(n, i, queue_status);
+
+        if (!q->tx_waiting) {
+            continue;
         }
-    } else {
-        if (q->tx_timer) {
-            qemu_del_timer(q->tx_timer);
+
+        if (virtio_net_started(n, status) && !q->vhost_started) {
+            if (q->tx_timer) {
+                qemu_mod_timer(q->tx_timer,
+                               qemu_get_clock_ns(vm_clock) + n->tx_timeout);
+            } else {
+                qemu_bh_schedule(q->tx_bh);
+            }
         } else {
-            qemu_bh_cancel(q->tx_bh);
+            if (q->tx_timer) {
+                qemu_del_timer(q->tx_timer);
+            } else {
+                qemu_bh_cancel(q->tx_bh);
+            }
         }
     }
 }
@@ -207,6 +228,8 @@ static void virtio_net_reset(VirtIODevice *vdev)
     n->nomulti = 0;
     n->nouni = 0;
     n->nobcast = 0;
+    /* multiqueue is disalbed by default */
+    n->curr_queues = 1;
 
     /* Flush any MAC and VLAN filter table state */
     n->mac_table.in_use = 0;
@@ -245,18 +268,72 @@ static int peer_has_ufo(VirtIONet *n)
 
 static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs)
 {
+    int i;
+    NetClientState *nc;
+
     n->mergeable_rx_bufs = mergeable_rx_bufs;
 
     n->guest_hdr_len = n->mergeable_rx_bufs ?
         sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct virtio_net_hdr);
 
-    if (peer_has_vnet_hdr(n) &&
-        tap_has_vnet_hdr_len(qemu_get_queue(n->nic)->peer, n->guest_hdr_len)) {
-        tap_set_vnet_hdr_len(qemu_get_queue(n->nic)->peer, n->guest_hdr_len);
-        n->host_hdr_len = n->guest_hdr_len;
+    for (i = 0; i < n->max_queues; i++) {
+        nc = qemu_get_subqueue(n->nic, i);
+
+        if (peer_has_vnet_hdr(n) &&
+            tap_has_vnet_hdr_len(nc->peer, n->guest_hdr_len)) {
+            tap_set_vnet_hdr_len(nc->peer, n->guest_hdr_len);
+            n->host_hdr_len = n->guest_hdr_len;
+        }
     }
 }
 
+static int peer_attach(VirtIONet *n, int index)
+{
+    NetClientState *nc = qemu_get_subqueue(n->nic, index);
+    int ret;
+
+    if (!nc->peer) {
+        ret = -1;
+    } else if (nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
+        ret = -1;
+    } else {
+        ret = tap_attach(nc->peer);
+    }
+
+    return ret;
+}
+
+static int peer_detach(VirtIONet *n, int index)
+{
+    NetClientState *nc = qemu_get_subqueue(n->nic, index);
+    int ret;
+
+    if (!nc->peer) {
+        ret = -1;
+    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
+        ret = -1;
+    } else {
+        ret = tap_detach(nc->peer);
+    }
+
+    return ret;
+}
+
+static void virtio_net_set_queues(VirtIONet *n)
+{
+    int i;
+
+    for (i = 0; i < n->max_queues; i++) {
+        if (i < n->curr_queues) {
+            assert(!peer_attach(n, i));
+        } else {
+            assert(!peer_detach(n, i));
+        }
+    }
+}
+
+static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
+
 static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
 {
     VirtIONet *n = to_virtio_net(vdev);
@@ -308,25 +385,33 @@ static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
 static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
 {
     VirtIONet *n = to_virtio_net(vdev);
+    int i;
+
+    virtio_net_set_multiqueue(n, !!(features & (1 << VIRTIO_NET_F_MQ)),
+                              !!(features & (1 << VIRTIO_NET_F_CTRL_VQ)));
 
     virtio_net_set_mrg_rx_bufs(n, !!(features & (1 << VIRTIO_NET_F_MRG_RXBUF)));
 
     if (n->has_vnet_hdr) {
-        tap_set_offload(qemu_get_queue(n->nic)->peer,
+        tap_set_offload(qemu_get_subqueue(n->nic, 0)->peer,
                         (features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
                         (features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
                         (features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
                         (features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
                         (features >> VIRTIO_NET_F_GUEST_UFO)  & 1);
     }
-    if (!qemu_get_queue(n->nic)->peer ||
-        qemu_get_queue(n->nic)->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
-        return;
-    }
-    if (!tap_get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-        return;
+
+    for (i = 0;  i < n->max_queues; i++) {
+        NetClientState *nc = qemu_get_subqueue(n->nic, i);
+
+        if (!nc->peer || nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
+            continue;
+        }
+        if (!tap_get_vhost_net(nc->peer)) {
+            continue;
+        }
+        vhost_net_ack_features(tap_get_vhost_net(nc->peer), features);
     }
-    vhost_net_ack_features(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), features);
 }
 
 static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
@@ -436,6 +521,35 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
     return VIRTIO_NET_OK;
 }
 
+static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
+                                VirtQueueElement *elem)
+{
+    struct virtio_net_ctrl_mq s;
+
+    if (elem->out_num != 2 ||
+        elem->out_sg[1].iov_len != sizeof(struct virtio_net_ctrl_mq)) {
+        error_report("virtio-net ctrl invalid steering command");
+        return VIRTIO_NET_ERR;
+    }
+
+    if (cmd != VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
+        return VIRTIO_NET_ERR;
+    }
+
+    memcpy(&s, elem->out_sg[1].iov_base, sizeof(struct virtio_net_ctrl_mq));
+
+    if (s.virtqueue_pairs < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN ||
+        s.virtqueue_pairs > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX ||
+        s.virtqueue_pairs > n->max_queues) {
+        return VIRTIO_NET_ERR;
+    }
+
+    n->curr_queues = s.virtqueue_pairs;
+    virtio_net_set_queues(n);
+    virtio_net_set_status(&n->vdev, n->vdev.status);
+
+    return VIRTIO_NET_OK;
+}
 static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
@@ -464,6 +578,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
             status = virtio_net_handle_mac(n, ctrl.cmd, &elem);
         else if (ctrl.class == VIRTIO_NET_CTRL_VLAN)
             status = virtio_net_handle_vlan_table(n, ctrl.cmd, &elem);
+        else if (ctrl.class == VIRTIO_NET_CTRL_MQ)
+            status = virtio_net_handle_mq(n, ctrl.cmd, &elem);
 
         stb_p(elem.in_sg[elem.in_num - 1].iov_base, status);
 
@@ -477,19 +593,24 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
+    int queue_index = vq2q(virtio_get_queue_index(vq));
 
-    qemu_flush_queued_packets(qemu_get_queue(n->nic));
+    qemu_flush_queued_packets(qemu_get_subqueue(n->nic, queue_index));
 }
 
 static int virtio_net_can_receive(NetClientState *nc)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
-    VirtIONetQueue *q = virtio_net_get_queue(nc);
+    VirtIONetQueue *q = virtio_net_get_subqueue(nc);
 
     if (!n->vdev.vm_running) {
         return 0;
     }
 
+    if (nc->queue_index >= n->curr_queues) {
+        return 0;
+    }
+
     if (!virtio_queue_ready(q->rx_vq) ||
         !(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
         return 0;
@@ -620,14 +741,15 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
 static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
-    VirtIONetQueue *q = virtio_net_get_queue(nc);
+    VirtIONetQueue *q = virtio_net_get_subqueue(nc);
     struct iovec mhdr_sg[VIRTQUEUE_MAX_SIZE];
     struct virtio_net_hdr_mrg_rxbuf mhdr;
     unsigned mhdr_cnt = 0;
     size_t offset, i, guest_offset;
 
-    if (!virtio_net_can_receive(qemu_get_queue(n->nic)))
+    if (!virtio_net_can_receive(nc)) {
         return -1;
+    }
 
     /* hdr_len refers to the header we supply to the guest */
     if (!virtio_net_has_buffers(q, size + n->guest_hdr_len - n->host_hdr_len)) {
@@ -720,7 +842,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q);
 static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
-    VirtIONetQueue *q = virtio_net_get_queue(nc);
+    VirtIONetQueue *q = virtio_net_get_subqueue(nc);
 
     virtqueue_push(q->tx_vq, &q->async_tx.elem, 0);
     virtio_notify(&n->vdev, q->tx_vq);
@@ -737,6 +859,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
     VirtIONet *n = q->n;
     VirtQueueElement elem;
     int32_t num_packets = 0;
+    int queue_index = vq2q(virtio_get_queue_index(q->tx_vq));
     if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
         return num_packets;
     }
@@ -778,8 +901,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 
         len = n->guest_hdr_len;
 
-        ret = qemu_sendv_packet_async(qemu_get_queue(n->nic), out_sg, out_num,
-                                      virtio_net_tx_complete);
+        ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index),
+                                      out_sg, out_num, virtio_net_tx_complete);
         if (ret == 0) {
             virtio_queue_set_notification(q->tx_vq, 0);
             q->async_tx.elem = elem;
@@ -802,7 +925,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
-    VirtIONetQueue *q = &n->vq;
+    VirtIONetQueue *q = &n->vqs[vq2q(virtio_get_queue_index(vq))];
 
     /* This happens when device was stopped but VCPU wasn't. */
     if (!n->vdev.vm_running) {
@@ -826,7 +949,7 @@ static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
 static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
-    VirtIONetQueue *q = &n->vq;
+    VirtIONetQueue *q = &n->vqs[vq2q(virtio_get_queue_index(vq))];
 
     if (unlikely(q->tx_waiting)) {
         return;
@@ -894,10 +1017,49 @@ static void virtio_net_tx_bh(void *opaque)
     }
 }
 
+static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl)
+{
+    VirtIODevice *vdev = &n->vdev;
+    int i;
+
+    n->multiqueue = multiqueue;
+
+    if (!multiqueue)
+        n->curr_queues = 1;
+
+    for (i = 2; i <= n->max_queues * 2 + 1; i++) {
+        virtio_del_queue(vdev, i);
+    }
+
+    for (i = 1; i < n->max_queues; i++) {
+        n->vqs[i].rx_vq = virtio_add_queue(vdev, 256, virtio_net_handle_rx);
+        if (n->vqs[i].tx_timer) {
+            n->vqs[i].tx_vq =
+                virtio_add_queue(vdev, 256, virtio_net_handle_tx_timer);
+            n->vqs[i].tx_timer = qemu_new_timer_ns(vm_clock,
+                                                   virtio_net_tx_timer,
+                                                   &n->vqs[i]);
+        } else {
+            n->vqs[i].tx_vq =
+                virtio_add_queue(vdev, 256, virtio_net_handle_tx_bh);
+            n->vqs[i].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[i]);
+        }
+
+        n->vqs[i].tx_waiting = 0;
+        n->vqs[i].n = n;
+    }
+
+    if (ctrl) {
+        n->ctrl_vq = virtio_add_queue(vdev, 64, virtio_net_handle_ctrl);
+    }
+
+    virtio_net_set_queues(n);
+}
+
 static void virtio_net_save(QEMUFile *f, void *opaque)
 {
     VirtIONet *n = opaque;
-    VirtIONetQueue *q = &n->vq;
+    VirtIONetQueue *q = &n->vqs[0];
 
     /* At this point, backend must be stopped, otherwise
      * it might keep writing to memory. */
@@ -926,9 +1088,8 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
 static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
 {
     VirtIONet *n = opaque;
-    VirtIONetQueue *q = &n->vq;
-    int i;
-    int ret;
+    VirtIONetQueue *q = &n->vqs[0];
+    int ret, i;
 
     if (version_id < 2 || version_id > VIRTIO_NET_VM_VERSION)
         return -EINVAL;
@@ -1044,6 +1205,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
                               virtio_net_conf *net)
 {
     VirtIONet *n;
+    int i;
 
     n = (VirtIONet *)virtio_common_init("virtio-net", VIRTIO_ID_NET,
                                         sizeof(struct virtio_net_config),
@@ -1056,8 +1218,11 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
     n->vdev.bad_features = virtio_net_bad_features;
     n->vdev.reset = virtio_net_reset;
     n->vdev.set_status = virtio_net_set_status;
-    n->vq.rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
-    n->vq.n = n;
+    n->vqs[0].rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
+    n->max_queues = conf->queues;
+    n->curr_queues = 1;
+    n->vqs[0].n = n;
+    n->tx_timeout = net->txtimer;
 
     if (net->tx && strcmp(net->tx, "timer") && strcmp(net->tx, "bh")) {
         error_report("virtio-net: "
@@ -1067,14 +1232,14 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
     }
 
     if (net->tx && !strcmp(net->tx, "timer")) {
-        n->vq.tx_vq = virtio_add_queue(&n->vdev, 256,
-                                       virtio_net_handle_tx_timer);
-        n->vq.tx_timer = qemu_new_timer_ns(vm_clock,
-                                           virtio_net_tx_timer, &n->vq);
-        n->tx_timeout = net->txtimer;
+        n->vqs[0].tx_vq = virtio_add_queue(&n->vdev, 256,
+                                           virtio_net_handle_tx_timer);
+        n->vqs[0].tx_timer = qemu_new_timer_ns(vm_clock, virtio_net_tx_timer,
+                                               &n->vqs[0]);
     } else {
-        n->vq.tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_bh);
-        n->vq.tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vq);
+        n->vqs[0].tx_vq = virtio_add_queue(&n->vdev, 256,
+                                           virtio_net_handle_tx_bh);
+        n->vqs[0].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[0]);
     }
     n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
     qemu_macaddr_default_if_unset(&conf->macaddr);
@@ -1084,7 +1249,9 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
     n->nic = qemu_new_nic(&net_virtio_info, conf, object_get_typename(OBJECT(dev)), dev->id, n);
     peer_test_vnet_hdr(n);
     if (peer_has_vnet_hdr(n)) {
-        tap_using_vnet_hdr(qemu_get_queue(n->nic)->peer, 1);
+        for (i = 0; i < n->max_queues; i++) {
+            tap_using_vnet_hdr(qemu_get_subqueue(n->nic, i)->peer, 1);
+        }
         n->host_hdr_len = sizeof(struct virtio_net_hdr);
     } else {
         n->host_hdr_len = 0;
@@ -1092,7 +1259,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
 
     qemu_format_nic_info_str(qemu_get_queue(n->nic), conf->macaddr.a);
 
-    n->vq.tx_waiting = 0;
+    n->vqs[0].tx_waiting = 0;
     n->tx_burst = net->txburst;
     virtio_net_set_mrg_rx_bufs(n, 0);
     n->promisc = 1; /* for compatibility */
@@ -1113,23 +1280,28 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
 void virtio_net_exit(VirtIODevice *vdev)
 {
     VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
-    VirtIONetQueue *q = &n->vq;
+    int i;
 
     /* This will stop vhost backend if appropriate. */
     virtio_net_set_status(vdev, 0);
 
-    qemu_purge_queued_packets(qemu_get_queue(n->nic));
-
     unregister_savevm(n->qdev, "virtio-net", n);
 
     g_free(n->mac_table.macs);
     g_free(n->vlans);
 
-    if (q->tx_timer) {
-        qemu_del_timer(q->tx_timer);
-        qemu_free_timer(q->tx_timer);
-    } else {
-        qemu_bh_delete(q->tx_bh);
+    for (i = 0; i < n->max_queues; i++) {
+        VirtIONetQueue *q = &n->vqs[i];
+        NetClientState *nc = qemu_get_subqueue(n->nic, i);
+
+        qemu_purge_queued_packets(nc);
+
+        if (q->tx_timer) {
+            qemu_del_timer(q->tx_timer);
+            qemu_free_timer(q->tx_timer);
+        } else {
+            qemu_bh_delete(q->tx_bh);
+        }
     }
 
     qemu_del_nic(n->nic);
diff --git a/hw/virtio-net.h b/hw/virtio-net.h
index 36aa463..bc5857a 100644
--- a/hw/virtio-net.h
+++ b/hw/virtio-net.h
@@ -44,6 +44,8 @@
 #define VIRTIO_NET_F_CTRL_RX    18      /* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN  19      /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
+#define VIRTIO_NET_F_MQ         22      /* Device supports Receive Flow
+                                         * Steering */
 
 #define VIRTIO_NET_S_LINK_UP    1       /* Link is up */
 
@@ -72,6 +74,8 @@ struct virtio_net_config
     uint8_t mac[ETH_ALEN];
     /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
     uint16_t status;
+    /* Max virtqueue pairs supported by the device */
+    uint16_t max_virtqueue_pairs;
 } QEMU_PACKED;
 
 /* This is the first element of the scatter-gather list.  If you don't
@@ -168,6 +172,26 @@ struct virtio_net_ctrl_mac {
  #define VIRTIO_NET_CTRL_VLAN_ADD             0
  #define VIRTIO_NET_CTRL_VLAN_DEL             1
 
+/*
+ * Control Multiqueue
+ *
+ * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
+ * enables multiqueue, specifying the number of the transmit and
+ * receive queues that will be used. After the command is consumed and acked by
+ * the device, the device will not steer new packets on receive virtqueues
+ * other than specified nor read from transmit virtqueues other than specified.
+ * Accordingly, driver should not transmit new packets  on virtqueues other than
+ * specified.
+ */
+struct virtio_net_ctrl_mq {
+    uint16_t virtqueue_pairs;
+};
+
+#define VIRTIO_NET_CTRL_MQ   4
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
+
 #define DEFINE_VIRTIO_NET_FEATURES(_state, _field) \
         DEFINE_VIRTIO_COMMON_FEATURES(_state, _field), \
         DEFINE_PROP_BIT("csum", _state, _field, VIRTIO_NET_F_CSUM, true), \
@@ -186,5 +210,6 @@ struct virtio_net_ctrl_mac {
         DEFINE_PROP_BIT("ctrl_vq", _state, _field, VIRTIO_NET_F_CTRL_VQ, true), \
         DEFINE_PROP_BIT("ctrl_rx", _state, _field, VIRTIO_NET_F_CTRL_RX, true), \
         DEFINE_PROP_BIT("ctrl_vlan", _state, _field, VIRTIO_NET_F_CTRL_VLAN, true), \
-        DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true)
+        DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true), \
+        DEFINE_PROP_BIT("mq", _state, _field, VIRTIO_NET_F_MQ, true)
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 11/12] virtio-net: migration support for multiqueue
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (9 preceding siblings ...)
  2012-12-28 10:32 ` [PATCH 10/12] virtio-net: multiqueue support Jason Wang
@ 2012-12-28 10:32 ` Jason Wang
  2013-01-08  7:10   ` Michael S. Tsirkin
  2012-12-28 10:32 ` [PATCH 12/12] virtio-net: compat multiqueue support Jason Wang
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:32 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

This patch add migration support for multiqueue virtio-net. The version were
bumped to 12.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio-net.c |   45 +++++++++++++++++++++++++++++++++++----------
 1 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index aaeef1b..ca4b804 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -21,7 +21,7 @@
 #include "virtio-net.h"
 #include "vhost_net.h"
 
-#define VIRTIO_NET_VM_VERSION    11
+#define VIRTIO_NET_VM_VERSION    12
 
 #define MAC_TABLE_ENTRIES    64
 #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
@@ -1058,16 +1058,18 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl)
 
 static void virtio_net_save(QEMUFile *f, void *opaque)
 {
+    int i;
     VirtIONet *n = opaque;
-    VirtIONetQueue *q = &n->vqs[0];
 
-    /* At this point, backend must be stopped, otherwise
-     * it might keep writing to memory. */
-    assert(!q->vhost_started);
+    for (i = 0; i < n->max_queues; i++) {
+        /* At this point, backend must be stopped, otherwise
+         * it might keep writing to memory. */
+        assert(!n->vqs[i].vhost_started);
+    }
     virtio_save(&n->vdev, f);
 
     qemu_put_buffer(f, n->mac, ETH_ALEN);
-    qemu_put_be32(f, q->tx_waiting);
+    qemu_put_be32(f, n->vqs[0].tx_waiting);
     qemu_put_be32(f, n->mergeable_rx_bufs);
     qemu_put_be16(f, n->status);
     qemu_put_byte(f, n->promisc);
@@ -1083,13 +1085,17 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
     qemu_put_byte(f, n->nouni);
     qemu_put_byte(f, n->nobcast);
     qemu_put_byte(f, n->has_ufo);
+    qemu_put_be16(f, n->max_queues);
+    qemu_put_be16(f, n->curr_queues);
+    for (i = 1; i < n->curr_queues; i++) {
+        qemu_put_be32(f, n->vqs[i].tx_waiting);
+    }
 }
 
 static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
 {
     VirtIONet *n = opaque;
-    VirtIONetQueue *q = &n->vqs[0];
-    int ret, i;
+    int ret, i, link_down;
 
     if (version_id < 2 || version_id > VIRTIO_NET_VM_VERSION)
         return -EINVAL;
@@ -1100,7 +1106,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
     }
 
     qemu_get_buffer(f, n->mac, ETH_ALEN);
-    q->tx_waiting = qemu_get_be32(f);
+    n->vqs[0].tx_waiting = qemu_get_be32(f);
 
     virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f));
 
@@ -1170,6 +1176,22 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
         }
     }
 
+    if (version_id >= 12) {
+        if (n->max_queues != qemu_get_be16(f)) {
+            error_report("virtio-net: different max_queues ");
+            return -1;
+        }
+
+        n->curr_queues = qemu_get_be16(f);
+        for (i = 1; i < n->curr_queues; i++) {
+            n->vqs[i].tx_waiting = qemu_get_be32(f);
+        }
+    }
+
+    virtio_net_set_queues(n);
+    /* Must do this again, since we may have more than one active queues. */
+    virtio_net_set_status(&n->vdev, n->status);
+
     /* Find the first multicast entry in the saved MAC filter */
     for (i = 0; i < n->mac_table.in_use; i++) {
         if (n->mac_table.macs[i * ETH_ALEN] & 1) {
@@ -1180,7 +1202,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
 
     /* nc.link_down can't be migrated, so infer link_down according
      * to link status bit in n->status */
-    qemu_get_queue(n->nic)->link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    for (i = 0; i < n->max_queues; i++) {
+        qemu_get_subqueue(n->nic, i)->link_down = link_down;
+    }
 
     return 0;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 12/12] virtio-net: compat multiqueue support
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (10 preceding siblings ...)
  2012-12-28 10:32 ` [PATCH 11/12] virtio-net: migration support for multiqueue Jason Wang
@ 2012-12-28 10:32 ` Jason Wang
  2013-01-09 14:29 ` [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net Stefan Hajnoczi
  2013-01-14 19:44 ` Anthony Liguori
  13 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2012-12-28 10:32 UTC (permalink / raw)
  To: mst, aliguori, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

Disable multiqueue support for pre 1.4.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/pc_piix.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 19e342a..0145370 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -295,6 +295,10 @@ static QEMUMachine pc_machine_v1_4 = {
             .driver   = "usb-tablet",\
             .property = "usb_version",\
             .value    = stringify(1),\
+        },{ \
+            .driver   = "virtio-net-pci", \
+            .property = "mq", \
+            .value    = "off", \
         }
 
 static QEMUMachine pc_machine_v1_3 = {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/12] virtio-net: multiqueue support
  2012-12-28 10:32 ` [PATCH 10/12] virtio-net: multiqueue support Jason Wang
@ 2012-12-28 17:52   ` Blue Swirl
  2013-01-04  5:12     ` Jason Wang
  2013-01-08  9:07   ` [Qemu-devel] " Wanlong Gao
  1 sibling, 1 reply; 58+ messages in thread
From: Blue Swirl @ 2012-12-28 17:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On Fri, Dec 28, 2012 at 10:32 AM, Jason Wang <jasowang@redhat.com> wrote:
> This patch implements both userspace and vhost support for multiple queue
> virtio-net (VIRTIO_NET_F_MQ). This is done by introducing an array of
> VirtIONetQueue to VirtIONet.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/virtio-net.c |  318 ++++++++++++++++++++++++++++++++++++++++++-------------
>  hw/virtio-net.h |   27 +++++-
>  2 files changed, 271 insertions(+), 74 deletions(-)
>
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index c6f0915..aaeef1b 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -45,7 +45,7 @@ typedef struct VirtIONet
>      VirtIODevice vdev;
>      uint8_t mac[ETH_ALEN];
>      uint16_t status;
> -    VirtIONetQueue vq;
> +    VirtIONetQueue vqs[MAX_QUEUE_NUM];
>      VirtQueue *ctrl_vq;
>      NICState *nic;
>      uint32_t tx_timeout;
> @@ -70,14 +70,23 @@ typedef struct VirtIONet
>      } mac_table;
>      uint32_t *vlans;
>      DeviceState *qdev;
> +    int multiqueue;
> +    uint16_t max_queues;
> +    uint16_t curr_queues;
>  } VirtIONet;
>
> -static VirtIONetQueue *virtio_net_get_queue(NetClientState *nc)
> +static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc)
>  {
>      VirtIONet *n = qemu_get_nic_opaque(nc);
>
> -    return &n->vq;
> +    return &n->vqs[nc->queue_index];
>  }
> +
> +static int vq2q(int queue_index)
> +{
> +    return queue_index / 2;
> +}
> +
>  /* TODO
>   * - we could suppress RX interrupt if we were so inclined.
>   */
> @@ -93,6 +102,7 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
>      struct virtio_net_config netcfg;
>
>      stw_p(&netcfg.status, n->status);
> +    stw_p(&netcfg.max_virtqueue_pairs, n->max_queues);
>      memcpy(netcfg.mac, n->mac, ETH_ALEN);
>      memcpy(config, &netcfg, sizeof(netcfg));
>  }
> @@ -116,31 +126,33 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
>          (n->status & VIRTIO_NET_S_LINK_UP) && n->vdev.vm_running;
>  }
>
> -static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> +static void virtio_net_vhost_status(VirtIONet *n, int queue_index,
> +                                    uint8_t status)
>  {
> -    VirtIONetQueue *q = &n->vq;
> +    NetClientState *nc = qemu_get_subqueue(n->nic, queue_index);
> +    VirtIONetQueue *q = &n->vqs[queue_index];
>
> -    if (!qemu_get_queue(n->nic)->peer) {
> +    if (!nc->peer) {
>          return;
>      }
> -    if (qemu_get_queue(n->nic)->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
> +    if (nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
>          return;
>      }
>
> -    if (!tap_get_vhost_net(qemu_get_queue(n->nic)->peer)) {
> +    if (!tap_get_vhost_net(nc->peer)) {
>          return;
>      }
> -    if (!!q->vhost_started == virtio_net_started(n, status) &&
> -                              !qemu_get_queue(n->nic)->peer->link_down) {
> +    if (!!q->vhost_started ==
> +        (virtio_net_started(n, status) && !nc->peer->link_down)) {
>          return;
>      }
>      if (!q->vhost_started) {
>          int r;
> -        if (!vhost_net_query(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev)) {
> +        if (!vhost_net_query(tap_get_vhost_net(nc->peer), &n->vdev)) {
>              return;
>          }
> -        r = vhost_net_start(tap_get_vhost_net(qemu_get_queue(n->nic)->peer),
> -                            &n->vdev, 0);
> +        r = vhost_net_start(tap_get_vhost_net(nc->peer), &n->vdev,
> +                            queue_index * 2);
>          if (r < 0) {
>              error_report("unable to start vhost net: %d: "
>                           "falling back on userspace virtio", -r);
> @@ -148,7 +160,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>              q->vhost_started = 1;
>          }
>      } else {
> -        vhost_net_stop(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), &n->vdev);
> +        vhost_net_stop(tap_get_vhost_net(nc->peer), &n->vdev);
>          q->vhost_started = 0;
>      }
>  }
> @@ -156,26 +168,35 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>  static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIONet *n = to_virtio_net(vdev);
> -    VirtIONetQueue *q = &n->vq;
> +    int i;
>
> -    virtio_net_vhost_status(n, status);
> +    for (i = 0; i < n->max_queues; i++) {
> +        VirtIONetQueue *q = &n->vqs[i];
> +        uint8_t queue_status = status;
>
> -    if (!q->tx_waiting) {
> -        return;
> -    }
> +        if ((!n->multiqueue && i != 0) || i >= n->curr_queues) {
> +            queue_status = 0;
> +        }
>
> -    if (virtio_net_started(n, status) && !q->vhost_started) {
> -        if (q->tx_timer) {
> -            qemu_mod_timer(q->tx_timer,
> -                           qemu_get_clock_ns(vm_clock) + n->tx_timeout);
> -        } else {
> -            qemu_bh_schedule(q->tx_bh);
> +        virtio_net_vhost_status(n, i, queue_status);
> +
> +        if (!q->tx_waiting) {
> +            continue;
>          }
> -    } else {
> -        if (q->tx_timer) {
> -            qemu_del_timer(q->tx_timer);
> +
> +        if (virtio_net_started(n, status) && !q->vhost_started) {
> +            if (q->tx_timer) {
> +                qemu_mod_timer(q->tx_timer,
> +                               qemu_get_clock_ns(vm_clock) + n->tx_timeout);
> +            } else {
> +                qemu_bh_schedule(q->tx_bh);
> +            }
>          } else {
> -            qemu_bh_cancel(q->tx_bh);
> +            if (q->tx_timer) {
> +                qemu_del_timer(q->tx_timer);
> +            } else {
> +                qemu_bh_cancel(q->tx_bh);
> +            }
>          }
>      }
>  }
> @@ -207,6 +228,8 @@ static void virtio_net_reset(VirtIODevice *vdev)
>      n->nomulti = 0;
>      n->nouni = 0;
>      n->nobcast = 0;
> +    /* multiqueue is disalbed by default */
> +    n->curr_queues = 1;
>
>      /* Flush any MAC and VLAN filter table state */
>      n->mac_table.in_use = 0;
> @@ -245,18 +268,72 @@ static int peer_has_ufo(VirtIONet *n)
>
>  static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs)
>  {
> +    int i;
> +    NetClientState *nc;
> +
>      n->mergeable_rx_bufs = mergeable_rx_bufs;
>
>      n->guest_hdr_len = n->mergeable_rx_bufs ?
>          sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct virtio_net_hdr);
>
> -    if (peer_has_vnet_hdr(n) &&
> -        tap_has_vnet_hdr_len(qemu_get_queue(n->nic)->peer, n->guest_hdr_len)) {
> -        tap_set_vnet_hdr_len(qemu_get_queue(n->nic)->peer, n->guest_hdr_len);
> -        n->host_hdr_len = n->guest_hdr_len;
> +    for (i = 0; i < n->max_queues; i++) {
> +        nc = qemu_get_subqueue(n->nic, i);
> +
> +        if (peer_has_vnet_hdr(n) &&
> +            tap_has_vnet_hdr_len(nc->peer, n->guest_hdr_len)) {
> +            tap_set_vnet_hdr_len(nc->peer, n->guest_hdr_len);
> +            n->host_hdr_len = n->guest_hdr_len;
> +        }
>      }
>  }
>
> +static int peer_attach(VirtIONet *n, int index)
> +{
> +    NetClientState *nc = qemu_get_subqueue(n->nic, index);
> +    int ret;
> +
> +    if (!nc->peer) {
> +        ret = -1;
> +    } else if (nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
> +        ret = -1;
> +    } else {
> +        ret = tap_attach(nc->peer);
> +    }
> +
> +    return ret;
> +}
> +
> +static int peer_detach(VirtIONet *n, int index)
> +{
> +    NetClientState *nc = qemu_get_subqueue(n->nic, index);
> +    int ret;
> +
> +    if (!nc->peer) {
> +        ret = -1;
> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
> +        ret = -1;
> +    } else {
> +        ret = tap_detach(nc->peer);
> +    }
> +
> +    return ret;
> +}
> +
> +static void virtio_net_set_queues(VirtIONet *n)
> +{
> +    int i;
> +
> +    for (i = 0; i < n->max_queues; i++) {
> +        if (i < n->curr_queues) {
> +            assert(!peer_attach(n, i));
> +        } else {
> +            assert(!peer_detach(n, i));
> +        }
> +    }
> +}
> +
> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
> +
>  static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
>  {
>      VirtIONet *n = to_virtio_net(vdev);
> @@ -308,25 +385,33 @@ static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
>  static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
>  {
>      VirtIONet *n = to_virtio_net(vdev);
> +    int i;
> +
> +    virtio_net_set_multiqueue(n, !!(features & (1 << VIRTIO_NET_F_MQ)),
> +                              !!(features & (1 << VIRTIO_NET_F_CTRL_VQ)));
>
>      virtio_net_set_mrg_rx_bufs(n, !!(features & (1 << VIRTIO_NET_F_MRG_RXBUF)));
>
>      if (n->has_vnet_hdr) {
> -        tap_set_offload(qemu_get_queue(n->nic)->peer,
> +        tap_set_offload(qemu_get_subqueue(n->nic, 0)->peer,
>                          (features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
>                          (features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
>                          (features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
>                          (features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
>                          (features >> VIRTIO_NET_F_GUEST_UFO)  & 1);
>      }
> -    if (!qemu_get_queue(n->nic)->peer ||
> -        qemu_get_queue(n->nic)->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
> -        return;
> -    }
> -    if (!tap_get_vhost_net(qemu_get_queue(n->nic)->peer)) {
> -        return;
> +
> +    for (i = 0;  i < n->max_queues; i++) {
> +        NetClientState *nc = qemu_get_subqueue(n->nic, i);
> +
> +        if (!nc->peer || nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
> +            continue;
> +        }
> +        if (!tap_get_vhost_net(nc->peer)) {
> +            continue;
> +        }
> +        vhost_net_ack_features(tap_get_vhost_net(nc->peer), features);
>      }
> -    vhost_net_ack_features(tap_get_vhost_net(qemu_get_queue(n->nic)->peer), features);
>  }
>
>  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> @@ -436,6 +521,35 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>      return VIRTIO_NET_OK;
>  }
>
> +static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
> +                                VirtQueueElement *elem)
> +{
> +    struct virtio_net_ctrl_mq s;
> +
> +    if (elem->out_num != 2 ||
> +        elem->out_sg[1].iov_len != sizeof(struct virtio_net_ctrl_mq)) {
> +        error_report("virtio-net ctrl invalid steering command");
> +        return VIRTIO_NET_ERR;
> +    }
> +
> +    if (cmd != VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
> +        return VIRTIO_NET_ERR;
> +    }
> +
> +    memcpy(&s, elem->out_sg[1].iov_base, sizeof(struct virtio_net_ctrl_mq));
> +
> +    if (s.virtqueue_pairs < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN ||
> +        s.virtqueue_pairs > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX ||
> +        s.virtqueue_pairs > n->max_queues) {
> +        return VIRTIO_NET_ERR;
> +    }
> +
> +    n->curr_queues = s.virtqueue_pairs;
> +    virtio_net_set_queues(n);
> +    virtio_net_set_status(&n->vdev, n->vdev.status);
> +
> +    return VIRTIO_NET_OK;
> +}
>  static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>  {
>      VirtIONet *n = to_virtio_net(vdev);
> @@ -464,6 +578,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>              status = virtio_net_handle_mac(n, ctrl.cmd, &elem);
>          else if (ctrl.class == VIRTIO_NET_CTRL_VLAN)
>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, &elem);
> +        else if (ctrl.class == VIRTIO_NET_CTRL_MQ)

Please add braces.

> +            status = virtio_net_handle_mq(n, ctrl.cmd, &elem);
>
>          stb_p(elem.in_sg[elem.in_num - 1].iov_base, status);
>
> @@ -477,19 +593,24 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>  static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
>  {
>      VirtIONet *n = to_virtio_net(vdev);
> +    int queue_index = vq2q(virtio_get_queue_index(vq));
>
> -    qemu_flush_queued_packets(qemu_get_queue(n->nic));
> +    qemu_flush_queued_packets(qemu_get_subqueue(n->nic, queue_index));
>  }
>
>  static int virtio_net_can_receive(NetClientState *nc)
>  {
>      VirtIONet *n = qemu_get_nic_opaque(nc);
> -    VirtIONetQueue *q = virtio_net_get_queue(nc);
> +    VirtIONetQueue *q = virtio_net_get_subqueue(nc);
>
>      if (!n->vdev.vm_running) {
>          return 0;
>      }
>
> +    if (nc->queue_index >= n->curr_queues) {
> +        return 0;
> +    }
> +
>      if (!virtio_queue_ready(q->rx_vq) ||
>          !(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>          return 0;
> @@ -620,14 +741,15 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
>  static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
>  {
>      VirtIONet *n = qemu_get_nic_opaque(nc);
> -    VirtIONetQueue *q = virtio_net_get_queue(nc);
> +    VirtIONetQueue *q = virtio_net_get_subqueue(nc);
>      struct iovec mhdr_sg[VIRTQUEUE_MAX_SIZE];
>      struct virtio_net_hdr_mrg_rxbuf mhdr;
>      unsigned mhdr_cnt = 0;
>      size_t offset, i, guest_offset;
>
> -    if (!virtio_net_can_receive(qemu_get_queue(n->nic)))
> +    if (!virtio_net_can_receive(nc)) {
>          return -1;
> +    }
>
>      /* hdr_len refers to the header we supply to the guest */
>      if (!virtio_net_has_buffers(q, size + n->guest_hdr_len - n->host_hdr_len)) {
> @@ -720,7 +842,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q);
>  static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
>  {
>      VirtIONet *n = qemu_get_nic_opaque(nc);
> -    VirtIONetQueue *q = virtio_net_get_queue(nc);
> +    VirtIONetQueue *q = virtio_net_get_subqueue(nc);
>
>      virtqueue_push(q->tx_vq, &q->async_tx.elem, 0);
>      virtio_notify(&n->vdev, q->tx_vq);
> @@ -737,6 +859,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>      VirtIONet *n = q->n;
>      VirtQueueElement elem;
>      int32_t num_packets = 0;
> +    int queue_index = vq2q(virtio_get_queue_index(q->tx_vq));
>      if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>          return num_packets;
>      }
> @@ -778,8 +901,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>
>          len = n->guest_hdr_len;
>
> -        ret = qemu_sendv_packet_async(qemu_get_queue(n->nic), out_sg, out_num,
> -                                      virtio_net_tx_complete);
> +        ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index),
> +                                      out_sg, out_num, virtio_net_tx_complete);
>          if (ret == 0) {
>              virtio_queue_set_notification(q->tx_vq, 0);
>              q->async_tx.elem = elem;
> @@ -802,7 +925,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>  static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
>  {
>      VirtIONet *n = to_virtio_net(vdev);
> -    VirtIONetQueue *q = &n->vq;
> +    VirtIONetQueue *q = &n->vqs[vq2q(virtio_get_queue_index(vq))];
>
>      /* This happens when device was stopped but VCPU wasn't. */
>      if (!n->vdev.vm_running) {
> @@ -826,7 +949,7 @@ static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
>  static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
>  {
>      VirtIONet *n = to_virtio_net(vdev);
> -    VirtIONetQueue *q = &n->vq;
> +    VirtIONetQueue *q = &n->vqs[vq2q(virtio_get_queue_index(vq))];
>
>      if (unlikely(q->tx_waiting)) {
>          return;
> @@ -894,10 +1017,49 @@ static void virtio_net_tx_bh(void *opaque)
>      }
>  }
>
> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl)
> +{
> +    VirtIODevice *vdev = &n->vdev;
> +    int i;
> +
> +    n->multiqueue = multiqueue;
> +
> +    if (!multiqueue)
> +        n->curr_queues = 1;

Ditto. Didn't checkpatch.pl catch these or did you not check?

> +
> +    for (i = 2; i <= n->max_queues * 2 + 1; i++) {
> +        virtio_del_queue(vdev, i);
> +    }
> +
> +    for (i = 1; i < n->max_queues; i++) {
> +        n->vqs[i].rx_vq = virtio_add_queue(vdev, 256, virtio_net_handle_rx);
> +        if (n->vqs[i].tx_timer) {
> +            n->vqs[i].tx_vq =
> +                virtio_add_queue(vdev, 256, virtio_net_handle_tx_timer);
> +            n->vqs[i].tx_timer = qemu_new_timer_ns(vm_clock,
> +                                                   virtio_net_tx_timer,
> +                                                   &n->vqs[i]);
> +        } else {
> +            n->vqs[i].tx_vq =
> +                virtio_add_queue(vdev, 256, virtio_net_handle_tx_bh);
> +            n->vqs[i].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[i]);
> +        }
> +
> +        n->vqs[i].tx_waiting = 0;
> +        n->vqs[i].n = n;
> +    }
> +
> +    if (ctrl) {
> +        n->ctrl_vq = virtio_add_queue(vdev, 64, virtio_net_handle_ctrl);
> +    }
> +
> +    virtio_net_set_queues(n);
> +}
> +
>  static void virtio_net_save(QEMUFile *f, void *opaque)
>  {
>      VirtIONet *n = opaque;
> -    VirtIONetQueue *q = &n->vq;
> +    VirtIONetQueue *q = &n->vqs[0];
>
>      /* At this point, backend must be stopped, otherwise
>       * it might keep writing to memory. */
> @@ -926,9 +1088,8 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
>  static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>  {
>      VirtIONet *n = opaque;
> -    VirtIONetQueue *q = &n->vq;
> -    int i;
> -    int ret;
> +    VirtIONetQueue *q = &n->vqs[0];
> +    int ret, i;
>
>      if (version_id < 2 || version_id > VIRTIO_NET_VM_VERSION)
>          return -EINVAL;
> @@ -1044,6 +1205,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>                                virtio_net_conf *net)
>  {
>      VirtIONet *n;
> +    int i;
>
>      n = (VirtIONet *)virtio_common_init("virtio-net", VIRTIO_ID_NET,
>                                          sizeof(struct virtio_net_config),
> @@ -1056,8 +1218,11 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>      n->vdev.bad_features = virtio_net_bad_features;
>      n->vdev.reset = virtio_net_reset;
>      n->vdev.set_status = virtio_net_set_status;
> -    n->vq.rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
> -    n->vq.n = n;
> +    n->vqs[0].rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
> +    n->max_queues = conf->queues;
> +    n->curr_queues = 1;
> +    n->vqs[0].n = n;
> +    n->tx_timeout = net->txtimer;
>
>      if (net->tx && strcmp(net->tx, "timer") && strcmp(net->tx, "bh")) {
>          error_report("virtio-net: "
> @@ -1067,14 +1232,14 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>      }
>
>      if (net->tx && !strcmp(net->tx, "timer")) {
> -        n->vq.tx_vq = virtio_add_queue(&n->vdev, 256,
> -                                       virtio_net_handle_tx_timer);
> -        n->vq.tx_timer = qemu_new_timer_ns(vm_clock,
> -                                           virtio_net_tx_timer, &n->vq);
> -        n->tx_timeout = net->txtimer;
> +        n->vqs[0].tx_vq = virtio_add_queue(&n->vdev, 256,
> +                                           virtio_net_handle_tx_timer);
> +        n->vqs[0].tx_timer = qemu_new_timer_ns(vm_clock, virtio_net_tx_timer,
> +                                               &n->vqs[0]);
>      } else {
> -        n->vq.tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_bh);
> -        n->vq.tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vq);
> +        n->vqs[0].tx_vq = virtio_add_queue(&n->vdev, 256,
> +                                           virtio_net_handle_tx_bh);
> +        n->vqs[0].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[0]);
>      }
>      n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
>      qemu_macaddr_default_if_unset(&conf->macaddr);
> @@ -1084,7 +1249,9 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>      n->nic = qemu_new_nic(&net_virtio_info, conf, object_get_typename(OBJECT(dev)), dev->id, n);
>      peer_test_vnet_hdr(n);
>      if (peer_has_vnet_hdr(n)) {
> -        tap_using_vnet_hdr(qemu_get_queue(n->nic)->peer, 1);
> +        for (i = 0; i < n->max_queues; i++) {
> +            tap_using_vnet_hdr(qemu_get_subqueue(n->nic, i)->peer, 1);
> +        }
>          n->host_hdr_len = sizeof(struct virtio_net_hdr);
>      } else {
>          n->host_hdr_len = 0;
> @@ -1092,7 +1259,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>
>      qemu_format_nic_info_str(qemu_get_queue(n->nic), conf->macaddr.a);
>
> -    n->vq.tx_waiting = 0;
> +    n->vqs[0].tx_waiting = 0;
>      n->tx_burst = net->txburst;
>      virtio_net_set_mrg_rx_bufs(n, 0);
>      n->promisc = 1; /* for compatibility */
> @@ -1113,23 +1280,28 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>  void virtio_net_exit(VirtIODevice *vdev)
>  {
>      VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
> -    VirtIONetQueue *q = &n->vq;
> +    int i;
>
>      /* This will stop vhost backend if appropriate. */
>      virtio_net_set_status(vdev, 0);
>
> -    qemu_purge_queued_packets(qemu_get_queue(n->nic));
> -
>      unregister_savevm(n->qdev, "virtio-net", n);
>
>      g_free(n->mac_table.macs);
>      g_free(n->vlans);
>
> -    if (q->tx_timer) {
> -        qemu_del_timer(q->tx_timer);
> -        qemu_free_timer(q->tx_timer);
> -    } else {
> -        qemu_bh_delete(q->tx_bh);
> +    for (i = 0; i < n->max_queues; i++) {
> +        VirtIONetQueue *q = &n->vqs[i];
> +        NetClientState *nc = qemu_get_subqueue(n->nic, i);
> +
> +        qemu_purge_queued_packets(nc);
> +
> +        if (q->tx_timer) {
> +            qemu_del_timer(q->tx_timer);
> +            qemu_free_timer(q->tx_timer);
> +        } else {
> +            qemu_bh_delete(q->tx_bh);
> +        }
>      }
>
>      qemu_del_nic(n->nic);
> diff --git a/hw/virtio-net.h b/hw/virtio-net.h
> index 36aa463..bc5857a 100644
> --- a/hw/virtio-net.h
> +++ b/hw/virtio-net.h
> @@ -44,6 +44,8 @@
>  #define VIRTIO_NET_F_CTRL_RX    18      /* Control channel RX mode support */
>  #define VIRTIO_NET_F_CTRL_VLAN  19      /* Control channel VLAN filtering */
>  #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
> +#define VIRTIO_NET_F_MQ         22      /* Device supports Receive Flow
> +                                         * Steering */
>
>  #define VIRTIO_NET_S_LINK_UP    1       /* Link is up */
>
> @@ -72,6 +74,8 @@ struct virtio_net_config
>      uint8_t mac[ETH_ALEN];
>      /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
>      uint16_t status;
> +    /* Max virtqueue pairs supported by the device */
> +    uint16_t max_virtqueue_pairs;
>  } QEMU_PACKED;
>
>  /* This is the first element of the scatter-gather list.  If you don't
> @@ -168,6 +172,26 @@ struct virtio_net_ctrl_mac {
>   #define VIRTIO_NET_CTRL_VLAN_ADD             0
>   #define VIRTIO_NET_CTRL_VLAN_DEL             1
>
> +/*
> + * Control Multiqueue
> + *
> + * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> + * enables multiqueue, specifying the number of the transmit and
> + * receive queues that will be used. After the command is consumed and acked by
> + * the device, the device will not steer new packets on receive virtqueues
> + * other than specified nor read from transmit virtqueues other than specified.
> + * Accordingly, driver should not transmit new packets  on virtqueues other than
> + * specified.
> + */
> +struct virtio_net_ctrl_mq {

VirtIONetCtrlMQ and please don't forget the typedef.

> +    uint16_t virtqueue_pairs;
> +};
> +
> +#define VIRTIO_NET_CTRL_MQ   4
> + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0
> + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
> + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
> +
>  #define DEFINE_VIRTIO_NET_FEATURES(_state, _field) \
>          DEFINE_VIRTIO_COMMON_FEATURES(_state, _field), \
>          DEFINE_PROP_BIT("csum", _state, _field, VIRTIO_NET_F_CSUM, true), \
> @@ -186,5 +210,6 @@ struct virtio_net_ctrl_mac {
>          DEFINE_PROP_BIT("ctrl_vq", _state, _field, VIRTIO_NET_F_CTRL_VQ, true), \
>          DEFINE_PROP_BIT("ctrl_rx", _state, _field, VIRTIO_NET_F_CTRL_RX, true), \
>          DEFINE_PROP_BIT("ctrl_vlan", _state, _field, VIRTIO_NET_F_CTRL_VLAN, true), \
> -        DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true)
> +        DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true), \
> +        DEFINE_PROP_BIT("mq", _state, _field, VIRTIO_NET_F_MQ, true)
>  #endif
> --
> 1.7.1
>
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 05/12] net: multiqueue support
  2012-12-28 10:31 ` [PATCH 05/12] net: multiqueue support Jason Wang
@ 2012-12-28 18:06   ` Blue Swirl
  0 siblings, 0 replies; 58+ messages in thread
From: Blue Swirl @ 2012-12-28 18:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On Fri, Dec 28, 2012 at 10:31 AM, Jason Wang <jasowang@redhat.com> wrote:
> This patch adds basic multiqueue support for qemu. The idea is simple, an array
> of NetClientStates were introduced in NICState, parse_netdev() were extended to
> find and match all NetClientStates belongs to the backend and place their
> pointers in NICConf. Then qemu_new_nic can setup a N:N mapping between NICStates
> that belongs to a nic and NICStates belongs to the netdev. After this, each
> peers of a NICStaet were abstracted as a queue.
>
> To adapt this change, set_link/netdev_del command will find all the
> NetClientStates of a nic or a netdev, and change all their state in one run.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/dp8393x.c         |    2 +-
>  hw/mcf_fec.c         |    2 +-
>  hw/qdev-properties.c |   46 +++++++++++--
>  hw/qdev-properties.h |    6 +-
>  net.c                |  172 +++++++++++++++++++++++++++++++++++++-------------
>  net.h                |   27 +++++++-
>  6 files changed, 195 insertions(+), 60 deletions(-)
>
> diff --git a/hw/dp8393x.c b/hw/dp8393x.c
> index 8f20a4a..fad0837 100644
> --- a/hw/dp8393x.c
> +++ b/hw/dp8393x.c
> @@ -899,7 +899,7 @@ void dp83932_init(NICInfo *nd, hwaddr base, int it_shift,
>      s->regs[SONIC_SR] = 0x0004; /* only revision recognized by Linux */
>
>      s->conf.macaddr = nd->macaddr;
> -    s->conf.peer = nd->netdev;
> +    s->conf.peers.ncs[0] = nd->netdev;
>
>      s->nic = qemu_new_nic(&net_dp83932_info, &s->conf, nd->model, nd->name, s);
>
> diff --git a/hw/mcf_fec.c b/hw/mcf_fec.c
> index 7fc89b5..c298bec 100644
> --- a/hw/mcf_fec.c
> +++ b/hw/mcf_fec.c
> @@ -472,7 +472,7 @@ void mcf_fec_init(MemoryRegion *sysmem, NICInfo *nd,
>      memory_region_add_subregion(sysmem, base, &s->iomem);
>
>      s->conf.macaddr = nd->macaddr;
> -    s->conf.peer = nd->netdev;
> +    s->conf.peers[0] = nd->netdev;
>
>      s->nic = qemu_new_nic(&net_mcf_fec_info, &s->conf, nd->model, nd->name, s);
>
> diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
> index 81d901c..6e45def 100644
> --- a/hw/qdev-properties.c
> +++ b/hw/qdev-properties.c
> @@ -585,16 +585,47 @@ PropertyInfo qdev_prop_chr = {
>
>  static int parse_netdev(DeviceState *dev, const char *str, void **ptr)
>  {
> -    NetClientState *netdev = qemu_find_netdev(str);
> +    NICPeers *peers_ptr = (NICPeers *)ptr;
> +    NICConf *conf = container_of(peers_ptr, NICConf, peers);
> +    NetClientState **ncs = peers_ptr->ncs;
> +    NetClientState *peers[MAX_QUEUE_NUM];
> +    int queues, i = 0;
> +    int ret;
>
> -    if (netdev == NULL) {
> -        return -ENOENT;
> +    queues = qemu_find_net_clients_except(str, peers,
> +                                          NET_CLIENT_OPTIONS_KIND_NIC,
> +                                          MAX_QUEUE_NUM);
> +    if (queues == 0) {
> +        ret = -ENOENT;
> +        goto err;
>      }
> -    if (netdev->peer) {
> -        return -EEXIST;
> +
> +    if (queues > MAX_QUEUE_NUM) {
> +        ret = -E2BIG;
> +        goto err;
> +    }
> +
> +    for (i = 0; i < queues; i++) {
> +        if (peers[i] == NULL) {
> +            ret = -ENOENT;
> +            goto err;
> +        }
> +
> +        if (peers[i]->peer) {
> +            ret = -EEXIST;
> +            goto err;
> +        }
> +
> +        ncs[i] = peers[i];
> +        ncs[i]->queue_index = i;
>      }
> -    *ptr = netdev;
> +
> +    conf->queues = queues;
> +
>      return 0;
> +
> +err:
> +    return ret;
>  }
>
>  static const char *print_netdev(void *ptr)
> @@ -661,7 +692,8 @@ static void set_vlan(Object *obj, Visitor *v, void *opaque,
>  {
>      DeviceState *dev = DEVICE(obj);
>      Property *prop = opaque;
> -    NetClientState **ptr = qdev_get_prop_ptr(dev, prop);
> +    NICPeers *peers_ptr = qdev_get_prop_ptr(dev, prop);
> +    NetClientState **ptr = &peers_ptr->ncs[0];
>      Error *local_err = NULL;
>      int32_t id;
>      NetClientState *hubport;
> diff --git a/hw/qdev-properties.h b/hw/qdev-properties.h
> index 5b046ab..2d90848 100644
> --- a/hw/qdev-properties.h
> +++ b/hw/qdev-properties.h
> @@ -31,7 +31,7 @@ extern PropertyInfo qdev_prop_pci_host_devaddr;
>          .name      = (_name),                                    \
>          .info      = &(_prop),                                   \
>          .offset    = offsetof(_state, _field)                    \
> -            + type_check(_type,typeof_field(_state, _field)),    \
> +            + type_check(_type, typeof_field(_state, _field)),   \
>          }
>  #define DEFINE_PROP_DEFAULT(_name, _state, _field, _defval, _prop, _type) { \
>          .name      = (_name),                                           \
> @@ -77,9 +77,9 @@ extern PropertyInfo qdev_prop_pci_host_devaddr;
>  #define DEFINE_PROP_STRING(_n, _s, _f)             \
>      DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*)
>  #define DEFINE_PROP_NETDEV(_n, _s, _f)             \
> -    DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, NetClientState*)
> +    DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, NICPeers)
>  #define DEFINE_PROP_VLAN(_n, _s, _f)             \
> -    DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, NetClientState*)
> +    DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, NICPeers)
>  #define DEFINE_PROP_DRIVE(_n, _s, _f) \
>      DEFINE_PROP(_n, _s, _f, qdev_prop_drive, BlockDriverState *)
>  #define DEFINE_PROP_MACADDR(_n, _s, _f)         \
> diff --git a/net.c b/net.c
> index 97ee542..4ceba33 100644
> --- a/net.c
> +++ b/net.c
> @@ -181,17 +181,12 @@ static char *assign_name(NetClientState *nc1, const char *model)
>      return g_strdup(buf);
>  }
>
> -NetClientState *qemu_new_net_client(NetClientInfo *info,
> -                                    NetClientState *peer,
> -                                    const char *model,
> -                                    const char *name)
> +void qemu_net_client_setup(NetClientState *nc,
> +                           NetClientInfo *info,
> +                           NetClientState *peer,
> +                           const char *model,
> +                           const char *name)
>  {
> -    NetClientState *nc;
> -
> -    assert(info->size >= sizeof(NetClientState));
> -
> -    nc = g_malloc0(info->size);
> -
>      nc->info = info;
>      nc->model = g_strdup(model);
>      if (name) {
> @@ -208,6 +203,20 @@ NetClientState *qemu_new_net_client(NetClientInfo *info,
>      QTAILQ_INSERT_TAIL(&net_clients, nc, next);
>
>      nc->send_queue = qemu_new_net_queue(nc);
> +}
> +
> +
> +NetClientState *qemu_new_net_client(NetClientInfo *info,
> +                                    NetClientState *peer,
> +                                    const char *model,
> +                                    const char *name)
> +{
> +    NetClientState *nc;
> +
> +    assert(info->size >= sizeof(NetClientState));
> +
> +    nc = g_malloc0(info->size);
> +    qemu_net_client_setup(nc, info, peer, model, name);
>
>      return nc;
>  }
> @@ -219,28 +228,43 @@ NICState *qemu_new_nic(NetClientInfo *info,
>                         void *opaque)
>  {
>      NetClientState *nc;
> +    NetClientState **peers = conf->peers.ncs;
>      NICState *nic;
> +    int i;
>
>      assert(info->type == NET_CLIENT_OPTIONS_KIND_NIC);
>      assert(info->size >= sizeof(NICState));
>
> -    nc = qemu_new_net_client(info, conf->peer, model, name);
> +    nc = qemu_new_net_client(info, peers[0], model, name);
> +    nc->queue_index = 0;
>
>      nic = qemu_get_nic(nc);
>      nic->conf = conf;
>      nic->opaque = opaque;
>
> +    for (i = 1; i < conf->queues; i++) {
> +        qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, nc->name);
> +        nic->ncs[i].queue_index = i;
> +    }
> +
>      return nic;
>  }
>
> +NetClientState *qemu_get_subqueue(NICState *nic, int queue_index)
> +{
> +    return &nic->ncs[queue_index];
> +}
> +
>  NetClientState *qemu_get_queue(NICState *nic)
>  {
> -    return &nic->nc;
> +    return qemu_get_subqueue(nic, 0);
>  }
>
>  NICState *qemu_get_nic(NetClientState *nc)
>  {
> -    return DO_UPCAST(NICState, nc, nc);
> +    NetClientState *nc0 = nc - nc->queue_index;
> +
> +    return DO_UPCAST(NICState, ncs[0], nc0);
>  }
>
>  void *qemu_get_nic_opaque(NetClientState *nc)
> @@ -254,12 +278,10 @@ static void qemu_cleanup_net_client(NetClientState *nc)
>  {
>      QTAILQ_REMOVE(&net_clients, nc, next);
>
> -    if (nc->info->cleanup) {
> -        nc->info->cleanup(nc);
> -    }
> +    nc->info->cleanup(nc);
>  }
>
> -static void qemu_free_net_client(NetClientState *nc)
> +static void qemu_free_net_client(NetClientState *nc, bool free)
>  {
>      if (nc->send_queue) {
>          qemu_del_net_queue(nc->send_queue);
> @@ -269,11 +291,24 @@ static void qemu_free_net_client(NetClientState *nc)
>      }
>      g_free(nc->name);
>      g_free(nc->model);
> -    g_free(nc);
> +
> +    if (free)

Missing braces.

> +        g_free(nc);
>  }
>
>  void qemu_del_net_client(NetClientState *nc)
>  {
> +    NetClientState *ncs[MAX_QUEUE_NUM];
> +    int queues, i;
> +
> +    /* If the NetClientState belongs to a multiqueue backend, we will change all
> +     * other NetClientStates also.
> +     */
> +    queues = qemu_find_net_clients_except(nc->name, ncs,
> +                                          NET_CLIENT_OPTIONS_KIND_NIC,
> +                                          MAX_QUEUE_NUM);
> +    assert(queues != 0);
> +
>      /* If there is a peer NIC, delete and cleanup client, but do not free. */
>      if (nc->peer && nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
>          NICState *nic = qemu_get_nic(nc->peer);
> @@ -281,34 +316,50 @@ void qemu_del_net_client(NetClientState *nc)
>              return;
>          }
>          nic->peer_deleted = true;
> -        /* Let NIC know peer is gone. */
> -        nc->peer->link_down = true;
> +
> +        for (i = 0; i < queues; i++) {
> +            ncs[i]->peer->link_down = true;
> +        }
> +
>          if (nc->peer->info->link_status_changed) {
>              nc->peer->info->link_status_changed(nc->peer);
>          }
> -        qemu_cleanup_net_client(nc);
> +
> +        for (i = 0; i < queues; i++) {
> +            qemu_cleanup_net_client(ncs[i]);
> +        }
> +
>          return;
>      }
>
>      assert(nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC);
>
> -    qemu_cleanup_net_client(nc);
> -    qemu_free_net_client(nc);
> +    for (i = 0; i < queues; i++) {
> +        qemu_cleanup_net_client(ncs[i]);
> +        qemu_free_net_client(ncs[i], true);
> +    }
>  }
>
>  void qemu_del_nic(NICState *nic)
>  {
> -    NetClientState *nc = qemu_get_queue(nic);
> +    int i, queues = nic->conf->queues;
> +
>      /* If this is a peer NIC and peer has already been deleted, free it now. */
> -    if (nc->peer && nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
> -        NICState *nic = qemu_get_nic(nc);
> -        if (nic->peer_deleted) {
> -            qemu_free_net_client(nc->peer);
> +    if (nic->peer_deleted) {
> +        for (i = 0; i < queues; i++) {
> +            qemu_free_net_client(qemu_get_subqueue(nic, i)->peer, true);
>          }
>      }
>
> -    qemu_cleanup_net_client(nc);
> -    qemu_free_net_client(nc);
> +    for (i = 1; i < queues; i++) {
> +        NetClientState *nc = qemu_get_subqueue(nic, i);
> +
> +        qemu_cleanup_net_client(nc);
> +        qemu_free_net_client(nc, false);
> +    }
> +
> +    qemu_cleanup_net_client(qemu_get_subqueue(nic, 0));
> +    qemu_free_net_client(qemu_get_subqueue(nic, 0), true);
>  }
>
>  void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
> @@ -317,7 +368,9 @@ void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
>
>      QTAILQ_FOREACH(nc, &net_clients, next) {
>          if (nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
> -            func(qemu_get_nic(nc), opaque);
> +            if (nc->queue_index == 0) {
> +                func(qemu_get_nic(nc), opaque);
> +            }
>          }
>      }
>  }
> @@ -507,6 +560,27 @@ NetClientState *qemu_find_netdev(const char *id)
>      return NULL;
>  }
>
> +int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
> +                                 NetClientOptionsKind type, int max)
> +{
> +    NetClientState *nc;
> +    int ret = 0;
> +
> +    QTAILQ_FOREACH(nc, &net_clients, next) {
> +        if (nc->info->type == type) {
> +            continue;
> +        }
> +        if (!strcmp(nc->name, id)) {
> +            if (ret < max) {
> +                ncs[ret] = nc;
> +            }
> +            ret++;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
>  static int nic_get_free_idx(void)
>  {
>      int index;
> @@ -873,8 +947,11 @@ void qmp_netdev_del(const char *id, Error **errp)
>
>  void print_net_client(Monitor *mon, NetClientState *nc)
>  {
> -    monitor_printf(mon, "%s: type=%s,%s\n", nc->name,
> -                   NetClientOptionsKind_lookup[nc->info->type], nc->info_str);
> +    monitor_printf(mon, "%s: index=%d,type=%s,%s,link=%s\n", nc->name,
> +                       nc->queue_index,
> +                       NetClientOptionsKind_lookup[nc->info->type],
> +                       nc->info_str,
> +                       nc->link_down ? "down" : "up");
>  }
>
>  void do_info_network(Monitor *mon)
> @@ -905,20 +982,23 @@ void do_info_network(Monitor *mon)
>
>  void qmp_set_link(const char *name, bool up, Error **errp)
>  {
> -    NetClientState *nc = NULL;
> +    NetClientState *ncs[MAX_QUEUE_NUM];
> +    NetClientState *nc;
> +    int queues, i;
>
> -    QTAILQ_FOREACH(nc, &net_clients, next) {
> -        if (!strcmp(nc->name, name)) {
> -            goto done;
> -        }
> -    }
> -done:
> -    if (!nc) {
> +    queues = qemu_find_net_clients_except(name, ncs,
> +                                          NET_CLIENT_OPTIONS_KIND_MAX,
> +                                          MAX_QUEUE_NUM);
> +
> +    if (queues == 0) {
>          error_set(errp, QERR_DEVICE_NOT_FOUND, name);
>          return;
>      }
> +    nc = ncs[0];
>
> -    nc->link_down = !up;
> +    for (i = 0; i < queues; i++) {
> +        ncs[i]->link_down = !up;
> +    }
>
>      if (nc->info->link_status_changed) {
>          nc->info->link_status_changed(nc);
> @@ -938,9 +1018,13 @@ done:
>
>  void net_cleanup(void)
>  {
> -    NetClientState *nc, *next_vc;
> +    NetClientState *nc;
>
> -    QTAILQ_FOREACH_SAFE(nc, &net_clients, next, next_vc) {
> +    /* We may del multiple entries during qemu_del_net_client(),
> +     * so QTAILQ_FOREACH_SAFE() is also not safe here.
> +     */
> +    while (!QTAILQ_EMPTY(&net_clients)) {
> +        nc = QTAILQ_FIRST(&net_clients);
>          if (nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
>              qemu_del_nic(qemu_get_nic(nc));
>          } else {
> diff --git a/net.h b/net.h
> index 0d53337..6ff1afc 100644
> --- a/net.h
> +++ b/net.h
> @@ -9,24 +9,32 @@
>  #include "vmstate.h"
>  #include "qapi-types.h"
>
> +#define MAX_QUEUE_NUM 1024
> +
>  struct MACAddr {
>      uint8_t a[6];
>  };
>
>  /* qdev nic properties */
>
> +typedef struct NICPeers {
> +    NetClientState *ncs[MAX_QUEUE_NUM];
> +} NICPeers;
> +
>  typedef struct NICConf {
>      MACAddr macaddr;
> -    NetClientState *peer;
> +    NICPeers peers;
>      int32_t bootindex;
> +    int32_t queues;
>  } NICConf;
>
>  #define DEFINE_NIC_PROPERTIES(_state, _conf)                            \
>      DEFINE_PROP_MACADDR("mac",   _state, _conf.macaddr),                \
> -    DEFINE_PROP_VLAN("vlan",     _state, _conf.peer),                   \
> -    DEFINE_PROP_NETDEV("netdev", _state, _conf.peer),                   \
> +    DEFINE_PROP_VLAN("vlan",     _state, _conf.peers),                   \
> +    DEFINE_PROP_NETDEV("netdev", _state, _conf.peers),                   \
>      DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1)
>
> +
>  /* Net clients */
>
>  typedef void (NetPoll)(NetClientState *, bool enable);
> @@ -58,16 +66,26 @@ struct NetClientState {
>      char *name;
>      char info_str[256];
>      unsigned receive_disabled : 1;
> +    unsigned int queue_index;
>  };
>
>  typedef struct NICState {
> -    NetClientState nc;
> +    NetClientState ncs[MAX_QUEUE_NUM];
>      NICConf *conf;
>      void *opaque;
>      bool peer_deleted;
>  } NICState;
>
>  NetClientState *qemu_find_netdev(const char *id);
> +int qemu_find_net_clients_except(const char *id,
> +                                 NetClientState **ncs,
> +                                 NetClientOptionsKind type,
> +                                 int max);
> +void qemu_net_client_setup(NetClientState *nc,
> +                           NetClientInfo *info,
> +                           NetClientState *peer,
> +                           const char *model,
> +                           const char *name);
>  NetClientState *qemu_new_net_client(NetClientInfo *info,
>                                      NetClientState *peer,
>                                      const char *model,
> @@ -78,6 +96,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
>                         const char *name,
>                         void *opaque);
>  void qemu_del_nic(NICState *nic);
> +NetClientState *qemu_get_subqueue(NICState *nic, int queue_index);
>  NetClientState *qemu_get_queue(NICState *nic);
>  NICState *qemu_get_nic(NetClientState *nc);
>  void *qemu_get_nic_opaque(NetClientState *nc);
> --
> 1.7.1
>
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/12] virtio-net: multiqueue support
  2012-12-28 17:52   ` Blue Swirl
@ 2013-01-04  5:12     ` Jason Wang
  2013-01-04 20:41       ` Blue Swirl
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-04  5:12 UTC (permalink / raw)
  To: Blue Swirl
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 12/29/2012 01:52 AM, Blue Swirl wrote:
> On Fri, Dec 28, 2012 at 10:32 AM, Jason Wang <jasowang@redhat.com> wrote:
>> This patch implements both userspace and vhost support for multiple queue
>> virtio-net (VIRTIO_NET_F_MQ). This is done by introducing an array of
>> VirtIONetQueue to VirtIONet.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  hw/virtio-net.c |  318 ++++++++++++++++++++++++++++++++++++++++++-------------
>>  hw/virtio-net.h |   27 +++++-
>>  2 files changed, 271 insertions(+), 74 deletions(-)
[...]
>>  static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>  {
>>      VirtIONet *n = to_virtio_net(vdev);
>> @@ -464,6 +578,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>              status = virtio_net_handle_mac(n, ctrl.cmd, &elem);
>>          else if (ctrl.class == VIRTIO_NET_CTRL_VLAN)
>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, &elem);
>> +        else if (ctrl.class == VIRTIO_NET_CTRL_MQ)
> Please add braces.

Sure.
>
>> +            status = virtio_net_handle_mq(n, ctrl.cmd, &elem);
>>
>>          stb_p(elem.in_sg[elem.in_num - 1].iov_base, status);
>>
>> @@ -477,19 +593,24 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>  static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
>>  {
>>      VirtIONet *n = to_virtio_net(vdev);
>> +    int queue_index = vq2q(virtio_get_queue_index(vq));
>>
>> -    qemu_flush_queued_packets(qemu_get_queue(n->nic));
>> +    qemu_flush_queued_packets(qemu_get_subqueue(n->nic, queue_index));
>>  }
>>
>>  
[...]
>>
>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl)
>> +{
>> +    VirtIODevice *vdev = &n->vdev;
>> +    int i;
>> +
>> +    n->multiqueue = multiqueue;
>> +
>> +    if (!multiqueue)
>> +        n->curr_queues = 1;
> Ditto. Didn't checkpatch.pl catch these or did you not check?

Sorry, will add braces here. I run checkpatch.pl but finally find that
some or lots of the existed codes (such as this file) does not obey the
rules. So I'm not sure whether I need to correct my own codes, or left
them as this file does and correct them all in the future.
>
[...]
>>  } QEMU_PACKED;
>>
>>  /* This is the first element of the scatter-gather list.  If you don't
>> @@ -168,6 +172,26 @@ struct virtio_net_ctrl_mac {
>>   #define VIRTIO_NET_CTRL_VLAN_ADD             0
>>   #define VIRTIO_NET_CTRL_VLAN_DEL             1
>>
>> +/*
>> + * Control Multiqueue
>> + *
>> + * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
>> + * enables multiqueue, specifying the number of the transmit and
>> + * receive queues that will be used. After the command is consumed and acked by
>> + * the device, the device will not steer new packets on receive virtqueues
>> + * other than specified nor read from transmit virtqueues other than specified.
>> + * Accordingly, driver should not transmit new packets  on virtqueues other than
>> + * specified.
>> + */
>> +struct virtio_net_ctrl_mq {
> VirtIONetCtrlMQ and please don't forget the typedef.

Sure, but the same question as above. (See other structures in this file).
>
>> +    uint16_t virtqueue_pairs;
>> +};
>> +
>> +#define VIRTIO_NET_CTRL_MQ   4
>> + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0
>> + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
>> + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
>> +
>>  #define DEFINE_VIRTIO_NET_FEATURES(_state, _field) \
>>          DEFINE_VIRTIO_COMMON_FEATURES(_state, _field), \
>>          DEFINE_PROP_BIT("csum", _state, _field, VIRTIO_NET_F_CSUM, true), \
>> @@ -186,5 +210,6 @@ struct virtio_net_ctrl_mac {
>>          DEFINE_PROP_BIT("ctrl_vq", _state, _field, VIRTIO_NET_F_CTRL_VQ, true), \
>>          DEFINE_PROP_BIT("ctrl_rx", _state, _field, VIRTIO_NET_F_CTRL_RX, true), \
>>          DEFINE_PROP_BIT("ctrl_vlan", _state, _field, VIRTIO_NET_F_CTRL_VLAN, true), \
>> -        DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true)
>> +        DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true), \
>> +        DEFINE_PROP_BIT("mq", _state, _field, VIRTIO_NET_F_MQ, true)
>>  #endif
>> --
>> 1.7.1
>>
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/12] virtio-net: multiqueue support
  2013-01-04  5:12     ` Jason Wang
@ 2013-01-04 20:41       ` Blue Swirl
  0 siblings, 0 replies; 58+ messages in thread
From: Blue Swirl @ 2013-01-04 20:41 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On Fri, Jan 4, 2013 at 5:12 AM, Jason Wang <jasowang@redhat.com> wrote:
> On 12/29/2012 01:52 AM, Blue Swirl wrote:
>> On Fri, Dec 28, 2012 at 10:32 AM, Jason Wang <jasowang@redhat.com> wrote:
>>> This patch implements both userspace and vhost support for multiple queue
>>> virtio-net (VIRTIO_NET_F_MQ). This is done by introducing an array of
>>> VirtIONetQueue to VirtIONet.
>>>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>>  hw/virtio-net.c |  318 ++++++++++++++++++++++++++++++++++++++++++-------------
>>>  hw/virtio-net.h |   27 +++++-
>>>  2 files changed, 271 insertions(+), 74 deletions(-)
> [...]
>>>  static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>>  {
>>>      VirtIONet *n = to_virtio_net(vdev);
>>> @@ -464,6 +578,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>>              status = virtio_net_handle_mac(n, ctrl.cmd, &elem);
>>>          else if (ctrl.class == VIRTIO_NET_CTRL_VLAN)
>>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, &elem);
>>> +        else if (ctrl.class == VIRTIO_NET_CTRL_MQ)
>> Please add braces.
>
> Sure.
>>
>>> +            status = virtio_net_handle_mq(n, ctrl.cmd, &elem);
>>>
>>>          stb_p(elem.in_sg[elem.in_num - 1].iov_base, status);
>>>
>>> @@ -477,19 +593,24 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>>  static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
>>>  {
>>>      VirtIONet *n = to_virtio_net(vdev);
>>> +    int queue_index = vq2q(virtio_get_queue_index(vq));
>>>
>>> -    qemu_flush_queued_packets(qemu_get_queue(n->nic));
>>> +    qemu_flush_queued_packets(qemu_get_subqueue(n->nic, queue_index));
>>>  }
>>>
>>>
> [...]
>>>
>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl)
>>> +{
>>> +    VirtIODevice *vdev = &n->vdev;
>>> +    int i;
>>> +
>>> +    n->multiqueue = multiqueue;
>>> +
>>> +    if (!multiqueue)
>>> +        n->curr_queues = 1;
>> Ditto. Didn't checkpatch.pl catch these or did you not check?
>
> Sorry, will add braces here. I run checkpatch.pl but finally find that
> some or lots of the existed codes (such as this file) does not obey the
> rules. So I'm not sure whether I need to correct my own codes, or left
> them as this file does and correct them all in the future.

The goal is to make QEMU codebase conform to CODING_STYLE. Currently
this is not the case for some amounts of code, but we should use
opportunities like this to advance towards that goal.

>>
> [...]
>>>  } QEMU_PACKED;
>>>
>>>  /* This is the first element of the scatter-gather list.  If you don't
>>> @@ -168,6 +172,26 @@ struct virtio_net_ctrl_mac {
>>>   #define VIRTIO_NET_CTRL_VLAN_ADD             0
>>>   #define VIRTIO_NET_CTRL_VLAN_DEL             1
>>>
>>> +/*
>>> + * Control Multiqueue
>>> + *
>>> + * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
>>> + * enables multiqueue, specifying the number of the transmit and
>>> + * receive queues that will be used. After the command is consumed and acked by
>>> + * the device, the device will not steer new packets on receive virtqueues
>>> + * other than specified nor read from transmit virtqueues other than specified.
>>> + * Accordingly, driver should not transmit new packets  on virtqueues other than
>>> + * specified.
>>> + */
>>> +struct virtio_net_ctrl_mq {
>> VirtIONetCtrlMQ and please don't forget the typedef.
>
> Sure, but the same question as above. (See other structures in this file).
>>
>>> +    uint16_t virtqueue_pairs;
>>> +};
>>> +
>>> +#define VIRTIO_NET_CTRL_MQ   4
>>> + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0
>>> + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
>>> + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
>>> +
>>>  #define DEFINE_VIRTIO_NET_FEATURES(_state, _field) \
>>>          DEFINE_VIRTIO_COMMON_FEATURES(_state, _field), \
>>>          DEFINE_PROP_BIT("csum", _state, _field, VIRTIO_NET_F_CSUM, true), \
>>> @@ -186,5 +210,6 @@ struct virtio_net_ctrl_mac {
>>>          DEFINE_PROP_BIT("ctrl_vq", _state, _field, VIRTIO_NET_F_CTRL_VQ, true), \
>>>          DEFINE_PROP_BIT("ctrl_rx", _state, _field, VIRTIO_NET_F_CTRL_RX, true), \
>>>          DEFINE_PROP_BIT("ctrl_vlan", _state, _field, VIRTIO_NET_F_CTRL_VLAN, true), \
>>> -        DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true)
>>> +        DEFINE_PROP_BIT("ctrl_rx_extra", _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true), \
>>> +        DEFINE_PROP_BIT("mq", _state, _field, VIRTIO_NET_F_MQ, true)
>>>  #endif
>>> --
>>> 1.7.1
>>>
>>>
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 11/12] virtio-net: migration support for multiqueue
  2012-12-28 10:32 ` [PATCH 11/12] virtio-net: migration support for multiqueue Jason Wang
@ 2013-01-08  7:10   ` Michael S. Tsirkin
  2013-01-08  9:27     ` Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Michael S. Tsirkin @ 2013-01-08  7:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: aliguori, stefanha, qemu-devel, rusty, kvm, mprivozn, shiyer,
	krkumar2, jwhan

On Fri, Dec 28, 2012 at 06:32:03PM +0800, Jason Wang wrote:
> This patch add migration support for multiqueue virtio-net. The version were
> bumped to 12.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/virtio-net.c |   45 +++++++++++++++++++++++++++++++++++----------
>  1 files changed, 35 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index aaeef1b..ca4b804 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -21,7 +21,7 @@
>  #include "virtio-net.h"
>  #include "vhost_net.h"
>  
> -#define VIRTIO_NET_VM_VERSION    11
> +#define VIRTIO_NET_VM_VERSION    12

Please don't, use a subsection instead.

>  #define MAC_TABLE_ENTRIES    64
>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
> @@ -1058,16 +1058,18 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl)
>  
>  static void virtio_net_save(QEMUFile *f, void *opaque)
>  {
> +    int i;
>      VirtIONet *n = opaque;
> -    VirtIONetQueue *q = &n->vqs[0];
>  
> -    /* At this point, backend must be stopped, otherwise
> -     * it might keep writing to memory. */
> -    assert(!q->vhost_started);
> +    for (i = 0; i < n->max_queues; i++) {
> +        /* At this point, backend must be stopped, otherwise
> +         * it might keep writing to memory. */
> +        assert(!n->vqs[i].vhost_started);
> +    }
>      virtio_save(&n->vdev, f);
>  
>      qemu_put_buffer(f, n->mac, ETH_ALEN);
> -    qemu_put_be32(f, q->tx_waiting);
> +    qemu_put_be32(f, n->vqs[0].tx_waiting);
>      qemu_put_be32(f, n->mergeable_rx_bufs);
>      qemu_put_be16(f, n->status);
>      qemu_put_byte(f, n->promisc);
> @@ -1083,13 +1085,17 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
>      qemu_put_byte(f, n->nouni);
>      qemu_put_byte(f, n->nobcast);
>      qemu_put_byte(f, n->has_ufo);
> +    qemu_put_be16(f, n->max_queues);

Above is specified by user so seems unnecessary in the migration stream.

Below should only be put if relevant: check host feature bit
set and/or max_queues > 1.

> +    qemu_put_be16(f, n->curr_queues);
> +    for (i = 1; i < n->curr_queues; i++) {
> +        qemu_put_be32(f, n->vqs[i].tx_waiting);
> +    }
>  }
>  
>  static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>  {
>      VirtIONet *n = opaque;
> -    VirtIONetQueue *q = &n->vqs[0];
> -    int ret, i;
> +    int ret, i, link_down;
>  
>      if (version_id < 2 || version_id > VIRTIO_NET_VM_VERSION)
>          return -EINVAL;
> @@ -1100,7 +1106,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>      }
>  
>      qemu_get_buffer(f, n->mac, ETH_ALEN);
> -    q->tx_waiting = qemu_get_be32(f);
> +    n->vqs[0].tx_waiting = qemu_get_be32(f);
>  
>      virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f));
>  
> @@ -1170,6 +1176,22 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>          }
>      }
>  
> +    if (version_id >= 12) {
> +        if (n->max_queues != qemu_get_be16(f)) {
> +            error_report("virtio-net: different max_queues ");
> +            return -1;
> +        }
> +
> +        n->curr_queues = qemu_get_be16(f);
> +        for (i = 1; i < n->curr_queues; i++) {
> +            n->vqs[i].tx_waiting = qemu_get_be32(f);
> +        }
> +    }
> +
> +    virtio_net_set_queues(n);
> +    /* Must do this again, since we may have more than one active queues. */

s/queues/queue/

Also I didn't understand why it's here.
It seems that virtio has vm running callback,
and that will invoke virtio_net_set_status after vm load.
No?


> +    virtio_net_set_status(&n->vdev, n->status);
> +
>      /* Find the first multicast entry in the saved MAC filter */
>      for (i = 0; i < n->mac_table.in_use; i++) {
>          if (n->mac_table.macs[i * ETH_ALEN] & 1) {
> @@ -1180,7 +1202,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>  
>      /* nc.link_down can't be migrated, so infer link_down according
>       * to link status bit in n->status */
> -    qemu_get_queue(n->nic)->link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
> +    link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
> +    for (i = 0; i < n->max_queues; i++) {
> +        qemu_get_subqueue(n->nic, i)->link_down = link_down;
> +    }
>  
>      return 0;
>  }
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 07/12] virtio: introduce virtio_queue_del()
  2012-12-28 10:31 ` [PATCH 07/12] virtio: introduce virtio_queue_del() Jason Wang
@ 2013-01-08  7:14   ` Michael S. Tsirkin
  2013-01-08  9:28     ` Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Michael S. Tsirkin @ 2013-01-08  7:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mprivozn, rusty, qemu-devel, stefanha,
	jwhan, shiyer

On Fri, Dec 28, 2012 at 06:31:59PM +0800, Jason Wang wrote:
> Some device (such as virtio-net) needs the ability to destroy or re-order the
> virtqueues, this patch adds a helper to do this.
> 
> Signed-off-by: Jason Wang <jasowang>

Actually del_queue unlike what the subject says :)

> ---
>  hw/virtio.c |    9 +++++++++
>  hw/virtio.h |    2 ++
>  2 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/virtio.c b/hw/virtio.c
> index f40a8c5..bc3c9c3 100644
> --- a/hw/virtio.c
> +++ b/hw/virtio.c
> @@ -700,6 +700,15 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>      return &vdev->vq[i];
>  }
>  
> +void virtio_del_queue(VirtIODevice *vdev, int n)
> +{
> +    if (n < 0 || n >= VIRTIO_PCI_QUEUE_MAX) {
> +        abort();
> +    }
> +
> +    vdev->vq[n].vring.num = 0;
> +}
> +
>  void virtio_irq(VirtQueue *vq)
>  {
>      trace_virtio_irq(vq);
> diff --git a/hw/virtio.h b/hw/virtio.h
> index 7c17f7b..f6cb0f9 100644
> --- a/hw/virtio.h
> +++ b/hw/virtio.h
> @@ -138,6 +138,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>                              void (*handle_output)(VirtIODevice *,
>                                                    VirtQueue *));
>  
> +void virtio_del_queue(VirtIODevice *vdev, int n);
> +
>  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>                      unsigned int len);
>  void virtqueue_flush(VirtQueue *vq, unsigned int count);
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2012-12-28 10:32 ` [PATCH 10/12] virtio-net: multiqueue support Jason Wang
  2012-12-28 17:52   ` Blue Swirl
@ 2013-01-08  9:07   ` Wanlong Gao
  2013-01-08  9:29     ` Jason Wang
  1 sibling, 1 reply; 58+ messages in thread
From: Wanlong Gao @ 2013-01-08  9:07 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, aliguori, stefanha, qemu-devel, krkumar2, kvm, mprivozn,
	rusty, jwhan, shiyer

On 12/28/2012 06:32 PM, Jason Wang wrote:
> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
> +        ret = -1;
> +    } else {
> +        ret = tap_detach(nc->peer);
> +    }
> +
> +    return ret;
> +}
> +
> +static void virtio_net_set_queues(VirtIONet *n)
> +{
> +    int i;
> +
> +    for (i = 0; i < n->max_queues; i++) {
> +        if (i < n->curr_queues) {
> +            assert(!peer_attach(n, i));
> +        } else {
> +            assert(!peer_detach(n, i));

I got a assert here,
qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.

Any thoughts?

Thanks,
Wanlong Gao

> +        }
> +    }
> +}
> +
> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
> +


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 11/12] virtio-net: migration support for multiqueue
  2013-01-08  7:10   ` Michael S. Tsirkin
@ 2013-01-08  9:27     ` Jason Wang
  0 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2013-01-08  9:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: krkumar2, aliguori, kvm, mprivozn, rusty, qemu-devel, stefanha,
	jwhan, shiyer

On 01/08/2013 03:10 PM, Michael S. Tsirkin wrote:
> On Fri, Dec 28, 2012 at 06:32:03PM +0800, Jason Wang wrote:
>> This patch add migration support for multiqueue virtio-net. The version were
>> bumped to 12.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  hw/virtio-net.c |   45 +++++++++++++++++++++++++++++++++++----------
>>  1 files changed, 35 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
>> index aaeef1b..ca4b804 100644
>> --- a/hw/virtio-net.c
>> +++ b/hw/virtio-net.c
>> @@ -21,7 +21,7 @@
>>  #include "virtio-net.h"
>>  #include "vhost_net.h"
>>  
>> -#define VIRTIO_NET_VM_VERSION    11
>> +#define VIRTIO_NET_VM_VERSION    12
> Please don't, use a subsection instead.

Ok, but virtio-net is not converted to VMState, so we can just emulate
the subsection.
>>  #define MAC_TABLE_ENTRIES    64
>>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
>> @@ -1058,16 +1058,18 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl)
>>  
>>  static void virtio_net_save(QEMUFile *f, void *opaque)
>>  {
>> +    int i;
>>      VirtIONet *n = opaque;
>> -    VirtIONetQueue *q = &n->vqs[0];
>>  
>> -    /* At this point, backend must be stopped, otherwise
>> -     * it might keep writing to memory. */
>> -    assert(!q->vhost_started);
>> +    for (i = 0; i < n->max_queues; i++) {
>> +        /* At this point, backend must be stopped, otherwise
>> +         * it might keep writing to memory. */
>> +        assert(!n->vqs[i].vhost_started);
>> +    }
>>      virtio_save(&n->vdev, f);
>>  
>>      qemu_put_buffer(f, n->mac, ETH_ALEN);
>> -    qemu_put_be32(f, q->tx_waiting);
>> +    qemu_put_be32(f, n->vqs[0].tx_waiting);
>>      qemu_put_be32(f, n->mergeable_rx_bufs);
>>      qemu_put_be16(f, n->status);
>>      qemu_put_byte(f, n->promisc);
>> @@ -1083,13 +1085,17 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
>>      qemu_put_byte(f, n->nouni);
>>      qemu_put_byte(f, n->nobcast);
>>      qemu_put_byte(f, n->has_ufo);
>> +    qemu_put_be16(f, n->max_queues);
> Above is specified by user so seems unnecessary in the migration stream.

It is used to prevent the following case:

Move a from a 4q to 2q with 1q active, if we don't do this, after
migration guest may still think it can have 4q.
> Below should only be put if relevant: check host feature bit
> set and/or max_queues > 1.

Right.
>
>> +    qemu_put_be16(f, n->curr_queues);
>> +    for (i = 1; i < n->curr_queues; i++) {
>> +        qemu_put_be32(f, n->vqs[i].tx_waiting);
>> +    }
>>  }
>>  
>>  static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>  {
>>      VirtIONet *n = opaque;
>> -    VirtIONetQueue *q = &n->vqs[0];
>> -    int ret, i;
>> +    int ret, i, link_down;
>>  
>>      if (version_id < 2 || version_id > VIRTIO_NET_VM_VERSION)
>>          return -EINVAL;
>> @@ -1100,7 +1106,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>      }
>>  
>>      qemu_get_buffer(f, n->mac, ETH_ALEN);
>> -    q->tx_waiting = qemu_get_be32(f);
>> +    n->vqs[0].tx_waiting = qemu_get_be32(f);
>>  
>>      virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f));
>>  
>> @@ -1170,6 +1176,22 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>          }
>>      }
>>  
>> +    if (version_id >= 12) {
>> +        if (n->max_queues != qemu_get_be16(f)) {
>> +            error_report("virtio-net: different max_queues ");
>> +            return -1;
>> +        }
>> +
>> +        n->curr_queues = qemu_get_be16(f);
>> +        for (i = 1; i < n->curr_queues; i++) {
>> +            n->vqs[i].tx_waiting = qemu_get_be32(f);
>> +        }
>> +    }
>> +
>> +    virtio_net_set_queues(n);
>> +    /* Must do this again, since we may have more than one active queues. */
> s/queues/queue/
>
> Also I didn't understand why it's here.
> It seems that virtio has vm running callback,
> and that will invoke virtio_net_set_status after vm load.
> No?

True, will remove it next version.

Thanks
>
>> +    virtio_net_set_status(&n->vdev, n->status);
>> +
>>      /* Find the first multicast entry in the saved MAC filter */
>>      for (i = 0; i < n->mac_table.in_use; i++) {
>>          if (n->mac_table.macs[i * ETH_ALEN] & 1) {
>> @@ -1180,7 +1202,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>  
>>      /* nc.link_down can't be migrated, so infer link_down according
>>       * to link status bit in n->status */
>> -    qemu_get_queue(n->nic)->link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
>> +    link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
>> +    for (i = 0; i < n->max_queues; i++) {
>> +        qemu_get_subqueue(n->nic, i)->link_down = link_down;
>> +    }
>>  
>>      return 0;
>>  }
>> -- 
>> 1.7.1

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 07/12] virtio: introduce virtio_queue_del()
  2013-01-08  7:14   ` Michael S. Tsirkin
@ 2013-01-08  9:28     ` Jason Wang
  0 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2013-01-08  9:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, stefanha, qemu-devel, rusty, kvm, mprivozn, shiyer,
	krkumar2, jwhan

On 01/08/2013 03:14 PM, Michael S. Tsirkin wrote:
> On Fri, Dec 28, 2012 at 06:31:59PM +0800, Jason Wang wrote:
>> Some device (such as virtio-net) needs the ability to destroy or re-order the
>> virtqueues, this patch adds a helper to do this.
>>
>> Signed-off-by: Jason Wang <jasowang>
> Actually del_queue unlike what the subject says :)

Oh, yes, will correct this.
>
>> ---
>>  hw/virtio.c |    9 +++++++++
>>  hw/virtio.h |    2 ++
>>  2 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/hw/virtio.c b/hw/virtio.c
>> index f40a8c5..bc3c9c3 100644
>> --- a/hw/virtio.c
>> +++ b/hw/virtio.c
>> @@ -700,6 +700,15 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>      return &vdev->vq[i];
>>  }
>>  
>> +void virtio_del_queue(VirtIODevice *vdev, int n)
>> +{
>> +    if (n < 0 || n >= VIRTIO_PCI_QUEUE_MAX) {
>> +        abort();
>> +    }
>> +
>> +    vdev->vq[n].vring.num = 0;
>> +}
>> +
>>  void virtio_irq(VirtQueue *vq)
>>  {
>>      trace_virtio_irq(vq);
>> diff --git a/hw/virtio.h b/hw/virtio.h
>> index 7c17f7b..f6cb0f9 100644
>> --- a/hw/virtio.h
>> +++ b/hw/virtio.h
>> @@ -138,6 +138,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>                              void (*handle_output)(VirtIODevice *,
>>                                                    VirtQueue *));
>>  
>> +void virtio_del_queue(VirtIODevice *vdev, int n);
>> +
>>  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>>                      unsigned int len);
>>  void virtqueue_flush(VirtQueue *vq, unsigned int count);
>> -- 
>> 1.7.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/12] virtio-net: multiqueue support
  2013-01-08  9:07   ` [Qemu-devel] " Wanlong Gao
@ 2013-01-08  9:29     ` Jason Wang
  2013-01-08  9:32       ` [Qemu-devel] " Wanlong Gao
  2013-01-08  9:49       ` Wanlong Gao
  0 siblings, 2 replies; 58+ messages in thread
From: Jason Wang @ 2013-01-08  9:29 UTC (permalink / raw)
  To: gaowanlong
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/08/2013 05:07 PM, Wanlong Gao wrote:
> On 12/28/2012 06:32 PM, Jason Wang wrote:
>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>> +        ret = -1;
>> +    } else {
>> +        ret = tap_detach(nc->peer);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static void virtio_net_set_queues(VirtIONet *n)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < n->max_queues; i++) {
>> +        if (i < n->curr_queues) {
>> +            assert(!peer_attach(n, i));
>> +        } else {
>> +            assert(!peer_detach(n, i));
> I got a assert here,
> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>
> Any thoughts?
>
> Thanks,
> Wanlong Gao

Thanks for the testing, which steps or cases did you met this assertion,
migration, reboot or just changing the number of virtqueues?

>> +        }
>> +    }
>> +}
>> +
>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>> +
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-08  9:29     ` Jason Wang
@ 2013-01-08  9:32       ` Wanlong Gao
  2013-01-08  9:49       ` Wanlong Gao
  1 sibling, 0 replies; 58+ messages in thread
From: Wanlong Gao @ 2013-01-08  9:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, aliguori, stefanha, qemu-devel, krkumar2, kvm, mprivozn,
	rusty, jwhan, shiyer

On 01/08/2013 05:29 PM, Jason Wang wrote:
> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>> +        ret = -1;
>>> +    } else {
>>> +        ret = tap_detach(nc->peer);
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static void virtio_net_set_queues(VirtIONet *n)
>>> +{
>>> +    int i;
>>> +
>>> +    for (i = 0; i < n->max_queues; i++) {
>>> +        if (i < n->curr_queues) {
>>> +            assert(!peer_attach(n, i));
>>> +        } else {
>>> +            assert(!peer_detach(n, i));
>> I got a assert here,
>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>
>> Any thoughts?
>>
>> Thanks,
>> Wanlong Gao
> 
> Thanks for the testing, which steps or cases did you met this assertion,
> migration, reboot or just changing the number of virtqueues?

It may because my host doesn't support muti-tap, I'll try with the upstream kernel again.

Thanks,
Wanlong Gao

> 
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>> +
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-08  9:29     ` Jason Wang
  2013-01-08  9:32       ` [Qemu-devel] " Wanlong Gao
@ 2013-01-08  9:49       ` Wanlong Gao
  2013-01-08  9:51         ` Jason Wang
  1 sibling, 1 reply; 58+ messages in thread
From: Wanlong Gao @ 2013-01-08  9:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, aliguori, stefanha, qemu-devel, krkumar2, kvm, mprivozn,
	rusty, jwhan, shiyer

On 01/08/2013 05:29 PM, Jason Wang wrote:
> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>> +        ret = -1;
>>> +    } else {
>>> +        ret = tap_detach(nc->peer);
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static void virtio_net_set_queues(VirtIONet *n)
>>> +{
>>> +    int i;
>>> +
>>> +    for (i = 0; i < n->max_queues; i++) {
>>> +        if (i < n->curr_queues) {
>>> +            assert(!peer_attach(n, i));
>>> +        } else {
>>> +            assert(!peer_detach(n, i));
>> I got a assert here,
>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>
>> Any thoughts?
>>
>> Thanks,
>> Wanlong Gao
> 
> Thanks for the testing, which steps or cases did you met this assertion,
> migration, reboot or just changing the number of virtqueues?

I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.

I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3

I pre-opened two tap fds, did I missing something?

Thanks,
Wanlong Gao

> 
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>> +
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/12] virtio-net: multiqueue support
  2013-01-08  9:49       ` Wanlong Gao
@ 2013-01-08  9:51         ` Jason Wang
  2013-01-08 10:00           ` [Qemu-devel] " Wanlong Gao
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-08  9:51 UTC (permalink / raw)
  To: gaowanlong
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/08/2013 05:49 PM, Wanlong Gao wrote:
> On 01/08/2013 05:29 PM, Jason Wang wrote:
>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>>> +        ret = -1;
>>>> +    } else {
>>>> +        ret = tap_detach(nc->peer);
>>>> +    }
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void virtio_net_set_queues(VirtIONet *n)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < n->max_queues; i++) {
>>>> +        if (i < n->curr_queues) {
>>>> +            assert(!peer_attach(n, i));
>>>> +        } else {
>>>> +            assert(!peer_detach(n, i));
>>> I got a assert here,
>>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>>
>>> Any thoughts?
>>>
>>> Thanks,
>>> Wanlong Gao
>> Thanks for the testing, which steps or cases did you met this assertion,
>> migration, reboot or just changing the number of virtqueues?
> I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.
>
> I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
>
> I pre-opened two tap fds, did I missing something?

Nothing missed :) It should work.

Could you please try not use fd=X and let qemu to create the file
descriptors by itself? Btw, how did you create the two tap fds?

Thanks
>
> Thanks,
> Wanlong Gao
>
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>>> +
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-08  9:51         ` Jason Wang
@ 2013-01-08 10:00           ` Wanlong Gao
  2013-01-08 10:14             ` Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Wanlong Gao @ 2013-01-08 10:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/08/2013 05:51 PM, Jason Wang wrote:
> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>> +        ret = -1;
>>>>> +    } else {
>>>>> +        ret = tap_detach(nc->peer);
>>>>> +    }
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +static void virtio_net_set_queues(VirtIONet *n)
>>>>> +{
>>>>> +    int i;
>>>>> +
>>>>> +    for (i = 0; i < n->max_queues; i++) {
>>>>> +        if (i < n->curr_queues) {
>>>>> +            assert(!peer_attach(n, i));
>>>>> +        } else {
>>>>> +            assert(!peer_detach(n, i));
>>>> I got a assert here,
>>>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>>>
>>>> Any thoughts?
>>>>
>>>> Thanks,
>>>> Wanlong Gao
>>> Thanks for the testing, which steps or cases did you met this assertion,
>>> migration, reboot or just changing the number of virtqueues?
>> I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.
>>
>> I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
>>
>> I pre-opened two tap fds, did I missing something?
> 
> Nothing missed :) It should work.
> 
> Could you please try not use fd=X and let qemu to create the file
> descriptors by itself? Btw, how did you create the two tap fds?

Can it create descriptors itself? I get 
qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized

I create the tap fd like this, and dup create the second fd, third fd, right?

	int tap_fd = open("/dev/net/tun", O_RDWR);
	int vhost_fd = open("/dev/vhost-net", O_RDWR);
	char *tap_name = "tap";
	char cmd[2048];
	char brctl[256];
	char netup[256];
	struct ifreq ifr;
	if (tap_fd < 0) {
		printf("open tun device failed\n");
		return -1;
	}
	if (vhost_fd < 0) {
		printf("open vhost-net device failed\n");
		return -1;
	}
	memset(&ifr, 0, sizeof(ifr));
	memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;

	/*
	 * setup tap net device
	 */
	if (ioctl(tap_fd, TUNSETIFF, &ifr) < 0) {
		printf("setup tap net device failed\n");
		return -1;
	}

	sprintf(brctl, "brctl addif virbr0 %s", tap_name);
	sprintf(netup, "ifconfig %s up", tap_name);
	system(brctl);
	system(netup);

Thanks,
Wanlong Gao


> 
> Thanks
>>
>> Thanks,
>> Wanlong Gao
>>
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>>>> +
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/12] virtio-net: multiqueue support
  2013-01-08 10:00           ` [Qemu-devel] " Wanlong Gao
@ 2013-01-08 10:14             ` Jason Wang
  2013-01-08 11:24               ` [Qemu-devel] " Wanlong Gao
  2013-01-09  8:23               ` Wanlong Gao
  0 siblings, 2 replies; 58+ messages in thread
From: Jason Wang @ 2013-01-08 10:14 UTC (permalink / raw)
  To: gaowanlong
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/08/2013 06:00 PM, Wanlong Gao wrote:
> On 01/08/2013 05:51 PM, Jason Wang wrote:
>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>> +        ret = -1;
>>>>>> +    } else {
>>>>>> +        ret = tap_detach(nc->peer);
>>>>>> +    }
>>>>>> +
>>>>>> +    return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static void virtio_net_set_queues(VirtIONet *n)
>>>>>> +{
>>>>>> +    int i;
>>>>>> +
>>>>>> +    for (i = 0; i < n->max_queues; i++) {
>>>>>> +        if (i < n->curr_queues) {
>>>>>> +            assert(!peer_attach(n, i));
>>>>>> +        } else {
>>>>>> +            assert(!peer_detach(n, i));
>>>>> I got a assert here,
>>>>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Thanks,
>>>>> Wanlong Gao
>>>> Thanks for the testing, which steps or cases did you met this assertion,
>>>> migration, reboot or just changing the number of virtqueues?
>>> I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.
>>>
>>> I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
>>>
>>> I pre-opened two tap fds, did I missing something?
>> Nothing missed :) It should work.
>>
>> Could you please try not use fd=X and let qemu to create the file
>> descriptors by itself? Btw, how did you create the two tap fds?
> Can it create descriptors itself? I get 
> qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized

You need prepare an ifup script which default at /etc/qemu-ifup (like
following). Or you may try to add a script=no after:

#!/bin/sh

switch=kvmbr0

/sbin/ifconfig $1 0.0.0.0 up
/usr/sbin/brctl addif $switch $1
/usr/sbin/brctl stp $switch off

This will let qemu create a tap fd itself and make it to be connected to
a port of the bridge caled kvmbr0.
>
> I create the tap fd like this, and dup create the second fd, third fd, right?

The second and third fd should be created with TUNSETIFF with the same
tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell
the kernel you want to create a multiqueue tap device, otherwise the
second and third calling of TUNSETIFF will fail.

Thanks
>
> 	int tap_fd = open("/dev/net/tun", O_RDWR);
> 	int vhost_fd = open("/dev/vhost-net", O_RDWR);
> 	char *tap_name = "tap";
> 	char cmd[2048];
> 	char brctl[256];
> 	char netup[256];
> 	struct ifreq ifr;
> 	if (tap_fd < 0) {
> 		printf("open tun device failed\n");
> 		return -1;
> 	}
> 	if (vhost_fd < 0) {
> 		printf("open vhost-net device failed\n");
> 		return -1;
> 	}
> 	memset(&ifr, 0, sizeof(ifr));
> 	memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
> 	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
>
> 	/*
> 	 * setup tap net device
> 	 */
> 	if (ioctl(tap_fd, TUNSETIFF, &ifr) < 0) {
> 		printf("setup tap net device failed\n");
> 		return -1;
> 	}
>
> 	sprintf(brctl, "brctl addif virbr0 %s", tap_name);
> 	sprintf(netup, "ifconfig %s up", tap_name);
> 	system(brctl);
> 	system(netup);
>
> Thanks,
> Wanlong Gao
>
>
>> Thanks
>>> Thanks,
>>> Wanlong Gao
>>>
>>>>>> +        }
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>>>>> +
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-08 10:14             ` Jason Wang
@ 2013-01-08 11:24               ` Wanlong Gao
  2013-01-09  3:11                 ` Jason Wang
  2013-01-09  8:23               ` Wanlong Gao
  1 sibling, 1 reply; 58+ messages in thread
From: Wanlong Gao @ 2013-01-08 11:24 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/08/2013 06:14 PM, Jason Wang wrote:
> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
>> On 01/08/2013 05:51 PM, Jason Wang wrote:
>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>>> +        ret = -1;
>>>>>>> +    } else {
>>>>>>> +        ret = tap_detach(nc->peer);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void virtio_net_set_queues(VirtIONet *n)
>>>>>>> +{
>>>>>>> +    int i;
>>>>>>> +
>>>>>>> +    for (i = 0; i < n->max_queues; i++) {
>>>>>>> +        if (i < n->curr_queues) {
>>>>>>> +            assert(!peer_attach(n, i));
>>>>>>> +        } else {
>>>>>>> +            assert(!peer_detach(n, i));
>>>>>> I got a assert here,
>>>>>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>>>>>
>>>>>> Any thoughts?
>>>>>>
>>>>>> Thanks,
>>>>>> Wanlong Gao
>>>>> Thanks for the testing, which steps or cases did you met this assertion,
>>>>> migration, reboot or just changing the number of virtqueues?
>>>> I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.
>>>>
>>>> I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
>>>>
>>>> I pre-opened two tap fds, did I missing something?
>>> Nothing missed :) It should work.
>>>
>>> Could you please try not use fd=X and let qemu to create the file
>>> descriptors by itself? Btw, how did you create the two tap fds?
>> Can it create descriptors itself? I get 
>> qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized
> 
> You need prepare an ifup script which default at /etc/qemu-ifup (like
> following). Or you may try to add a script=no after:
> 
> #!/bin/sh
> 
> switch=kvmbr0
> 
> /sbin/ifconfig $1 0.0.0.0 up
> /usr/sbin/brctl addif $switch $1
> /usr/sbin/brctl stp $switch off
> 
> This will let qemu create a tap fd itself and make it to be connected to
> a port of the bridge caled kvmbr0.
>>
>> I create the tap fd like this, and dup create the second fd, third fd, right?
> 
> The second and third fd should be created with TUNSETIFF with the same
> tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell
> the kernel you want to create a multiqueue tap device, otherwise the
> second and third calling of TUNSETIFF will fail.

Thank you for teaching me, I'll try it tomorrow.

Regards,
Wanlong Gao

> 
> Thanks
>>
>> 	int tap_fd = open("/dev/net/tun", O_RDWR);
>> 	int vhost_fd = open("/dev/vhost-net", O_RDWR);
>> 	char *tap_name = "tap";
>> 	char cmd[2048];
>> 	char brctl[256];
>> 	char netup[256];
>> 	struct ifreq ifr;
>> 	if (tap_fd < 0) {
>> 		printf("open tun device failed\n");
>> 		return -1;
>> 	}
>> 	if (vhost_fd < 0) {
>> 		printf("open vhost-net device failed\n");
>> 		return -1;
>> 	}
>> 	memset(&ifr, 0, sizeof(ifr));
>> 	memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
>> 	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
>>
>> 	/*
>> 	 * setup tap net device
>> 	 */
>> 	if (ioctl(tap_fd, TUNSETIFF, &ifr) < 0) {
>> 		printf("setup tap net device failed\n");
>> 		return -1;
>> 	}
>>
>> 	sprintf(brctl, "brctl addif virbr0 %s", tap_name);
>> 	sprintf(netup, "ifconfig %s up", tap_name);
>> 	system(brctl);
>> 	system(netup);
>>
>> Thanks,
>> Wanlong Gao
>>
>>
>>> Thanks
>>>> Thanks,
>>>> Wanlong Gao
>>>>
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>>>>>> +
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-08 11:24               ` [Qemu-devel] " Wanlong Gao
@ 2013-01-09  3:11                 ` Jason Wang
  0 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2013-01-09  3:11 UTC (permalink / raw)
  To: gaowanlong
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/08/2013 07:24 PM, Wanlong Gao wrote:
> On 01/08/2013 06:14 PM, Jason Wang wrote:
>> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
>>> On 01/08/2013 05:51 PM, Jason Wang wrote:
>>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>>>> +        ret = -1;
>>>>>>>> +    } else {
>>>>>>>> +        ret = tap_detach(nc->peer);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void virtio_net_set_queues(VirtIONet *n)
>>>>>>>> +{
>>>>>>>> +    int i;
>>>>>>>> +
>>>>>>>> +    for (i = 0; i < n->max_queues; i++) {
>>>>>>>> +        if (i < n->curr_queues) {
>>>>>>>> +            assert(!peer_attach(n, i));
>>>>>>>> +        } else {
>>>>>>>> +            assert(!peer_detach(n, i));
>>>>>>> I got a assert here,
>>>>>>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>>>>>>
>>>>>>> Any thoughts?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Wanlong Gao
>>>>>> Thanks for the testing, which steps or cases did you met this assertion,
>>>>>> migration, reboot or just changing the number of virtqueues?
>>>>> I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.
>>>>>
>>>>> I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
>>>>>
>>>>> I pre-opened two tap fds, did I missing something?
>>>> Nothing missed :) It should work.
>>>>
>>>> Could you please try not use fd=X and let qemu to create the file
>>>> descriptors by itself? Btw, how did you create the two tap fds?
>>> Can it create descriptors itself? I get 
>>> qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized
>> You need prepare an ifup script which default at /etc/qemu-ifup (like
>> following). Or you may try to add a script=no after:
>>
>> #!/bin/sh
>>
>> switch=kvmbr0
>>
>> /sbin/ifconfig $1 0.0.0.0 up
>> /usr/sbin/brctl addif $switch $1
>> /usr/sbin/brctl stp $switch off
>>
>> This will let qemu create a tap fd itself and make it to be connected to
>> a port of the bridge caled kvmbr0.
>>> I create the tap fd like this, and dup create the second fd, third fd, right?
>> The second and third fd should be created with TUNSETIFF with the same
>> tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell
>> the kernel you want to create a multiqueue tap device, otherwise the
>> second and third calling of TUNSETIFF will fail.
> Thank you for teaching me, I'll try it tomorrow.
>
> Regards,
> Wanlong Gao

Thanks, the API of multiqueue should be documented in
Documentation/networking/tuntap.txt. It's in my TODO list.
>
>> Thanks
>>> 	int tap_fd = open("/dev/net/tun", O_RDWR);
>>> 	int vhost_fd = open("/dev/vhost-net", O_RDWR);
>>> 	char *tap_name = "tap";
>>> 	char cmd[2048];
>>> 	char brctl[256];
>>> 	char netup[256];
>>> 	struct ifreq ifr;
>>> 	if (tap_fd < 0) {
>>> 		printf("open tun device failed\n");
>>> 		return -1;
>>> 	}
>>> 	if (vhost_fd < 0) {
>>> 		printf("open vhost-net device failed\n");
>>> 		return -1;
>>> 	}
>>> 	memset(&ifr, 0, sizeof(ifr));
>>> 	memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
>>> 	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
>>>
>>> 	/*
>>> 	 * setup tap net device
>>> 	 */
>>> 	if (ioctl(tap_fd, TUNSETIFF, &ifr) < 0) {
>>> 		printf("setup tap net device failed\n");
>>> 		return -1;
>>> 	}
>>>
>>> 	sprintf(brctl, "brctl addif virbr0 %s", tap_name);
>>> 	sprintf(netup, "ifconfig %s up", tap_name);
>>> 	system(brctl);
>>> 	system(netup);
>>>
>>> Thanks,
>>> Wanlong Gao
>>>
>>>
>>>> Thanks
>>>>> Thanks,
>>>>> Wanlong Gao
>>>>>
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>>>>>>> +
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-08 10:14             ` Jason Wang
  2013-01-08 11:24               ` [Qemu-devel] " Wanlong Gao
@ 2013-01-09  8:23               ` Wanlong Gao
  2013-01-09  9:30                 ` Jason Wang
  1 sibling, 1 reply; 58+ messages in thread
From: Wanlong Gao @ 2013-01-09  8:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/08/2013 06:14 PM, Jason Wang wrote:
> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
>> On 01/08/2013 05:51 PM, Jason Wang wrote:
>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>>> +        ret = -1;
>>>>>>> +    } else {
>>>>>>> +        ret = tap_detach(nc->peer);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void virtio_net_set_queues(VirtIONet *n)
>>>>>>> +{
>>>>>>> +    int i;
>>>>>>> +
>>>>>>> +    for (i = 0; i < n->max_queues; i++) {
>>>>>>> +        if (i < n->curr_queues) {
>>>>>>> +            assert(!peer_attach(n, i));
>>>>>>> +        } else {
>>>>>>> +            assert(!peer_detach(n, i));
>>>>>> I got a assert here,
>>>>>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>>>>>
>>>>>> Any thoughts?
>>>>>>
>>>>>> Thanks,
>>>>>> Wanlong Gao
>>>>> Thanks for the testing, which steps or cases did you met this assertion,
>>>>> migration, reboot or just changing the number of virtqueues?
>>>> I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.
>>>>
>>>> I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
>>>>
>>>> I pre-opened two tap fds, did I missing something?
>>> Nothing missed :) It should work.
>>>
>>> Could you please try not use fd=X and let qemu to create the file
>>> descriptors by itself? Btw, how did you create the two tap fds?
>> Can it create descriptors itself? I get 
>> qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized
> 
> You need prepare an ifup script which default at /etc/qemu-ifup (like
> following). Or you may try to add a script=no after:
> 
> #!/bin/sh
> 
> switch=kvmbr0
> 
> /sbin/ifconfig $1 0.0.0.0 up
> /usr/sbin/brctl addif $switch $1
> /usr/sbin/brctl stp $switch off
> 
> This will let qemu create a tap fd itself and make it to be connected to
> a port of the bridge caled kvmbr0.

But how to support multi-queue in this way?
I got guest kernel panic when using this way and set queues=4.

Thanks,
Wanlong Gao

>>
>> I create the tap fd like this, and dup create the second fd, third fd, right?
> 
> The second and third fd should be created with TUNSETIFF with the same
> tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell
> the kernel you want to create a multiqueue tap device, otherwise the
> second and third calling of TUNSETIFF will fail.
> 
> Thanks
>>
>> 	int tap_fd = open("/dev/net/tun", O_RDWR);
>> 	int vhost_fd = open("/dev/vhost-net", O_RDWR);
>> 	char *tap_name = "tap";
>> 	char cmd[2048];
>> 	char brctl[256];
>> 	char netup[256];
>> 	struct ifreq ifr;
>> 	if (tap_fd < 0) {
>> 		printf("open tun device failed\n");
>> 		return -1;
>> 	}
>> 	if (vhost_fd < 0) {
>> 		printf("open vhost-net device failed\n");
>> 		return -1;
>> 	}
>> 	memset(&ifr, 0, sizeof(ifr));
>> 	memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
>> 	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
>>
>> 	/*
>> 	 * setup tap net device
>> 	 */
>> 	if (ioctl(tap_fd, TUNSETIFF, &ifr) < 0) {
>> 		printf("setup tap net device failed\n");
>> 		return -1;
>> 	}
>>
>> 	sprintf(brctl, "brctl addif virbr0 %s", tap_name);
>> 	sprintf(netup, "ifconfig %s up", tap_name);
>> 	system(brctl);
>> 	system(netup);
>>
>> Thanks,
>> Wanlong Gao
>>
>>
>>> Thanks
>>>> Thanks,
>>>> Wanlong Gao
>>>>
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>>>>>> +
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/12] virtio-net: multiqueue support
  2013-01-09  8:23               ` Wanlong Gao
@ 2013-01-09  9:30                 ` Jason Wang
  2013-01-09 10:01                   ` [Qemu-devel] " Wanlong Gao
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-09  9:30 UTC (permalink / raw)
  To: gaowanlong
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/09/2013 04:23 PM, Wanlong Gao wrote:
> On 01/08/2013 06:14 PM, Jason Wang wrote:
>> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
>>> On 01/08/2013 05:51 PM, Jason Wang wrote:
>>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>>>> +        ret = -1;
>>>>>>>> +    } else {
>>>>>>>> +        ret = tap_detach(nc->peer);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void virtio_net_set_queues(VirtIONet *n)
>>>>>>>> +{
>>>>>>>> +    int i;
>>>>>>>> +
>>>>>>>> +    for (i = 0; i < n->max_queues; i++) {
>>>>>>>> +        if (i < n->curr_queues) {
>>>>>>>> +            assert(!peer_attach(n, i));
>>>>>>>> +        } else {
>>>>>>>> +            assert(!peer_detach(n, i));
>>>>>>> I got a assert here,
>>>>>>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>>>>>>
>>>>>>> Any thoughts?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Wanlong Gao
>>>>>> Thanks for the testing, which steps or cases did you met this assertion,
>>>>>> migration, reboot or just changing the number of virtqueues?
>>>>> I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.
>>>>>
>>>>> I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
>>>>>
>>>>> I pre-opened two tap fds, did I missing something?
>>>> Nothing missed :) It should work.
>>>>
>>>> Could you please try not use fd=X and let qemu to create the file
>>>> descriptors by itself? Btw, how did you create the two tap fds?
>>> Can it create descriptors itself? I get 
>>> qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized
>> You need prepare an ifup script which default at /etc/qemu-ifup (like
>> following). Or you may try to add a script=no after:
>>
>> #!/bin/sh
>>
>> switch=kvmbr0
>>
>> /sbin/ifconfig $1 0.0.0.0 up
>> /usr/sbin/brctl addif $switch $1
>> /usr/sbin/brctl stp $switch off
>>
>> This will let qemu create a tap fd itself and make it to be connected to
>> a port of the bridge caled kvmbr0.
> But how to support multi-queue in this way?

Qemu will create the necessary multiqueue tap by itself, see patch 0/12.
> I got guest kernel panic when using this way and set queues=4.

Does it happens w/o or w/ a fd parameter? What's the qemu command line?
Did you meet it during boot time?

Thanks
>
> Thanks,
> Wanlong Gao
>
>>> I create the tap fd like this, and dup create the second fd, third fd, right?
>> The second and third fd should be created with TUNSETIFF with the same
>> tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell
>> the kernel you want to create a multiqueue tap device, otherwise the
>> second and third calling of TUNSETIFF will fail.
>>
>> Thanks
>>> 	int tap_fd = open("/dev/net/tun", O_RDWR);
>>> 	int vhost_fd = open("/dev/vhost-net", O_RDWR);
>>> 	char *tap_name = "tap";
>>> 	char cmd[2048];
>>> 	char brctl[256];
>>> 	char netup[256];
>>> 	struct ifreq ifr;
>>> 	if (tap_fd < 0) {
>>> 		printf("open tun device failed\n");
>>> 		return -1;
>>> 	}
>>> 	if (vhost_fd < 0) {
>>> 		printf("open vhost-net device failed\n");
>>> 		return -1;
>>> 	}
>>> 	memset(&ifr, 0, sizeof(ifr));
>>> 	memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
>>> 	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
>>>
>>> 	/*
>>> 	 * setup tap net device
>>> 	 */
>>> 	if (ioctl(tap_fd, TUNSETIFF, &ifr) < 0) {
>>> 		printf("setup tap net device failed\n");
>>> 		return -1;
>>> 	}
>>>
>>> 	sprintf(brctl, "brctl addif virbr0 %s", tap_name);
>>> 	sprintf(netup, "ifconfig %s up", tap_name);
>>> 	system(brctl);
>>> 	system(netup);
>>>
>>> Thanks,
>>> Wanlong Gao
>>>
>>>
>>>> Thanks
>>>>> Thanks,
>>>>> Wanlong Gao
>>>>>
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>>>>>>> +
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/12] tap: multiqueue support
  2012-12-28 10:31 ` [PATCH 01/12] tap: multiqueue support Jason Wang
@ 2013-01-09  9:56   ` Stefan Hajnoczi
  2013-01-09 15:25     ` Jason Wang
  2013-01-10 10:28   ` Stefan Hajnoczi
  1 sibling, 1 reply; 58+ messages in thread
From: Stefan Hajnoczi @ 2013-01-09  9:56 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, aliguori, stefanha, qemu-devel, rusty, kvm, mprivozn,
	shiyer, krkumar2, jwhan

On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote:
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 5dfa052..583eb7c 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2465,7 +2465,7 @@
>  { 'type': 'NetdevTapOptions',
>    'data': {
>      '*ifname':     'str',
> -    '*fd':         'str',
> +    '*fd':         ['String'],

This change is not backwards-compatible.  You need to add a '*fds':
['String'] field instead.

>      '*script':     'str',
>      '*downscript': 'str',
>      '*helper':     'str',
> @@ -2473,7 +2473,8 @@
>      '*vnet_hdr':   'bool',
>      '*vhost':      'bool',
>      '*vhostfd':    'str',
> -    '*vhostforce': 'bool' } }
> +    '*vhostforce': 'bool',
> +    '*queues':     'uint32'} }

The 'queues' parameter should not be necessary when fd passing is used
since we can learn the number of queues by looking at the list length.

Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-09  9:30                 ` Jason Wang
@ 2013-01-09 10:01                   ` Wanlong Gao
  2013-01-09 15:26                     ` Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Wanlong Gao @ 2013-01-09 10:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer, Wanlong Gao

On 01/09/2013 05:30 PM, Jason Wang wrote:
> On 01/09/2013 04:23 PM, Wanlong Gao wrote:
>> On 01/08/2013 06:14 PM, Jason Wang wrote:
>>> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
>>>> On 01/08/2013 05:51 PM, Jason Wang wrote:
>>>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>>>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>>>>> +        ret = -1;
>>>>>>>>> +    } else {
>>>>>>>>> +        ret = tap_detach(nc->peer);
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    return ret;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void virtio_net_set_queues(VirtIONet *n)
>>>>>>>>> +{
>>>>>>>>> +    int i;
>>>>>>>>> +
>>>>>>>>> +    for (i = 0; i < n->max_queues; i++) {
>>>>>>>>> +        if (i < n->curr_queues) {
>>>>>>>>> +            assert(!peer_attach(n, i));
>>>>>>>>> +        } else {
>>>>>>>>> +            assert(!peer_detach(n, i));
>>>>>>>> I got a assert here,
>>>>>>>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>>>>>>>
>>>>>>>> Any thoughts?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Wanlong Gao
>>>>>>> Thanks for the testing, which steps or cases did you met this assertion,
>>>>>>> migration, reboot or just changing the number of virtqueues?
>>>>>> I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.
>>>>>>
>>>>>> I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
>>>>>>
>>>>>> I pre-opened two tap fds, did I missing something?
>>>>> Nothing missed :) It should work.
>>>>>
>>>>> Could you please try not use fd=X and let qemu to create the file
>>>>> descriptors by itself? Btw, how did you create the two tap fds?
>>>> Can it create descriptors itself? I get 
>>>> qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized
>>> You need prepare an ifup script which default at /etc/qemu-ifup (like
>>> following). Or you may try to add a script=no after:
>>>
>>> #!/bin/sh
>>>
>>> switch=kvmbr0
>>>
>>> /sbin/ifconfig $1 0.0.0.0 up
>>> /usr/sbin/brctl addif $switch $1
>>> /usr/sbin/brctl stp $switch off
>>>
>>> This will let qemu create a tap fd itself and make it to be connected to
>>> a port of the bridge caled kvmbr0.
>> But how to support multi-queue in this way?
> 
> Qemu will create the necessary multiqueue tap by itself, see patch 0/12.
>> I got guest kernel panic when using this way and set queues=4.
> 
> Does it happens w/o or w/ a fd parameter? What's the qemu command line?
> Did you meet it during boot time?

The QEMU command line is 

/work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15 -enable-kvm -m 3096 \
-smp 4,sockets=4,cores=1,threads=1 \
-uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc -no-shutdown \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-drive file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 \
-drive file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=scsi0-0-2-0,removable=on \
-drive file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=scsi0-0-3-0 \
-drive file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
-drive file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 \
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \
-chardev file,id=charserial1,path=/vm/f17.log \
-device isa-serial,chardev=charserial1,id=serial1 \
-device usb-tablet,id=input0 -vga std \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-netdev tap,id=hostnet0,vhost=on,queues=4 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3 \
-monitor stdio

I got panic just after booting the system, did nothing,  waited for a while, the guest panicked.

[   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
[   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core virtio_scsi virtio_blk floppy
[   28.053004] CPU 1 
[   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs Bochs
[   28.053004] RIP: 0010:[<ffffffff8137a9ab>]  [<ffffffff8137a9ab>] virtqueue_get_buf+0xb/0x120
[   28.053004] RSP: 0018:ffff8800bc913550  EFLAGS: 00000246
[   28.053004] RAX: 0000000000000000 RBX: ffff8800bc49c000 RCX: ffff8800bc49e000
[   28.053004] RDX: 0000000000000000 RSI: ffff8800bc913584 RDI: ffff8800bcfd4000
[   28.053004] RBP: ffff8800bc913558 R08: ffff8800bcfd0800 R09: 0000000000000000
[   28.053004] R10: ffff8800bc49c000 R11: ffff880036cc4de0 R12: ffff8800bcfd4000
[   28.053004] R13: ffff8800bc913558 R14: ffffffff8137ad73 R15: 00000000000200d0
[   28.053004] FS:  00007fb27a589740(0000) GS:ffff8800c1480000(0000) knlGS:0000000000000000
[   28.053004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.053004] CR2: 0000000000640530 CR3: 00000000baeff000 CR4: 00000000000006e0
[   28.053004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   28.053004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   28.053004] Process ip (pid: 592, threadinfo ffff8800bc912000, task ffff880036da2e20)
[   28.053004] Stack:
[   28.053004]  ffff8800bcfd0800 ffff8800bc913638 ffffffffa003e9bb ffff8800bc913656
[   28.053004]  0000000100000002 ffff8800c17ebb08 000000500000ff10 ffffea0002f244c0
[   28.053004]  0000000200000582 0000000000000000 0000000000000000 ffffea0002f244c0
[   28.053004] Call Trace:
[   28.053004]  [<ffffffffa003e9bb>] virtnet_send_command.constprop.26+0x24b/0x270 [virtio_net]
[   28.053004]  [<ffffffff812ed963>] ? sg_init_table+0x23/0x50
[   28.053004]  [<ffffffffa0040629>] virtnet_set_rx_mode+0x99/0x300 [virtio_net]
[   28.053004]  [<ffffffff8152306f>] __dev_set_rx_mode+0x5f/0xb0
[   28.053004]  [<ffffffff815230ef>] dev_set_rx_mode+0x2f/0x50
[   28.053004]  [<ffffffff815231b7>] __dev_open+0xa7/0xf0
[   28.053004]  [<ffffffff81523461>] __dev_change_flags+0xa1/0x180
[   28.053004]  [<ffffffff815235f8>] dev_change_flags+0x28/0x70
[   28.053004]  [<ffffffff8152ff20>] do_setlink+0x3b0/0xa50
[   28.053004]  [<ffffffff812fb6b1>] ? nla_parse+0x31/0xe0
[   28.053004]  [<ffffffff815325de>] rtnl_newlink+0x36e/0x580
[   28.053004]  [<ffffffff811355cc>] ? get_page_from_freelist+0x37c/0x730
[   28.053004]  [<ffffffff81531e13>] rtnetlink_rcv_msg+0x113/0x2f0
[   28.053004]  [<ffffffff8117d973>] ? __kmalloc_node_track_caller+0x63/0x1c0
[   28.053004]  [<ffffffff8151526b>] ? __alloc_skb+0x8b/0x2a0
[   28.053004]  [<ffffffff81531d00>] ? __rtnl_unlock+0x20/0x20
[   28.053004]  [<ffffffff8154b571>] netlink_rcv_skb+0xb1/0xc0
[   28.053004]  [<ffffffff8152ea05>] rtnetlink_rcv+0x25/0x40
[   28.053004]  [<ffffffff8154ae91>] netlink_unicast+0x1a1/0x220
[   28.053004]  [<ffffffff8154b211>] netlink_sendmsg+0x301/0x3c0
[   28.053004]  [<ffffffff81508530>] sock_sendmsg+0xb0/0xe0
[   28.053004]  [<ffffffff8113a45b>] ? lru_cache_add_lru+0x3b/0x60
[   28.053004]  [<ffffffff811608b7>] ? page_add_new_anon_rmap+0xc7/0x180
[   28.053004]  [<ffffffff81509efc>] __sys_sendmsg+0x3ac/0x3c0
[   28.053004]  [<ffffffff8162e47c>] ? __do_page_fault+0x23c/0x4d0
[   28.053004]  [<ffffffff8115c9ef>] ? do_brk+0x1ff/0x370
[   28.053004]  [<ffffffff8150bec9>] sys_sendmsg+0x49/0x90
[   28.053004]  [<ffffffff81632d59>] system_call_fastpath+0x16/0x1b
[   28.053004] Code: 04 0f ae f0 48 8b 47 50 5d 0f b7 50 02 66 39 57 64 0f 94 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 <53> 80 7f 59 00 48 89 fb 0f 85 90 00 00 00 48 8b 47 50 0f b7 50 


The QEMU tree I used is git://github.com/jasowang/qemu.git

Thanks,
Wanlong Gao

> 
> Thanks
>>
>> Thanks,
>> Wanlong Gao
>>
>>>> I create the tap fd like this, and dup create the second fd, third fd, right?
>>> The second and third fd should be created with TUNSETIFF with the same
>>> tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell
>>> the kernel you want to create a multiqueue tap device, otherwise the
>>> second and third calling of TUNSETIFF will fail.
>>>
>>> Thanks
>>>> 	int tap_fd = open("/dev/net/tun", O_RDWR);
>>>> 	int vhost_fd = open("/dev/vhost-net", O_RDWR);
>>>> 	char *tap_name = "tap";
>>>> 	char cmd[2048];
>>>> 	char brctl[256];
>>>> 	char netup[256];
>>>> 	struct ifreq ifr;
>>>> 	if (tap_fd < 0) {
>>>> 		printf("open tun device failed\n");
>>>> 		return -1;
>>>> 	}
>>>> 	if (vhost_fd < 0) {
>>>> 		printf("open vhost-net device failed\n");
>>>> 		return -1;
>>>> 	}
>>>> 	memset(&ifr, 0, sizeof(ifr));
>>>> 	memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
>>>> 	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
>>>>
>>>> 	/*
>>>> 	 * setup tap net device
>>>> 	 */
>>>> 	if (ioctl(tap_fd, TUNSETIFF, &ifr) < 0) {
>>>> 		printf("setup tap net device failed\n");
>>>> 		return -1;
>>>> 	}
>>>>
>>>> 	sprintf(brctl, "brctl addif virbr0 %s", tap_name);
>>>> 	sprintf(netup, "ifconfig %s up", tap_name);
>>>> 	system(brctl);
>>>> 	system(netup);
>>>>
>>>> Thanks,
>>>> Wanlong Gao
>>>>
>>>>
>>>>> Thanks
>>>>>> Thanks,
>>>>>> Wanlong Gao
>>>>>>
>>>>>>>>> +        }
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>>>>>>>> +
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (11 preceding siblings ...)
  2012-12-28 10:32 ` [PATCH 12/12] virtio-net: compat multiqueue support Jason Wang
@ 2013-01-09 14:29 ` Stefan Hajnoczi
  2013-01-09 15:32   ` Michael S. Tsirkin
  2013-01-14 19:44 ` Anthony Liguori
  13 siblings, 1 reply; 58+ messages in thread
From: Stefan Hajnoczi @ 2013-01-09 14:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, aliguori, stefanha, qemu-devel, krkumar2, kvm, mprivozn,
	rusty, jwhan, shiyer

On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> Perf Numbers:
> 
> Two Intel Xeon 5620 with direct connected intel 82599EB
> Host/Guest kernel: David net tree
> vhost enabled
> 
> - lots of improvents of both latency and cpu utilization in request-reponse test
> - get regression of guest sending small packets which because TCP tends to batch
>   less when the latency were improved
> 
> 1q/2q/4q
> TCP_RR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> TCP_CRR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> TCP_STREAM guest receiving
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> TCP_MAERTS guest sending
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
> 1 4     71.59   1      68.91   0.94   61.52   0.77
> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47

Trying to understand the performance results:

What is the host device configuration?  tap + bridge?

Did you use host CPU affinity for the vhost threads?

Can multiqueue tap take advantage of multiqueue host NICs or is
virtio-net multiqueue unaware of the physical NIC multiqueue
capabilities?

The results seem pretty mixed - as a user it's not obvious what to
choose as a good all-round setting.  Any observations on how multiqueue
should be configured?

What is the "norm" statistic?

Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/12] tap: multiqueue support
  2013-01-09  9:56   ` Stefan Hajnoczi
@ 2013-01-09 15:25     ` Jason Wang
  2013-01-10  8:32       ` Stefan Hajnoczi
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-09 15:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/09/2013 05:56 PM, Stefan Hajnoczi wrote:
> On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote:
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 5dfa052..583eb7c 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -2465,7 +2465,7 @@
>>  { 'type': 'NetdevTapOptions',
>>    'data': {
>>      '*ifname':     'str',
>> -    '*fd':         'str',
>> +    '*fd':         ['String'],
> This change is not backwards-compatible.  You need to add a '*fds':
> ['String'] field instead.

I'm not quite understand this case, I think it still work when we we
just specify one fd.
>>      '*script':     'str',
>>      '*downscript': 'str',
>>      '*helper':     'str',
>> @@ -2473,7 +2473,8 @@
>>      '*vnet_hdr':   'bool',
>>      '*vhost':      'bool',
>>      '*vhostfd':    'str',
>> -    '*vhostforce': 'bool' } }
>> +    '*vhostforce': 'bool',
>> +    '*queues':     'uint32'} }
> The 'queues' parameter should not be necessary when fd passing is used
> since we can learn the number of queues by looking at the list length.

Ok.
>
> Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-09 10:01                   ` [Qemu-devel] " Wanlong Gao
@ 2013-01-09 15:26                     ` Jason Wang
  2013-01-10  6:43                       ` Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-09 15:26 UTC (permalink / raw)
  To: gaowanlong
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/09/2013 06:01 PM, Wanlong Gao wrote:
> On 01/09/2013 05:30 PM, Jason Wang wrote:
>> On 01/09/2013 04:23 PM, Wanlong Gao wrote:
>>> On 01/08/2013 06:14 PM, Jason Wang wrote:
>>>> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
>>>>> On 01/08/2013 05:51 PM, Jason Wang wrote:
>>>>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>>>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>>>>>> +    } else if (nc->peer->info->type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>>>>>> +        ret = -1;
>>>>>>>>>> +    } else {
>>>>>>>>>> +        ret = tap_detach(nc->peer);
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    return ret;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static void virtio_net_set_queues(VirtIONet *n)
>>>>>>>>>> +{
>>>>>>>>>> +    int i;
>>>>>>>>>> +
>>>>>>>>>> +    for (i = 0; i < n->max_queues; i++) {
>>>>>>>>>> +        if (i < n->curr_queues) {
>>>>>>>>>> +            assert(!peer_attach(n, i));
>>>>>>>>>> +        } else {
>>>>>>>>>> +            assert(!peer_detach(n, i));
>>>>>>>>> I got a assert here,
>>>>>>>>> qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.
>>>>>>>>>
>>>>>>>>> Any thoughts?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Wanlong Gao
>>>>>>>> Thanks for the testing, which steps or cases did you met this assertion,
>>>>>>>> migration, reboot or just changing the number of virtqueues?
>>>>>>> I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support.
>>>>>>>
>>>>>>> I just can't start the QEMU use  -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
>>>>>>>
>>>>>>> I pre-opened two tap fds, did I missing something?
>>>>>> Nothing missed :) It should work.
>>>>>>
>>>>>> Could you please try not use fd=X and let qemu to create the file
>>>>>> descriptors by itself? Btw, how did you create the two tap fds?
>>>>> Can it create descriptors itself? I get 
>>>>> qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized
>>>> You need prepare an ifup script which default at /etc/qemu-ifup (like
>>>> following). Or you may try to add a script=no after:
>>>>
>>>> #!/bin/sh
>>>>
>>>> switch=kvmbr0
>>>>
>>>> /sbin/ifconfig $1 0.0.0.0 up
>>>> /usr/sbin/brctl addif $switch $1
>>>> /usr/sbin/brctl stp $switch off
>>>>
>>>> This will let qemu create a tap fd itself and make it to be connected to
>>>> a port of the bridge caled kvmbr0.
>>> But how to support multi-queue in this way?
>> Qemu will create the necessary multiqueue tap by itself, see patch 0/12.
>>> I got guest kernel panic when using this way and set queues=4.
>> Does it happens w/o or w/ a fd parameter? What's the qemu command line?
>> Did you meet it during boot time?
> The QEMU command line is 
>
> /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15 -enable-kvm -m 3096 \
> -smp 4,sockets=4,cores=1,threads=1 \
> -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
> -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowait \
> -mon chardev=charmonitor,id=monitor,mode=control \
> -rtc base=utc -no-shutdown \
> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
> -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
> -drive file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 \
> -drive file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=scsi0-0-2-0,removable=on \
> -drive file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=scsi0-0-3-0 \
> -drive file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
> -drive file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 \
> -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \
> -chardev file,id=charserial1,path=/vm/f17.log \
> -device isa-serial,chardev=charserial1,id=serial1 \
> -device usb-tablet,id=input0 -vga std \
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
> -netdev tap,id=hostnet0,vhost=on,queues=4 \
> -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3 \
> -monitor stdio
>
> I got panic just after booting the system, did nothing,  waited for a while, the guest panicked.
>
> [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
> [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core virtio_scsi virtio_blk floppy
> [   28.053004] CPU 1 
> [   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs Bochs
> [   28.053004] RIP: 0010:[<ffffffff8137a9ab>]  [<ffffffff8137a9ab>] virtqueue_get_buf+0xb/0x120
> [   28.053004] RSP: 0018:ffff8800bc913550  EFLAGS: 00000246
> [   28.053004] RAX: 0000000000000000 RBX: ffff8800bc49c000 RCX: ffff8800bc49e000
> [   28.053004] RDX: 0000000000000000 RSI: ffff8800bc913584 RDI: ffff8800bcfd4000
> [   28.053004] RBP: ffff8800bc913558 R08: ffff8800bcfd0800 R09: 0000000000000000
> [   28.053004] R10: ffff8800bc49c000 R11: ffff880036cc4de0 R12: ffff8800bcfd4000
> [   28.053004] R13: ffff8800bc913558 R14: ffffffff8137ad73 R15: 00000000000200d0
> [   28.053004] FS:  00007fb27a589740(0000) GS:ffff8800c1480000(0000) knlGS:0000000000000000
> [   28.053004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   28.053004] CR2: 0000000000640530 CR3: 00000000baeff000 CR4: 00000000000006e0
> [   28.053004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   28.053004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   28.053004] Process ip (pid: 592, threadinfo ffff8800bc912000, task ffff880036da2e20)
> [   28.053004] Stack:
> [   28.053004]  ffff8800bcfd0800 ffff8800bc913638 ffffffffa003e9bb ffff8800bc913656
> [   28.053004]  0000000100000002 ffff8800c17ebb08 000000500000ff10 ffffea0002f244c0
> [   28.053004]  0000000200000582 0000000000000000 0000000000000000 ffffea0002f244c0
> [   28.053004] Call Trace:
> [   28.053004]  [<ffffffffa003e9bb>] virtnet_send_command.constprop.26+0x24b/0x270 [virtio_net]
> [   28.053004]  [<ffffffff812ed963>] ? sg_init_table+0x23/0x50
> [   28.053004]  [<ffffffffa0040629>] virtnet_set_rx_mode+0x99/0x300 [virtio_net]
> [   28.053004]  [<ffffffff8152306f>] __dev_set_rx_mode+0x5f/0xb0
> [   28.053004]  [<ffffffff815230ef>] dev_set_rx_mode+0x2f/0x50
> [   28.053004]  [<ffffffff815231b7>] __dev_open+0xa7/0xf0
> [   28.053004]  [<ffffffff81523461>] __dev_change_flags+0xa1/0x180
> [   28.053004]  [<ffffffff815235f8>] dev_change_flags+0x28/0x70
> [   28.053004]  [<ffffffff8152ff20>] do_setlink+0x3b0/0xa50
> [   28.053004]  [<ffffffff812fb6b1>] ? nla_parse+0x31/0xe0
> [   28.053004]  [<ffffffff815325de>] rtnl_newlink+0x36e/0x580
> [   28.053004]  [<ffffffff811355cc>] ? get_page_from_freelist+0x37c/0x730
> [   28.053004]  [<ffffffff81531e13>] rtnetlink_rcv_msg+0x113/0x2f0
> [   28.053004]  [<ffffffff8117d973>] ? __kmalloc_node_track_caller+0x63/0x1c0
> [   28.053004]  [<ffffffff8151526b>] ? __alloc_skb+0x8b/0x2a0
> [   28.053004]  [<ffffffff81531d00>] ? __rtnl_unlock+0x20/0x20
> [   28.053004]  [<ffffffff8154b571>] netlink_rcv_skb+0xb1/0xc0
> [   28.053004]  [<ffffffff8152ea05>] rtnetlink_rcv+0x25/0x40
> [   28.053004]  [<ffffffff8154ae91>] netlink_unicast+0x1a1/0x220
> [   28.053004]  [<ffffffff8154b211>] netlink_sendmsg+0x301/0x3c0
> [   28.053004]  [<ffffffff81508530>] sock_sendmsg+0xb0/0xe0
> [   28.053004]  [<ffffffff8113a45b>] ? lru_cache_add_lru+0x3b/0x60
> [   28.053004]  [<ffffffff811608b7>] ? page_add_new_anon_rmap+0xc7/0x180
> [   28.053004]  [<ffffffff81509efc>] __sys_sendmsg+0x3ac/0x3c0
> [   28.053004]  [<ffffffff8162e47c>] ? __do_page_fault+0x23c/0x4d0
> [   28.053004]  [<ffffffff8115c9ef>] ? do_brk+0x1ff/0x370
> [   28.053004]  [<ffffffff8150bec9>] sys_sendmsg+0x49/0x90
> [   28.053004]  [<ffffffff81632d59>] system_call_fastpath+0x16/0x1b
> [   28.053004] Code: 04 0f ae f0 48 8b 47 50 5d 0f b7 50 02 66 39 57 64 0f 94 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 <53> 80 7f 59 00 48 89 fb 0f 85 90 00 00 00 48 8b 47 50 0f b7 50 
>
>
> The QEMU tree I used is git://github.com/jasowang/qemu.git

Thanks a lot, will try to reproduce my self tomorrow. From the
calltrace, looks like we send a command to a rx/tx queue.
> Thanks,
> Wanlong Gao
>
>> Thanks
>>> Thanks,
>>> Wanlong Gao
>>>
>>>>> I create the tap fd like this, and dup create the second fd, third fd, right?
>>>> The second and third fd should be created with TUNSETIFF with the same
>>>> tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell
>>>> the kernel you want to create a multiqueue tap device, otherwise the
>>>> second and third calling of TUNSETIFF will fail.
>>>>
>>>> Thanks
>>>>> 	int tap_fd = open("/dev/net/tun", O_RDWR);
>>>>> 	int vhost_fd = open("/dev/vhost-net", O_RDWR);
>>>>> 	char *tap_name = "tap";
>>>>> 	char cmd[2048];
>>>>> 	char brctl[256];
>>>>> 	char netup[256];
>>>>> 	struct ifreq ifr;
>>>>> 	if (tap_fd < 0) {
>>>>> 		printf("open tun device failed\n");
>>>>> 		return -1;
>>>>> 	}
>>>>> 	if (vhost_fd < 0) {
>>>>> 		printf("open vhost-net device failed\n");
>>>>> 		return -1;
>>>>> 	}
>>>>> 	memset(&ifr, 0, sizeof(ifr));
>>>>> 	memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
>>>>> 	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
>>>>>
>>>>> 	/*
>>>>> 	 * setup tap net device
>>>>> 	 */
>>>>> 	if (ioctl(tap_fd, TUNSETIFF, &ifr) < 0) {
>>>>> 		printf("setup tap net device failed\n");
>>>>> 		return -1;
>>>>> 	}
>>>>>
>>>>> 	sprintf(brctl, "brctl addif virbr0 %s", tap_name);
>>>>> 	sprintf(netup, "ifconfig %s up", tap_name);
>>>>> 	system(brctl);
>>>>> 	system(netup);
>>>>>
>>>>> Thanks,
>>>>> Wanlong Gao
>>>>>
>>>>>
>>>>>> Thanks
>>>>>>> Thanks,
>>>>>>> Wanlong Gao
>>>>>>>
>>>>>>>>>> +        }
>>>>>>>>>> +    }
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl);
>>>>>>>>>> +
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
  2013-01-09 14:29 ` [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net Stefan Hajnoczi
@ 2013-01-09 15:32   ` Michael S. Tsirkin
  2013-01-09 15:33     ` Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Michael S. Tsirkin @ 2013-01-09 15:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Jason Wang, aliguori, stefanha, qemu-devel, krkumar2, kvm,
	mprivozn, rusty, jwhan, shiyer

On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> > Perf Numbers:
> > 
> > Two Intel Xeon 5620 with direct connected intel 82599EB
> > Host/Guest kernel: David net tree
> > vhost enabled
> > 
> > - lots of improvents of both latency and cpu utilization in request-reponse test
> > - get regression of guest sending small packets which because TCP tends to batch
> >   less when the latency were improved
> > 
> > 1q/2q/4q
> > TCP_RR
> >  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> > 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
> > 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> > 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> > 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> > 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
> > 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> > 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
> > 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> > 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> > 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> > 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> > 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> > TCP_CRR
> >  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> > 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
> > 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
> > 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
> > 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> > 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
> > 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> > 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> > 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> > 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> > 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> > 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
> > 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> > TCP_STREAM guest receiving
> >  size #sessions throughput  norm throughput  norm throughput  norm
> > 1 1     16.27   1.33   16.1    1.12   16.13   0.99
> > 1 2     33.04   2.08   32.96   2.19   32.75   1.98
> > 1 4     66.62   6.83   68.3    5.56   66.14   2.65
> > 64 1    896.55  56.67  914.02  58.14  898.9   61.56
> > 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
> > 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
> > 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> > 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> > 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> > 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> > 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> > 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> > 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> > 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> > 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> > 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> > 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> > 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> > 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> > 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
> > 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> > TCP_MAERTS guest sending
> >  size #sessions throughput  norm throughput  norm throughput  norm
> > 1 1     15.94   0.62   15.55   0.61   15.13   0.59
> > 1 2     36.11   0.83   32.46   0.69   32.28   0.69
> > 1 4     71.59   1      68.91   0.94   61.52   0.77
> > 64 1    630.71  22.52  622.11  22.35  605.09  21.84
> > 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
> > 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
> > 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> > 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> > 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> > 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> > 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> > 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> > 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> > 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
> > 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> > 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> > 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> > 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> > 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> > 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
> > 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
> 
> Trying to understand the performance results:
> 
> What is the host device configuration?  tap + bridge?
> 
> Did you use host CPU affinity for the vhost threads?
> 
> Can multiqueue tap take advantage of multiqueue host NICs or is
> virtio-net multiqueue unaware of the physical NIC multiqueue
> capabilities?
> 
> The results seem pretty mixed - as a user it's not obvious what to
> choose as a good all-round setting.

Yes, this I think is the reason it's disabled by default ATM,
guest admin has to enable it using ethtool.

>From what I saw, it looks like with a streaming guest to external
benchmark, we sometimes get smaller packets and
so worse performance. We are still investigating - what's
going on seems to be a strange interaction with guest TCP stack.

Other workloads seem to benefit.

>  Any observations on how multiqueue
> should be configured?

I think the right thing to do is to enable it on the host and
let guest admin enable it if appropriate.

> What is the "norm" statistic?
> 
> Stefan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/12] Multiqueue virtio-net
  2013-01-09 15:32   ` Michael S. Tsirkin
@ 2013-01-09 15:33     ` Jason Wang
  2013-01-10  8:44       ` Stefan Hajnoczi
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-09 15:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: krkumar2, aliguori, kvm, Stefan Hajnoczi, rusty, qemu-devel,
	stefanha, mprivozn, jwhan, shiyer

On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
>> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
>>> Perf Numbers:
>>>
>>> Two Intel Xeon 5620 with direct connected intel 82599EB
>>> Host/Guest kernel: David net tree
>>> vhost enabled
>>>
>>> - lots of improvents of both latency and cpu utilization in request-reponse test
>>> - get regression of guest sending small packets which because TCP tends to batch
>>>   less when the latency were improved
>>>
>>> 1q/2q/4q
>>> TCP_RR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
>>> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>>> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>>> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
>>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>>> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
>>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
>>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
>>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
>>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
>>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
>>> TCP_CRR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
>>> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
>>> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
>>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
>>> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
>>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
>>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
>>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
>>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
>>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
>>> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
>>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
>>> TCP_STREAM guest receiving
>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
>>> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
>>> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
>>> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
>>> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
>>> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
>>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
>>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
>>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
>>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
>>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
>>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
>>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
>>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
>>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
>>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
>>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
>>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
>>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
>>> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
>>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
>>> TCP_MAERTS guest sending
>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
>>> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
>>> 1 4     71.59   1      68.91   0.94   61.52   0.77
>>> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
>>> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
>>> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
>>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
>>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
>>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
>>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
>>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
>>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
>>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
>>> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
>>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
>>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
>>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
>>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
>>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
>>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
>>> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
>> Trying to understand the performance results:
>>
>> What is the host device configuration?  tap + bridge?

Yes.
>>
>> Did you use host CPU affinity for the vhost threads?

I use numactl to pin cpu threads and vhost threads in the same numa node.
>> Can multiqueue tap take advantage of multiqueue host NICs or is
>> virtio-net multiqueue unaware of the physical NIC multiqueue
>> capabilities?

Tap is unware of the physical multiqueue NIC, but we can benefit from it
since we use multiple vhost threads.
>>
>> The results seem pretty mixed - as a user it's not obvious what to
>> choose as a good all-round setting.
> Yes, this I think is the reason it's disabled by default ATM,
> guest admin has to enable it using ethtool.
>
> From what I saw, it looks like with a streaming guest to external
> benchmark, we sometimes get smaller packets and
> so worse performance. We are still investigating - what's
> going on seems to be a strange interaction with guest TCP stack.

Yes, guest TCP tends to batch less when the multiqueue is enabled
(latency is improved). So much more smaller packets were sent in this
case leads to bad performance.
> Other workloads seem to benefit.
>
>>  Any observations on how multiqueue
>> should be configured?
> I think the right thing to do is to enable it on the host and
> let guest admin enable it if appropriate.
>
>> What is the "norm" statistic?

Sorry for not being clear, it's short for normalized result (divide the
result by cpu utilization).
>> Stefan
>
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-09 15:26                     ` Jason Wang
@ 2013-01-10  6:43                       ` Jason Wang
  2013-01-10  6:49                         ` Wanlong Gao
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-10  6:43 UTC (permalink / raw)
  To: gaowanlong
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote:
> On 01/09/2013 06:01 PM, Wanlong Gao wrote:
> > On 01/09/2013 05:30 PM, Jason Wang wrote:
> >> On 01/09/2013 04:23 PM, Wanlong Gao wrote:
> >>> On 01/08/2013 06:14 PM, Jason Wang wrote:
> >>>> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
> >>>>> On 01/08/2013 05:51 PM, Jason Wang wrote:
> >>>>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
> >>>>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
> >>>>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
> >>>>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
> >>>>>>>>>> +    } else if (nc->peer->info->type != 
> >>>>>>>>>> NET_CLIENT_OPTIONS_KIND_TAP) {
> >>>>>>>>>> +        ret = -1;
> >>>>>>>>>> +    } else {
> >>>>>>>>>> +        ret = tap_detach(nc->peer);
> >>>>>>>>>> +    }
> >>>>>>>>>> +
> >>>>>>>>>> +    return ret;
> >>>>>>>>>> +}
> >>>>>>>>>> +
[...]
> >>> I got guest kernel panic when using this way and set queues=4.
> >> 
> >> Does it happens w/o or w/ a fd parameter? What's the qemu command line?
> >> Did you meet it during boot time?
> > 
> > The QEMU command line is
> > 
> > /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15
> > -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \
> > -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
> > -chardev
> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowai
> > t \ -mon chardev=charmonitor,id=monitor,mode=control \
> > -rtc base=utc -no-shutdown \
> > -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> > -device
> > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
> > -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
> > -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
> > -device
> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=vi
> > rtio-disk0,bootindex=1 \ -drive
> > file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
> > -device
> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=vi
> > rtio-disk1 \ -drive
> > file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
> > -device
> > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=
> > scsi0-0-2-0,removable=on \ -drive
> > file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
> > -device
> > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=
> > scsi0-0-3-0 \ -drive
> > file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
> > -device
> > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=
> > scsi0-0-0-0 \ -drive
> > file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
> > -device
> > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=
> > scsi0-0-1-0 \ -chardev pty,id=charserial0 -device
> > isa-serial,chardev=charserial0,id=serial0 \ -chardev
> > file,id=charserial1,path=/vm/f17.log \
> > -device isa-serial,chardev=charserial1,id=serial1 \
> > -device usb-tablet,id=input0 -vga std \
> > -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
> > -netdev tap,id=hostnet0,vhost=on,queues=4 \
> > -device
> > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,ad
> > dr=0x3 \ -monitor stdio
> > 
> > I got panic just after booting the system, did nothing,  waited for a
> > while, the guest panicked.
> > 
> > [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
> > [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6
> > nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput
> > joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core
> > virtio_scsi virtio_blk floppy [   28.053004] CPU 1
> > [   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs
> > Bochs
> > [   28.053004] RIP: 0010:[<ffffffff8137a9ab>]  [<ffffffff8137a9ab>]
> > virtqueue_get_buf+0xb/0x120 [   28.053004] RSP: 0018:ffff8800bc913550 
> > EFLAGS: 00000246
> > [   28.053004] RAX: 0000000000000000 RBX: ffff8800bc49c000 RCX:
> > ffff8800bc49e000 [   28.053004] RDX: 0000000000000000 RSI:
> > ffff8800bc913584 RDI: ffff8800bcfd4000 [   28.053004] RBP:
> > ffff8800bc913558 R08: ffff8800bcfd0800 R09: 0000000000000000 [  
> > 28.053004] R10: ffff8800bc49c000 R11: ffff880036cc4de0 R12:
> > ffff8800bcfd4000 [   28.053004] R13: ffff8800bc913558 R14:
> > ffffffff8137ad73 R15: 00000000000200d0 [   28.053004] FS: 
> > 00007fb27a589740(0000) GS:ffff8800c1480000(0000) knlGS:0000000000000000 [
> >   28.053004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   28.053004] CR2: 0000000000640530 CR3: 00000000baeff000 CR4:
> > 00000000000006e0 [   28.053004] DR0: 0000000000000000 DR1:
> > 0000000000000000 DR2: 0000000000000000 [   28.053004] DR3:
> > 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [  
> > 28.053004] Process ip (pid: 592, threadinfo ffff8800bc912000, task
> > ffff880036da2e20) [   28.053004] Stack:
> > [   28.053004]  ffff8800bcfd0800 ffff8800bc913638 ffffffffa003e9bb
> > ffff8800bc913656 [   28.053004]  0000000100000002 ffff8800c17ebb08
> > 000000500000ff10 ffffea0002f244c0 [   28.053004]  0000000200000582
> > 0000000000000000 0000000000000000 ffffea0002f244c0 [   28.053004] Call
> > Trace:
> > [   28.053004]  [<ffffffffa003e9bb>]
> > virtnet_send_command.constprop.26+0x24b/0x270 [virtio_net] [   28.053004]
> >  [<ffffffff812ed963>] ? sg_init_table+0x23/0x50
> > [   28.053004]  [<ffffffffa0040629>] virtnet_set_rx_mode+0x99/0x300
> > [virtio_net] [   28.053004]  [<ffffffff8152306f>]
> > __dev_set_rx_mode+0x5f/0xb0
> > [   28.053004]  [<ffffffff815230ef>] dev_set_rx_mode+0x2f/0x50
> > [   28.053004]  [<ffffffff815231b7>] __dev_open+0xa7/0xf0
> > [   28.053004]  [<ffffffff81523461>] __dev_change_flags+0xa1/0x180
> > [   28.053004]  [<ffffffff815235f8>] dev_change_flags+0x28/0x70
> > [   28.053004]  [<ffffffff8152ff20>] do_setlink+0x3b0/0xa50
> > [   28.053004]  [<ffffffff812fb6b1>] ? nla_parse+0x31/0xe0
> > [   28.053004]  [<ffffffff815325de>] rtnl_newlink+0x36e/0x580
> > [   28.053004]  [<ffffffff811355cc>] ? get_page_from_freelist+0x37c/0x730
> > [   28.053004]  [<ffffffff81531e13>] rtnetlink_rcv_msg+0x113/0x2f0
> > [   28.053004]  [<ffffffff8117d973>] ?
> > __kmalloc_node_track_caller+0x63/0x1c0 [   28.053004] 
> > [<ffffffff8151526b>] ? __alloc_skb+0x8b/0x2a0
> > [   28.053004]  [<ffffffff81531d00>] ? __rtnl_unlock+0x20/0x20
> > [   28.053004]  [<ffffffff8154b571>] netlink_rcv_skb+0xb1/0xc0
> > [   28.053004]  [<ffffffff8152ea05>] rtnetlink_rcv+0x25/0x40
> > [   28.053004]  [<ffffffff8154ae91>] netlink_unicast+0x1a1/0x220
> > [   28.053004]  [<ffffffff8154b211>] netlink_sendmsg+0x301/0x3c0
> > [   28.053004]  [<ffffffff81508530>] sock_sendmsg+0xb0/0xe0
> > [   28.053004]  [<ffffffff8113a45b>] ? lru_cache_add_lru+0x3b/0x60
> > [   28.053004]  [<ffffffff811608b7>] ? page_add_new_anon_rmap+0xc7/0x180
> > [   28.053004]  [<ffffffff81509efc>] __sys_sendmsg+0x3ac/0x3c0
> > [   28.053004]  [<ffffffff8162e47c>] ? __do_page_fault+0x23c/0x4d0
> > [   28.053004]  [<ffffffff8115c9ef>] ? do_brk+0x1ff/0x370
> > [   28.053004]  [<ffffffff8150bec9>] sys_sendmsg+0x49/0x90
> > [   28.053004]  [<ffffffff81632d59>] system_call_fastpath+0x16/0x1b
> > [   28.053004] Code: 04 0f ae f0 48 8b 47 50 5d 0f b7 50 02 66 39 57 64 0f
> > 94 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
> > 54 <53> 80 7f 59 00 48 89 fb 0f 85 90 00 00 00 48 8b 47 50 0f b7 50
> > 
> > 
> > The QEMU tree I used is git://github.com/jasowang/qemu.git
> 
> Thanks a lot, will try to reproduce my self tomorrow. From the
> calltrace, looks like we send a command to a rx/tx queue.

Right, the virtqueue that will not be used by single queue were initialized.
Please try to following patch or use the my qemu.git with this fix in github.

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 8b4f079..cfd9af1 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -186,7 +186,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
             continue;
         }
 
-        if (virtio_net_started(n, status) && !q->vhost_started) {
+        if (virtio_net_started(n, queue_status) && !q->vhost_started) {
             if (q->tx_timer) {
                 qemu_mod_timer(q->tx_timer,
                                qemu_get_clock_ns(vm_clock) + n->tx_timeout);
@@ -545,7 +545,8 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
 
     if (s.virtqueue_pairs < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN ||
         s.virtqueue_pairs > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX ||
-        s.virtqueue_pairs > n->max_queues) {
+        s.virtqueue_pairs > n->max_queues ||
+        !n->multiqueue) {
         return VIRTIO_NET_ERR;
     }
 
@@ -1026,19 +1027,15 @@ static void virtio_net_tx_bh(void *opaque)
 static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl)
 {
     VirtIODevice *vdev = &n->vdev;
-    int i;
+    int i, max = multiqueue ? n->max_queues : 1;
 
     n->multiqueue = multiqueue;
 
-    if (!multiqueue) {
-        n->max_queues = 1;
-    }
-
     for (i = 2; i <= n->max_queues * 2 + 1; i++) {
         virtio_del_queue(vdev, i);
     }
 
-    for (i = 1; i < n->max_queues; i++) {
+    for (i = 1; i < max; i++) {
         n->vqs[i].rx_vq = virtio_add_queue(vdev, 256, virtio_net_handle_rx);
         if (n->vqs[i].tx_timer) {
             n->vqs[i].tx_vq =
-- 
1.7.1


Thanks



> 
> > Thanks,
> > Wanlong Gao
> > 
[...]

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-10  6:43                       ` Jason Wang
@ 2013-01-10  6:49                         ` Wanlong Gao
  2013-01-10  7:16                           ` Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Wanlong Gao @ 2013-01-10  6:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer, Wanlong Gao

On 01/10/2013 02:43 PM, Jason Wang wrote:
> On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote:
>> On 01/09/2013 06:01 PM, Wanlong Gao wrote:
>>> On 01/09/2013 05:30 PM, Jason Wang wrote:
>>>> On 01/09/2013 04:23 PM, Wanlong Gao wrote:
>>>>> On 01/08/2013 06:14 PM, Jason Wang wrote:
>>>>>> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
>>>>>>> On 01/08/2013 05:51 PM, Jason Wang wrote:
>>>>>>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>>>>>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>>>>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>>>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>>>>>>>> +    } else if (nc->peer->info->type != 
>>>>>>>>>>>> NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>>>>>>>> +        ret = -1;
>>>>>>>>>>>> +    } else {
>>>>>>>>>>>> +        ret = tap_detach(nc->peer);
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
> [...]
>>>>> I got guest kernel panic when using this way and set queues=4.
>>>>
>>>> Does it happens w/o or w/ a fd parameter? What's the qemu command line?
>>>> Did you meet it during boot time?
>>>
>>> The QEMU command line is
>>>
>>> /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15
>>> -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \
>>> -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
>>> -chardev
>>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowai
>>> t \ -mon chardev=charmonitor,id=monitor,mode=control \
>>> -rtc base=utc -no-shutdown \
>>> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
>>> -device
>>> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
>>> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
>>> -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
>>> -device
>>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=vi
>>> rtio-disk0,bootindex=1 \ -drive
>>> file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
>>> -device
>>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=vi
>>> rtio-disk1 \ -drive
>>> file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
>>> -device
>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=
>>> scsi0-0-2-0,removable=on \ -drive
>>> file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
>>> -device
>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=
>>> scsi0-0-3-0 \ -drive
>>> file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
>>> -device
>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=
>>> scsi0-0-0-0 \ -drive
>>> file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
>>> -device
>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=
>>> scsi0-0-1-0 \ -chardev pty,id=charserial0 -device
>>> isa-serial,chardev=charserial0,id=serial0 \ -chardev
>>> file,id=charserial1,path=/vm/f17.log \
>>> -device isa-serial,chardev=charserial1,id=serial1 \
>>> -device usb-tablet,id=input0 -vga std \
>>> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
>>> -netdev tap,id=hostnet0,vhost=on,queues=4 \
>>> -device
>>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,ad
>>> dr=0x3 \ -monitor stdio
>>>
>>> I got panic just after booting the system, did nothing,  waited for a
>>> while, the guest panicked.
>>>
>>> [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
>>> [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6
>>> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput
>>> joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core
>>> virtio_scsi virtio_blk floppy [   28.053004] CPU 1
>>> [   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs
>>> Bochs
>>> [   28.053004] RIP: 0010:[<ffffffff8137a9ab>]  [<ffffffff8137a9ab>]
>>> virtqueue_get_buf+0xb/0x120 [   28.053004] RSP: 0018:ffff8800bc913550 
>>> EFLAGS: 00000246
>>> [   28.053004] RAX: 0000000000000000 RBX: ffff8800bc49c000 RCX:
>>> ffff8800bc49e000 [   28.053004] RDX: 0000000000000000 RSI:
>>> ffff8800bc913584 RDI: ffff8800bcfd4000 [   28.053004] RBP:
>>> ffff8800bc913558 R08: ffff8800bcfd0800 R09: 0000000000000000 [  
>>> 28.053004] R10: ffff8800bc49c000 R11: ffff880036cc4de0 R12:
>>> ffff8800bcfd4000 [   28.053004] R13: ffff8800bc913558 R14:
>>> ffffffff8137ad73 R15: 00000000000200d0 [   28.053004] FS: 
>>> 00007fb27a589740(0000) GS:ffff8800c1480000(0000) knlGS:0000000000000000 [
>>>   28.053004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   28.053004] CR2: 0000000000640530 CR3: 00000000baeff000 CR4:
>>> 00000000000006e0 [   28.053004] DR0: 0000000000000000 DR1:
>>> 0000000000000000 DR2: 0000000000000000 [   28.053004] DR3:
>>> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [  
>>> 28.053004] Process ip (pid: 592, threadinfo ffff8800bc912000, task
>>> ffff880036da2e20) [   28.053004] Stack:
>>> [   28.053004]  ffff8800bcfd0800 ffff8800bc913638 ffffffffa003e9bb
>>> ffff8800bc913656 [   28.053004]  0000000100000002 ffff8800c17ebb08
>>> 000000500000ff10 ffffea0002f244c0 [   28.053004]  0000000200000582
>>> 0000000000000000 0000000000000000 ffffea0002f244c0 [   28.053004] Call
>>> Trace:
>>> [   28.053004]  [<ffffffffa003e9bb>]
>>> virtnet_send_command.constprop.26+0x24b/0x270 [virtio_net] [   28.053004]
>>>  [<ffffffff812ed963>] ? sg_init_table+0x23/0x50
>>> [   28.053004]  [<ffffffffa0040629>] virtnet_set_rx_mode+0x99/0x300
>>> [virtio_net] [   28.053004]  [<ffffffff8152306f>]
>>> __dev_set_rx_mode+0x5f/0xb0
>>> [   28.053004]  [<ffffffff815230ef>] dev_set_rx_mode+0x2f/0x50
>>> [   28.053004]  [<ffffffff815231b7>] __dev_open+0xa7/0xf0
>>> [   28.053004]  [<ffffffff81523461>] __dev_change_flags+0xa1/0x180
>>> [   28.053004]  [<ffffffff815235f8>] dev_change_flags+0x28/0x70
>>> [   28.053004]  [<ffffffff8152ff20>] do_setlink+0x3b0/0xa50
>>> [   28.053004]  [<ffffffff812fb6b1>] ? nla_parse+0x31/0xe0
>>> [   28.053004]  [<ffffffff815325de>] rtnl_newlink+0x36e/0x580
>>> [   28.053004]  [<ffffffff811355cc>] ? get_page_from_freelist+0x37c/0x730
>>> [   28.053004]  [<ffffffff81531e13>] rtnetlink_rcv_msg+0x113/0x2f0
>>> [   28.053004]  [<ffffffff8117d973>] ?
>>> __kmalloc_node_track_caller+0x63/0x1c0 [   28.053004] 
>>> [<ffffffff8151526b>] ? __alloc_skb+0x8b/0x2a0
>>> [   28.053004]  [<ffffffff81531d00>] ? __rtnl_unlock+0x20/0x20
>>> [   28.053004]  [<ffffffff8154b571>] netlink_rcv_skb+0xb1/0xc0
>>> [   28.053004]  [<ffffffff8152ea05>] rtnetlink_rcv+0x25/0x40
>>> [   28.053004]  [<ffffffff8154ae91>] netlink_unicast+0x1a1/0x220
>>> [   28.053004]  [<ffffffff8154b211>] netlink_sendmsg+0x301/0x3c0
>>> [   28.053004]  [<ffffffff81508530>] sock_sendmsg+0xb0/0xe0
>>> [   28.053004]  [<ffffffff8113a45b>] ? lru_cache_add_lru+0x3b/0x60
>>> [   28.053004]  [<ffffffff811608b7>] ? page_add_new_anon_rmap+0xc7/0x180
>>> [   28.053004]  [<ffffffff81509efc>] __sys_sendmsg+0x3ac/0x3c0
>>> [   28.053004]  [<ffffffff8162e47c>] ? __do_page_fault+0x23c/0x4d0
>>> [   28.053004]  [<ffffffff8115c9ef>] ? do_brk+0x1ff/0x370
>>> [   28.053004]  [<ffffffff8150bec9>] sys_sendmsg+0x49/0x90
>>> [   28.053004]  [<ffffffff81632d59>] system_call_fastpath+0x16/0x1b
>>> [   28.053004] Code: 04 0f ae f0 48 8b 47 50 5d 0f b7 50 02 66 39 57 64 0f
>>> 94 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
>>> 54 <53> 80 7f 59 00 48 89 fb 0f 85 90 00 00 00 48 8b 47 50 0f b7 50
>>>
>>>
>>> The QEMU tree I used is git://github.com/jasowang/qemu.git
>>
>> Thanks a lot, will try to reproduce my self tomorrow. From the
>> calltrace, looks like we send a command to a rx/tx queue.
> 
> Right, the virtqueue that will not be used by single queue were initialized.
> Please try to following patch or use the my qemu.git with this fix in github.

It's odd, why didn't I get guest panic by using your python start script this morning?

Thanks,
Wanlong Gao

> 
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index 8b4f079..cfd9af1 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -186,7 +186,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>              continue;
>          }
>  
> -        if (virtio_net_started(n, status) && !q->vhost_started) {
> +        if (virtio_net_started(n, queue_status) && !q->vhost_started) {
>              if (q->tx_timer) {
>                  qemu_mod_timer(q->tx_timer,
>                                 qemu_get_clock_ns(vm_clock) + n->tx_timeout);
> @@ -545,7 +545,8 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
>  
>      if (s.virtqueue_pairs < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN ||
>          s.virtqueue_pairs > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX ||
> -        s.virtqueue_pairs > n->max_queues) {
> +        s.virtqueue_pairs > n->max_queues ||
> +        !n->multiqueue) {
>          return VIRTIO_NET_ERR;
>      }
>  
> @@ -1026,19 +1027,15 @@ static void virtio_net_tx_bh(void *opaque)
>  static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl)
>  {
>      VirtIODevice *vdev = &n->vdev;
> -    int i;
> +    int i, max = multiqueue ? n->max_queues : 1;
>  
>      n->multiqueue = multiqueue;
>  
> -    if (!multiqueue) {
> -        n->max_queues = 1;
> -    }
> -
>      for (i = 2; i <= n->max_queues * 2 + 1; i++) {
>          virtio_del_queue(vdev, i);
>      }
>  
> -    for (i = 1; i < n->max_queues; i++) {
> +    for (i = 1; i < max; i++) {
>          n->vqs[i].rx_vq = virtio_add_queue(vdev, 256, virtio_net_handle_rx);
>          if (n->vqs[i].tx_timer) {
>              n->vqs[i].tx_vq =
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/12] virtio-net: multiqueue support
  2013-01-10  6:49                         ` Wanlong Gao
@ 2013-01-10  7:16                           ` Jason Wang
  2013-01-10  9:06                             ` Wanlong Gao
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-10  7:16 UTC (permalink / raw)
  To: gaowanlong
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On Thursday, January 10, 2013 02:49:14 PM Wanlong Gao wrote:
> On 01/10/2013 02:43 PM, Jason Wang wrote:
> > On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote:
> >> On 01/09/2013 06:01 PM, Wanlong Gao wrote:
> >>> On 01/09/2013 05:30 PM, Jason Wang wrote:
> >>>> On 01/09/2013 04:23 PM, Wanlong Gao wrote:
> >>>>> On 01/08/2013 06:14 PM, Jason Wang wrote:
> >>>>>> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
> >>>>>>> On 01/08/2013 05:51 PM, Jason Wang wrote:
> >>>>>>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
> >>>>>>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
> >>>>>>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
> >>>>>>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
> >>>>>>>>>>>> +    } else if (nc->peer->info->type !=
> >>>>>>>>>>>> NET_CLIENT_OPTIONS_KIND_TAP) {
> >>>>>>>>>>>> +        ret = -1;
> >>>>>>>>>>>> +    } else {
> >>>>>>>>>>>> +        ret = tap_detach(nc->peer);
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    return ret;
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> > 
> > [...]
> > 
> >>>>> I got guest kernel panic when using this way and set queues=4.
> >>>> 
> >>>> Does it happens w/o or w/ a fd parameter? What's the qemu command line?
> >>>> Did you meet it during boot time?
> >>> 
> >>> The QEMU command line is
> >>> 
> >>> /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15
> >>> -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \
> >>> -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
> >>> -chardev
> >>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowa
> >>> i
> >>> t \ -mon chardev=charmonitor,id=monitor,mode=control \
> >>> -rtc base=utc -no-shutdown \
> >>> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> >>> -device
> >>> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
> >>> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
> >>> -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
> >>> -device
> >>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=v
> >>> i
> >>> rtio-disk0,bootindex=1 \ -drive
> >>> file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
> >>> -device
> >>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=v
> >>> i
> >>> rtio-disk1 \ -drive
> >>> file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
> >>> -device
> >>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id
> >>> =
> >>> scsi0-0-2-0,removable=on \ -drive
> >>> file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
> >>> -device
> >>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id
> >>> =
> >>> scsi0-0-3-0 \ -drive
> >>> file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
> >>> -device
> >>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id
> >>> =
> >>> scsi0-0-0-0 \ -drive
> >>> file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
> >>> -device
> >>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id
> >>> =
> >>> scsi0-0-1-0 \ -chardev pty,id=charserial0 -device
> >>> isa-serial,chardev=charserial0,id=serial0 \ -chardev
> >>> file,id=charserial1,path=/vm/f17.log \
> >>> -device isa-serial,chardev=charserial1,id=serial1 \
> >>> -device usb-tablet,id=input0 -vga std \
> >>> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
> >>> -netdev tap,id=hostnet0,vhost=on,queues=4 \
> >>> -device
> >>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,a
> >>> d
> >>> dr=0x3 \ -monitor stdio
> >>> 
> >>> I got panic just after booting the system, did nothing,  waited for a
> >>> while, the guest panicked.
> >>> 
> >>> [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
> >>> [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6
> >>> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput
> >>> joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core
> >>> virtio_scsi virtio_blk floppy [   28.053004] CPU 1
> >>> [   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs
> >>> Bochs
> >>> [   28.053004] RIP: 0010:[<ffffffff8137a9ab>]  [<ffffffff8137a9ab>]
> >>> virtqueue_get_buf+0xb/0x120 [   28.053004] RSP: 0018:ffff8800bc913550
> >>> EFLAGS: 00000246
> >>> [   28.053004] RAX: 0000000000000000 RBX: ffff8800bc49c000 RCX:
> >>> ffff8800bc49e000 [   28.053004] RDX: 0000000000000000 RSI:
> >>> ffff8800bc913584 RDI: ffff8800bcfd4000 [   28.053004] RBP:
> >>> ffff8800bc913558 R08: ffff8800bcfd0800 R09: 0000000000000000 [
> >>> 28.053004] R10: ffff8800bc49c000 R11: ffff880036cc4de0 R12:
> >>> ffff8800bcfd4000 [   28.053004] R13: ffff8800bc913558 R14:
> >>> ffffffff8137ad73 R15: 00000000000200d0 [   28.053004] FS:
> >>> 00007fb27a589740(0000) GS:ffff8800c1480000(0000) knlGS:0000000000000000
> >>> [
> >>> 
> >>>   28.053004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> 
> >>> [   28.053004] CR2: 0000000000640530 CR3: 00000000baeff000 CR4:
> >>> 00000000000006e0 [   28.053004] DR0: 0000000000000000 DR1:
> >>> 0000000000000000 DR2: 0000000000000000 [   28.053004] DR3:
> >>> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [
> >>> 28.053004] Process ip (pid: 592, threadinfo ffff8800bc912000, task
> >>> ffff880036da2e20) [   28.053004] Stack:
> >>> [   28.053004]  ffff8800bcfd0800 ffff8800bc913638 ffffffffa003e9bb
> >>> ffff8800bc913656 [   28.053004]  0000000100000002 ffff8800c17ebb08
> >>> 000000500000ff10 ffffea0002f244c0 [   28.053004]  0000000200000582
> >>> 0000000000000000 0000000000000000 ffffea0002f244c0 [   28.053004] Call
> >>> Trace:
> >>> [   28.053004]  [<ffffffffa003e9bb>]
> >>> virtnet_send_command.constprop.26+0x24b/0x270 [virtio_net] [  
> >>> 28.053004]
> >>> 
> >>>  [<ffffffff812ed963>] ? sg_init_table+0x23/0x50
> >>> 
> >>> [   28.053004]  [<ffffffffa0040629>] virtnet_set_rx_mode+0x99/0x300
> >>> [virtio_net] [   28.053004]  [<ffffffff8152306f>]
> >>> __dev_set_rx_mode+0x5f/0xb0
> >>> [   28.053004]  [<ffffffff815230ef>] dev_set_rx_mode+0x2f/0x50
> >>> [   28.053004]  [<ffffffff815231b7>] __dev_open+0xa7/0xf0
> >>> [   28.053004]  [<ffffffff81523461>] __dev_change_flags+0xa1/0x180
> >>> [   28.053004]  [<ffffffff815235f8>] dev_change_flags+0x28/0x70
> >>> [   28.053004]  [<ffffffff8152ff20>] do_setlink+0x3b0/0xa50
> >>> [   28.053004]  [<ffffffff812fb6b1>] ? nla_parse+0x31/0xe0
> >>> [   28.053004]  [<ffffffff815325de>] rtnl_newlink+0x36e/0x580
> >>> [   28.053004]  [<ffffffff811355cc>] ?
> >>> get_page_from_freelist+0x37c/0x730
> >>> [   28.053004]  [<ffffffff81531e13>] rtnetlink_rcv_msg+0x113/0x2f0
> >>> [   28.053004]  [<ffffffff8117d973>] ?
> >>> __kmalloc_node_track_caller+0x63/0x1c0 [   28.053004]
> >>> [<ffffffff8151526b>] ? __alloc_skb+0x8b/0x2a0
> >>> [   28.053004]  [<ffffffff81531d00>] ? __rtnl_unlock+0x20/0x20
> >>> [   28.053004]  [<ffffffff8154b571>] netlink_rcv_skb+0xb1/0xc0
> >>> [   28.053004]  [<ffffffff8152ea05>] rtnetlink_rcv+0x25/0x40
> >>> [   28.053004]  [<ffffffff8154ae91>] netlink_unicast+0x1a1/0x220
> >>> [   28.053004]  [<ffffffff8154b211>] netlink_sendmsg+0x301/0x3c0
> >>> [   28.053004]  [<ffffffff81508530>] sock_sendmsg+0xb0/0xe0
> >>> [   28.053004]  [<ffffffff8113a45b>] ? lru_cache_add_lru+0x3b/0x60
> >>> [   28.053004]  [<ffffffff811608b7>] ? page_add_new_anon_rmap+0xc7/0x180
> >>> [   28.053004]  [<ffffffff81509efc>] __sys_sendmsg+0x3ac/0x3c0
> >>> [   28.053004]  [<ffffffff8162e47c>] ? __do_page_fault+0x23c/0x4d0
> >>> [   28.053004]  [<ffffffff8115c9ef>] ? do_brk+0x1ff/0x370
> >>> [   28.053004]  [<ffffffff8150bec9>] sys_sendmsg+0x49/0x90
> >>> [   28.053004]  [<ffffffff81632d59>] system_call_fastpath+0x16/0x1b
> >>> [   28.053004] Code: 04 0f ae f0 48 8b 47 50 5d 0f b7 50 02 66 39 57 64
> >>> 0f
> >>> 94 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
> >>> 54 <53> 80 7f 59 00 48 89 fb 0f 85 90 00 00 00 48 8b 47 50 0f b7 50
> >>> 
> >>> 
> >>> The QEMU tree I used is git://github.com/jasowang/qemu.git
> >> 
> >> Thanks a lot, will try to reproduce my self tomorrow. From the
> >> calltrace, looks like we send a command to a rx/tx queue.
> > 
> > Right, the virtqueue that will not be used by single queue were
> > initialized. Please try to following patch or use the my qemu.git with
> > this fix in github.
> It's odd, why didn't I get guest panic by using your python start script
> this morning?
> 
That's strange, I can reproduce it. Did you try booting a single queue guest 
under a multiqueue virtio-net?

It could be only triggered when you want to boot a single queue guest with 
queues >= 2. Let's take 2 as an example. Without the patch, all virtqueues 
will be initialized even if guest don't support multiqueue. So ctrl vq will be 
4th but guest think it's 2th. So guest will send the command to a rx/tx queue, 
so it won't get any response.

So if you're using the python script to boot a single queue guest with queue = 
1, or boot a multiqueue guest. It would not be triggerable.

Thanks
> Thanks,
> Wanlong Gao
> 
> > diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> > index 8b4f079..cfd9af1 100644
> > --- a/hw/virtio-net.c
> > +++ b/hw/virtio-net.c
[...]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/12] tap: multiqueue support
  2013-01-09 15:25     ` Jason Wang
@ 2013-01-10  8:32       ` Stefan Hajnoczi
  0 siblings, 0 replies; 58+ messages in thread
From: Stefan Hajnoczi @ 2013-01-10  8:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Stefan Hajnoczi, mst, aliguori, qemu-devel, rusty, kvm, mprivozn,
	shiyer, krkumar2, jwhan

On Wed, Jan 09, 2013 at 11:25:24PM +0800, Jason Wang wrote:
> On 01/09/2013 05:56 PM, Stefan Hajnoczi wrote:
> > On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote:
> >> diff --git a/qapi-schema.json b/qapi-schema.json
> >> index 5dfa052..583eb7c 100644
> >> --- a/qapi-schema.json
> >> +++ b/qapi-schema.json
> >> @@ -2465,7 +2465,7 @@
> >>  { 'type': 'NetdevTapOptions',
> >>    'data': {
> >>      '*ifname':     'str',
> >> -    '*fd':         'str',
> >> +    '*fd':         ['String'],
> > This change is not backwards-compatible.  You need to add a '*fds':
> > ['String'] field instead.
> 
> I'm not quite understand this case, I think it still work when we we
> just specify one fd.

You are right, the QemuOpts visitor shows no incompatibility.

But there is also a QMP interface: netdev_add.  I think changing the
type to a string list breaks compatibility there.

Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/12] Multiqueue virtio-net
  2013-01-09 15:33     ` Jason Wang
@ 2013-01-10  8:44       ` Stefan Hajnoczi
  2013-01-10  9:34         ` [Qemu-devel] " Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Stefan Hajnoczi @ 2013-01-10  8:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, Michael S. Tsirkin, Stefan Hajnoczi,
	rusty, qemu-devel, mprivozn, jwhan, shiyer

On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote:
> On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
> > On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
> >> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> >>> Perf Numbers:
> >>>
> >>> Two Intel Xeon 5620 with direct connected intel 82599EB
> >>> Host/Guest kernel: David net tree
> >>> vhost enabled
> >>>
> >>> - lots of improvents of both latency and cpu utilization in request-reponse test
> >>> - get regression of guest sending small packets which because TCP tends to batch
> >>>   less when the latency were improved
> >>>
> >>> 1q/2q/4q
> >>> TCP_RR
> >>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> >>> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
> >>> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> >>> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> >>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> >>> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
> >>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> >>> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
> >>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> >>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> >>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> >>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> >>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> >>> TCP_CRR
> >>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> >>> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
> >>> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
> >>> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
> >>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> >>> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
> >>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> >>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> >>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> >>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> >>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> >>> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
> >>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> >>> TCP_STREAM guest receiving
> >>>  size #sessions throughput  norm throughput  norm throughput  norm
> >>> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
> >>> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
> >>> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
> >>> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
> >>> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
> >>> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
> >>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> >>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> >>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> >>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> >>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> >>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> >>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> >>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> >>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> >>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> >>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> >>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> >>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> >>> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
> >>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> >>> TCP_MAERTS guest sending
> >>>  size #sessions throughput  norm throughput  norm throughput  norm
> >>> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
> >>> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
> >>> 1 4     71.59   1      68.91   0.94   61.52   0.77
> >>> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
> >>> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
> >>> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
> >>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> >>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> >>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> >>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> >>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> >>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> >>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> >>> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
> >>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> >>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> >>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> >>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> >>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> >>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
> >>> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
> >> Trying to understand the performance results:
> >>
> >> What is the host device configuration?  tap + bridge?
> 
> Yes.
> >>
> >> Did you use host CPU affinity for the vhost threads?
> 
> I use numactl to pin cpu threads and vhost threads in the same numa node.
> >> Can multiqueue tap take advantage of multiqueue host NICs or is
> >> virtio-net multiqueue unaware of the physical NIC multiqueue
> >> capabilities?
> 
> Tap is unware of the physical multiqueue NIC, but we can benefit from it
> since we use multiple vhost threads.

I wonder if it makes a difference to bind tap queues to physical NIC
queues.  Maybe this is only possible in macvlan or can you preset the
queue index of outgoing skbs so the network stack doesn't recalculate
the flow?

> >>
> >> The results seem pretty mixed - as a user it's not obvious what to
> >> choose as a good all-round setting.
> > Yes, this I think is the reason it's disabled by default ATM,
> > guest admin has to enable it using ethtool.
> >
> > From what I saw, it looks like with a streaming guest to external
> > benchmark, we sometimes get smaller packets and
> > so worse performance. We are still investigating - what's
> > going on seems to be a strange interaction with guest TCP stack.
> 
> Yes, guest TCP tends to batch less when the multiqueue is enabled
> (latency is improved). So much more smaller packets were sent in this
> case leads to bad performance.

Okay, this makes sense.

> > Other workloads seem to benefit.
> >
> >>  Any observations on how multiqueue
> >> should be configured?
> > I think the right thing to do is to enable it on the host and
> > let guest admin enable it if appropriate.
> >
> >> What is the "norm" statistic?
> 
> Sorry for not being clear, it's short for normalized result (divide the
> result by cpu utilization).

Okay, that explains the results a little.  When norm doesn't change much
across 1q/2q/4q we're getting linear scaling.  It scales further because
the queues allow for more CPUs to be used.  That's good.

Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/12] virtio-net: multiqueue support
  2013-01-10  7:16                           ` Jason Wang
@ 2013-01-10  9:06                             ` Wanlong Gao
  2013-01-10  9:40                               ` [Qemu-devel] " Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Wanlong Gao @ 2013-01-10  9:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	Wanlong Gao, stefanha, jwhan, shiyer

On 01/10/2013 03:16 PM, Jason Wang wrote:
> On Thursday, January 10, 2013 02:49:14 PM Wanlong Gao wrote:
>> On 01/10/2013 02:43 PM, Jason Wang wrote:
>>> On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote:
>>>> On 01/09/2013 06:01 PM, Wanlong Gao wrote:
>>>>> On 01/09/2013 05:30 PM, Jason Wang wrote:
>>>>>> On 01/09/2013 04:23 PM, Wanlong Gao wrote:
>>>>>>> On 01/08/2013 06:14 PM, Jason Wang wrote:
>>>>>>>> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
>>>>>>>>> On 01/08/2013 05:51 PM, Jason Wang wrote:
>>>>>>>>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>>>>>>>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>>>>>>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>>>>>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>>>>>>>>>> +    } else if (nc->peer->info->type !=
>>>>>>>>>>>>>> NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>>>>>>>>>> +        ret = -1;
>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>> +        ret = tap_detach(nc->peer);
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>
>>> [...]
>>>
>>>>>>> I got guest kernel panic when using this way and set queues=4.
>>>>>>
>>>>>> Does it happens w/o or w/ a fd parameter? What's the qemu command line?
>>>>>> Did you meet it during boot time?
>>>>>
>>>>> The QEMU command line is
>>>>>
>>>>> /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15
>>>>> -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \
>>>>> -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
>>>>> -chardev
>>>>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowa
>>>>> i
>>>>> t \ -mon chardev=charmonitor,id=monitor,mode=control \
>>>>> -rtc base=utc -no-shutdown \
>>>>> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
>>>>> -device
>>>>> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
>>>>> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
>>>>> -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
>>>>> -device
>>>>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=v
>>>>> i
>>>>> rtio-disk0,bootindex=1 \ -drive
>>>>> file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
>>>>> -device
>>>>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=v
>>>>> i
>>>>> rtio-disk1 \ -drive
>>>>> file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
>>>>> -device
>>>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id
>>>>> =
>>>>> scsi0-0-2-0,removable=on \ -drive
>>>>> file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
>>>>> -device
>>>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id
>>>>> =
>>>>> scsi0-0-3-0 \ -drive
>>>>> file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
>>>>> -device
>>>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id
>>>>> =
>>>>> scsi0-0-0-0 \ -drive
>>>>> file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
>>>>> -device
>>>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id
>>>>> =
>>>>> scsi0-0-1-0 \ -chardev pty,id=charserial0 -device
>>>>> isa-serial,chardev=charserial0,id=serial0 \ -chardev
>>>>> file,id=charserial1,path=/vm/f17.log \
>>>>> -device isa-serial,chardev=charserial1,id=serial1 \
>>>>> -device usb-tablet,id=input0 -vga std \
>>>>> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
>>>>> -netdev tap,id=hostnet0,vhost=on,queues=4 \
>>>>> -device
>>>>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,a
>>>>> d
>>>>> dr=0x3 \ -monitor stdio
>>>>>
>>>>> I got panic just after booting the system, did nothing,  waited for a
>>>>> while, the guest panicked.
>>>>>
>>>>> [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
>>>>> [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6
>>>>> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput
>>>>> joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core
>>>>> virtio_scsi virtio_blk floppy [   28.053004] CPU 1
>>>>> [   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs
>>>>> Bochs
>>>>> [   28.053004] RIP: 0010:[<ffffffff8137a9ab>]  [<ffffffff8137a9ab>]
>>>>> virtqueue_get_buf+0xb/0x120 [   28.053004] RSP: 0018:ffff8800bc913550
>>>>> EFLAGS: 00000246
>>>>> [   28.053004] RAX: 0000000000000000 RBX: ffff8800bc49c000 RCX:
>>>>> ffff8800bc49e000 [   28.053004] RDX: 0000000000000000 RSI:
>>>>> ffff8800bc913584 RDI: ffff8800bcfd4000 [   28.053004] RBP:
>>>>> ffff8800bc913558 R08: ffff8800bcfd0800 R09: 0000000000000000 [
>>>>> 28.053004] R10: ffff8800bc49c000 R11: ffff880036cc4de0 R12:
>>>>> ffff8800bcfd4000 [   28.053004] R13: ffff8800bc913558 R14:
>>>>> ffffffff8137ad73 R15: 00000000000200d0 [   28.053004] FS:
>>>>> 00007fb27a589740(0000) GS:ffff8800c1480000(0000) knlGS:0000000000000000
>>>>> [
>>>>>
>>>>>   28.053004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>
>>>>> [   28.053004] CR2: 0000000000640530 CR3: 00000000baeff000 CR4:
>>>>> 00000000000006e0 [   28.053004] DR0: 0000000000000000 DR1:
>>>>> 0000000000000000 DR2: 0000000000000000 [   28.053004] DR3:
>>>>> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [
>>>>> 28.053004] Process ip (pid: 592, threadinfo ffff8800bc912000, task
>>>>> ffff880036da2e20) [   28.053004] Stack:
>>>>> [   28.053004]  ffff8800bcfd0800 ffff8800bc913638 ffffffffa003e9bb
>>>>> ffff8800bc913656 [   28.053004]  0000000100000002 ffff8800c17ebb08
>>>>> 000000500000ff10 ffffea0002f244c0 [   28.053004]  0000000200000582
>>>>> 0000000000000000 0000000000000000 ffffea0002f244c0 [   28.053004] Call
>>>>> Trace:
>>>>> [   28.053004]  [<ffffffffa003e9bb>]
>>>>> virtnet_send_command.constprop.26+0x24b/0x270 [virtio_net] [  
>>>>> 28.053004]
>>>>>
>>>>>  [<ffffffff812ed963>] ? sg_init_table+0x23/0x50
>>>>>
>>>>> [   28.053004]  [<ffffffffa0040629>] virtnet_set_rx_mode+0x99/0x300
>>>>> [virtio_net] [   28.053004]  [<ffffffff8152306f>]
>>>>> __dev_set_rx_mode+0x5f/0xb0
>>>>> [   28.053004]  [<ffffffff815230ef>] dev_set_rx_mode+0x2f/0x50
>>>>> [   28.053004]  [<ffffffff815231b7>] __dev_open+0xa7/0xf0
>>>>> [   28.053004]  [<ffffffff81523461>] __dev_change_flags+0xa1/0x180
>>>>> [   28.053004]  [<ffffffff815235f8>] dev_change_flags+0x28/0x70
>>>>> [   28.053004]  [<ffffffff8152ff20>] do_setlink+0x3b0/0xa50
>>>>> [   28.053004]  [<ffffffff812fb6b1>] ? nla_parse+0x31/0xe0
>>>>> [   28.053004]  [<ffffffff815325de>] rtnl_newlink+0x36e/0x580
>>>>> [   28.053004]  [<ffffffff811355cc>] ?
>>>>> get_page_from_freelist+0x37c/0x730
>>>>> [   28.053004]  [<ffffffff81531e13>] rtnetlink_rcv_msg+0x113/0x2f0
>>>>> [   28.053004]  [<ffffffff8117d973>] ?
>>>>> __kmalloc_node_track_caller+0x63/0x1c0 [   28.053004]
>>>>> [<ffffffff8151526b>] ? __alloc_skb+0x8b/0x2a0
>>>>> [   28.053004]  [<ffffffff81531d00>] ? __rtnl_unlock+0x20/0x20
>>>>> [   28.053004]  [<ffffffff8154b571>] netlink_rcv_skb+0xb1/0xc0
>>>>> [   28.053004]  [<ffffffff8152ea05>] rtnetlink_rcv+0x25/0x40
>>>>> [   28.053004]  [<ffffffff8154ae91>] netlink_unicast+0x1a1/0x220
>>>>> [   28.053004]  [<ffffffff8154b211>] netlink_sendmsg+0x301/0x3c0
>>>>> [   28.053004]  [<ffffffff81508530>] sock_sendmsg+0xb0/0xe0
>>>>> [   28.053004]  [<ffffffff8113a45b>] ? lru_cache_add_lru+0x3b/0x60
>>>>> [   28.053004]  [<ffffffff811608b7>] ? page_add_new_anon_rmap+0xc7/0x180
>>>>> [   28.053004]  [<ffffffff81509efc>] __sys_sendmsg+0x3ac/0x3c0
>>>>> [   28.053004]  [<ffffffff8162e47c>] ? __do_page_fault+0x23c/0x4d0
>>>>> [   28.053004]  [<ffffffff8115c9ef>] ? do_brk+0x1ff/0x370
>>>>> [   28.053004]  [<ffffffff8150bec9>] sys_sendmsg+0x49/0x90
>>>>> [   28.053004]  [<ffffffff81632d59>] system_call_fastpath+0x16/0x1b
>>>>> [   28.053004] Code: 04 0f ae f0 48 8b 47 50 5d 0f b7 50 02 66 39 57 64
>>>>> 0f
>>>>> 94 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
>>>>> 54 <53> 80 7f 59 00 48 89 fb 0f 85 90 00 00 00 48 8b 47 50 0f b7 50
>>>>>
>>>>>
>>>>> The QEMU tree I used is git://github.com/jasowang/qemu.git
>>>>
>>>> Thanks a lot, will try to reproduce my self tomorrow. From the
>>>> calltrace, looks like we send a command to a rx/tx queue.
>>>
>>> Right, the virtqueue that will not be used by single queue were
>>> initialized. Please try to following patch or use the my qemu.git with
>>> this fix in github.
>> It's odd, why didn't I get guest panic by using your python start script
>> this morning?
>>
> That's strange, I can reproduce it. Did you try booting a single queue guest 
> under a multiqueue virtio-net?
> 
> It could be only triggered when you want to boot a single queue guest with 
> queues >= 2. Let's take 2 as an example. Without the patch, all virtqueues 
> will be initialized even if guest don't support multiqueue. So ctrl vq will be 
> 4th but guest think it's 2th. So guest will send the command to a rx/tx queue, 
> so it won't get any response.

Anyway, with your updated github tree, the guest panic has gone.
As you say here, the guest panic is triggered by using single-queue-supported guest
kernel? But I think my guest kernel was always support multi-queue virtio-net,
Am I missing something on the guest kernel supporting multi-queue virtio-net ?

Thanks,
Wanlong Gao

> 
> So if you're using the python script to boot a single queue guest with queue = 
> 1, or boot a multiqueue guest. It would not be triggerable.
> 
> Thanks
>> Thanks,
>> Wanlong Gao
>>
>>> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
>>> index 8b4f079..cfd9af1 100644
>>> --- a/hw/virtio-net.c
>>> +++ b/hw/virtio-net.c
> [...]
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
  2013-01-10  8:44       ` Stefan Hajnoczi
@ 2013-01-10  9:34         ` Jason Wang
  2013-01-10 11:49           ` Stefan Hajnoczi
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-10  9:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, aliguori, qemu-devel,
	krkumar2, kvm, mprivozn, rusty, jwhan, shiyer

On 01/10/2013 04:44 PM, Stefan Hajnoczi wrote:
> On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote:
>> On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
>>> On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
>>>> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
>>>>> Perf Numbers:
>>>>>
>>>>> Two Intel Xeon 5620 with direct connected intel 82599EB
>>>>> Host/Guest kernel: David net tree
>>>>> vhost enabled
>>>>>
>>>>> - lots of improvents of both latency and cpu utilization in request-reponse test
>>>>> - get regression of guest sending small packets which because TCP tends to batch
>>>>>   less when the latency were improved
>>>>>
>>>>> 1q/2q/4q
>>>>> TCP_RR
>>>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>>>> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
>>>>> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>>>>> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>>>>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>>>>> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
>>>>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>>>>> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
>>>>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
>>>>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
>>>>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
>>>>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
>>>>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
>>>>> TCP_CRR
>>>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>>>> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
>>>>> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
>>>>> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
>>>>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
>>>>> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
>>>>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
>>>>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
>>>>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
>>>>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
>>>>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
>>>>> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
>>>>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
>>>>> TCP_STREAM guest receiving
>>>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>>>> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
>>>>> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
>>>>> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
>>>>> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
>>>>> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
>>>>> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
>>>>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
>>>>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
>>>>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
>>>>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
>>>>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
>>>>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
>>>>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
>>>>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
>>>>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
>>>>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
>>>>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
>>>>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
>>>>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
>>>>> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
>>>>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
>>>>> TCP_MAERTS guest sending
>>>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>>>> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
>>>>> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
>>>>> 1 4     71.59   1      68.91   0.94   61.52   0.77
>>>>> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
>>>>> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
>>>>> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
>>>>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
>>>>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
>>>>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
>>>>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
>>>>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
>>>>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
>>>>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
>>>>> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
>>>>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
>>>>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
>>>>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
>>>>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
>>>>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
>>>>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
>>>>> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
>>>> Trying to understand the performance results:
>>>>
>>>> What is the host device configuration?  tap + bridge?
>> Yes.
>>>> Did you use host CPU affinity for the vhost threads?
>> I use numactl to pin cpu threads and vhost threads in the same numa node.
>>>> Can multiqueue tap take advantage of multiqueue host NICs or is
>>>> virtio-net multiqueue unaware of the physical NIC multiqueue
>>>> capabilities?
>> Tap is unware of the physical multiqueue NIC, but we can benefit from it
>> since we use multiple vhost threads.
> I wonder if it makes a difference to bind tap queues to physical NIC
> queues.  Maybe this is only possible in macvlan or can you preset the
> queue index of outgoing skbs so the network stack doesn't recalculate
> the flow?

There are some issues here:

- For tap, we know nothing about the physical card, especially how many
queues it has.
- We can present the queue index information in the skb. But there's not
a standard txq selection / rxq smp affinity setting method for
multiqueue card driver in linux. For example, ixgbe and efx use
completely different method. So we can easily find a method for ixgbe
but not all others.

>>>> The results seem pretty mixed - as a user it's not obvious what to
>>>> choose as a good all-round setting.
>>> Yes, this I think is the reason it's disabled by default ATM,
>>> guest admin has to enable it using ethtool.
>>>
>>> From what I saw, it looks like with a streaming guest to external
>>> benchmark, we sometimes get smaller packets and
>>> so worse performance. We are still investigating - what's
>>> going on seems to be a strange interaction with guest TCP stack.
>> Yes, guest TCP tends to batch less when the multiqueue is enabled
>> (latency is improved). So much more smaller packets were sent in this
>> case leads to bad performance.
> Okay, this makes sense.
>
>>> Other workloads seem to benefit.
>>>
>>>>  Any observations on how multiqueue
>>>> should be configured?
>>> I think the right thing to do is to enable it on the host and
>>> let guest admin enable it if appropriate.
>>>
>>>> What is the "norm" statistic?
>> Sorry for not being clear, it's short for normalized result (divide the
>> result by cpu utilization).
> Okay, that explains the results a little.  When norm doesn't change much
> across 1q/2q/4q we're getting linear scaling.  It scales further because
> the queues allow for more CPUs to be used.  That's good.
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
  2013-01-10  9:06                             ` Wanlong Gao
@ 2013-01-10  9:40                               ` Jason Wang
  0 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2013-01-10  9:40 UTC (permalink / raw)
  To: gaowanlong
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/10/2013 05:06 PM, Wanlong Gao wrote:
> On 01/10/2013 03:16 PM, Jason Wang wrote:
>> On Thursday, January 10, 2013 02:49:14 PM Wanlong Gao wrote:
>>> On 01/10/2013 02:43 PM, Jason Wang wrote:
>>>> On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote:
>>>>> On 01/09/2013 06:01 PM, Wanlong Gao wrote:
>>>>>> On 01/09/2013 05:30 PM, Jason Wang wrote:
>>>>>>> On 01/09/2013 04:23 PM, Wanlong Gao wrote:
>>>>>>>> On 01/08/2013 06:14 PM, Jason Wang wrote:
>>>>>>>>> On 01/08/2013 06:00 PM, Wanlong Gao wrote:
>>>>>>>>>> On 01/08/2013 05:51 PM, Jason Wang wrote:
>>>>>>>>>>> On 01/08/2013 05:49 PM, Wanlong Gao wrote:
>>>>>>>>>>>> On 01/08/2013 05:29 PM, Jason Wang wrote:
>>>>>>>>>>>>> On 01/08/2013 05:07 PM, Wanlong Gao wrote:
>>>>>>>>>>>>>> On 12/28/2012 06:32 PM, Jason Wang wrote:
>>>>>>>>>>>>>>> +    } else if (nc->peer->info->type !=
>>>>>>>>>>>>>>> NET_CLIENT_OPTIONS_KIND_TAP) {
>>>>>>>>>>>>>>> +        ret = -1;
>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>> +        ret = tap_detach(nc->peer);
>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +
>>>> [...]
>>>>
>>>>>>>> I got guest kernel panic when using this way and set queues=4.
>>>>>>> Does it happens w/o or w/ a fd parameter? What's the qemu command line?
>>>>>>> Did you meet it during boot time?
>>>>>> The QEMU command line is
>>>>>>
>>>>>> /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15
>>>>>> -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \
>>>>>> -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
>>>>>> -chardev
>>>>>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowa
>>>>>> i
>>>>>> t \ -mon chardev=charmonitor,id=monitor,mode=control \
>>>>>> -rtc base=utc -no-shutdown \
>>>>>> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
>>>>>> -device
>>>>>> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
>>>>>> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
>>>>>> -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
>>>>>> -device
>>>>>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=v
>>>>>> i
>>>>>> rtio-disk0,bootindex=1 \ -drive
>>>>>> file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
>>>>>> -device
>>>>>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=v
>>>>>> i
>>>>>> rtio-disk1 \ -drive
>>>>>> file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
>>>>>> -device
>>>>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id
>>>>>> =
>>>>>> scsi0-0-2-0,removable=on \ -drive
>>>>>> file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
>>>>>> -device
>>>>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id
>>>>>> =
>>>>>> scsi0-0-3-0 \ -drive
>>>>>> file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
>>>>>> -device
>>>>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id
>>>>>> =
>>>>>> scsi0-0-0-0 \ -drive
>>>>>> file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
>>>>>> -device
>>>>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id
>>>>>> =
>>>>>> scsi0-0-1-0 \ -chardev pty,id=charserial0 -device
>>>>>> isa-serial,chardev=charserial0,id=serial0 \ -chardev
>>>>>> file,id=charserial1,path=/vm/f17.log \
>>>>>> -device isa-serial,chardev=charserial1,id=serial1 \
>>>>>> -device usb-tablet,id=input0 -vga std \
>>>>>> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
>>>>>> -netdev tap,id=hostnet0,vhost=on,queues=4 \
>>>>>> -device
>>>>>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,a
>>>>>> d
>>>>>> dr=0x3 \ -monitor stdio
>>>>>>
>>>>>> I got panic just after booting the system, did nothing,  waited for a
>>>>>> while, the guest panicked.
>>>>>>
>>>>>> [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
>>>>>> [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6
>>>>>> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput
>>>>>> joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core
>>>>>> virtio_scsi virtio_blk floppy [   28.053004] CPU 1
>>>>>> [   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs
>>>>>> Bochs
>>>>>> [   28.053004] RIP: 0010:[<ffffffff8137a9ab>]  [<ffffffff8137a9ab>]
>>>>>> virtqueue_get_buf+0xb/0x120 [   28.053004] RSP: 0018:ffff8800bc913550
>>>>>> EFLAGS: 00000246
>>>>>> [   28.053004] RAX: 0000000000000000 RBX: ffff8800bc49c000 RCX:
>>>>>> ffff8800bc49e000 [   28.053004] RDX: 0000000000000000 RSI:
>>>>>> ffff8800bc913584 RDI: ffff8800bcfd4000 [   28.053004] RBP:
>>>>>> ffff8800bc913558 R08: ffff8800bcfd0800 R09: 0000000000000000 [
>>>>>> 28.053004] R10: ffff8800bc49c000 R11: ffff880036cc4de0 R12:
>>>>>> ffff8800bcfd4000 [   28.053004] R13: ffff8800bc913558 R14:
>>>>>> ffffffff8137ad73 R15: 00000000000200d0 [   28.053004] FS:
>>>>>> 00007fb27a589740(0000) GS:ffff8800c1480000(0000) knlGS:0000000000000000
>>>>>> [
>>>>>>
>>>>>>   28.053004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>
>>>>>> [   28.053004] CR2: 0000000000640530 CR3: 00000000baeff000 CR4:
>>>>>> 00000000000006e0 [   28.053004] DR0: 0000000000000000 DR1:
>>>>>> 0000000000000000 DR2: 0000000000000000 [   28.053004] DR3:
>>>>>> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [
>>>>>> 28.053004] Process ip (pid: 592, threadinfo ffff8800bc912000, task
>>>>>> ffff880036da2e20) [   28.053004] Stack:
>>>>>> [   28.053004]  ffff8800bcfd0800 ffff8800bc913638 ffffffffa003e9bb
>>>>>> ffff8800bc913656 [   28.053004]  0000000100000002 ffff8800c17ebb08
>>>>>> 000000500000ff10 ffffea0002f244c0 [   28.053004]  0000000200000582
>>>>>> 0000000000000000 0000000000000000 ffffea0002f244c0 [   28.053004] Call
>>>>>> Trace:
>>>>>> [   28.053004]  [<ffffffffa003e9bb>]
>>>>>> virtnet_send_command.constprop.26+0x24b/0x270 [virtio_net] [  
>>>>>> 28.053004]
>>>>>>
>>>>>>  [<ffffffff812ed963>] ? sg_init_table+0x23/0x50
>>>>>>
>>>>>> [   28.053004]  [<ffffffffa0040629>] virtnet_set_rx_mode+0x99/0x300
>>>>>> [virtio_net] [   28.053004]  [<ffffffff8152306f>]
>>>>>> __dev_set_rx_mode+0x5f/0xb0
>>>>>> [   28.053004]  [<ffffffff815230ef>] dev_set_rx_mode+0x2f/0x50
>>>>>> [   28.053004]  [<ffffffff815231b7>] __dev_open+0xa7/0xf0
>>>>>> [   28.053004]  [<ffffffff81523461>] __dev_change_flags+0xa1/0x180
>>>>>> [   28.053004]  [<ffffffff815235f8>] dev_change_flags+0x28/0x70
>>>>>> [   28.053004]  [<ffffffff8152ff20>] do_setlink+0x3b0/0xa50
>>>>>> [   28.053004]  [<ffffffff812fb6b1>] ? nla_parse+0x31/0xe0
>>>>>> [   28.053004]  [<ffffffff815325de>] rtnl_newlink+0x36e/0x580
>>>>>> [   28.053004]  [<ffffffff811355cc>] ?
>>>>>> get_page_from_freelist+0x37c/0x730
>>>>>> [   28.053004]  [<ffffffff81531e13>] rtnetlink_rcv_msg+0x113/0x2f0
>>>>>> [   28.053004]  [<ffffffff8117d973>] ?
>>>>>> __kmalloc_node_track_caller+0x63/0x1c0 [   28.053004]
>>>>>> [<ffffffff8151526b>] ? __alloc_skb+0x8b/0x2a0
>>>>>> [   28.053004]  [<ffffffff81531d00>] ? __rtnl_unlock+0x20/0x20
>>>>>> [   28.053004]  [<ffffffff8154b571>] netlink_rcv_skb+0xb1/0xc0
>>>>>> [   28.053004]  [<ffffffff8152ea05>] rtnetlink_rcv+0x25/0x40
>>>>>> [   28.053004]  [<ffffffff8154ae91>] netlink_unicast+0x1a1/0x220
>>>>>> [   28.053004]  [<ffffffff8154b211>] netlink_sendmsg+0x301/0x3c0
>>>>>> [   28.053004]  [<ffffffff81508530>] sock_sendmsg+0xb0/0xe0
>>>>>> [   28.053004]  [<ffffffff8113a45b>] ? lru_cache_add_lru+0x3b/0x60
>>>>>> [   28.053004]  [<ffffffff811608b7>] ? page_add_new_anon_rmap+0xc7/0x180
>>>>>> [   28.053004]  [<ffffffff81509efc>] __sys_sendmsg+0x3ac/0x3c0
>>>>>> [   28.053004]  [<ffffffff8162e47c>] ? __do_page_fault+0x23c/0x4d0
>>>>>> [   28.053004]  [<ffffffff8115c9ef>] ? do_brk+0x1ff/0x370
>>>>>> [   28.053004]  [<ffffffff8150bec9>] sys_sendmsg+0x49/0x90
>>>>>> [   28.053004]  [<ffffffff81632d59>] system_call_fastpath+0x16/0x1b
>>>>>> [   28.053004] Code: 04 0f ae f0 48 8b 47 50 5d 0f b7 50 02 66 39 57 64
>>>>>> 0f
>>>>>> 94 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
>>>>>> 54 <53> 80 7f 59 00 48 89 fb 0f 85 90 00 00 00 48 8b 47 50 0f b7 50
>>>>>>
>>>>>>
>>>>>> The QEMU tree I used is git://github.com/jasowang/qemu.git
>>>>> Thanks a lot, will try to reproduce my self tomorrow. From the
>>>>> calltrace, looks like we send a command to a rx/tx queue.
>>>> Right, the virtqueue that will not be used by single queue were
>>>> initialized. Please try to following patch or use the my qemu.git with
>>>> this fix in github.
>>> It's odd, why didn't I get guest panic by using your python start script
>>> this morning?
>>>
>> That's strange, I can reproduce it. Did you try booting a single queue guest 
>> under a multiqueue virtio-net?
>>
>> It could be only triggered when you want to boot a single queue guest with 
>> queues >= 2. Let's take 2 as an example. Without the patch, all virtqueues 
>> will be initialized even if guest don't support multiqueue. So ctrl vq will be 
>> 4th but guest think it's 2th. So guest will send the command to a rx/tx queue, 
>> so it won't get any response.
> Anyway, with your updated github tree, the guest panic has gone.

Good to know that.
> As you say here, the guest panic is triggered by using single-queue-supported guest
> kernel? But I think my guest kernel was always support multi-queue virtio-net,
> Am I missing something on the guest kernel supporting multi-queue virtio-net ?

I didn't know the steps of how you did the setting of you guest. I
assume your step is:

1) boot a 'legacy' kernel without a multiqueue virtio-net driver
2) install the new kernel with multiqueue support
3) reboot

So, looks like the hang can occurred only in step 1 and only if you
start qemu with queues > 1. If you use queue = 1 in step one, you will
not get the hang. If you still keep the old kernel, you can reproduce it
by booting the old one with queues > 1.

> Thanks,
> Wanlong Gao
>
>> So if you're using the python script to boot a single queue guest with queue = 
>> 1, or boot a multiqueue guest. It would not be triggerable.
>>
>> Thanks
>>> Thanks,
>>> Wanlong Gao
>>>
>>>> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
>>>> index 8b4f079..cfd9af1 100644
>>>> --- a/hw/virtio-net.c
>>>> +++ b/hw/virtio-net.c
>> [...]
>>
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/12] tap: multiqueue support
  2012-12-28 10:31 ` [PATCH 01/12] tap: multiqueue support Jason Wang
  2013-01-09  9:56   ` Stefan Hajnoczi
@ 2013-01-10 10:28   ` Stefan Hajnoczi
  2013-01-10 13:52     ` Jason Wang
  1 sibling, 1 reply; 58+ messages in thread
From: Stefan Hajnoczi @ 2013-01-10 10:28 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote:

Mainly suggestions to make the code easier to understand, but see the
comment about the 1:1 queue/NetClientState model for a general issue
with this approach.

> Recently, linux support multiqueue tap which could let userspace call TUNSETIFF
> for a signle device many times to create multiple file descriptors as

s/signle/single/

(Noting these if you respin.)

> independent queues. User could also enable/disabe a specific queue through

s/disabe/disable/

> TUNSETQUEUE.
> 
> The patch adds the generic infrastructure to create multiqueue taps. To achieve
> this a new parameter "queues" were introduced to specify how many queues were
> expected to be created for tap. The "fd" parameter were also changed to support
> a list of file descriptors which could be used by management (such as libvirt)
> to pass pre-created file descriptors (queues) to qemu.
> 
> Each TAPState were still associated to a tap fd, which mean multiple TAPStates
> were created when user needs multiqueue taps.
> 
> Only linux part were implemented now, since it's the only OS that support
> multiqueue tap.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  net/tap-aix.c     |   18 ++++-
>  net/tap-bsd.c     |   18 ++++-
>  net/tap-haiku.c   |   18 ++++-
>  net/tap-linux.c   |   70 +++++++++++++++-
>  net/tap-linux.h   |    4 +
>  net/tap-solaris.c |   18 ++++-
>  net/tap-win32.c   |   10 ++
>  net/tap.c         |  248 +++++++++++++++++++++++++++++++++++++----------------
>  net/tap.h         |    8 ++-
>  qapi-schema.json  |    5 +-
>  10 files changed, 335 insertions(+), 82 deletions(-)

This patch should be split up:
1. linux-headers: import linux/if_tun.h multiqueue constants
2. tap: add Linux multiqueue support (tap_open(), tap_fd_attach(), tap_fd_detach())
3. tap: queue attach/detach (tap_attach(), tap_detach())
4. tap: split out net_init_one_tap() function (pure code motion, to make later diffs easy to review)
5. tap: add "queues" and multi-"fd" options (net_init_tap()/net_init_one_tap() changes)

Each commit description can explain how this works in more detail.  I
think I've figured it out now but it would have helped to separate
things out from the start.

> diff --git a/net/tap-aix.c b/net/tap-aix.c
> index f27c177..f931ef3 100644
> --- a/net/tap-aix.c
> +++ b/net/tap-aix.c
> @@ -25,7 +25,8 @@
>  #include "net/tap.h"
>  #include <stdio.h>
>  
> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
> +             int vnet_hdr_required, int mq_required)
>  {
>      fprintf(stderr, "no tap on AIX\n");
>      return -1;
> @@ -59,3 +60,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>                          int tso6, int ecn, int ufo)
>  {
>  }
> +
> +int tap_fd_attach(int fd)
> +{
> +    return -1;
> +}
> +
> +int tap_fd_detach(int fd)
> +{
> +    return -1;
> +}
> +
> +int tap_fd_ifname(int fd, char *ifname)
> +{
> +    return -1;
> +}
> diff --git a/net/tap-bsd.c b/net/tap-bsd.c
> index a3b717d..07c287d 100644
> --- a/net/tap-bsd.c
> +++ b/net/tap-bsd.c
> @@ -33,7 +33,8 @@
>  #include <net/if_tap.h>
>  #endif
>  
> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
> +             int vnet_hdr_required, int mq_required)
>  {
>      int fd;
>  #ifdef TAPGIFNAME
> @@ -145,3 +146,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>                          int tso6, int ecn, int ufo)
>  {
>  }
> +
> +int tap_fd_attach(int fd)
> +{
> +    return -1;
> +}
> +
> +int tap_fd_detach(int fd)
> +{
> +    return -1;
> +}
> +
> +int tap_fd_ifname(int fd, char *ifname)
> +{
> +    return -1;
> +}
> diff --git a/net/tap-haiku.c b/net/tap-haiku.c
> index 34739d1..62ab423 100644
> --- a/net/tap-haiku.c
> +++ b/net/tap-haiku.c
> @@ -25,7 +25,8 @@
>  #include "net/tap.h"
>  #include <stdio.h>
>  
> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
> +             int vnet_hdr_required, int mq_required)
>  {
>      fprintf(stderr, "no tap on Haiku\n");
>      return -1;
> @@ -59,3 +60,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>                          int tso6, int ecn, int ufo)
>  {
>  }
> +
> +int tap_fd_attach(int fd)
> +{
> +    return -1;
> +}
> +
> +int tap_fd_detach(int fd)
> +{
> +    return -1;
> +}
> +
> +int tap_fd_ifname(int fd, char *ifname)
> +{
> +    return -1;
> +}
> diff --git a/net/tap-linux.c b/net/tap-linux.c
> index c6521be..0854ef5 100644
> --- a/net/tap-linux.c
> +++ b/net/tap-linux.c
> @@ -35,7 +35,8 @@
>  
>  #define PATH_NET_TUN "/dev/net/tun"
>  
> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
> +             int vnet_hdr_required, int mq_required)
>  {
>      struct ifreq ifr;
>      int fd, ret;
> @@ -67,6 +68,20 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required
>          }
>      }
>  
> +    if (mq_required) {
> +        unsigned int features;
> +
> +        if ((ioctl(fd, TUNGETFEATURES, &features) != 0) ||
> +            !(features & IFF_MULTI_QUEUE)) {
> +            error_report("multiqueue required, but no kernel "
> +                         "support for IFF_MULTI_QUEUE available");
> +            close(fd);
> +            return -1;
> +        } else {
> +            ifr.ifr_flags |= IFF_MULTI_QUEUE;
> +        }
> +    }
> +
>      if (ifname[0] != '\0')
>          pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname);
>      else
> @@ -200,3 +215,56 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>          }
>      }
>  }
> +
> +/* Attach a file descriptor to a TUN/TAP device. This descriptor should be
> + * detached before.
> + */
> +int tap_fd_attach(int fd)
> +{
> +    struct ifreq ifr;
> +    int ret;
> +
> +    memset(&ifr, 0, sizeof(ifr));
> +
> +    ifr.ifr_flags = IFF_ATTACH_QUEUE;
> +    ret = ioctl(fd, TUNSETQUEUE, (void *) &ifr);
> +
> +    if (ret != 0) {
> +        error_report("could not attach fd to tap");
> +    }
> +
> +    return ret;
> +}
> +
> +/* Detach a file descriptor to a TUN/TAP device. This file descriptor must have
> + * been attach to a device.
> + */
> +int tap_fd_detach(int fd)
> +{
> +    struct ifreq ifr;
> +    int ret;
> +
> +    memset(&ifr, 0, sizeof(ifr));
> +
> +    ifr.ifr_flags = IFF_DETACH_QUEUE;
> +    ret = ioctl(fd, TUNSETQUEUE, (void *) &ifr);
> +
> +    if (ret != 0) {
> +        error_report("could not detach fd");
> +    }
> +
> +    return ret;
> +}
> +
> +int tap_get_ifname(int fd, char *ifname)

Please document that ifname must have IFNAMSIZ size.

> +{
> +    struct ifreq ifr;
> +
> +    if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
> +        error_report("TUNGETIFF ioctl() failed: %s", strerror(errno));
> +        return -1;
> +    }
> +
> +    pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
> +    return 0;
> +}
> diff --git a/net/tap-linux.h b/net/tap-linux.h
> index 659e981..648d29f 100644
> --- a/net/tap-linux.h
> +++ b/net/tap-linux.h
> @@ -29,6 +29,7 @@
>  #define TUNSETSNDBUF   _IOW('T', 212, int)
>  #define TUNGETVNETHDRSZ _IOR('T', 215, int)
>  #define TUNSETVNETHDRSZ _IOW('T', 216, int)
> +#define TUNSETQUEUE  _IOW('T', 217, int)
>  
>  #endif
>  
> @@ -36,6 +37,9 @@
>  #define IFF_TAP		0x0002
>  #define IFF_NO_PI	0x1000
>  #define IFF_VNET_HDR	0x4000
> +#define IFF_MULTI_QUEUE 0x0100
> +#define IFF_ATTACH_QUEUE 0x0200
> +#define IFF_DETACH_QUEUE 0x0400
>  
>  /* Features for GSO (TUNSETOFFLOAD). */
>  #define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */
> diff --git a/net/tap-solaris.c b/net/tap-solaris.c
> index 5d6ac42..2df3ec1 100644
> --- a/net/tap-solaris.c
> +++ b/net/tap-solaris.c
> @@ -173,7 +173,8 @@ static int tap_alloc(char *dev, size_t dev_size)
>      return tap_fd;
>  }
>  
> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
> +             int vnet_hdr_required, int mq_required)
>  {
>      char  dev[10]="";
>      int fd;
> @@ -225,3 +226,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>                          int tso6, int ecn, int ufo)
>  {
>  }
> +
> +int tap_fd_attach(int fd)
> +{
> +    return -1;
> +}
> +
> +int tap_fd_detach(int fd)
> +{
> +    return -1;
> +}
> +
> +int tap_fd_ifname(int fd, char *ifname)
> +{
> +    return -1;
> +}
> diff --git a/net/tap-win32.c b/net/tap-win32.c
> index f9bd741..d7b1f7a 100644
> --- a/net/tap-win32.c
> +++ b/net/tap-win32.c
> @@ -763,3 +763,13 @@ void tap_set_vnet_hdr_len(NetClientState *nc, int len)
>  {
>      assert(0);
>  }
> +
> +int tap_attach(NetClientState *nc)
> +{
> +    assert(0);
> +}
> +
> +int tap_detach(NetClientState *nc)
> +{
> +    assert(0);
> +}
> diff --git a/net/tap.c b/net/tap.c
> index 1abfd44..01f826a 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -60,6 +60,7 @@ typedef struct TAPState {
>      unsigned int write_poll : 1;
>      unsigned int using_vnet_hdr : 1;
>      unsigned int has_ufo: 1;
> +    unsigned int enabled:1;

For consistency, please use "enabled : 1".

>      VHostNetState *vhost_net;
>      unsigned host_vnet_hdr_len;
>  } TAPState;
> @@ -73,9 +74,9 @@ static void tap_writable(void *opaque);
>  static void tap_update_fd_handler(TAPState *s)
>  {
>      qemu_set_fd_handler2(s->fd,
> -                         s->read_poll  ? tap_can_send : NULL,
> -                         s->read_poll  ? tap_send     : NULL,
> -                         s->write_poll ? tap_writable : NULL,
> +                         s->read_poll && s->enabled ? tap_can_send : NULL,
> +                         s->read_poll && s->enabled ? tap_send     : NULL,
> +                         s->write_poll && s->enabled ? tap_writable : NULL,
>                           s);
>  }
>  
> @@ -340,6 +341,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
>      s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
>      s->using_vnet_hdr = 0;
>      s->has_ufo = tap_probe_has_ufo(s->fd);
> +    s->enabled = 1;
>      tap_set_offload(&s->nc, 0, 0, 0, 0, 0);
>      /*
>       * Make sure host header length is set correctly in tap:
> @@ -559,17 +561,10 @@ int net_init_bridge(const NetClientOptions *opts, const char *name,
>  
>  static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
>                          const char *setup_script, char *ifname,
> -                        size_t ifname_sz)
> +                        size_t ifname_sz, int mq_required)
>  {
>      int fd, vnet_hdr_required;
>  
> -    if (tap->has_ifname) {
> -        pstrcpy(ifname, ifname_sz, tap->ifname);
> -    } else {
> -        assert(ifname_sz > 0);
> -        ifname[0] = '\0';
> -    }
> -
>      if (tap->has_vnet_hdr) {
>          *vnet_hdr = tap->vnet_hdr;
>          vnet_hdr_required = *vnet_hdr;
> @@ -578,7 +573,8 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
>          vnet_hdr_required = 0;
>      }
>  
> -    TFR(fd = tap_open(ifname, ifname_sz, vnet_hdr, vnet_hdr_required));
> +    TFR(fd = tap_open(ifname, ifname_sz, vnet_hdr, vnet_hdr_required,
> +                      mq_required));
>      if (fd < 0) {
>          return -1;
>      }
> @@ -594,69 +590,37 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
>      return fd;
>  }
>  
> -int net_init_tap(const NetClientOptions *opts, const char *name,
> -                 NetClientState *peer)
> -{
> -    const NetdevTapOptions *tap;
> -
> -    int fd, vnet_hdr = 0;
> -    const char *model;
> -    TAPState *s;
> +#define MAX_TAP_QUEUES 1024
>  
> -    /* for the no-fd, no-helper case */
> -    const char *script = NULL; /* suppress wrong "uninit'd use" gcc warning */
> -    char ifname[128];
> -
> -    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_TAP);
> -    tap = opts->tap;
> -
> -    if (tap->has_fd) {
> -        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
> -            tap->has_vnet_hdr || tap->has_helper) {
> -            error_report("ifname=, script=, downscript=, vnet_hdr=, "
> -                         "and helper= are invalid with fd=");
> -            return -1;
> -        }
> -
> -        fd = monitor_handle_fd_param(cur_mon, tap->fd);
> -        if (fd == -1) {
> -            return -1;
> -        }
> -
> -        fcntl(fd, F_SETFL, O_NONBLOCK);
> -
> -        vnet_hdr = tap_probe_vnet_hdr(fd);
> -
> -        model = "tap";
> -
> -    } else if (tap->has_helper) {
> -        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
> -            tap->has_vnet_hdr) {
> -            error_report("ifname=, script=, downscript=, and vnet_hdr= "
> -                         "are invalid with helper=");
> -            return -1;
> -        }
> -
> -        fd = net_bridge_run_helper(tap->helper, DEFAULT_BRIDGE_INTERFACE);
> -        if (fd == -1) {
> -            return -1;
> -        }
> +static int tap_fd(const StringList *fd, const char **fds)

This function can be dropped if you change it so the "queues" parameter
is not used together with "fd".  There's no need to pass both: it simply
adds more code to check they are consistent and is a pain for human
users.

Then you can iterate the StringList directly in __net_init_tap() without
the needs for the temporary fds[] array.

In other words:

1. For multiqueue without fd passing, use queues=<n>.
2. For multiqueue with fd passing, use fd=<fd>.

> +{
> +    const StringList *c = fd;
> +    size_t i = 0, num_opts = 0;
>  
> -        fcntl(fd, F_SETFL, O_NONBLOCK);
> +    while (c) {
> +        num_opts++;
> +        c = c->next;
> +    }
>  
> -        vnet_hdr = tap_probe_vnet_hdr(fd);
> +    if (num_opts == 0) {
> +        return 0;
> +    }
>  
> -        model = "bridge";
> +    c = fd;
> +    while (c) {
> +        fds[i++] = c->value->str;
> +        c = c->next;
> +    }
>  
> -    } else {
> -        script = tap->has_script ? tap->script : DEFAULT_NETWORK_SCRIPT;
> -        fd = net_tap_init(tap, &vnet_hdr, script, ifname, sizeof ifname);
> -        if (fd == -1) {
> -            return -1;
> -        }
> +    return num_opts;
> +}
>  
> -        model = "tap";
> -    }
> +static int __net_init_tap(const NetdevTapOptions *tap, NetClientState *peer,
> +                          const char *model, const char *name,
> +                          const char *ifname, const char *script,
> +                          const char *downscript, int vnet_hdr, int fd)

Function names starting with underscore are avoided in QEMU.  According
to the C standard these names are reserved.  Please rename, how about
net_init_one_tap()?

> +{
> +    TAPState *s;
>  
>      s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);

Every queue has the same name so qemu_find_netdev() doesn't work anymore.  I
think we need snprintf(queue_name, "%s.%d", name, queue_index).

The model where we have one NetClientState per queue has a few other
issues.  Maybe you have adressed these in later patches:

1. netdev_del doesn't work because it only deleted 1 queue!
2. set_link changes link up/down for a single queue only
3. info network output will show many more entries now - I doubt
   management tools like libvirt are prepared to handle this and they
   may show 1 network interface per queue now!

I think it's very likely that this simple 1:1 queue/NetClientState model
won't work without more changes.

>      if (!s) {
> @@ -674,11 +638,6 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
>          snprintf(s->nc.info_str, sizeof(s->nc.info_str), "helper=%s",
>                   tap->helper);
>      } else {
> -        const char *downscript;
> -
> -        downscript = tap->has_downscript ? tap->downscript :
> -                                           DEFAULT_NETWORK_DOWN_SCRIPT;
> -
>          snprintf(s->nc.info_str, sizeof(s->nc.info_str),
>                   "ifname=%s,script=%s,downscript=%s", ifname, script,
>                   downscript);
> @@ -716,9 +675,150 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
>      return 0;
>  }
>  
> +int net_init_tap(const NetClientOptions *opts, const char *name,
> +                 NetClientState *peer)
> +{
> +    const NetdevTapOptions *tap;
> +    const char *fds[MAX_TAP_QUEUES];

Not a good idea to duplicate a hard-coded value from the tun driver.  I
suggested how to get rid of fds[] above, that way the tun driver could
change this limit in the future without requiring a QEMU change too.

> +    int fd, vnet_hdr = 0, i, queues;
> +    /* for the no-fd, no-helper case */
> +    const char *script = NULL; /* suppress wrong "uninit'd use" gcc warning */
> +    const char *downscript = NULL;
> +    char ifname[128];
> +
> +    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_TAP);
> +    tap = opts->tap;
> +    queues = tap->has_queues ? tap->queues : 1;
> +
> +    if (tap->has_fd) {
> +        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
> +            tap->has_vnet_hdr || tap->has_helper) {
> +            error_report("ifname=, script=, downscript=, vnet_hdr=, "
> +                         "and helper= are invalid with fd=");

Please add tap->has_queues here to prevent "queues" and "fd" from being
used together.

> +            return -1;
> +        }
> +
> +        if (queues != tap_fd(tap->fd, fds)) {
> +            error_report("the number of fds were not equal to queues");
> +            return -1;
> +        }
> +
> +        for (i = 0; i < queues; i++) {
> +            fd = monitor_handle_fd_param(cur_mon, fds[i]);
> +            if (fd == -1) {
> +                return -1;
> +            }
> +
> +            fcntl(fd, F_SETFL, O_NONBLOCK);
> +
> +            if (i == 0) {
> +                vnet_hdr = tap_probe_vnet_hdr(fd);
> +            }

The paranoid thing to do is:

if (i == 0) {
    vnet_hdr = tap_probe_vnet_hdr(fd);
} else if (vnet_hdr != tap_probe_vnet_hdr(fd)) {
    error_report("vnet_hdr not consistent across given tap fds");
    return -1;
}

> +
> +            if (__net_init_tap(tap, peer, "tap", name, ifname,
> +                               script, downscript, vnet_hdr, fd)) {
> +                return -1;
> +            }
> +        }
> +    } else if (tap->has_helper) {
> +        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
> +            tap->has_vnet_hdr) {
> +            error_report("ifname=, script=, downscript=, and vnet_hdr= "
> +                         "are invalid with helper=");
> +            return -1;
> +        }
> +
> +        /* FIXME: correct ? */
> +        for (i = 0; i < queues; i++) {
> +            fd = net_bridge_run_helper(tap->helper, DEFAULT_BRIDGE_INTERFACE);

The bridge helper doesn't support multiqueue tap devices (it doesn't use
IFF_MULTI_QUEUE).  Even if it did, SIOCBRADDIF would fail with EBUSY
because the network interface has already been added to the bridge.

It seems qemu-bridge-helper.c needs to be extended to support --queues.

Right now this code is broken.

> +            if (fd == -1) {
> +                return -1;
> +            }
> +
> +            fcntl(fd, F_SETFL, O_NONBLOCK);
> +
> +            if (i == 0) {
> +                vnet_hdr = tap_probe_vnet_hdr(fd);
> +            }
> +
> +            if (__net_init_tap(tap, peer, "bridge", name, ifname,
> +                               script, downscript, vnet_hdr, fd)) {
> +                return -1;
> +            }
> +        }
> +    } else {
> +        script = tap->has_script ? tap->script : DEFAULT_NETWORK_SCRIPT;
> +        downscript = tap->has_downscript ? tap->downscript :
> +                                           DEFAULT_NETWORK_DOWN_SCRIPT;
> +
> +        if (tap->has_ifname) {
> +            pstrcpy(ifname, sizeof ifname, tap->ifname);
> +        } else {
> +            ifname[0] = '\0';
> +        }
> +
> +        for (i = 0; i < queues; i++) {
> +            fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
> +                              ifname, sizeof ifname, queues > 1);
> +            if (fd == -1) {
> +                return -1;
> +            }
> +
> +            if (i == 0 && tap_get_ifname(fd, ifname) != 0) {
> +                error_report("could not get ifname");
> +                return -1;
> +            }
> +
> +            if (__net_init_tap(tap, peer, "tap", name, ifname,
> +                               i >= 1 ? "no" : script,
> +                               i >= 1 ? "no" : downscript,
> +                               vnet_hdr, fd)) {
> +                return -1;
> +            }

It's cleaner to avoid passing script/downscript into __net_init_tap()
because the fd passing and helper cases don't use it.

Move the nc.info_str setting code out of __net_init_tap().  Then the
script/downscript arguments are unnecessary and we have fewer if
statements checking for tap->has_fd, tap->has_helper, and else.

> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  VHostNetState *tap_get_vhost_net(NetClientState *nc)
>  {
>      TAPState *s = DO_UPCAST(TAPState, nc, nc);
>      assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP);
>      return s->vhost_net;
>  }
> +
> +int tap_attach(NetClientState *nc)

The tap_attach()/tap_detach() naming isn't obvious.  I wouldn't be sure
what these functions actually do.  You called the variable "enabled" -
how about tap_enable()/tap_disable()?  (Even if you don't rename, please
add a doc comment and make the s->enabled variable name consistent with
the function naming.)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
  2013-01-10  9:34         ` [Qemu-devel] " Jason Wang
@ 2013-01-10 11:49           ` Stefan Hajnoczi
  2013-01-10 14:15             ` Jason Wang
  0 siblings, 1 reply; 58+ messages in thread
From: Stefan Hajnoczi @ 2013-01-10 11:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, aliguori, qemu-devel,
	krkumar2, kvm, mprivozn, rusty, jwhan, shiyer

On Thu, Jan 10, 2013 at 05:34:14PM +0800, Jason Wang wrote:
> On 01/10/2013 04:44 PM, Stefan Hajnoczi wrote:
> > On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote:
> >> On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
> >>>> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> >>>>> Perf Numbers:
> >>>>>
> >>>>> Two Intel Xeon 5620 with direct connected intel 82599EB
> >>>>> Host/Guest kernel: David net tree
> >>>>> vhost enabled
> >>>>>
> >>>>> - lots of improvents of both latency and cpu utilization in request-reponse test
> >>>>> - get regression of guest sending small packets which because TCP tends to batch
> >>>>>   less when the latency were improved
> >>>>>
> >>>>> 1q/2q/4q
> >>>>> TCP_RR
> >>>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> >>>>> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
> >>>>> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> >>>>> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> >>>>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> >>>>> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
> >>>>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> >>>>> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
> >>>>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> >>>>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> >>>>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> >>>>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> >>>>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> >>>>> TCP_CRR
> >>>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> >>>>> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
> >>>>> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
> >>>>> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
> >>>>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> >>>>> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
> >>>>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> >>>>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> >>>>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> >>>>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> >>>>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> >>>>> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
> >>>>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> >>>>> TCP_STREAM guest receiving
> >>>>>  size #sessions throughput  norm throughput  norm throughput  norm
> >>>>> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
> >>>>> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
> >>>>> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
> >>>>> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
> >>>>> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
> >>>>> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
> >>>>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> >>>>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> >>>>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> >>>>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> >>>>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> >>>>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> >>>>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> >>>>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> >>>>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> >>>>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> >>>>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> >>>>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> >>>>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> >>>>> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
> >>>>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> >>>>> TCP_MAERTS guest sending
> >>>>>  size #sessions throughput  norm throughput  norm throughput  norm
> >>>>> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
> >>>>> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
> >>>>> 1 4     71.59   1      68.91   0.94   61.52   0.77
> >>>>> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
> >>>>> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
> >>>>> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
> >>>>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> >>>>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> >>>>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> >>>>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> >>>>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> >>>>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> >>>>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> >>>>> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
> >>>>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> >>>>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> >>>>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> >>>>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> >>>>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> >>>>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
> >>>>> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
> >>>> Trying to understand the performance results:
> >>>>
> >>>> What is the host device configuration?  tap + bridge?
> >> Yes.
> >>>> Did you use host CPU affinity for the vhost threads?
> >> I use numactl to pin cpu threads and vhost threads in the same numa node.
> >>>> Can multiqueue tap take advantage of multiqueue host NICs or is
> >>>> virtio-net multiqueue unaware of the physical NIC multiqueue
> >>>> capabilities?
> >> Tap is unware of the physical multiqueue NIC, but we can benefit from it
> >> since we use multiple vhost threads.
> > I wonder if it makes a difference to bind tap queues to physical NIC
> > queues.  Maybe this is only possible in macvlan or can you preset the
> > queue index of outgoing skbs so the network stack doesn't recalculate
> > the flow?
> 
> There are some issues here:
> 
> - For tap, we know nothing about the physical card, especially how many
> queues it has.
> - We can present the queue index information in the skb. But there's not
> a standard txq selection / rxq smp affinity setting method for
> multiqueue card driver in linux. For example, ixgbe and efx use
> completely different method. So we can easily find a method for ixgbe
> but not all others.

It's an interesting problem because it seems like doing multiqueue
through the entire stack would be more efficient than doing multiqueue
twice at different layers.

I wonder how much of a difference it can make.

Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/12] tap: multiqueue support
  2013-01-10 10:28   ` Stefan Hajnoczi
@ 2013-01-10 13:52     ` Jason Wang
  0 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2013-01-10 13:52 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: krkumar2, aliguori, kvm, mst, mprivozn, rusty, qemu-devel,
	stefanha, jwhan, shiyer

On 01/10/2013 06:28 PM, Stefan Hajnoczi wrote:
> On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote:
>
> Mainly suggestions to make the code easier to understand, but see the
> comment about the 1:1 queue/NetClientState model for a general issue
> with this approach.

Ok, thanks for the reviewing.
>> Recently, linux support multiqueue tap which could let userspace call TUNSETIFF
>> for a signle device many times to create multiple file descriptors as
> s/signle/single/
>
> (Noting these if you respin.)

Sorry about this, will be careful.
>> independent queues. User could also enable/disabe a specific queue through
> s/disabe/disable/
>
>> TUNSETQUEUE.
>>
>> The patch adds the generic infrastructure to create multiqueue taps. To achieve
>> this a new parameter "queues" were introduced to specify how many queues were
>> expected to be created for tap. The "fd" parameter were also changed to support
>> a list of file descriptors which could be used by management (such as libvirt)
>> to pass pre-created file descriptors (queues) to qemu.
>>
>> Each TAPState were still associated to a tap fd, which mean multiple TAPStates
>> were created when user needs multiqueue taps.
>>
>> Only linux part were implemented now, since it's the only OS that support
>> multiqueue tap.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  net/tap-aix.c     |   18 ++++-
>>  net/tap-bsd.c     |   18 ++++-
>>  net/tap-haiku.c   |   18 ++++-
>>  net/tap-linux.c   |   70 +++++++++++++++-
>>  net/tap-linux.h   |    4 +
>>  net/tap-solaris.c |   18 ++++-
>>  net/tap-win32.c   |   10 ++
>>  net/tap.c         |  248 +++++++++++++++++++++++++++++++++++++----------------
>>  net/tap.h         |    8 ++-
>>  qapi-schema.json  |    5 +-
>>  10 files changed, 335 insertions(+), 82 deletions(-)
> This patch should be split up:
> 1. linux-headers: import linux/if_tun.h multiqueue constants
> 2. tap: add Linux multiqueue support (tap_open(), tap_fd_attach(), tap_fd_detach())
> 3. tap: queue attach/detach (tap_attach(), tap_detach())
> 4. tap: split out net_init_one_tap() function (pure code motion, to make later diffs easy to review)
> 5. tap: add "queues" and multi-"fd" options (net_init_tap()/net_init_one_tap() changes)
>
> Each commit description can explain how this works in more detail.  I
> think I've figured it out now but it would have helped to separate
> things out from the start.

Ok.
>> diff --git a/net/tap-aix.c b/net/tap-aix.c
>> index f27c177..f931ef3 100644
>> --- a/net/tap-aix.c
>> +++ b/net/tap-aix.c
>> @@ -25,7 +25,8 @@
>>  #include "net/tap.h"
>>  #include <stdio.h>
>>  
>> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
>> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
>> +             int vnet_hdr_required, int mq_required)
>>  {
>>      fprintf(stderr, "no tap on AIX\n");
>>      return -1;
>> @@ -59,3 +60,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>>                          int tso6, int ecn, int ufo)
>>  {
>>  }
>> +
>> +int tap_fd_attach(int fd)
>> +{
>> +    return -1;
>> +}
>> +
>> +int tap_fd_detach(int fd)
>> +{
>> +    return -1;
>> +}
>> +
>> +int tap_fd_ifname(int fd, char *ifname)
>> +{
>> +    return -1;
>> +}
>> diff --git a/net/tap-bsd.c b/net/tap-bsd.c
>> index a3b717d..07c287d 100644
>> --- a/net/tap-bsd.c
>> +++ b/net/tap-bsd.c
>> @@ -33,7 +33,8 @@
>>  #include <net/if_tap.h>
>>  #endif
>>  
>> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
>> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
>> +             int vnet_hdr_required, int mq_required)
>>  {
>>      int fd;
>>  #ifdef TAPGIFNAME
>> @@ -145,3 +146,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>>                          int tso6, int ecn, int ufo)
>>  {
>>  }
>> +
>> +int tap_fd_attach(int fd)
>> +{
>> +    return -1;
>> +}
>> +
>> +int tap_fd_detach(int fd)
>> +{
>> +    return -1;
>> +}
>> +
>> +int tap_fd_ifname(int fd, char *ifname)
>> +{
>> +    return -1;
>> +}
>> diff --git a/net/tap-haiku.c b/net/tap-haiku.c
>> index 34739d1..62ab423 100644
>> --- a/net/tap-haiku.c
>> +++ b/net/tap-haiku.c
>> @@ -25,7 +25,8 @@
>>  #include "net/tap.h"
>>  #include <stdio.h>
>>  
>> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
>> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
>> +             int vnet_hdr_required, int mq_required)
>>  {
>>      fprintf(stderr, "no tap on Haiku\n");
>>      return -1;
>> @@ -59,3 +60,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>>                          int tso6, int ecn, int ufo)
>>  {
>>  }
>> +
>> +int tap_fd_attach(int fd)
>> +{
>> +    return -1;
>> +}
>> +
>> +int tap_fd_detach(int fd)
>> +{
>> +    return -1;
>> +}
>> +
>> +int tap_fd_ifname(int fd, char *ifname)
>> +{
>> +    return -1;
>> +}
>> diff --git a/net/tap-linux.c b/net/tap-linux.c
>> index c6521be..0854ef5 100644
>> --- a/net/tap-linux.c
>> +++ b/net/tap-linux.c
>> @@ -35,7 +35,8 @@
>>  
>>  #define PATH_NET_TUN "/dev/net/tun"
>>  
>> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
>> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
>> +             int vnet_hdr_required, int mq_required)
>>  {
>>      struct ifreq ifr;
>>      int fd, ret;
>> @@ -67,6 +68,20 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required
>>          }
>>      }
>>  
>> +    if (mq_required) {
>> +        unsigned int features;
>> +
>> +        if ((ioctl(fd, TUNGETFEATURES, &features) != 0) ||
>> +            !(features & IFF_MULTI_QUEUE)) {
>> +            error_report("multiqueue required, but no kernel "
>> +                         "support for IFF_MULTI_QUEUE available");
>> +            close(fd);
>> +            return -1;
>> +        } else {
>> +            ifr.ifr_flags |= IFF_MULTI_QUEUE;
>> +        }
>> +    }
>> +
>>      if (ifname[0] != '\0')
>>          pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname);
>>      else
>> @@ -200,3 +215,56 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>>          }
>>      }
>>  }
>> +
>> +/* Attach a file descriptor to a TUN/TAP device. This descriptor should be
>> + * detached before.
>> + */
>> +int tap_fd_attach(int fd)
>> +{
>> +    struct ifreq ifr;
>> +    int ret;
>> +
>> +    memset(&ifr, 0, sizeof(ifr));
>> +
>> +    ifr.ifr_flags = IFF_ATTACH_QUEUE;
>> +    ret = ioctl(fd, TUNSETQUEUE, (void *) &ifr);
>> +
>> +    if (ret != 0) {
>> +        error_report("could not attach fd to tap");
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +/* Detach a file descriptor to a TUN/TAP device. This file descriptor must have
>> + * been attach to a device.
>> + */
>> +int tap_fd_detach(int fd)
>> +{
>> +    struct ifreq ifr;
>> +    int ret;
>> +
>> +    memset(&ifr, 0, sizeof(ifr));
>> +
>> +    ifr.ifr_flags = IFF_DETACH_QUEUE;
>> +    ret = ioctl(fd, TUNSETQUEUE, (void *) &ifr);
>> +
>> +    if (ret != 0) {
>> +        error_report("could not detach fd");
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +int tap_get_ifname(int fd, char *ifname)
> Please document that ifname must have IFNAMSIZ size.

Ok.
>> +{
>> +    struct ifreq ifr;
>> +
>> +    if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
>> +        error_report("TUNGETIFF ioctl() failed: %s", strerror(errno));
>> +        return -1;
>> +    }
>> +
>> +    pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
>> +    return 0;
>> +}
>> diff --git a/net/tap-linux.h b/net/tap-linux.h
>> index 659e981..648d29f 100644
>> --- a/net/tap-linux.h
>> +++ b/net/tap-linux.h
>> @@ -29,6 +29,7 @@
>>  #define TUNSETSNDBUF   _IOW('T', 212, int)
>>  #define TUNGETVNETHDRSZ _IOR('T', 215, int)
>>  #define TUNSETVNETHDRSZ _IOW('T', 216, int)
>> +#define TUNSETQUEUE  _IOW('T', 217, int)
>>  
>>  #endif
>>  
>> @@ -36,6 +37,9 @@
>>  #define IFF_TAP		0x0002
>>  #define IFF_NO_PI	0x1000
>>  #define IFF_VNET_HDR	0x4000
>> +#define IFF_MULTI_QUEUE 0x0100
>> +#define IFF_ATTACH_QUEUE 0x0200
>> +#define IFF_DETACH_QUEUE 0x0400
>>  
>>  /* Features for GSO (TUNSETOFFLOAD). */
>>  #define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */
>> diff --git a/net/tap-solaris.c b/net/tap-solaris.c
>> index 5d6ac42..2df3ec1 100644
>> --- a/net/tap-solaris.c
>> +++ b/net/tap-solaris.c
>> @@ -173,7 +173,8 @@ static int tap_alloc(char *dev, size_t dev_size)
>>      return tap_fd;
>>  }
>>  
>> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required)
>> +int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
>> +             int vnet_hdr_required, int mq_required)
>>  {
>>      char  dev[10]="";
>>      int fd;
>> @@ -225,3 +226,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
>>                          int tso6, int ecn, int ufo)
>>  {
>>  }
>> +
>> +int tap_fd_attach(int fd)
>> +{
>> +    return -1;
>> +}
>> +
>> +int tap_fd_detach(int fd)
>> +{
>> +    return -1;
>> +}
>> +
>> +int tap_fd_ifname(int fd, char *ifname)
>> +{
>> +    return -1;
>> +}
>> diff --git a/net/tap-win32.c b/net/tap-win32.c
>> index f9bd741..d7b1f7a 100644
>> --- a/net/tap-win32.c
>> +++ b/net/tap-win32.c
>> @@ -763,3 +763,13 @@ void tap_set_vnet_hdr_len(NetClientState *nc, int len)
>>  {
>>      assert(0);
>>  }
>> +
>> +int tap_attach(NetClientState *nc)
>> +{
>> +    assert(0);
>> +}
>> +
>> +int tap_detach(NetClientState *nc)
>> +{
>> +    assert(0);
>> +}
>> diff --git a/net/tap.c b/net/tap.c
>> index 1abfd44..01f826a 100644
>> --- a/net/tap.c
>> +++ b/net/tap.c
>> @@ -60,6 +60,7 @@ typedef struct TAPState {
>>      unsigned int write_poll : 1;
>>      unsigned int using_vnet_hdr : 1;
>>      unsigned int has_ufo: 1;
>> +    unsigned int enabled:1;
> For consistency, please use "enabled : 1".

Ok.
>>      VHostNetState *vhost_net;
>>      unsigned host_vnet_hdr_len;
>>  } TAPState;
>> @@ -73,9 +74,9 @@ static void tap_writable(void *opaque);
>>  static void tap_update_fd_handler(TAPState *s)
>>  {
>>      qemu_set_fd_handler2(s->fd,
>> -                         s->read_poll  ? tap_can_send : NULL,
>> -                         s->read_poll  ? tap_send     : NULL,
>> -                         s->write_poll ? tap_writable : NULL,
>> +                         s->read_poll && s->enabled ? tap_can_send : NULL,
>> +                         s->read_poll && s->enabled ? tap_send     : NULL,
>> +                         s->write_poll && s->enabled ? tap_writable : NULL,
>>                           s);
>>  }
>>  
>> @@ -340,6 +341,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
>>      s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
>>      s->using_vnet_hdr = 0;
>>      s->has_ufo = tap_probe_has_ufo(s->fd);
>> +    s->enabled = 1;
>>      tap_set_offload(&s->nc, 0, 0, 0, 0, 0);
>>      /*
>>       * Make sure host header length is set correctly in tap:
>> @@ -559,17 +561,10 @@ int net_init_bridge(const NetClientOptions *opts, const char *name,
>>  
>>  static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
>>                          const char *setup_script, char *ifname,
>> -                        size_t ifname_sz)
>> +                        size_t ifname_sz, int mq_required)
>>  {
>>      int fd, vnet_hdr_required;
>>  
>> -    if (tap->has_ifname) {
>> -        pstrcpy(ifname, ifname_sz, tap->ifname);
>> -    } else {
>> -        assert(ifname_sz > 0);
>> -        ifname[0] = '\0';
>> -    }
>> -
>>      if (tap->has_vnet_hdr) {
>>          *vnet_hdr = tap->vnet_hdr;
>>          vnet_hdr_required = *vnet_hdr;
>> @@ -578,7 +573,8 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
>>          vnet_hdr_required = 0;
>>      }
>>  
>> -    TFR(fd = tap_open(ifname, ifname_sz, vnet_hdr, vnet_hdr_required));
>> +    TFR(fd = tap_open(ifname, ifname_sz, vnet_hdr, vnet_hdr_required,
>> +                      mq_required));
>>      if (fd < 0) {
>>          return -1;
>>      }
>> @@ -594,69 +590,37 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
>>      return fd;
>>  }
>>  
>> -int net_init_tap(const NetClientOptions *opts, const char *name,
>> -                 NetClientState *peer)
>> -{
>> -    const NetdevTapOptions *tap;
>> -
>> -    int fd, vnet_hdr = 0;
>> -    const char *model;
>> -    TAPState *s;
>> +#define MAX_TAP_QUEUES 1024
>>  
>> -    /* for the no-fd, no-helper case */
>> -    const char *script = NULL; /* suppress wrong "uninit'd use" gcc warning */
>> -    char ifname[128];
>> -
>> -    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_TAP);
>> -    tap = opts->tap;
>> -
>> -    if (tap->has_fd) {
>> -        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
>> -            tap->has_vnet_hdr || tap->has_helper) {
>> -            error_report("ifname=, script=, downscript=, vnet_hdr=, "
>> -                         "and helper= are invalid with fd=");
>> -            return -1;
>> -        }
>> -
>> -        fd = monitor_handle_fd_param(cur_mon, tap->fd);
>> -        if (fd == -1) {
>> -            return -1;
>> -        }
>> -
>> -        fcntl(fd, F_SETFL, O_NONBLOCK);
>> -
>> -        vnet_hdr = tap_probe_vnet_hdr(fd);
>> -
>> -        model = "tap";
>> -
>> -    } else if (tap->has_helper) {
>> -        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
>> -            tap->has_vnet_hdr) {
>> -            error_report("ifname=, script=, downscript=, and vnet_hdr= "
>> -                         "are invalid with helper=");
>> -            return -1;
>> -        }
>> -
>> -        fd = net_bridge_run_helper(tap->helper, DEFAULT_BRIDGE_INTERFACE);
>> -        if (fd == -1) {
>> -            return -1;
>> -        }
>> +static int tap_fd(const StringList *fd, const char **fds)
> This function can be dropped if you change it so the "queues" parameter
> is not used together with "fd".  There's no need to pass both: it simply
> adds more code to check they are consistent and is a pain for human
> users.
>
> Then you can iterate the StringList directly in __net_init_tap() without
> the needs for the temporary fds[] array.
>
> In other words:
>
> 1. For multiqueue without fd passing, use queues=<n>.
> 2. For multiqueue with fd passing, use fd=<fd>.

Ok. sure.
>> +{
>> +    const StringList *c = fd;
>> +    size_t i = 0, num_opts = 0;
>>  
>> -        fcntl(fd, F_SETFL, O_NONBLOCK);
>> +    while (c) {
>> +        num_opts++;
>> +        c = c->next;
>> +    }
>>  
>> -        vnet_hdr = tap_probe_vnet_hdr(fd);
>> +    if (num_opts == 0) {
>> +        return 0;
>> +    }
>>  
>> -        model = "bridge";
>> +    c = fd;
>> +    while (c) {
>> +        fds[i++] = c->value->str;
>> +        c = c->next;
>> +    }
>>  
>> -    } else {
>> -        script = tap->has_script ? tap->script : DEFAULT_NETWORK_SCRIPT;
>> -        fd = net_tap_init(tap, &vnet_hdr, script, ifname, sizeof ifname);
>> -        if (fd == -1) {
>> -            return -1;
>> -        }
>> +    return num_opts;
>> +}
>>  
>> -        model = "tap";
>> -    }
>> +static int __net_init_tap(const NetdevTapOptions *tap, NetClientState *peer,
>> +                          const char *model, const char *name,
>> +                          const char *ifname, const char *script,
>> +                          const char *downscript, int vnet_hdr, int fd)
> Function names starting with underscore are avoided in QEMU.  According
> to the C standard these names are reserved.  Please rename, how about
> net_init_one_tap()?

Ok, the name sounds better.
>> +{
>> +    TAPState *s;
>>  
>>      s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
> Every queue has the same name so qemu_find_netdev() doesn't work anymore.  I
> think we need snprintf(queue_name, "%s.%d", name, queue_index).
>
> The model where we have one NetClientState per queue has a few other
> issues.  Maybe you have adressed these in later patches:
>
> 1. netdev_del doesn't work because it only deleted 1 queue!
> 2. set_link changes link up/down for a single queue only
> 3. info network output will show many more entries now - I doubt
>    management tools like libvirt are prepared to handle this and they
>    may show 1 network interface per queue now!
>
> I think it's very likely that this simple 1:1 queue/NetClientState model
> won't work without more changes.

Yes, all these issues has been addressed in patch 4/12 net: multiqueue
support. The solution is straightforward: change or delete one of a
specific NetClientState of all that belongs to the same nic or tap is
not allowed, netdev_del/set_link will set the link or delete all
NetClientState that belongs to the same nic or tap. This would simplify
the management and minimize the changeset.
>>      if (!s) {
>> @@ -674,11 +638,6 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
>>          snprintf(s->nc.info_str, sizeof(s->nc.info_str), "helper=%s",
>>                   tap->helper);
>>      } else {
>> -        const char *downscript;
>> -
>> -        downscript = tap->has_downscript ? tap->downscript :
>> -                                           DEFAULT_NETWORK_DOWN_SCRIPT;
>> -
>>          snprintf(s->nc.info_str, sizeof(s->nc.info_str),
>>                   "ifname=%s,script=%s,downscript=%s", ifname, script,
>>                   downscript);
>> @@ -716,9 +675,150 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
>>      return 0;
>>  }
>>  
>> +int net_init_tap(const NetClientOptions *opts, const char *name,
>> +                 NetClientState *peer)
>> +{
>> +    const NetdevTapOptions *tap;
>> +    const char *fds[MAX_TAP_QUEUES];
> Not a good idea to duplicate a hard-coded value from the tun driver.  I
> suggested how to get rid of fds[] above, that way the tun driver could
> change this limit in the future without requiring a QEMU change too.

Ok.
>
>> +    int fd, vnet_hdr = 0, i, queues;
>> +    /* for the no-fd, no-helper case */
>> +    const char *script = NULL; /* suppress wrong "uninit'd use" gcc warning */
>> +    const char *downscript = NULL;
>> +    char ifname[128];
>> +
>> +    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_TAP);
>> +    tap = opts->tap;
>> +    queues = tap->has_queues ? tap->queues : 1;
>> +
>> +    if (tap->has_fd) {
>> +        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
>> +            tap->has_vnet_hdr || tap->has_helper) {
>> +            error_report("ifname=, script=, downscript=, vnet_hdr=, "
>> +                         "and helper= are invalid with fd=");
> Please add tap->has_queues here to prevent "queues" and "fd" from being
> used together.

Ok.
>> +            return -1;
>> +        }
>> +
>> +        if (queues != tap_fd(tap->fd, fds)) {
>> +            error_report("the number of fds were not equal to queues");
>> +            return -1;
>> +        }
>> +
>> +        for (i = 0; i < queues; i++) {
>> +            fd = monitor_handle_fd_param(cur_mon, fds[i]);
>> +            if (fd == -1) {
>> +                return -1;
>> +            }
>> +
>> +            fcntl(fd, F_SETFL, O_NONBLOCK);
>> +
>> +            if (i == 0) {
>> +                vnet_hdr = tap_probe_vnet_hdr(fd);
>> +            }
> The paranoid thing to do is:
>
> if (i == 0) {
>     vnet_hdr = tap_probe_vnet_hdr(fd);
> } else if (vnet_hdr != tap_probe_vnet_hdr(fd)) {
>     error_report("vnet_hdr not consistent across given tap fds");
>     return -1;
> }

Sure, I can add this.
>
>> +
>> +            if (__net_init_tap(tap, peer, "tap", name, ifname,
>> +                               script, downscript, vnet_hdr, fd)) {
>> +                return -1;
>> +            }
>> +        }
>> +    } else if (tap->has_helper) {
>> +        if (tap->has_ifname || tap->has_script || tap->has_downscript ||
>> +            tap->has_vnet_hdr) {
>> +            error_report("ifname=, script=, downscript=, and vnet_hdr= "
>> +                         "are invalid with helper=");
>> +            return -1;
>> +        }
>> +
>> +        /* FIXME: correct ? */
>> +        for (i = 0; i < queues; i++) {
>> +            fd = net_bridge_run_helper(tap->helper, DEFAULT_BRIDGE_INTERFACE);
> The bridge helper doesn't support multiqueue tap devices (it doesn't use
> IFF_MULTI_QUEUE).  Even if it did, SIOCBRADDIF would fail with EBUSY
> because the network interface has already been added to the bridge.
>
> It seems qemu-bridge-helper.c needs to be extended to support --queues.
>
> Right now this code is broken.

Right, I will add the bridge-helper in next version.
>> +            if (fd == -1) {
>> +                return -1;
>> +            }
>> +
>> +            fcntl(fd, F_SETFL, O_NONBLOCK);
>> +
>> +            if (i == 0) {
>> +                vnet_hdr = tap_probe_vnet_hdr(fd);
>> +            }
>> +
>> +            if (__net_init_tap(tap, peer, "bridge", name, ifname,
>> +                               script, downscript, vnet_hdr, fd)) {
>> +                return -1;
>> +            }
>> +        }
>> +    } else {
>> +        script = tap->has_script ? tap->script : DEFAULT_NETWORK_SCRIPT;
>> +        downscript = tap->has_downscript ? tap->downscript :
>> +                                           DEFAULT_NETWORK_DOWN_SCRIPT;
>> +
>> +        if (tap->has_ifname) {
>> +            pstrcpy(ifname, sizeof ifname, tap->ifname);
>> +        } else {
>> +            ifname[0] = '\0';
>> +        }
>> +
>> +        for (i = 0; i < queues; i++) {
>> +            fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
>> +                              ifname, sizeof ifname, queues > 1);
>> +            if (fd == -1) {
>> +                return -1;
>> +            }
>> +
>> +            if (i == 0 && tap_get_ifname(fd, ifname) != 0) {
>> +                error_report("could not get ifname");
>> +                return -1;
>> +            }
>> +
>> +            if (__net_init_tap(tap, peer, "tap", name, ifname,
>> +                               i >= 1 ? "no" : script,
>> +                               i >= 1 ? "no" : downscript,
>> +                               vnet_hdr, fd)) {
>> +                return -1;
>> +            }
> It's cleaner to avoid passing script/downscript into __net_init_tap()
> because the fd passing and helper cases don't use it.
>
> Move the nc.info_str setting code out of __net_init_tap().  Then the
> script/downscript arguments are unnecessary and we have fewer if
> statements checking for tap->has_fd, tap->has_helper, and else.
>

Make sense, will do this.
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  VHostNetState *tap_get_vhost_net(NetClientState *nc)
>>  {
>>      TAPState *s = DO_UPCAST(TAPState, nc, nc);
>>      assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP);
>>      return s->vhost_net;
>>  }
>> +
>> +int tap_attach(NetClientState *nc)
> The tap_attach()/tap_detach() naming isn't obvious.  I wouldn't be sure
> what these functions actually do.  You called the variable "enabled" -
> how about tap_enable()/tap_disable()?  (Even if you don't rename, please
> add a doc comment and make the s->enabled variable name consistent with
> the function naming.)

Good suggestion, will rename both functions.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/12] Multiqueue virtio-net
  2013-01-10 11:49           ` Stefan Hajnoczi
@ 2013-01-10 14:15             ` Jason Wang
  0 siblings, 0 replies; 58+ messages in thread
From: Jason Wang @ 2013-01-10 14:15 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: krkumar2, aliguori, kvm, Michael S. Tsirkin, Stefan Hajnoczi,
	rusty, qemu-devel, mprivozn, jwhan, shiyer

On 01/10/2013 07:49 PM, Stefan Hajnoczi wrote:
> On Thu, Jan 10, 2013 at 05:34:14PM +0800, Jason Wang wrote:
>> On 01/10/2013 04:44 PM, Stefan Hajnoczi wrote:
>>> On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote:
>>>> On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
>>>>>> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
>>>>>>> Perf Numbers:
>>>>>>>
>>>>>>> Two Intel Xeon 5620 with direct connected intel 82599EB
>>>>>>> Host/Guest kernel: David net tree
>>>>>>> vhost enabled
>>>>>>>
>>>>>>> - lots of improvents of both latency and cpu utilization in request-reponse test
>>>>>>> - get regression of guest sending small packets which because TCP tends to batch
>>>>>>>   less when the latency were improved
>>>>>>>
>>>>>>> 1q/2q/4q
>>>>>>> TCP_RR
>>>>>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>>>>>> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
>>>>>>> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>>>>>>> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>>>>>>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>>>>>>> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
>>>>>>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>>>>>>> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
>>>>>>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
>>>>>>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
>>>>>>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
>>>>>>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
>>>>>>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
>>>>>>> TCP_CRR
>>>>>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>>>>>> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
>>>>>>> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
>>>>>>> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
>>>>>>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
>>>>>>> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
>>>>>>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
>>>>>>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
>>>>>>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
>>>>>>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
>>>>>>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
>>>>>>> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
>>>>>>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
>>>>>>> TCP_STREAM guest receiving
>>>>>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>>>>>> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
>>>>>>> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
>>>>>>> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
>>>>>>> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
>>>>>>> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
>>>>>>> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
>>>>>>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
>>>>>>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
>>>>>>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
>>>>>>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
>>>>>>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
>>>>>>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
>>>>>>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
>>>>>>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
>>>>>>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
>>>>>>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
>>>>>>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
>>>>>>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
>>>>>>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
>>>>>>> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
>>>>>>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
>>>>>>> TCP_MAERTS guest sending
>>>>>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>>>>>> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
>>>>>>> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
>>>>>>> 1 4     71.59   1      68.91   0.94   61.52   0.77
>>>>>>> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
>>>>>>> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
>>>>>>> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
>>>>>>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
>>>>>>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
>>>>>>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
>>>>>>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
>>>>>>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
>>>>>>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
>>>>>>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
>>>>>>> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
>>>>>>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
>>>>>>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
>>>>>>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
>>>>>>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
>>>>>>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
>>>>>>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
>>>>>>> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
>>>>>> Trying to understand the performance results:
>>>>>>
>>>>>> What is the host device configuration?  tap + bridge?
>>>> Yes.
>>>>>> Did you use host CPU affinity for the vhost threads?
>>>> I use numactl to pin cpu threads and vhost threads in the same numa node.
>>>>>> Can multiqueue tap take advantage of multiqueue host NICs or is
>>>>>> virtio-net multiqueue unaware of the physical NIC multiqueue
>>>>>> capabilities?
>>>> Tap is unware of the physical multiqueue NIC, but we can benefit from it
>>>> since we use multiple vhost threads.
>>> I wonder if it makes a difference to bind tap queues to physical NIC
>>> queues.  Maybe this is only possible in macvlan or can you preset the
>>> queue index of outgoing skbs so the network stack doesn't recalculate
>>> the flow?
>> There are some issues here:
>>
>> - For tap, we know nothing about the physical card, especially how many
>> queues it has.
>> - We can present the queue index information in the skb. But there's not
>> a standard txq selection / rxq smp affinity setting method for
>> multiqueue card driver in linux. For example, ixgbe and efx use
>> completely different method. So we can easily find a method for ixgbe
>> but not all others.
> It's an interesting problem because it seems like doing multiqueue
> through the entire stack would be more efficient than doing multiqueue
> twice at different layers.

Yes, I admit that more co-operation on the whole stack is better. Anyway
we can start from this series and improve on top.
> I wonder how much of a difference it can make.

Not an easy task at least form the point of my view. May need 1) unify
the flow steering mechanism of all mq drivers or cards 2) passing more
information between various layers.
> Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/12] Multiqueue virtio-net
  2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
                   ` (12 preceding siblings ...)
  2013-01-09 14:29 ` [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net Stefan Hajnoczi
@ 2013-01-14 19:44 ` Anthony Liguori
  2013-01-15 10:12   ` Jason Wang
  13 siblings, 1 reply; 58+ messages in thread
From: Anthony Liguori @ 2013-01-14 19:44 UTC (permalink / raw)
  To: Jason Wang, mst, stefanha, qemu-devel
  Cc: rusty, kvm, mprivozn, shiyer, krkumar2, jwhan, Jason Wang

Jason Wang <jasowang@redhat.com> writes:

> Hello all:
>
> This seires is an update of last version of multiqueue virtio-net support.
>
> Recently, linux tap gets multiqueue support. This series implements basic
> support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
> enable the multiqueue support for virtio-net.
>
> Both vhost and userspace multiqueue were implemented for virtio-net, but
> userspace could be get much benefits since dataplane like parallized mechanism
> were not implemented.
>
> User could start a multiqueue virtio-net card through adding a "queues"
> parameter to tap.
>
> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
>
> Management tools such as libvirt can pass multiple pre-created fds through
>
> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> virtio-net-pci,netdev=hn0

I'm confused/frightened that this syntax works.  You shouldn't be
allowed to have two values for the same property.  Better to have a
syntax like fd[0]=X,fd[1]=Y or something along those lines.

Regards,

Anthony Liguori

>
> You can fetch and try the code from:
> git://github.com/jasowang/qemu.git
>
> Patch 1 adds a generic method of creating multiqueue taps and implement the
> linux part.
> Patch 2 - 4 introduce some helpers which could be used to refactor the nic
> emulation codes to support multiqueue.
> Patch 5 introduces multiqueue support for qemu networking code: each peers of
> NetClientState were abstracted as a queue. Though this, most of the codes could
> be reusued without change.
> Patch 6 adds basic multiqueue support for vhost which could let vhost just
> handle a subset of all virtqueues.
> Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
> virtio-net.
> Patch 9-12 implement the multiqueue support of virtio-net
>
> Changes from RFC v2:
> - rebase the codes to latest qemu
> - align the multiqueue virtio-net implementation to virtio spec
> - split the patches into more smaller patches
> - set_link and hotplug support
>
> Changes from RFC V1:
> - rebase to the latest
> - fix memory leak in parse_netdev
> - fix guest notifiers assignment/de-assignment
> - changes the command lines to:
>    qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2
>
> Reference:
> v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
> v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481
>
> Perf Numbers:
>
> Two Intel Xeon 5620 with direct connected intel 82599EB
> Host/Guest kernel: David net tree
> vhost enabled
>
> - lots of improvents of both latency and cpu utilization in request-reponse test
> - get regression of guest sending small packets which because TCP tends to batch
>   less when the latency were improved
>
> 1q/2q/4q
> TCP_RR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> TCP_CRR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> TCP_STREAM guest receiving
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> TCP_MAERTS guest sending
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
> 1 4     71.59   1      68.91   0.94   61.52   0.77
> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
>
>
> Jason Wang (12):
>   tap: multiqueue support
>   net: introduce qemu_get_queue()
>   net: introduce qemu_get_nic()
>   net: intorduce qemu_del_nic()
>   net: multiqueue support
>   vhost: multiqueue support
>   virtio: introduce virtio_queue_del()
>   virtio: add a queue_index to VirtQueue
>   virtio-net: separate virtqueue from VirtIONet
>   virtio-net: multiqueue support
>   virtio-net: migration support for multiqueue
>   virtio-net: compat multiqueue support
>
>  hw/cadence_gem.c        |   16 +-
>  hw/dp8393x.c            |   16 +-
>  hw/e1000.c              |   28 ++--
>  hw/eepro100.c           |   18 +-
>  hw/etraxfs_eth.c        |   10 +-
>  hw/lan9118.c            |   16 +-
>  hw/lance.c              |    2 +-
>  hw/mcf_fec.c            |   12 +-
>  hw/milkymist-minimac2.c |   10 +-
>  hw/mipsnet.c            |   10 +-
>  hw/musicpal.c           |    6 +-
>  hw/ne2000-isa.c         |    4 +-
>  hw/ne2000.c             |   12 +-
>  hw/opencores_eth.c      |   12 +-
>  hw/pc_piix.c            |    4 +
>  hw/pcnet-pci.c          |    4 +-
>  hw/pcnet.c              |   12 +-
>  hw/qdev-properties.c    |   46 ++++-
>  hw/qdev-properties.h    |    6 +-
>  hw/rtl8139.c            |   20 +-
>  hw/smc91c111.c          |   10 +-
>  hw/spapr_llan.c         |    8 +-
>  hw/stellaris_enet.c     |   10 +-
>  hw/usb/dev-network.c    |   16 +-
>  hw/vhost.c              |   52 +++--
>  hw/vhost.h              |    2 +
>  hw/vhost_net.c          |    7 +-
>  hw/vhost_net.h          |    2 +-
>  hw/virtio-net.c         |  523 ++++++++++++++++++++++++++++++++++-------------
>  hw/virtio-net.h         |   27 +++-
>  hw/virtio.c             |   17 ++
>  hw/virtio.h             |    3 +
>  hw/xen_nic.c            |   14 +-
>  hw/xgmac.c              |   10 +-
>  hw/xilinx_axienet.c     |   10 +-
>  hw/xilinx_ethlite.c     |   10 +-
>  net.c                   |  198 ++++++++++++++----
>  net.h                   |   31 +++-
>  net/tap-aix.c           |   18 ++-
>  net/tap-bsd.c           |   18 ++-
>  net/tap-haiku.c         |   18 ++-
>  net/tap-linux.c         |   70 ++++++-
>  net/tap-linux.h         |    4 +
>  net/tap-solaris.c       |   18 ++-
>  net/tap-win32.c         |   10 +
>  net/tap.c               |  248 ++++++++++++++++-------
>  net/tap.h               |    8 +-
>  qapi-schema.json        |    5 +-
>  savevm.c                |    2 +-
>  49 files changed, 1177 insertions(+), 456 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/12] Multiqueue virtio-net
  2013-01-14 19:44 ` Anthony Liguori
@ 2013-01-15 10:12   ` Jason Wang
  2013-01-16 15:09     ` Anthony Liguori
  0 siblings, 1 reply; 58+ messages in thread
From: Jason Wang @ 2013-01-15 10:12 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: krkumar2, kvm, mst, mprivozn, rusty, qemu-devel, stefanha, jwhan, shiyer

On 01/15/2013 03:44 AM, Anthony Liguori wrote:
> Jason Wang <jasowang@redhat.com> writes:
>
>> Hello all:
>>
>> This seires is an update of last version of multiqueue virtio-net support.
>>
>> Recently, linux tap gets multiqueue support. This series implements basic
>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
>> enable the multiqueue support for virtio-net.
>>
>> Both vhost and userspace multiqueue were implemented for virtio-net, but
>> userspace could be get much benefits since dataplane like parallized mechanism
>> were not implemented.
>>
>> User could start a multiqueue virtio-net card through adding a "queues"
>> parameter to tap.
>>
>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
>>
>> Management tools such as libvirt can pass multiple pre-created fds through
>>
>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
>> virtio-net-pci,netdev=hn0
> I'm confused/frightened that this syntax works.  You shouldn't be
> allowed to have two values for the same property.  Better to have a
> syntax like fd[0]=X,fd[1]=Y or something along those lines.

Yes, but this what current a StringList type works for command line.
Some other parameters such as dnssearch, hostfwd and guestfwd have
already worked in this way. Looks like your suggestions need some
extension on QemuOps visitor, maybe we can do this on top.

Thanks
>
> Regards,
>
> Anthony Liguori
>
>> You can fetch and try the code from:
>> git://github.com/jasowang/qemu.git
>>
>> Patch 1 adds a generic method of creating multiqueue taps and implement the
>> linux part.
>> Patch 2 - 4 introduce some helpers which could be used to refactor the nic
>> emulation codes to support multiqueue.
>> Patch 5 introduces multiqueue support for qemu networking code: each peers of
>> NetClientState were abstracted as a queue. Though this, most of the codes could
>> be reusued without change.
>> Patch 6 adds basic multiqueue support for vhost which could let vhost just
>> handle a subset of all virtqueues.
>> Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
>> virtio-net.
>> Patch 9-12 implement the multiqueue support of virtio-net
>>
>> Changes from RFC v2:
>> - rebase the codes to latest qemu
>> - align the multiqueue virtio-net implementation to virtio spec
>> - split the patches into more smaller patches
>> - set_link and hotplug support
>>
>> Changes from RFC V1:
>> - rebase to the latest
>> - fix memory leak in parse_netdev
>> - fix guest notifiers assignment/de-assignment
>> - changes the command lines to:
>>    qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2
>>
>> Reference:
>> v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
>> v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481
>>
>> Perf Numbers:
>>
>> Two Intel Xeon 5620 with direct connected intel 82599EB
>> Host/Guest kernel: David net tree
>> vhost enabled
>>
>> - lots of improvents of both latency and cpu utilization in request-reponse test
>> - get regression of guest sending small packets which because TCP tends to batch
>>   less when the latency were improved
>>
>> 1q/2q/4q
>> TCP_RR
>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
>> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
>> TCP_CRR
>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
>> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
>> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
>> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
>> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
>> TCP_STREAM guest receiving
>>  size #sessions throughput  norm throughput  norm throughput  norm
>> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
>> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
>> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
>> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
>> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
>> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
>> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
>> TCP_MAERTS guest sending
>>  size #sessions throughput  norm throughput  norm throughput  norm
>> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
>> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
>> 1 4     71.59   1      68.91   0.94   61.52   0.77
>> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
>> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
>> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
>> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
>> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
>>
>>
>> Jason Wang (12):
>>   tap: multiqueue support
>>   net: introduce qemu_get_queue()
>>   net: introduce qemu_get_nic()
>>   net: intorduce qemu_del_nic()
>>   net: multiqueue support
>>   vhost: multiqueue support
>>   virtio: introduce virtio_queue_del()
>>   virtio: add a queue_index to VirtQueue
>>   virtio-net: separate virtqueue from VirtIONet
>>   virtio-net: multiqueue support
>>   virtio-net: migration support for multiqueue
>>   virtio-net: compat multiqueue support
>>
>>  hw/cadence_gem.c        |   16 +-
>>  hw/dp8393x.c            |   16 +-
>>  hw/e1000.c              |   28 ++--
>>  hw/eepro100.c           |   18 +-
>>  hw/etraxfs_eth.c        |   10 +-
>>  hw/lan9118.c            |   16 +-
>>  hw/lance.c              |    2 +-
>>  hw/mcf_fec.c            |   12 +-
>>  hw/milkymist-minimac2.c |   10 +-
>>  hw/mipsnet.c            |   10 +-
>>  hw/musicpal.c           |    6 +-
>>  hw/ne2000-isa.c         |    4 +-
>>  hw/ne2000.c             |   12 +-
>>  hw/opencores_eth.c      |   12 +-
>>  hw/pc_piix.c            |    4 +
>>  hw/pcnet-pci.c          |    4 +-
>>  hw/pcnet.c              |   12 +-
>>  hw/qdev-properties.c    |   46 ++++-
>>  hw/qdev-properties.h    |    6 +-
>>  hw/rtl8139.c            |   20 +-
>>  hw/smc91c111.c          |   10 +-
>>  hw/spapr_llan.c         |    8 +-
>>  hw/stellaris_enet.c     |   10 +-
>>  hw/usb/dev-network.c    |   16 +-
>>  hw/vhost.c              |   52 +++--
>>  hw/vhost.h              |    2 +
>>  hw/vhost_net.c          |    7 +-
>>  hw/vhost_net.h          |    2 +-
>>  hw/virtio-net.c         |  523 ++++++++++++++++++++++++++++++++++-------------
>>  hw/virtio-net.h         |   27 +++-
>>  hw/virtio.c             |   17 ++
>>  hw/virtio.h             |    3 +
>>  hw/xen_nic.c            |   14 +-
>>  hw/xgmac.c              |   10 +-
>>  hw/xilinx_axienet.c     |   10 +-
>>  hw/xilinx_ethlite.c     |   10 +-
>>  net.c                   |  198 ++++++++++++++----
>>  net.h                   |   31 +++-
>>  net/tap-aix.c           |   18 ++-
>>  net/tap-bsd.c           |   18 ++-
>>  net/tap-haiku.c         |   18 ++-
>>  net/tap-linux.c         |   70 ++++++-
>>  net/tap-linux.h         |    4 +
>>  net/tap-solaris.c       |   18 ++-
>>  net/tap-win32.c         |   10 +
>>  net/tap.c               |  248 ++++++++++++++++-------
>>  net/tap.h               |    8 +-
>>  qapi-schema.json        |    5 +-
>>  savevm.c                |    2 +-
>>  49 files changed, 1177 insertions(+), 456 deletions(-)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/12] Multiqueue virtio-net
  2013-01-15 10:12   ` Jason Wang
@ 2013-01-16 15:09     ` Anthony Liguori
  2013-01-16 15:19       ` Michael S. Tsirkin
  0 siblings, 1 reply; 58+ messages in thread
From: Anthony Liguori @ 2013-01-16 15:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, stefanha, qemu-devel, rusty, kvm, mprivozn, shiyer, krkumar2, jwhan

Jason Wang <jasowang@redhat.com> writes:

> On 01/15/2013 03:44 AM, Anthony Liguori wrote:
>> Jason Wang <jasowang@redhat.com> writes:
>>
>>> Hello all:
>>>
>>> This seires is an update of last version of multiqueue virtio-net support.
>>>
>>> Recently, linux tap gets multiqueue support. This series implements basic
>>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
>>> enable the multiqueue support for virtio-net.
>>>
>>> Both vhost and userspace multiqueue were implemented for virtio-net, but
>>> userspace could be get much benefits since dataplane like parallized mechanism
>>> were not implemented.
>>>
>>> User could start a multiqueue virtio-net card through adding a "queues"
>>> parameter to tap.
>>>
>>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
>>>
>>> Management tools such as libvirt can pass multiple pre-created fds through
>>>
>>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
>>> virtio-net-pci,netdev=hn0
>> I'm confused/frightened that this syntax works.  You shouldn't be
>> allowed to have two values for the same property.  Better to have a
>> syntax like fd[0]=X,fd[1]=Y or something along those lines.
>
> Yes, but this what current a StringList type works for command line.
> Some other parameters such as dnssearch, hostfwd and guestfwd have
> already worked in this way. Looks like your suggestions need some
> extension on QemuOps visitor, maybe we can do this on top.

It's a silly syntax and breaks compatibility.  This is valid syntax:

-net tap,fd=3,fd=4

In this case, it means 'fd=4' because the last fd overwrites the first
one.

Now you've changed it to mean something else.  Having one thing mean
something in one context, but something else in another context is
terrible interface design.

Regards,

Anthony Liguori

>
> Thanks
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>> You can fetch and try the code from:
>>> git://github.com/jasowang/qemu.git
>>>
>>> Patch 1 adds a generic method of creating multiqueue taps and implement the
>>> linux part.
>>> Patch 2 - 4 introduce some helpers which could be used to refactor the nic
>>> emulation codes to support multiqueue.
>>> Patch 5 introduces multiqueue support for qemu networking code: each peers of
>>> NetClientState were abstracted as a queue. Though this, most of the codes could
>>> be reusued without change.
>>> Patch 6 adds basic multiqueue support for vhost which could let vhost just
>>> handle a subset of all virtqueues.
>>> Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
>>> virtio-net.
>>> Patch 9-12 implement the multiqueue support of virtio-net
>>>
>>> Changes from RFC v2:
>>> - rebase the codes to latest qemu
>>> - align the multiqueue virtio-net implementation to virtio spec
>>> - split the patches into more smaller patches
>>> - set_link and hotplug support
>>>
>>> Changes from RFC V1:
>>> - rebase to the latest
>>> - fix memory leak in parse_netdev
>>> - fix guest notifiers assignment/de-assignment
>>> - changes the command lines to:
>>>    qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2
>>>
>>> Reference:
>>> v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
>>> v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481
>>>
>>> Perf Numbers:
>>>
>>> Two Intel Xeon 5620 with direct connected intel 82599EB
>>> Host/Guest kernel: David net tree
>>> vhost enabled
>>>
>>> - lots of improvents of both latency and cpu utilization in request-reponse test
>>> - get regression of guest sending small packets which because TCP tends to batch
>>>   less when the latency were improved
>>>
>>> 1q/2q/4q
>>> TCP_RR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
>>> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>>> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>>> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
>>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>>> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
>>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
>>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
>>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
>>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
>>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
>>> TCP_CRR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
>>> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
>>> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
>>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
>>> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
>>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
>>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
>>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
>>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
>>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
>>> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
>>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
>>> TCP_STREAM guest receiving
>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
>>> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
>>> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
>>> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
>>> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
>>> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
>>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
>>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
>>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
>>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
>>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
>>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
>>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
>>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
>>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
>>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
>>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
>>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
>>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
>>> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
>>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
>>> TCP_MAERTS guest sending
>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
>>> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
>>> 1 4     71.59   1      68.91   0.94   61.52   0.77
>>> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
>>> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
>>> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
>>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
>>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
>>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
>>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
>>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
>>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
>>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
>>> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
>>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
>>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
>>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
>>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
>>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
>>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
>>> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
>>>
>>>
>>> Jason Wang (12):
>>>   tap: multiqueue support
>>>   net: introduce qemu_get_queue()
>>>   net: introduce qemu_get_nic()
>>>   net: intorduce qemu_del_nic()
>>>   net: multiqueue support
>>>   vhost: multiqueue support
>>>   virtio: introduce virtio_queue_del()
>>>   virtio: add a queue_index to VirtQueue
>>>   virtio-net: separate virtqueue from VirtIONet
>>>   virtio-net: multiqueue support
>>>   virtio-net: migration support for multiqueue
>>>   virtio-net: compat multiqueue support
>>>
>>>  hw/cadence_gem.c        |   16 +-
>>>  hw/dp8393x.c            |   16 +-
>>>  hw/e1000.c              |   28 ++--
>>>  hw/eepro100.c           |   18 +-
>>>  hw/etraxfs_eth.c        |   10 +-
>>>  hw/lan9118.c            |   16 +-
>>>  hw/lance.c              |    2 +-
>>>  hw/mcf_fec.c            |   12 +-
>>>  hw/milkymist-minimac2.c |   10 +-
>>>  hw/mipsnet.c            |   10 +-
>>>  hw/musicpal.c           |    6 +-
>>>  hw/ne2000-isa.c         |    4 +-
>>>  hw/ne2000.c             |   12 +-
>>>  hw/opencores_eth.c      |   12 +-
>>>  hw/pc_piix.c            |    4 +
>>>  hw/pcnet-pci.c          |    4 +-
>>>  hw/pcnet.c              |   12 +-
>>>  hw/qdev-properties.c    |   46 ++++-
>>>  hw/qdev-properties.h    |    6 +-
>>>  hw/rtl8139.c            |   20 +-
>>>  hw/smc91c111.c          |   10 +-
>>>  hw/spapr_llan.c         |    8 +-
>>>  hw/stellaris_enet.c     |   10 +-
>>>  hw/usb/dev-network.c    |   16 +-
>>>  hw/vhost.c              |   52 +++--
>>>  hw/vhost.h              |    2 +
>>>  hw/vhost_net.c          |    7 +-
>>>  hw/vhost_net.h          |    2 +-
>>>  hw/virtio-net.c         |  523 ++++++++++++++++++++++++++++++++++-------------
>>>  hw/virtio-net.h         |   27 +++-
>>>  hw/virtio.c             |   17 ++
>>>  hw/virtio.h             |    3 +
>>>  hw/xen_nic.c            |   14 +-
>>>  hw/xgmac.c              |   10 +-
>>>  hw/xilinx_axienet.c     |   10 +-
>>>  hw/xilinx_ethlite.c     |   10 +-
>>>  net.c                   |  198 ++++++++++++++----
>>>  net.h                   |   31 +++-
>>>  net/tap-aix.c           |   18 ++-
>>>  net/tap-bsd.c           |   18 ++-
>>>  net/tap-haiku.c         |   18 ++-
>>>  net/tap-linux.c         |   70 ++++++-
>>>  net/tap-linux.h         |    4 +
>>>  net/tap-solaris.c       |   18 ++-
>>>  net/tap-win32.c         |   10 +
>>>  net/tap.c               |  248 ++++++++++++++++-------
>>>  net/tap.h               |    8 +-
>>>  qapi-schema.json        |    5 +-
>>>  savevm.c                |    2 +-
>>>  49 files changed, 1177 insertions(+), 456 deletions(-)
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/12] Multiqueue virtio-net
  2013-01-16 15:09     ` Anthony Liguori
@ 2013-01-16 15:19       ` Michael S. Tsirkin
  2013-01-16 16:14         ` Anthony Liguori
  0 siblings, 1 reply; 58+ messages in thread
From: Michael S. Tsirkin @ 2013-01-16 15:19 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: krkumar2, kvm, mprivozn, Jason Wang, rusty, qemu-devel, stefanha,
	jwhan, shiyer

On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
> Jason Wang <jasowang@redhat.com> writes:
> 
> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
> >> Jason Wang <jasowang@redhat.com> writes:
> >>
> >>> Hello all:
> >>>
> >>> This seires is an update of last version of multiqueue virtio-net support.
> >>>
> >>> Recently, linux tap gets multiqueue support. This series implements basic
> >>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
> >>> enable the multiqueue support for virtio-net.
> >>>
> >>> Both vhost and userspace multiqueue were implemented for virtio-net, but
> >>> userspace could be get much benefits since dataplane like parallized mechanism
> >>> were not implemented.
> >>>
> >>> User could start a multiqueue virtio-net card through adding a "queues"
> >>> parameter to tap.
> >>>
> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
> >>>
> >>> Management tools such as libvirt can pass multiple pre-created fds through
> >>>
> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> >>> virtio-net-pci,netdev=hn0
> >> I'm confused/frightened that this syntax works.  You shouldn't be
> >> allowed to have two values for the same property.  Better to have a
> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
> >
> > Yes, but this what current a StringList type works for command line.
> > Some other parameters such as dnssearch, hostfwd and guestfwd have
> > already worked in this way. Looks like your suggestions need some
> > extension on QemuOps visitor, maybe we can do this on top.
> 
> It's a silly syntax and breaks compatibility.  This is valid syntax:
> 
> -net tap,fd=3,fd=4
> 
> In this case, it means 'fd=4' because the last fd overwrites the first
> one.
> 
> Now you've changed it to mean something else.  Having one thing mean
> something in one context, but something else in another context is
> terrible interface design.
> 
> Regards,
> 
> Anthony Liguori

Aha so just renaming the field 'fds' would address this issue?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/12] Multiqueue virtio-net
  2013-01-16 15:19       ` Michael S. Tsirkin
@ 2013-01-16 16:14         ` Anthony Liguori
  2013-01-16 16:48           ` Michael S. Tsirkin
  2013-01-17 10:31           ` [Qemu-devel] " Michael S. Tsirkin
  0 siblings, 2 replies; 58+ messages in thread
From: Anthony Liguori @ 2013-01-16 16:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: krkumar2, kvm, mprivozn, Jason Wang, rusty, qemu-devel, stefanha,
	jwhan, shiyer

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
>> Jason Wang <jasowang@redhat.com> writes:
>> 
>> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
>> >> Jason Wang <jasowang@redhat.com> writes:
>> >>
>> >>> Hello all:
>> >>>
>> >>> This seires is an update of last version of multiqueue virtio-net support.
>> >>>
>> >>> Recently, linux tap gets multiqueue support. This series implements basic
>> >>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
>> >>> enable the multiqueue support for virtio-net.
>> >>>
>> >>> Both vhost and userspace multiqueue were implemented for virtio-net, but
>> >>> userspace could be get much benefits since dataplane like parallized mechanism
>> >>> were not implemented.
>> >>>
>> >>> User could start a multiqueue virtio-net card through adding a "queues"
>> >>> parameter to tap.
>> >>>
>> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
>> >>>
>> >>> Management tools such as libvirt can pass multiple pre-created fds through
>> >>>
>> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
>> >>> virtio-net-pci,netdev=hn0
>> >> I'm confused/frightened that this syntax works.  You shouldn't be
>> >> allowed to have two values for the same property.  Better to have a
>> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
>> >
>> > Yes, but this what current a StringList type works for command line.
>> > Some other parameters such as dnssearch, hostfwd and guestfwd have
>> > already worked in this way. Looks like your suggestions need some
>> > extension on QemuOps visitor, maybe we can do this on top.
>> 
>> It's a silly syntax and breaks compatibility.  This is valid syntax:
>> 
>> -net tap,fd=3,fd=4
>> 
>> In this case, it means 'fd=4' because the last fd overwrites the first
>> one.
>> 
>> Now you've changed it to mean something else.  Having one thing mean
>> something in one context, but something else in another context is
>> terrible interface design.
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> Aha so just renaming the field 'fds' would address this issue?

No, you still have the problem of different meanings.

-netdev tap,fd=X,fd=Y

-netdev tap,fds=X,fds=Y

Would have wildly different behavior.

Just do:

-netdev tap,fds=X:Y

And then we're staying consistent wrt the interpretation of multiple
properties of the same name.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/12] Multiqueue virtio-net
  2013-01-16 16:14         ` Anthony Liguori
@ 2013-01-16 16:48           ` Michael S. Tsirkin
  2013-01-17 10:31           ` [Qemu-devel] " Michael S. Tsirkin
  1 sibling, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2013-01-16 16:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: krkumar2, kvm, mprivozn, Jason Wang, rusty, qemu-devel, stefanha,
	jwhan, shiyer

On Wed, Jan 16, 2013 at 10:14:33AM -0600, Anthony Liguori wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
> >> Jason Wang <jasowang@redhat.com> writes:
> >> 
> >> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
> >> >> Jason Wang <jasowang@redhat.com> writes:
> >> >>
> >> >>> Hello all:
> >> >>>
> >> >>> This seires is an update of last version of multiqueue virtio-net support.
> >> >>>
> >> >>> Recently, linux tap gets multiqueue support. This series implements basic
> >> >>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
> >> >>> enable the multiqueue support for virtio-net.
> >> >>>
> >> >>> Both vhost and userspace multiqueue were implemented for virtio-net, but
> >> >>> userspace could be get much benefits since dataplane like parallized mechanism
> >> >>> were not implemented.
> >> >>>
> >> >>> User could start a multiqueue virtio-net card through adding a "queues"
> >> >>> parameter to tap.
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
> >> >>>
> >> >>> Management tools such as libvirt can pass multiple pre-created fds through
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> >> >>> virtio-net-pci,netdev=hn0
> >> >> I'm confused/frightened that this syntax works.  You shouldn't be
> >> >> allowed to have two values for the same property.  Better to have a
> >> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
> >> >
> >> > Yes, but this what current a StringList type works for command line.
> >> > Some other parameters such as dnssearch, hostfwd and guestfwd have
> >> > already worked in this way. Looks like your suggestions need some
> >> > extension on QemuOps visitor, maybe we can do this on top.
> >> 
> >> It's a silly syntax and breaks compatibility.  This is valid syntax:
> >> 
> >> -net tap,fd=3,fd=4
> >> 
> >> In this case, it means 'fd=4' because the last fd overwrites the first
> >> one.
> >> 
> >> Now you've changed it to mean something else.  Having one thing mean
> >> something in one context, but something else in another context is
> >> terrible interface design.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >
> > Aha so just renaming the field 'fds' would address this issue?
> 
> No, you still have the problem of different meanings.
> 
> -netdev tap,fd=X,fd=Y
> 
> -netdev tap,fds=X,fds=Y
> 
> Would have wildly different behavior.

fd=X,fd=Y is more a bug than a feature. It could have failed
just as well.

> Just do:
> 
> -netdev tap,fds=X:Y
> 
> And then we're staying consistent wrt the interpretation of multiple
> properties of the same name.
> 
> Regards,
> 
> Anthony Liguori


Issue is ':' would only work for a list of numbers.
As Jason points out StringList is already used - do
we really want to invent yet another syntax for a list
that will work only for this case?

-- 
MST

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
  2013-01-16 16:14         ` Anthony Liguori
  2013-01-16 16:48           ` Michael S. Tsirkin
@ 2013-01-17 10:31           ` Michael S. Tsirkin
  1 sibling, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2013-01-17 10:31 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: krkumar2, kvm, mprivozn, Jason Wang, rusty, qemu-devel, stefanha,
	jwhan, shiyer

On Wed, Jan 16, 2013 at 10:14:33AM -0600, Anthony Liguori wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
> >> Jason Wang <jasowang@redhat.com> writes:
> >> 
> >> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
> >> >> Jason Wang <jasowang@redhat.com> writes:
> >> >>
> >> >>> Hello all:
> >> >>>
> >> >>> This seires is an update of last version of multiqueue virtio-net support.
> >> >>>
> >> >>> Recently, linux tap gets multiqueue support. This series implements basic
> >> >>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
> >> >>> enable the multiqueue support for virtio-net.
> >> >>>
> >> >>> Both vhost and userspace multiqueue were implemented for virtio-net, but
> >> >>> userspace could be get much benefits since dataplane like parallized mechanism
> >> >>> were not implemented.
> >> >>>
> >> >>> User could start a multiqueue virtio-net card through adding a "queues"
> >> >>> parameter to tap.
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
> >> >>>
> >> >>> Management tools such as libvirt can pass multiple pre-created fds through
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> >> >>> virtio-net-pci,netdev=hn0
> >> >> I'm confused/frightened that this syntax works.  You shouldn't be
> >> >> allowed to have two values for the same property.  Better to have a
> >> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
> >> >
> >> > Yes, but this what current a StringList type works for command line.
> >> > Some other parameters such as dnssearch, hostfwd and guestfwd have
> >> > already worked in this way. Looks like your suggestions need some
> >> > extension on QemuOps visitor, maybe we can do this on top.
> >> 
> >> It's a silly syntax and breaks compatibility.  This is valid syntax:
> >> 
> >> -net tap,fd=3,fd=4
> >> 
> >> In this case, it means 'fd=4' because the last fd overwrites the first
> >> one.
> >> 
> >> Now you've changed it to mean something else.  Having one thing mean
> >> something in one context, but something else in another context is
> >> terrible interface design.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >
> > Aha so just renaming the field 'fds' would address this issue?
> 
> No, you still have the problem of different meanings.
> 
> -netdev tap,fd=X,fd=Y
> 
> -netdev tap,fds=X,fds=Y
> 
> Would have wildly different behavior.

I think even caring about -net tap,fd=1,fd=2 is a bit silly.  If this
resulted in fd=2 by mistake, I don't think it was ever intentionally
legal.
As Jason points out we have list support and for better or worse
it is currently using repeated options, e.g. with dnssearch, hostfwd and
guestfwd.
Isn't it better to be consistent?

> Just do:
> 
> -netdev tap,fds=X:Y
> 
> And then we're staying consistent wrt the interpretation of multiple
> properties of the same name.
> 
> Regards,
> 
> Anthony Liguori

This introduces : as a special character. However fds can
be fd names passed in with getfd, where : is a legal character.

-- 
MST

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2013-01-17 10:27 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-28 10:31 [PATCH 00/12] Multiqueue virtio-net Jason Wang
2012-12-28 10:31 ` [PATCH 01/12] tap: multiqueue support Jason Wang
2013-01-09  9:56   ` Stefan Hajnoczi
2013-01-09 15:25     ` Jason Wang
2013-01-10  8:32       ` Stefan Hajnoczi
2013-01-10 10:28   ` Stefan Hajnoczi
2013-01-10 13:52     ` Jason Wang
2012-12-28 10:31 ` [PATCH 02/12] net: introduce qemu_get_queue() Jason Wang
2012-12-28 10:31 ` [PATCH 03/12] net: introduce qemu_get_nic() Jason Wang
2012-12-28 10:31 ` [PATCH 04/12] net: intorduce qemu_del_nic() Jason Wang
2012-12-28 10:31 ` [PATCH 05/12] net: multiqueue support Jason Wang
2012-12-28 18:06   ` Blue Swirl
2012-12-28 10:31 ` [PATCH 06/12] vhost: " Jason Wang
2012-12-28 10:31 ` [PATCH 07/12] virtio: introduce virtio_queue_del() Jason Wang
2013-01-08  7:14   ` Michael S. Tsirkin
2013-01-08  9:28     ` Jason Wang
2012-12-28 10:32 ` [PATCH 08/12] virtio: add a queue_index to VirtQueue Jason Wang
2012-12-28 10:32 ` [PATCH 09/12] virtio-net: separate virtqueue from VirtIONet Jason Wang
2012-12-28 10:32 ` [PATCH 10/12] virtio-net: multiqueue support Jason Wang
2012-12-28 17:52   ` Blue Swirl
2013-01-04  5:12     ` Jason Wang
2013-01-04 20:41       ` Blue Swirl
2013-01-08  9:07   ` [Qemu-devel] " Wanlong Gao
2013-01-08  9:29     ` Jason Wang
2013-01-08  9:32       ` [Qemu-devel] " Wanlong Gao
2013-01-08  9:49       ` Wanlong Gao
2013-01-08  9:51         ` Jason Wang
2013-01-08 10:00           ` [Qemu-devel] " Wanlong Gao
2013-01-08 10:14             ` Jason Wang
2013-01-08 11:24               ` [Qemu-devel] " Wanlong Gao
2013-01-09  3:11                 ` Jason Wang
2013-01-09  8:23               ` Wanlong Gao
2013-01-09  9:30                 ` Jason Wang
2013-01-09 10:01                   ` [Qemu-devel] " Wanlong Gao
2013-01-09 15:26                     ` Jason Wang
2013-01-10  6:43                       ` Jason Wang
2013-01-10  6:49                         ` Wanlong Gao
2013-01-10  7:16                           ` Jason Wang
2013-01-10  9:06                             ` Wanlong Gao
2013-01-10  9:40                               ` [Qemu-devel] " Jason Wang
2012-12-28 10:32 ` [PATCH 11/12] virtio-net: migration support for multiqueue Jason Wang
2013-01-08  7:10   ` Michael S. Tsirkin
2013-01-08  9:27     ` Jason Wang
2012-12-28 10:32 ` [PATCH 12/12] virtio-net: compat multiqueue support Jason Wang
2013-01-09 14:29 ` [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net Stefan Hajnoczi
2013-01-09 15:32   ` Michael S. Tsirkin
2013-01-09 15:33     ` Jason Wang
2013-01-10  8:44       ` Stefan Hajnoczi
2013-01-10  9:34         ` [Qemu-devel] " Jason Wang
2013-01-10 11:49           ` Stefan Hajnoczi
2013-01-10 14:15             ` Jason Wang
2013-01-14 19:44 ` Anthony Liguori
2013-01-15 10:12   ` Jason Wang
2013-01-16 15:09     ` Anthony Liguori
2013-01-16 15:19       ` Michael S. Tsirkin
2013-01-16 16:14         ` Anthony Liguori
2013-01-16 16:48           ` Michael S. Tsirkin
2013-01-17 10:31           ` [Qemu-devel] " Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).