All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, davem@davemloft.net,
	virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	rusty@rustcorp.com.au, krkumar2@in.ibm.com
Cc: kvm@vger.kernel.org, Jason Wang <jasowang@redhat.com>
Subject: [rfc net-next v6 0/3] Multiqueue virtio-net
Date: Tue, 30 Oct 2012 18:03:20 +0800	[thread overview]
Message-ID: <1351591403-23065-1-git-send-email-jasowang@redhat.com> (raw)

Hi all:

This series is an update version of multiqueue virtio-net driver based on
Krishna Kumar's work to let virtio-net use multiple rx/tx queues to do the
packets reception and transmission. Please review and comments.

Changes from v5:
- Align the implementation with the RFC spec update v4
- Switch the mode between single mode and multiqueue mode without reset
- Remove the 256 limitation of queues
- Use helpers to do the mapping between virtqueues and tx/rx queues
- Use commbined channels instead of separated rx/tx queus when do the queue
number configuartion
- Other coding style comments from Michael

Reference:
- A protype implementation of qemu-kvm support could by found in
git://github.com/jasowang/qemu-kvm-mq.git
- V5 could be found at http://lwn.net/Articles/505388/
- V4 could be found at https://lkml.org/lkml/2012/6/25/120
- V2 could be found at http://lwn.net/Articles/467283/
- Michael virtio-spec: http://www.spinics.net/lists/netdev/msg209986.html

Perf Numbers:

- Pktgen test shows the receiving capability of the multiqueue virtio-net were
  dramatically improved.
- Netperf result shows latency were greately improved according to the test
result. The throughput were kept or improved when transfter with large
packets. But we get regression with small packet (<1500)
transmission/receiving. According to the satistics, TCP tends batch less when mq
is enabled which means much more but smaller pakcets were sent/received whcih
lead much higher cpu utilization and degradate the throughput. In the future,
either tuning of TCP or automatic switch bettwen mq and sq is needed.

Test environment:
- Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes
- Two directed connected 82599
- Host/Guest kenrel: net-next with the mq virtio-net patches and mq tuntap
patches

Pktgen test:
- Local host generate 64 byte UDP packet to guest.
- average of 20 runs

20 runs
#q #vcpu   kpps     +improvement
1q 1vcpu:  264kpps  +0%
2q 2vcpu:  451kpps  +70%
3q 3vcpu:  661kpps  +150%
4q 4vcpu:  941kpps  +250%

Netperf Local VM to VM test:
- VM1 and its vcpu/vhost thread in numa node 0
- VM2 and its vcpu/vhost thread in numa node 1
- a script is used to lauch the netperf with demo mode and do the postprocessing
  to measure the aggreagte result with the help of timestamp
- average of 3 runs

TCP_RR:
size/session/+lat%/+normalize%
    1/     1/    0%/    0%
    1/    10/  +52%/   +6%
    1/    20/  +27%/   +5%
   64/     1/    0%/    0%
   64/    10/  +45%/   +4%
   64/    20/  +28%/   +7%
  256/     1/   -1%/    0%
  256/    10/  +38%/   +2%
  256/    20/  +27%/   +6%
TCP_CRR:
size/session/+lat%/+normalize%
    1/     1/   -7%/  -12%
    1/    10/  +34%/   +3%
    1/    20/   +3%/   -8%
   64/     1/   -7%/   -3%
   64/    10/  +32%/   +1%
   64/    20/   +4%/   -7%
  256/     1/   -6%/  -18%
  256/    10/  +33%/    0%
  256/    20/   +4%/   -8%
STREAM:
size/session/+thu%/+normalize%
    1/     1/   -3%/    0%
    1/     2/   -1%/    0%
    1/     4/   -2%/    0%
   64/     1/    0%/   +1%
   64/     2/   -6%/   -6%
   64/     4/   -8%/  -14%
  256/     1/    0%/    0%
  256/     2/  -48%/  -52%
  256/     4/  -50%/  -55%
  512/     1/   +4%/   +5%
  512/     2/  -29%/  -33%
  512/     4/  -37%/  -49%
 1024/     1/   +6%/   +7%
 1024/     2/  -46%/  -51%
 1024/     4/  -15%/  -17%
 4096/     1/   +1%/   +1%
 4096/     2/  +16%/   -2%
 4096/     4/  +31%/  -10%
16384/     1/    0%/    0%
16384/     2/  +16%/   +9%
16384/     4/  +17%/   -9%

Netperf test between external host and guest over 10gb(ixgbe):
- VM thread and vhost threads were pinned int the node 0
- a script is used to lauch the netperf with demo mode and do the postprocessing
  to measure the aggreagte result with the help of timestamp
- average of 3 runs

TCP_RR:
size/session/+lat%/+normalize%
    1/     1/    0%/   +6%
    1/    10/  +41%/   +2%
    1/    20/  +10%/   -3%
   64/     1/    0%/  -10%
   64/    10/  +39%/   +1%
   64/    20/  +22%/   +2%
  256/     1/    0%/   +2%
  256/    10/  +26%/  -17%
  256/    20/  +24%/  +10%
TCP_CRR:
size/session/+lat%/+normalize%
    1/     1/   -3%/   -3%
    1/    10/  +34%/   -3%
    1/    20/    0%/  -15%
   64/     1/   -3%/   -3%
   64/    10/  +34%/   -3%
   64/    20/   -1%/  -16%
  256/     1/   -1%/   -3%
  256/    10/  +38%/   -2%
  256/    20/   -2%/  -17%
TCP_STREAM:(guest receiving)
size/session/+thu%/+normalize%
    1/     1/   +1%/  +14%
    1/     2/    0%/   +4%
    1/     4/   -2%/  -24%
   64/     1/   -6%/   +1%
   64/     2/   +1%/   +1%
   64/     4/   -1%/  -11%
  256/     1/   +3%/   +4%
  256/     2/    0%/   -1%
  256/     4/    0%/  -15%
  512/     1/   +4%/    0%
  512/     2/  -10%/  -12%
  512/     4/    0%/  -11%
 1024/     1/   -5%/    0%
 1024/     2/  -11%/  -16%
 1024/     4/   +3%/  -11%
 4096/     1/  +27%/   +6%
 4096/     2/    0%/  -12%
 4096/     4/    0%/  -20%
16384/     1/    0%/   -2%
16384/     2/    0%/   -9%
16384/     4/  +10%/   -2%
TCP_MAERTS:(guest sending)
    1/     1/   -1%/    0%
    1/     2/    0%/    0%
    1/     4/   -5%/    0%
   64/     1/    0%/    0%
   64/     2/   -7%/   -8%
   64/     4/   -7%/   -8%
  256/     1/    0%/    0%
  256/     2/  -28%/  -28%
  256/     4/  -28%/  -29%
  512/     1/    0%/    0%
  512/     2/  -15%/  -13%
  512/     4/  -53%/  -59%
 1024/     1/   +4%/  +13%
 1024/     2/   -7%/  -18%
 1024/     4/   +1%/  -18%
 4096/     1/   +2%/    0%
 4096/     2/   +3%/  -19%
 4096/     4/   -1%/  -19%
16384/     1/   -3%/   -1%
16384/     2/    0%/  -12%
16384/     4/    0%/  -10%

Jason Wang (2):
  virtio_net: multiqueue support
  virtio-net: change the number of queues through ethtool

Krishna Kumar (1):
  virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE

 drivers/net/virtio_net.c        |  790 ++++++++++++++++++++++++++++-----------
 include/uapi/linux/virtio_net.h |   19 +
 2 files changed, 594 insertions(+), 215 deletions(-)


WARNING: multiple messages have this Message-ID (diff)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, davem@davemloft.net,
	virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	rusty@rustcorp.com.au, krkumar2@in.ibm.com
Cc: kvm@vger.kernel.org
Subject: [rfc net-next v6 0/3] Multiqueue virtio-net
Date: Tue, 30 Oct 2012 18:03:20 +0800	[thread overview]
Message-ID: <1351591403-23065-1-git-send-email-jasowang@redhat.com> (raw)

Hi all:

This series is an update version of multiqueue virtio-net driver based on
Krishna Kumar's work to let virtio-net use multiple rx/tx queues to do the
packets reception and transmission. Please review and comments.

Changes from v5:
- Align the implementation with the RFC spec update v4
- Switch the mode between single mode and multiqueue mode without reset
- Remove the 256 limitation of queues
- Use helpers to do the mapping between virtqueues and tx/rx queues
- Use commbined channels instead of separated rx/tx queus when do the queue
number configuartion
- Other coding style comments from Michael

Reference:
- A protype implementation of qemu-kvm support could by found in
git://github.com/jasowang/qemu-kvm-mq.git
- V5 could be found at http://lwn.net/Articles/505388/
- V4 could be found at https://lkml.org/lkml/2012/6/25/120
- V2 could be found at http://lwn.net/Articles/467283/
- Michael virtio-spec: http://www.spinics.net/lists/netdev/msg209986.html

Perf Numbers:

- Pktgen test shows the receiving capability of the multiqueue virtio-net were
  dramatically improved.
- Netperf result shows latency were greately improved according to the test
result. The throughput were kept or improved when transfter with large
packets. But we get regression with small packet (<1500)
transmission/receiving. According to the satistics, TCP tends batch less when mq
is enabled which means much more but smaller pakcets were sent/received whcih
lead much higher cpu utilization and degradate the throughput. In the future,
either tuning of TCP or automatic switch bettwen mq and sq is needed.

Test environment:
- Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes
- Two directed connected 82599
- Host/Guest kenrel: net-next with the mq virtio-net patches and mq tuntap
patches

Pktgen test:
- Local host generate 64 byte UDP packet to guest.
- average of 20 runs

20 runs
#q #vcpu   kpps     +improvement
1q 1vcpu:  264kpps  +0%
2q 2vcpu:  451kpps  +70%
3q 3vcpu:  661kpps  +150%
4q 4vcpu:  941kpps  +250%

Netperf Local VM to VM test:
- VM1 and its vcpu/vhost thread in numa node 0
- VM2 and its vcpu/vhost thread in numa node 1
- a script is used to lauch the netperf with demo mode and do the postprocessing
  to measure the aggreagte result with the help of timestamp
- average of 3 runs

TCP_RR:
size/session/+lat%/+normalize%
    1/     1/    0%/    0%
    1/    10/  +52%/   +6%
    1/    20/  +27%/   +5%
   64/     1/    0%/    0%
   64/    10/  +45%/   +4%
   64/    20/  +28%/   +7%
  256/     1/   -1%/    0%
  256/    10/  +38%/   +2%
  256/    20/  +27%/   +6%
TCP_CRR:
size/session/+lat%/+normalize%
    1/     1/   -7%/  -12%
    1/    10/  +34%/   +3%
    1/    20/   +3%/   -8%
   64/     1/   -7%/   -3%
   64/    10/  +32%/   +1%
   64/    20/   +4%/   -7%
  256/     1/   -6%/  -18%
  256/    10/  +33%/    0%
  256/    20/   +4%/   -8%
STREAM:
size/session/+thu%/+normalize%
    1/     1/   -3%/    0%
    1/     2/   -1%/    0%
    1/     4/   -2%/    0%
   64/     1/    0%/   +1%
   64/     2/   -6%/   -6%
   64/     4/   -8%/  -14%
  256/     1/    0%/    0%
  256/     2/  -48%/  -52%
  256/     4/  -50%/  -55%
  512/     1/   +4%/   +5%
  512/     2/  -29%/  -33%
  512/     4/  -37%/  -49%
 1024/     1/   +6%/   +7%
 1024/     2/  -46%/  -51%
 1024/     4/  -15%/  -17%
 4096/     1/   +1%/   +1%
 4096/     2/  +16%/   -2%
 4096/     4/  +31%/  -10%
16384/     1/    0%/    0%
16384/     2/  +16%/   +9%
16384/     4/  +17%/   -9%

Netperf test between external host and guest over 10gb(ixgbe):
- VM thread and vhost threads were pinned int the node 0
- a script is used to lauch the netperf with demo mode and do the postprocessing
  to measure the aggreagte result with the help of timestamp
- average of 3 runs

TCP_RR:
size/session/+lat%/+normalize%
    1/     1/    0%/   +6%
    1/    10/  +41%/   +2%
    1/    20/  +10%/   -3%
   64/     1/    0%/  -10%
   64/    10/  +39%/   +1%
   64/    20/  +22%/   +2%
  256/     1/    0%/   +2%
  256/    10/  +26%/  -17%
  256/    20/  +24%/  +10%
TCP_CRR:
size/session/+lat%/+normalize%
    1/     1/   -3%/   -3%
    1/    10/  +34%/   -3%
    1/    20/    0%/  -15%
   64/     1/   -3%/   -3%
   64/    10/  +34%/   -3%
   64/    20/   -1%/  -16%
  256/     1/   -1%/   -3%
  256/    10/  +38%/   -2%
  256/    20/   -2%/  -17%
TCP_STREAM:(guest receiving)
size/session/+thu%/+normalize%
    1/     1/   +1%/  +14%
    1/     2/    0%/   +4%
    1/     4/   -2%/  -24%
   64/     1/   -6%/   +1%
   64/     2/   +1%/   +1%
   64/     4/   -1%/  -11%
  256/     1/   +3%/   +4%
  256/     2/    0%/   -1%
  256/     4/    0%/  -15%
  512/     1/   +4%/    0%
  512/     2/  -10%/  -12%
  512/     4/    0%/  -11%
 1024/     1/   -5%/    0%
 1024/     2/  -11%/  -16%
 1024/     4/   +3%/  -11%
 4096/     1/  +27%/   +6%
 4096/     2/    0%/  -12%
 4096/     4/    0%/  -20%
16384/     1/    0%/   -2%
16384/     2/    0%/   -9%
16384/     4/  +10%/   -2%
TCP_MAERTS:(guest sending)
    1/     1/   -1%/    0%
    1/     2/    0%/    0%
    1/     4/   -5%/    0%
   64/     1/    0%/    0%
   64/     2/   -7%/   -8%
   64/     4/   -7%/   -8%
  256/     1/    0%/    0%
  256/     2/  -28%/  -28%
  256/     4/  -28%/  -29%
  512/     1/    0%/    0%
  512/     2/  -15%/  -13%
  512/     4/  -53%/  -59%
 1024/     1/   +4%/  +13%
 1024/     2/   -7%/  -18%
 1024/     4/   +1%/  -18%
 4096/     1/   +2%/    0%
 4096/     2/   +3%/  -19%
 4096/     4/   -1%/  -19%
16384/     1/   -3%/   -1%
16384/     2/    0%/  -12%
16384/     4/    0%/  -10%

Jason Wang (2):
  virtio_net: multiqueue support
  virtio-net: change the number of queues through ethtool

Krishna Kumar (1):
  virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE

 drivers/net/virtio_net.c        |  790 ++++++++++++++++++++++++++++-----------
 include/uapi/linux/virtio_net.h |   19 +
 2 files changed, 594 insertions(+), 215 deletions(-)

             reply	other threads:[~2012-10-30 10:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-30 10:03 Jason Wang [this message]
2012-10-30 10:03 ` [rfc net-next v6 0/3] Multiqueue virtio-net Jason Wang
2012-10-30 10:03 ` [rfc net-next v6 1/3] virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE Jason Wang
2012-10-30 10:03   ` Jason Wang
2012-10-30 10:03 ` [rfc net-next v6 2/3] virtio_net: multiqueue support Jason Wang
2012-10-30 10:03   ` Jason Wang
2012-11-04 23:16   ` Rusty Russell
2012-11-04 23:16     ` Rusty Russell
2012-11-19  6:18     ` Jason Wang
2012-11-19  6:18       ` Jason Wang
2012-11-05  1:08   ` Rusty Russell
2012-11-05  1:08     ` Rusty Russell
2012-11-13  6:40     ` Michael S. Tsirkin
2012-11-13  6:40       ` Michael S. Tsirkin
2012-11-17  0:35       ` Ben Hutchings
2012-11-17  0:35         ` Ben Hutchings
2012-11-18  9:13         ` Michael S. Tsirkin
2012-11-18  9:13           ` Michael S. Tsirkin
2012-11-19 18:44           ` Ben Hutchings
2012-11-19 18:44           ` Ben Hutchings
2012-11-19  7:40     ` Jason Wang
2012-11-19  7:40       ` Jason Wang
2012-10-30 10:03 ` [rfc net-next v6 3/3] virtio-net: change the number of queues through ethtool Jason Wang
2012-10-30 10:03   ` Jason Wang
2012-11-04 23:46   ` Rusty Russell
2012-11-04 23:46     ` Rusty Russell
2012-11-19  6:22     ` Jason Wang
2012-11-19  6:22       ` Jason Wang
2012-11-08 21:13   ` Ben Hutchings
2012-11-08 21:13     ` Ben Hutchings
2012-10-30 19:05 ` [rfc net-next v6 0/3] Multiqueue virtio-net Rick Jones
2012-10-30 19:05   ` Rick Jones
2012-10-31 10:33   ` Jason Wang
2012-10-31 10:33     ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1351591403-23065-1-git-send-email-jasowang@redhat.com \
    --to=jasowang@redhat.com \
    --cc=davem@davemloft.net \
    --cc=krkumar2@in.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.