All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: krkumar2@in.ibm.com, kvm@vger.kernel.org, mst@redhat.com,
	qemu-devel@nongnu.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	mirq-linux@rere.qmqm.pl, davem@davemloft.net
Subject: [net-next RFC PATCH 0/7] multiqueue support for tun/tap
Date: Fri, 12 Aug 2011 10:11:31 +0800	[thread overview]
Message-ID: <20036.35795.314019.270841__21588.1518707194$1313115265$gmane$org@gargle.gargle.HOWL> (raw)
In-Reply-To: <20110812015221.31613.95001.stgit@intel-e5620-16-2.englab.nay.redhat.com>

Jason Wang writes:
 > As multi-queue nics were commonly used for high-end servers,
 > current single queue based tap can not satisfy the
 > requirement of scaling guest network performance as the
 > numbers of vcpus increase. So the following series
 > implements multiple queue support in tun/tap.
 > 
 > In order to take advantages of this, a multi-queue capable
 > driver and qemu were also needed. I just rebase the latest
 > version of Krishna's multi-queue virtio-net driver into this
 > series to simplify the test. And for multiqueue supported
 > qemu, you can refer the patches I post in
 > http://www.spinics.net/lists/kvm/msg52808.html. Vhost is
 > also a must to achieve high performance and its code could
 > be used for multi-queue without modification. Alternatively,
 > this series can be also used for Krishna's M:N
 > implementation of multiqueue but I didn't test it.
 > 
 > The idea is simple: each socket were abstracted as a queue
 > for tun/tap, and userspace may open as many files as
 > required and then attach them to the devices. In order to
 > keep the ABI compatibility, device creation were still
 > finished in TUNSETIFF, and two new ioctls TUNATTACHQUEUE and
 > TUNDETACHQUEUE were added for user to manipulate the numbers
 > of queues for the tun/tap.
 > 
 > I've done some basic performance testing of multi queue
 > tap. For tun, I just test it through vpnc.
 > 
 > Notes:
 > - Test shows improvement when receving packets from
 > local/external host to guest, and send big packet from guest
 > to local/external host.
 > - Current multiqueue based virtio-net/tap introduce a
 > regression of send small packet (512 byte) from guest to
 > local/external host. I suspect it's the issue of queue
 > selection in both guest driver and tap. Would continue to
 > investigate.
 > - I would post the perforamnce numbers as a reply of this
 > mail.
 > 
 > TODO:
 > - solve the issue of packet transmission of small packets.
 > - addressing the comments of virtio-net driver
 > - performance tunning
 > 
 > Please review and comment it, Thanks.
 > 
 > ---
 > 
 > Jason Wang (5):
 >       tuntap: move socket/sock related structures to tun_file
 >       tuntap: categorize ioctl
 >       tuntap: introduce multiqueue related flags
 >       tuntap: multiqueue support
 >       tuntap: add ioctls to attach or detach a file form tap device
 > 
 > Krishna Kumar (2):
 >       Change virtqueue structure
 >       virtio-net changes
 > 
 > 
 >  drivers/net/tun.c           |  738 ++++++++++++++++++++++++++-----------------
 >  drivers/net/virtio_net.c    |  578 ++++++++++++++++++++++++----------
 >  drivers/virtio/virtio_pci.c |   10 -
 >  include/linux/if_tun.h      |    5 
 >  include/linux/virtio.h      |    1 
 >  include/linux/virtio_net.h  |    3 
 >  6 files changed, 867 insertions(+), 468 deletions(-)
 > 
 > -- 
 > Jason Wang
 > --
 > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 > the body of a message to majordomo@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > Please read the FAQ at  http://www.tux.org/lkml/

Here are some performance result for multiqueue tap

For multiqueue, the test use qemu-kvm + mq patches, net-next-2.6+
tap mq patches + mq driver,
For single queue, the test use qemu-kvm, net-next-2.6, rfs
were also enabled in the guest during the test.

All test were done by netperf in two i7(Intel(R) Xeon(R) CPU
E5620 2.40GHz) with direct connected 82599 cards.

Quick Notes to the result:
- Regression with Guest to External/Local host of 512 bytes.
- For the External host to guest, could scale or at least
the same as the single queue implementation.

1 Guest to External Host TCP 512 byte

Multiqueue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2054.11                23.43       87
2          2037.32                22.64       89
4          2007.53                22.87       87
8          1993.41                23.82       83
== smp=2 queue=2 ==
sessions  | throughput        | cpu      | normalized
1          1960.58                24.30       80
2          9250.41                32.19       287
4          3897.49                49.31       79
8          4088.44                46.85       87
== smp=4 queue=4 ==
sessions  | throughput        | cpu      | normalized
1          1986.87                23.17       85
2          4431.79                44.64       99
4          8705.83                51.89       167
8          9420.63                45.96       204
== smp=8 queue=8 ==
sessions  | throughput        | cpu      | normalized
1          1820.38                20.17       90
2          3707.64                42.19       87
4          8930.71                63.65       140
8          9391.13                51.90       180

Single-queue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2032.64                22.96       88
2          2058.76                23.22       88
4          2028.97                22.84       88
8          1989.41                23.89       83
== smp=2 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2444.50                25.00       97
2          9298.64                30.76       302
4          8788.58                30.82       285
8          9158.28                30.45       300
== smp=4 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2359.50                25.10       94
2          9325.88                29.83       312
4          9198.29                32.96       279
8          8980.73                32.25       278
== smp=8 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2170.15                23.77       91
2          8329.73                28.79       289
4          8152.25                36.11       225
8          9121.11                40.08       227

2 Guest to external host TCP with default size

Multiqueue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          7767.87                18.43       421
2          9399.18                21.48       437
4          8373.23                21.37       391
8          9310.84                21.91       424
== smp=2 queue=2 ==
sessions  | throughput        | cpu      | normalized
1          9358.75                20.27       461
2          9405.25                30.67       306
4          9407.63                26.24       358
8          9412.77                28.75       327
== smp=4 queue=4 ==
sessions  | throughput        | cpu      | normalized
1          9358.39                22.11       423
2          9401.27                27.29       344
4          9414.98                28.75       327
8          9420.93                31.09       303
== smp=8 queue=8 ==
sessions  | throughput        | cpu      | normalized
1          9057.52                20.09       450
2          8486.72                28.18       301
4          9330.96                40.13       232
8          9377.99                59.41       157

Single Queue Result

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          8192.58                19.30       424
2          9400.31                22.55       416
4          8771.94                21.75       403
8          8922.61                22.50       396
== smp=2 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          9387.28                23.13       405
2          8322.94                24.58       338
4          9404.86                26.22       358
8          9145.79                26.57       344
== smp=4 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2377.83                9.86       241
2          9403.32                26.96       348
4          8822.57                27.23       324
8          9380.85                26.90       348
== smp=8 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          7275.95                21.47       338
2          9407.34                27.39       343
4          8365.05                25.99       321
8          9150.65                27.78       329

3 External Host to guest TCP, default packet size

Multiqueue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          8944.69                25.59       349
2          8503.67                24.95       340
4          7910.54                25.88       305
8          7455.13                26.35       282
== smp=2 queue=2 ==
sessions  | throughput        | cpu      | normalized
1          9370.11                23.70       395
2          9365.97                31.91       293
4          9389.83                34.99       268
8          9405.52                34.83       270
== smp=4 queue=4 ==
sessions  | throughput        | cpu      | normalized
1          9061.71                23.45       386
2          9373.92                22.38       418
4          9399.83                40.89       229
8          9412.92                48.99       192
== smp=8 queue=8 ==
sessions  | throughput        | cpu      | normalized
1          8203.61                24.64       332
2          9286.28                32.68       284
4          9403.61                49.33       190
8          9411.42                64.38       146

Single Queue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          8999.39                26.24       342
2          8921.23                25.00       356
4          7918.52                26.60       297
8          6901.77                25.92       266
== smp=2 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          9016.77                25.82       349
2          8572.92                33.19       258
4          7962.34                28.88       275
8          6959.10                32.77       212
== smp=4 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          8951.43                25.76       347
2          8411.78                35.51       236
4          7874.05                35.99       218
8          6869.55                36.80       186
== smp=8 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          9332.84                25.95       359
2          9103.57                30.37       299
4          7907.03                33.94       232
8          6919.99                38.82       178

4 External Host to guest TCP with 512 byte packet size

Multiqueue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          3354.22                15.75       212
2          6419.73                22.59       284
4          7545.04                25.06       301
8          7550.39                26.32       286
== smp=2 queue=2 ==
sessions  | throughput        | cpu      | normalized
1          3146.17                14.08       223
2          6414.55                21.01       305
4          9389.08                37.86       247
8          9402.39                40.24       233
== smp=4 queue=4 ==
sessions  | throughput        | cpu      | normalized
1          3247.65                14.91       217
2          6528.78                29.89       218
4          9402.89                37.79       248
8          9404.06                47.87       196
== smp=8 queue=8 ==
sessions  | throughput        | cpu      | normalized
1          4367.90                14.16       308
2          6962.76                27.99       248
4          9404.83                41.26       227
8          9412.09                57.74       163

Single Queue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          3253.88                14.53       223
2          6385.90                20.83       306
4          7581.40                26.07       290
8          7025.62                26.54       264
== smp=2 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          3257.61                13.85       235
2          6385.06                20.66       309
4          7465.50                32.27       231
8          7021.31                31.42       223
== smp=4 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          3186.60                15.88       200
2          6298.92                27.40       229
4          7474.69                32.53       229
8          6985.72                33.36       209
== smp=8 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          3279.81                17.63       186
2          6513.77                29.78       218
4          7413.30                35.44       209
8          6936.96                32.68       212


5 Guest to Local host TCP with 512 byte packet size

Multuqueue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          1961.31                35.43       55
2          1974.04                34.76       56
4          1906.74                34.04       56
8          1907.94                34.75       54
== smp=2 queue=2 ==
sessions  | throughput        | cpu      | normalized
1          1971.22                31.95       61
2          2484.96                58.75       42
4          3290.77                53.18       61
8          3031.99                54.11       56
== smp=4 queue=4 ==
sessions  | throughput        | cpu      | normalized
1          1107.56                31.22       35
2          2811.83                59.57       47
4          10276.05                79.79       128
8          12760.93                96.93       131
== smp=8 queue=8 ==
sessions  | throughput        | cpu      | normalized
1          1888.28                32.15       58
2          2335.03                56.72       41
4          9785.72                82.22       119
8          11274.42                95.60       117

Single Queue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          1981.08                31.89       62
2          1970.74                32.57       60
4          1944.63                32.02       60
8          1943.50                31.45       61
== smp=2 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2118.23                34.80       60
2          7221.95                45.63       158
4          7924.92                47.06       168
8          8651.28                47.40       182
== smp=4 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2110.70                33.18       63
2          6602.25                42.86       154
4          9715.38                47.38       205
8          20131.98                61.94       325
== smp=8 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          1881.33                40.69       46
2          7631.25                48.56       157
4          13366.28                59.47       224
8          19949.45                68.85       289

6 Guest to Local host with default packet size.

Multuqueue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          8674.81                34.86       248
2          8576.14                34.72       247
4          8503.87                34.62       245
8          8247.43                33.77       244
== smp=2 queue=2 ==
sessions  | throughput        | cpu      | normalized
1          7785.02                32.25       241
2          14696.71                58.14       252
4          12339.64                51.43       239
8          12997.55                52.53       247
== smp=4 queue=4 ==
sessions  | throughput        | cpu      | normalized
1          8557.25                32.38       264
2          12164.88                58.56       207
4          18144.19                73.69       246
8          29756.33                96.15       309
== smp=8 queue=8 ==
sessions  | throughput        | cpu      | normalized
1          6808.67                36.55       186
2          11590.04                61.14       189
4          23667.67                81.50       290
8          25501.89                92.44       275

Single Queue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          8053.49                36.35       221
2          8493.95                35.21       241
4          8367.26                34.61       241
8          8435.64                35.45       237
== smp=2 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          9259.56                35.24       262
2          17153.83                44.07       389
4          16901.67                45.88       368
8          18180.81                42.34       429
== smp=4 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          8928.11                31.22       285
2          16835.27                47.79       352
4          16923.83                47.78       354
8          18050.62                45.86       393
== smp=8 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2978.88                25.75       115
2          15422.18                41.97       367
4          16137.10                45.90       351
8          16628.30                48.99       339

7 Local host to Guest with defaut 512 packet size

Multiqueue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          3665.90                31.88       114
2          5709.15                38.16       149
4          8803.25                42.92       205
8          10530.33                45.21       232
== smp=2 queue=2 ==
sessions  | throughput        | cpu      | normalized
1          3390.07                31.28       108
2          7502.21                62.42       120
4          14247.63                67.23       211
8          16766.93                69.66       240
== smp=4 queue=4 ==
sessions  | throughput        | cpu      | normalized
1          3580.96                31.90       112
2          4353.46                62.85       69
4          8264.18                77.94       106
8          16014.00                80.11       199
== smp=8 queue=8 ==
sessions  | throughput        | cpu      | normalized
1          1745.36                41.84       41
2          4472.03                73.50       60
4          12646.92                79.86       158
8          18212.21                89.79       202

Single Queue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          4220.96                31.88       132
2          5732.38                37.12       154
4          7006.81                41.60       168
8          10529.09                45.92       229
== smp=2 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2665.41                40.53       65
2          9864.49                59.44       165
4          11678.42                60.20       193
8          16042.60                57.85       277
== smp=4 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2609.10                42.67       61
2          5496.83                68.52       80
4          16848.24                60.49       278
8          14829.66                60.54       244
== smp=8 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          2567.15                44.54       57
2          5902.02                59.32       99
4          13265.99                68.48       193
8          15301.16                63.95       239

8 Local host to Guest with default packet size

Multiqueue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          12531.65                29.95       418
2          12495.93                30.05       415
4          12487.40                31.28       399
8          11501.68                33.51       343
== smp=2 queue=2 ==
sessions  | throughput        | cpu      | normalized
1          12566.08                28.86       435
2          21756.15                54.33       400
4          19899.84                56.37       353
8          19326.62                61.57       313
== smp=4 queue=4 ==
sessions  | throughput        | cpu      | normalized
1          12383.42                28.69       431
2          19714.34                57.62       342
4          20609.45                64.13       321
8          18935.57                95.05       199
== smp=8 queue=8 ==
sessions  | throughput        | cpu      | normalized
1          13736.90                31.95       429
2          26157.13                71.77       364
4          22874.41                78.54       291
8          19960.91                96.08       207

Single Queue Result:

== smp=1 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          12501.11                30.01       416
2          12497.01                28.51       438
4          12429.25                31.09       399
8          12152.53                28.20       430
== smp=2 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          13632.87                35.32       385
2          19900.82                46.28       430
4          17510.87                42.21       414
8          14443.78                35.48       407
== smp=4 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          14584.61                37.70       386
2          12646.50                31.39       402
4          16248.16                49.22       330
8          14131.34                47.48       297
== smp=8 queue=1 ==
sessions  | throughput        | cpu      | normalized
1          16279.89                39.51       412
2          16958.02                53.87       314
4          16906.03                50.35       335
8          14686.25                47.30       310

-- 
Jason Wang

  parent reply	other threads:[~2011-08-12  2:11 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-12  1:54 [net-next RFC PATCH 0/7] multiqueue support for tun/tap Jason Wang
2011-08-12  1:54 ` [Qemu-devel] " Jason Wang
2011-08-12  1:54 ` [net-next RFC PATCH 1/7] tuntap: move socket/sock related structures to tun_file Jason Wang
2011-08-12  1:54   ` [Qemu-devel] " Jason Wang
2011-08-12  1:54 ` Jason Wang
2011-08-12  1:55 ` [net-next RFC PATCH 2/7] tuntap: categorize ioctl Jason Wang
2011-08-12  1:55   ` [Qemu-devel] " Jason Wang
2011-08-12  1:55 ` Jason Wang
2011-08-12  1:55 ` [net-next RFC PATCH 3/7] tuntap: introduce multiqueue related flags Jason Wang
2011-08-12  1:55   ` [Qemu-devel] " Jason Wang
2011-08-12  1:55 ` Jason Wang
2011-08-12  1:55 ` [net-next RFC PATCH 4/7] tuntap: multiqueue support Jason Wang
2011-08-12  1:55   ` [Qemu-devel] " Jason Wang
2011-08-12 14:29   ` Eric Dumazet
2011-08-12 14:29     ` [Qemu-devel] " Eric Dumazet
2011-08-12 14:29     ` Eric Dumazet
2011-08-14  6:05     ` Jason Wang
2011-08-14  6:05     ` Jason Wang
2011-08-14  6:05       ` [Qemu-devel] " Jason Wang
2011-08-14  6:05       ` Jason Wang
2011-08-12 14:29   ` Eric Dumazet
2011-08-12 23:21   ` Paul E. McKenney
2011-08-12 23:21   ` Paul E. McKenney
2011-08-12 23:21     ` [Qemu-devel] " Paul E. McKenney
2011-08-12 23:21     ` Paul E. McKenney
2011-08-14  6:07     ` Jason Wang
2011-08-14  6:07       ` [Qemu-devel] " Jason Wang
2011-08-14  6:07       ` Jason Wang
2011-08-14  6:07     ` Jason Wang
2011-08-12  1:55 ` Jason Wang
2011-08-12  1:55 ` [net-next RFC PATCH 5/7] tuntap: add ioctls to attach or detach a file form tap device Jason Wang
2011-08-12  1:55 ` Jason Wang
2011-08-12  1:55   ` [Qemu-devel] " Jason Wang
2011-08-12  1:55   ` Jason Wang
2011-08-12  1:55 ` [net-next RFC PATCH 6/7] Change virtqueue structure Jason Wang
2011-08-12  1:55 ` Jason Wang
2011-08-12  1:55   ` [Qemu-devel] " Jason Wang
2011-08-12  1:55 ` [net-next RFC PATCH 7/7] virtio-net changes Jason Wang
2011-08-12  1:55   ` [Qemu-devel] " Jason Wang
2011-08-12  9:09   ` Sasha Levin
2011-08-12  9:09   ` Sasha Levin
2011-08-12  9:09     ` Sasha Levin
2011-08-12  9:09     ` Sasha Levin
2011-08-14  5:59     ` [Qemu-devel] " Jason Wang
2011-08-14  5:59     ` Jason Wang
2011-08-14  5:59       ` Jason Wang
2011-08-14  5:59       ` Jason Wang
2011-08-17 13:24   ` WANG Cong
2011-08-17 13:24     ` [Qemu-devel] " WANG Cong
2011-08-12  1:55 ` Jason Wang
2011-08-12  2:11 ` Jason Wang [this message]
2011-08-12  2:11 ` [net-next RFC PATCH 0/7] multiqueue support for tun/tap Jason Wang
2011-08-12  2:11   ` [Qemu-devel] " Jason Wang
2011-08-12  2:11   ` Jason Wang
2011-08-13  0:46 ` Sridhar Samudrala
2011-08-13  0:46 ` Sridhar Samudrala
2011-08-13  0:46   ` [Qemu-devel] " Sridhar Samudrala
2011-08-14  6:19   ` Jason Wang
2011-08-14  6:19   ` Jason Wang
2011-08-14  6:19     ` [Qemu-devel] " Jason Wang
2011-08-14  6:19     ` Jason Wang
2011-08-12  1:54 Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20036.35795.314019.270841__21588.1518707194$1313115265$gmane$org@gargle.gargle.HOWL' \
    --to=jasowang@redhat.com \
    --cc=davem@davemloft.net \
    --cc=krkumar2@in.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mirq-linux@rere.qmqm.pl \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.