From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756005Ab2GEKg0 (ORCPT ); Thu, 5 Jul 2012 06:36:26 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48903 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755693Ab2GEKgY (ORCPT ); Thu, 5 Jul 2012 06:36:24 -0400 From: Jason Wang To: mst@redhat.com, mashirle@us.ibm.com, krkumar2@in.ibm.com, habanero@linux.vnet.ibm.com, rusty@rustcorp.com.au, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, edumazet@google.com, tahm@linux.vnet.ibm.com, jwhan@filewood.snu.ac.kr, davem@davemloft.net Cc: akong@redhat.com, kvm@vger.kernel.org, sri@us.ibm.com, Jason Wang Subject: [net-next RFC V5 0/5] Multiqueue virtio-net Date: Thu, 5 Jul 2012 18:29:49 +0800 Message-Id: <1341484194-8108-1-git-send-email-jasowang@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello All: This series is an update version of multiqueue virtio-net driver based on Krishna Kumar's work to let virtio-net use multiple rx/tx queues to do the packets reception and transmission. Please review and comments. Test Environment: - Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes - Two directed connected 82599 Test Summary: - Highlights: huge improvements on TCP_RR test - Lowlights: regression on small packet transmission, higher cpu utilization than single queue, need further optimization Analysis of the performance result: - I count the number of packets sending/receiving during the test, and multiqueue show much more ability in terms of packets per second. - For the tx regression, multiqueue send about 1-2 times of more packets compared to single queue, and the packets size were much smaller than single queue does. I suspect tcp does less batching in multiqueue, so I hack the tcp_write_xmit() to forece more batching, multiqueue works as well as singlequeue for both small transmission and throughput - I didn't pack the accelerate RFS with virtio-net in this sereis as it still need further shaping, for the one that interested in this please see: http://www.mail-archive.com/kvm@vger.kernel.org/msg64111.html Changes from V4: - Add ability to negotiate the number of queues through control virtqueue - Ethtool -{L|l} support and default the tx/rx queue number to 1 - Expose the API to set irq affinity instead of irq itself Changes from V3: - Rebase to the net-next - Let queue 2 to be the control virtqueue to obey the spec - Prodives irq affinity - Choose txq based on processor id References: - V4: https://lkml.org/lkml/2012/6/25/120 - V3: http://lwn.net/Articles/467283/ Test result: 1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning - Guest to External Host TCP STREAM sessions size throughput1 throughput2 norm1 norm2 1 64 650.55 655.61 100% 24.88 24.86 99% 2 64 1446.81 1309.44 90% 30.49 27.16 89% 4 64 1430.52 1305.59 91% 30.78 26.80 87% 8 64 1450.89 1270.82 87% 30.83 25.95 84% 1 256 1699.45 1779.58 104% 56.75 59.08 104% 2 256 4902.71 3446.59 70% 98.53 62.78 63% 4 256 4803.76 2980.76 62% 97.44 54.68 56% 8 256 5128.88 3158.74 61% 104.68 58.61 55% 1 512 2837.98 2838.42 100% 89.76 90.41 100% 2 512 6742.59 5495.83 81% 155.03 99.07 63% 4 512 9193.70 5900.17 64% 202.84 106.44 52% 8 512 9287.51 7107.79 76% 202.18 129.08 63% 1 1024 4166.42 4224.98 101% 128.55 129.86 101% 2 1024 6196.94 7823.08 126% 181.80 168.81 92% 4 1024 9113.62 9219.49 101% 235.15 190.93 81% 8 1024 9324.25 9402.66 100% 239.10 179.99 75% 1 2048 7441.63 6534.04 87% 248.01 215.63 86% 2 2048 7024.61 7414.90 105% 225.79 219.62 97% 4 2048 8971.49 9269.00 103% 278.94 220.84 79% 8 2048 9314.20 9359.96 100% 268.36 192.23 71% 1 4096 8282.60 8990.08 108% 277.45 320.05 115% 2 4096 9194.80 9293.78 101% 317.02 248.76 78% 4 4096 9340.73 9313.19 99% 300.34 230.35 76% 8 4096 9148.23 9347.95 102% 279.49 199.43 71% 1 16384 8787.89 8766.31 99% 312.38 316.53 101% 2 16384 9306.35 9156.14 98% 319.53 279.83 87% 4 16384 9177.81 9307.50 101% 312.69 230.07 73% 8 16384 9035.82 9188.00 101% 298.32 199.17 66% - TCP RR sessions size throughput1 throughput2 norm1 norm2 50 1 54695.41 84164.98 153% 1957.33 1901.31 97% 100 1 60141.88 88598.94 147% 2157.90 2000.45 92% 250 1 74763.56 135584.22 181% 2541.94 2628.59 103% 50 64 51628.38 82867.50 160% 1872.55 1812.16 96% 100 64 60367.73 84080.60 139% 2215.69 1867.69 84% 250 64 68502.70 124910.59 182% 2321.43 2495.76 107% 50 128 53477.08 77625.07 145% 1905.10 1870.99 98% 100 128 59697.56 74902.37 125% 2230.66 1751.03 78% 250 128 71248.74 133963.55 188% 2453.12 2711.72 110% 50 256 47663.86 67742.63 142% 1880.45 1735.30 92% 100 256 54051.84 68738.57 127% 2123.03 1778.59 83% 250 256 68250.06 124487.90 182% 2321.89 2598.60 111% - External Host to Guest TCP STRAM sessions size throughput1 throughput2 norm1 norm2 1 64 847.71 864.83 102% 57.99 57.93 99% 2 64 1690.82 1544.94 91% 80.13 55.09 68% 4 64 3434.98 3455.53 100% 127.17 89.00 69% 8 64 5890.19 6557.35 111% 194.70 146.52 75% 1 256 2094.04 2109.14 100% 130.73 127.14 97% 2 256 5218.13 3731.97 71% 219.15 114.02 52% 4 256 6734.51 9213.47 136% 227.87 208.31 91% 8 256 6452.86 9402.78 145% 224.83 207.77 92% 1 512 3945.07 4203.68 106% 279.72 273.30 97% 2 512 7878.96 8122.55 103% 278.25 231.71 83% 4 512 7645.89 9402.13 122% 252.10 217.42 86% 8 512 6657.06 9403.71 141% 239.81 214.89 89% 1 1024 5729.06 5111.21 89% 289.38 303.09 104% 2 1024 8097.27 8159.67 100% 269.29 242.97 90% 4 1024 7778.93 8919.02 114% 261.28 205.50 78% 8 1024 6458.02 9360.02 144% 221.26 208.09 94% 1 2048 6426.94 5195.59 80% 292.52 307.47 105% 2 2048 8221.90 9025.66 109% 283.80 242.25 85% 4 2048 7364.72 8527.79 115% 248.10 198.36 79% 8 2048 6760.63 9161.07 135% 230.53 205.12 88% 1 4096 7247.02 6874.21 94% 276.23 287.68 104% 2 4096 8346.04 8818.65 105% 281.49 254.81 90% 4 4096 6710.00 9354.59 139% 216.41 210.13 97% 8 4096 6265.69 9406.87 150% 206.69 210.92 102% 1 16384 8159.50 8048.79 98% 266.94 283.11 106% 2 16384 8525.66 8552.41 100% 294.36 239.27 81% 4 16384 6042.24 8447.86 139% 200.21 196.40 98% 8 16384 6432.63 9403.49 146% 211.48 206.13 97% 2) 1 vm 4 vcpu 1q vs 4q, 1 - 1q, 2 - 4q, no pinning - Guest to External Host TCP STREAM sessions size throughput1 throughput2 norm1 norm2 1 64 636.93 657.69 103% 23.55 24.42 103% 2 64 1457.46 1268.78 87% 30.97 26.02 84% 4 64 3062.86 2302.43 75% 41.00 29.64 72% 8 64 3107.68 2308.32 74% 41.62 29.07 69% 1 256 1743.50 1750.11 100% 59.00 56.63 95% 2 256 4582.61 2870.31 62% 92.47 51.97 56% 4 256 8440.96 4795.37 56% 135.10 56.39 41% 8 256 9240.31 6654.82 72% 144.76 74.89 51% 1 512 2918.25 2735.26 93% 91.08 86.47 94% 2 512 8978.32 5107.95 56% 200.00 94.97 47% 4 512 8850.39 6864.37 77% 190.32 101.09 53% 8 512 9270.30 8483.01 91% 193.44 118.73 61% 1 1024 4416.10 3679.70 83% 135.54 110.63 81% 2 1024 9085.20 8770.48 96% 242.23 175.59 72% 4 1024 9158.57 9011.56 98% 234.39 159.17 67% 8 1024 9345.89 9067.43 97% 233.35 138.73 59% 1 2048 8455.19 6077.94 71% 338.52 190.16 56% 2 2048 9223.32 8237.73 89% 270.00 198.27 73% 4 2048 9080.75 9257.63 101% 261.30 172.80 66% 8 2048 9177.39 8977.10 97% 256.89 147.50 57% 1 4096 8665.35 8394.78 96% 289.63 289.85 100% 2 4096 7850.73 8857.86 112% 253.33 252.62 99% 4 4096 9332.55 8508.37 91% 289.19 151.29 52% 8 4096 8482.30 9146.80 107% 255.41 156.02 61% 1 16384 8825.72 8778.26 99% 314.60 308.89 98% 2 16384 9283.85 8927.40 96% 316.48 246.98 78% 4 16384 7766.95 8708.06 112% 265.25 155.59 58% 8 16384 8945.55 8940.23 99% 298.45 151.32 50% - TCP_RR sessions size throughput1 throughput2 norm1 norm2 50 1 60848.70 81719.39 134% 2196.86 1551.05 70% 100 1 61886.19 81425.02 131% 2215.76 1517.52 68% 250 1 72058.41 162597.84 225% 2441.84 2278.14 93% 50 64 51646.93 74160.10 143% 1861.07 1322.22 71% 100 64 57574.86 83488.26 145% 2076.54 1479.79 71% 250 64 67583.35 138482.15 204% 2314.46 2022.83 87% 50 128 59931.51 71633.03 119% 2244.60 1309.18 58% 100 128 58329.80 73104.90 125% 2202.98 1329.52 60% 250 128 71021.55 161067.73 226% 2469.11 2205.28 89% 50 256 47509.24 64330.24 135% 1915.75 1269.90 66% 100 256 49293.03 68507.94 138% 1939.75 1263.64 65% 250 256 63169.07 138390.68 219% 2255.47 2098.13 93% - External Host to Guest TCP STREAM sessions size throughput1 throughput2 norm1 norm2 1 64 850.18 854.96 100% 56.94 58.25 102% 2 64 1659.12 1730.25 104% 81.65 67.57 82% 4 64 3254.70 3397.17 104% 118.57 76.21 64% 8 64 6251.97 6389.29 102% 207.68 104.21 50% 1 256 2029.14 2105.18 103% 116.45 119.69 102% 2 256 5412.02 4260.32 78% 240.87 139.73 58% 4 256 7777.28 8743.12 112% 263.20 174.65 66% 8 256 6459.51 9388.93 145% 218.94 158.37 72% 1 512 4566.31 4269.30 93% 274.74 289.83 105% 2 512 7444.52 8240.64 110% 286.24 243.74 85% 4 512 7722.29 9391.16 121% 261.96 180.36 68% 8 512 6228.50 9134.52 146% 209.17 161.00 76% 1 1024 4965.50 4953.68 99% 307.64 280.48 91% 2 1024 8270.08 7733.71 93% 288.32 197.04 68% 4 1024 7551.04 9394.58 124% 268.41 206.62 76% 8 1024 6307.78 9179.03 145% 216.67 159.63 73% 1 2048 5741.12 5948.80 103% 290.34 268.66 92% 2 2048 7932.79 8766.05 110% 262.96 215.90 82% 4 2048 6907.55 9255.97 133% 233.56 203.96 87% 8 2048 6037.22 9399.41 155% 197.14 164.09 83% 1 4096 7131.70 7535.10 105% 279.43 275.12 98% 2 4096 8109.17 9348.04 115% 274.29 211.49 77% 4 4096 6878.92 9319.13 135% 244.21 192.06 78% 8 4096 6265.92 9408.35 150% 211.85 159.26 75% 1 16384 8288.01 8596.39 103% 272.85 290.22 106% 2 16384 8166.29 9280.12 113% 277.04 236.61 85% 4 16384 6446.97 9382.22 145% 222.91 187.24 83% 8 16384 6066.98 9405.51 155% 198.98 157.09 78% 3) 2 vms each with 2 vcpus, 1q vs 2q - pin vhost/vcpu in the same node - 2 Guests to External Hosts TCP STREAM sessions size throughput1 throughput2 norm1 norm2 1 64 1442.07 1475.11 102% 30.82 31.21 101% 2 64 3124.87 2900.93 92% 40.29 35.95 89% 4 64 3166.52 2864.04 90% 40.70 35.47 87% 8 64 3141.45 2848.94 90% 40.38 35.34 87% 1 256 3628.54 3711.73 102% 68.47 70.22 102% 2 256 7806.95 7586.69 97% 111.23 84.38 75% 4 256 8823.65 7612.74 86% 132.92 85.04 63% 8 256 9194.89 9373.41 101% 135.98 119.62 87% 1 512 7106.67 7128.00 100% 124.79 124.30 99% 2 512 9190.22 9397.33 102% 180.84 149.34 82% 4 512 9401.01 9376.67 99% 173.00 140.15 81% 8 512 8572.84 9032.90 105% 150.49 127.58 84% 1 1024 9361.93 9379.24 100% 205.81 202.94 98% 2 1024 9386.69 9389.04 100% 201.78 165.75 82% 4 1024 9403.43 9378.54 99% 195.33 152.06 77% 8 1024 9213.63 9180.64 99% 178.99 141.51 79% 1 2048 9338.95 9384.67 100% 223.22 227.86 102% 2 2048 9389.28 9389.45 100% 202.37 170.08 84% 4 2048 9405.86 9388.71 99% 193.76 161.54 83% 8 2048 9352.40 9384.06 100% 189.16 157.06 83% 1 4096 9380.74 9384.90 100% 239.37 241.56 100% 2 4096 9393.47 9376.74 99% 213.84 195.61 91% 4 4096 9393.85 9381.50 99% 198.06 170.18 85% 8 4096 9400.41 9232.31 98% 192.87 163.56 84% 1 16384 9348.18 9335.55 99% 253.02 254.86 100% 2 16384 9384.97 9359.53 99% 218.56 208.59 95% 4 16384 9326.60 9382.15 100% 206.24 179.72 87% 8 16384 9355.82 9392.85 100% 198.22 172.89 87% - TCP RR sessions size throughput1 throughput2 norm1 norm2 50 1 200340.33 261750.19 130% 2935.27 3018.59 102% 100 1 236141.58 266304.49 112% 3452.16 3071.74 88% 250 1 361574.59 320825.08 88% 4972.98 3705.70 74% 50 64 225748.53 242671.12 107% 3011.48 2869.07 95% 100 64 249885.37 260453.72 104% 3240.21 3063.67 94% 250 64 360341.12 310775.60 86% 4682.42 3657.91 78% 50 128 227995.27 289320.38 126% 2950.92 3479.37 117% 100 128 239491.11 291135.77 121% 3099.55 3508.75 113% 250 128 390390.68 362484.35 92% 5042.30 4368.52 86% 50 256 222604.51 317140.97 142% 3058.08 3839.39 125% 100 256 254770.92 335606.03 131% 3326.16 4046.65 121% 250 256 400584.52 436749.22 109% 5220.79 5278.86 101% - External Host to 2 Guests sessions size throughput1 throughput2 norm1 norm2 1 64 1667.99 1684.50 100% 59.66 60.77 101% 2 64 3338.83 3379.97 101% 83.61 64.82 77% 4 64 6613.65 6619.11 100% 131.00 97.19 74% 8 64 6553.07 6418.31 97% 141.35 98.27 69% 1 256 3938.40 4068.52 103% 125.21 123.76 98% 2 256 9215.57 9210.88 99% 185.31 154.27 83% 4 256 9407.29 9008.13 95% 186.72 150.01 80% 8 256 9377.17 9385.57 100% 190.28 137.59 72% 1 512 7360.19 6984.80 94% 214.09 211.66 98% 2 512 9392.91 9401.88 100% 193.92 173.11 89% 4 512 9382.64 9394.34 100% 189.27 145.80 77% 8 512 9308.60 9094.08 97% 189.70 141.26 74% 1 1024 9153.26 9066.06 99% 223.07 219.95 98% 2 1024 9393.38 9398.43 100% 194.02 173.82 89% 4 1024 9395.92 8960.73 95% 192.61 145.82 75% 8 1024 9388.92 9399.08 100% 191.18 143.87 75% 1 2048 9355.32 9240.63 98% 221.50 223.03 100% 2 2048 9395.68 9399.62 100% 193.31 177.21 91% 4 2048 9397.67 9399.56 100% 195.25 157.53 80% 8 2048 9397.89 9401.70 100% 197.57 146.96 74% 1 4096 9375.84 9381.72 100% 223.06 225.06 100% 2 4096 9389.47 9396.00 100% 193.91 197.13 101% 4 4096 9397.45 9400.11 100% 192.33 163.60 85% 8 4096 9105.40 9415.76 103% 192.71 140.41 72% 1 16384 9381.53 9381.40 99% 223.53 225.66 100% 2 16384 9387.90 9395.44 100% 193.34 177.03 91% 4 16384 9397.92 9410.98 100% 195.04 151.14 77% 8 16384 9259.00 9419.48 101% 194.91 153.48 78% 4) Local vm to vm 2 vcpu 1q vs 2q - pin vcpu/thread in the same numa node - VM to VM TCP STREAM sessions size throughput1 throughput2 norm1 norm2 1 64 576.05 576.14 100% 12.25 12.32 100% 2 64 1266.75 1160.04 91% 19.10 16.05 84% 4 64 1267.34 1123.70 88% 19.08 15.51 81% 8 64 1230.88 1174.70 95% 18.53 15.58 84% 1 256 1311.00 1303.02 99% 25.34 25.35 100% 2 256 5400.26 2794.00 51% 75.92 36.43 47% 4 256 5200.67 2818.88 54% 72.81 33.92 46% 8 256 5234.55 2893.74 55% 73.10 34.97 47% 1 512 3244.09 3263.72 100% 56.48 56.65 100% 2 512 8172.16 4661.15 57% 119.05 67.89 57% 4 512 10567.44 7063.25 66% 147.76 77.27 52% 8 512 10477.87 8471.33 80% 145.94 102.91 70% 1 1024 5432.54 5333.99 98% 93.69 92.38 98% 2 1024 12590.24 9259.97 73% 185.37 135.28 72% 4 1024 15600.53 10731.93 68% 222.20 123.60 55% 8 1024 16222.87 10704.85 65% 227.05 113.81 50% 1 2048 6667.61 7484.37 112% 116.75 129.72 111% 2 2048 8180.43 11500.88 140% 137.84 156.64 113% 4 2048 15127.93 14416.16 95% 227.60 154.59 67% 8 2048 16381.79 14794.10 90% 244.29 158.45 64% 1 4096 7375.63 8948.90 121% 131.97 156.57 118% 2 4096 9321.16 14443.21 154% 161.24 163.74 101% 4 4096 13028.45 15984.94 122% 212.78 171.26 80% 8 4096 15611.28 18810.54 120% 245.15 198.65 81% 1 16384 15304.38 14202.08 92% 259.94 244.04 93% 2 16384 15508.97 15913.09 102% 261.30 244.26 93% 4 16384 14859.98 20164.34 135% 248.29 214.26 86% 8 16384 15594.59 19960.99 127% 253.79 211.27 83% - TCP RR sessions size throughput1 throughput2 norm1 norm2 50 1 54972.51 69820.99 127% 1133.58 1063.58 93% 100 1 55847.16 72407.93 129% 1155.73 1024.35 88% 250 1 60066.23 108266.50 180% 1114.30 1323.55 118% 50 64 48727.63 62378.32 128% 1014.29 888.78 87% 100 64 51804.65 69250.51 133% 1077.78 986.97 91% 250 64 61278.68 100015.78 163% 1076.93 1243.18 115% 50 256 51593.29 62046.22 120% 1069.14 871.08 81% 100 256 51647.00 68197.43 132% 1071.66 958.51 89% 250 256 60433.88 99072.59 163% 1072.41 1199.10 111% 50 512 52177.79 66483.77 127% 1082.65 960.82 88% 100 512 50351.67 62537.63 124% 1041.61 876.41 84% 250 512 60510.14 103856.79 171% 1055.21 1245.17 118% Jason Wang (4): virtio_ring: move queue_index to vring_virtqueue virtio: intorduce an API to set affinity for a virtqueue virtio_net: multiqueue support virtio_net: support negotiating the number of queues through ctrl vq Krishna Kumar (1): virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE drivers/net/virtio_net.c | 792 +++++++++++++++++++++++++++++------------ drivers/virtio/virtio_mmio.c | 5 +- drivers/virtio/virtio_pci.c | 58 +++- drivers/virtio/virtio_ring.c | 17 + include/linux/virtio.h | 4 + include/linux/virtio_config.h | 21 ++ include/linux/virtio_net.h | 10 + 7 files changed, 677 insertions(+), 230 deletions(-)