All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Rick Jones <rick.jones2@hp.com>
Cc: mst@redhat.com, mashirle@us.ibm.com, krkumar2@in.ibm.com,
	habanero@linux.vnet.ibm.com, rusty@rustcorp.com.au,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, edumazet@google.com,
	tahm@linux.vnet.ibm.com, jwhan@filewood.snu.ac.kr,
	davem@davemloft.net, akong@redhat.com, kvm@vger.kernel.org,
	sri@us.ibm.com
Subject: Re: [net-next RFC V5 0/5] Multiqueue virtio-net
Date: Fri, 06 Jul 2012 15:42:01 +0800	[thread overview]
Message-ID: <4FF696C9.5070907@redhat.com> (raw)
In-Reply-To: <4FF5D2B7.6080602@hp.com>

On 07/06/2012 01:45 AM, Rick Jones wrote:
> On 07/05/2012 03:29 AM, Jason Wang wrote:
>
>>
>> Test result:
>>
>> 1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning
>>
>> - Guest to External Host TCP STREAM
>> sessions size throughput1 throughput2   norm1 norm2
>> 1 64 650.55 655.61 100% 24.88 24.86 99%
>> 2 64 1446.81 1309.44 90% 30.49 27.16 89%
>> 4 64 1430.52 1305.59 91% 30.78 26.80 87%
>> 8 64 1450.89 1270.82 87% 30.83 25.95 84%
>
> Was the -D test-specific option used to set TCP_NODELAY?  I'm guessing 
> from your description of how packet sizes were smaller with multiqueue 
> and your need to hack tcp_write_xmit() it wasn't but since we don't 
> have the specific netperf command lines (hint hint :) I wanted to make 
> certain.
Hi Rick:

I didn't specify -D for disabling Nagle. I also collects rx packets and 
average packet size:

Guest to External Host ( 2vcpu 1q vs 2q )
sessions size tput-sq tput-mq %  norm-sq norm-mq %  #tx-pkts-sq 
#tx-pkts-mq % avg-sz-sq avg-sz-mq %
1 64 668.85 671.13 100% 25.80 26.86 104% 629038 627126 99% 1395 1403 100%
2 64 1421.29 1345.40 94% 32.06 27.57 85% 1318498 1246721 94% 1413 1414 100%
4 64 1469.96 1365.42 92% 32.44 27.04 83% 1362542 1277848 93% 1414 1401 99%
8 64 1131.00 1361.58 120% 24.81 26.76 107% 1223700 1280970 104% 1395 
1394 99%
1 256 1883.98 1649.87 87% 60.67 58.48 96% 1542775 1465836 95% 1592 1472 92%
2 256 4847.09 3539.74 73% 98.35 64.05 65% 2683346 3074046 114% 2323 1505 64%
4 256 5197.33 3283.48 63% 109.14 62.39 57% 1819814 2929486 160% 3636 
1467 40%
8 256 5953.53 3359.22 56% 122.75 64.21 52% 906071 2924148 322% 8282 1502 18%
1 512 3019.70 2646.07 87% 93.89 86.78 92% 2003780 2256077 112% 1949 1532 78%
2 512 7455.83 5861.03 78% 173.79 104.43 60% 1200322 3577142 298% 7831 
2114 26%
4 512 8962.28 7062.20 78% 213.08 127.82 59% 468142 2594812 554% 24030 
3468 14%
8 512 7849.82 8523.85 108% 175.41 154.19 87% 304923 1662023 545% 38640 
6479 16%

When multiqueue were enabled, it does have a higher packets per second 
but with a much more smaller packet size. It looks to me that multiqueue 
is faster and guest tcp have less oppotunity to build a larger skbs to 
send, so lots of small packet were required to send which leads to much 
more #exit and vhost works. One interesting thing is, if I run tcpdump 
in the host where guest run, I can get obvious throughput increasing. To 
verify the assumption, I hack the tcp_write_xmit() with following patch 
and set tcp_tso_win_divisor=1, then I multiqueue can outperform or at 
least get the same throughput as singlequeue, though it could introduce 
latency but I havent' measured it.

I'm not expert of tcp, but looks like the changes are reasonable:
- we can do full-sized TSO check in tcp_tso_should_defer() only for 
westwood, according to tcp westwood
- run tcp_tso_should_defer for tso_segs = 1 when tso is enabled.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c465d3e..166a888 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1567,7 +1567,7 @@ static bool tcp_tso_should_defer(struct sock *sk, 
struct sk_buff *skb)

         in_flight = tcp_packets_in_flight(tp);

-       BUG_ON(tcp_skb_pcount(skb) <= 1 || (tp->snd_cwnd <= in_flight));
+       BUG_ON(tp->snd_cwnd <= in_flight);

         send_win = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;

@@ -1576,9 +1576,11 @@ static bool tcp_tso_should_defer(struct sock *sk, 
struct sk_buff *skb)

         limit = min(send_win, cong_win);

+#if 0
         /* If a full-sized TSO skb can be sent, do it. */
         if (limit >= sk->sk_gso_max_size)
                 goto send_now;
+#endif

         /* Middle in queue won't get any more data, full sendable 
already? */
         if ((skb != tcp_write_queue_tail(sk)) && (limit >= skb->len))
@@ -1795,10 +1797,9 @@ static bool tcp_write_xmit(struct sock *sk, 
unsigned int mss_now, int nonagle,
                                                      
(tcp_skb_is_last(sk, skb) ?
                                                       nonagle : 
TCP_NAGLE_PUSH))))
                                 break;
-               } else {
-                       if (!push_one && tcp_tso_should_defer(sk, skb))
-                               break;
                 }
+               if (!push_one && tcp_tso_should_defer(sk, skb))
+                       break;

                 limit = mss_now;
                 if (tso_segs > 1 && !tcp_urg_mode(tp))




>
> Instead of calling them throughput1 and throughput2, it might be more 
> clear in future to identify them as singlequeue and multiqueue.
>

Sure.
> Also, how are you combining the concurrent netperf results?  Are you 
> taking sums of what netperf reports, or are you gathering statistics 
> outside of netperf?
>

The throughput were just sumed from netperf result like what netperf 
manual suggests. The cpu utilization were measured by mpstat.
>> - TCP RR
>> sessions size throughput1 throughput2   norm1 norm2
>> 50 1 54695.41 84164.98 153% 1957.33 1901.31 97%
>
> A single instance TCP_RR test would help confirm/refute any 
> non-trivial change in (effective) path length between the two cases.
>

Yes, I would test this thanks.
> happy benchmarking,
>
> rick jones
> -- 
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


WARNING: multiple messages have this Message-ID (diff)
From: Jason Wang <jasowang@redhat.com>
To: Rick Jones <rick.jones2@hp.com>
Cc: krkumar2@in.ibm.com, habanero@linux.vnet.ibm.com,
	mashirle@us.ibm.com, kvm@vger.kernel.org, mst@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, edumazet@google.com,
	tahm@linux.vnet.ibm.com, jwhan@filewood.snu.ac.kr,
	davem@davemloft.net, sri@us.ibm.com
Subject: Re: [net-next RFC V5 0/5] Multiqueue virtio-net
Date: Fri, 06 Jul 2012 15:42:01 +0800	[thread overview]
Message-ID: <4FF696C9.5070907@redhat.com> (raw)
In-Reply-To: <4FF5D2B7.6080602@hp.com>

On 07/06/2012 01:45 AM, Rick Jones wrote:
> On 07/05/2012 03:29 AM, Jason Wang wrote:
>
>>
>> Test result:
>>
>> 1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning
>>
>> - Guest to External Host TCP STREAM
>> sessions size throughput1 throughput2   norm1 norm2
>> 1 64 650.55 655.61 100% 24.88 24.86 99%
>> 2 64 1446.81 1309.44 90% 30.49 27.16 89%
>> 4 64 1430.52 1305.59 91% 30.78 26.80 87%
>> 8 64 1450.89 1270.82 87% 30.83 25.95 84%
>
> Was the -D test-specific option used to set TCP_NODELAY?  I'm guessing 
> from your description of how packet sizes were smaller with multiqueue 
> and your need to hack tcp_write_xmit() it wasn't but since we don't 
> have the specific netperf command lines (hint hint :) I wanted to make 
> certain.
Hi Rick:

I didn't specify -D for disabling Nagle. I also collects rx packets and 
average packet size:

Guest to External Host ( 2vcpu 1q vs 2q )
sessions size tput-sq tput-mq %  norm-sq norm-mq %  #tx-pkts-sq 
#tx-pkts-mq % avg-sz-sq avg-sz-mq %
1 64 668.85 671.13 100% 25.80 26.86 104% 629038 627126 99% 1395 1403 100%
2 64 1421.29 1345.40 94% 32.06 27.57 85% 1318498 1246721 94% 1413 1414 100%
4 64 1469.96 1365.42 92% 32.44 27.04 83% 1362542 1277848 93% 1414 1401 99%
8 64 1131.00 1361.58 120% 24.81 26.76 107% 1223700 1280970 104% 1395 
1394 99%
1 256 1883.98 1649.87 87% 60.67 58.48 96% 1542775 1465836 95% 1592 1472 92%
2 256 4847.09 3539.74 73% 98.35 64.05 65% 2683346 3074046 114% 2323 1505 64%
4 256 5197.33 3283.48 63% 109.14 62.39 57% 1819814 2929486 160% 3636 
1467 40%
8 256 5953.53 3359.22 56% 122.75 64.21 52% 906071 2924148 322% 8282 1502 18%
1 512 3019.70 2646.07 87% 93.89 86.78 92% 2003780 2256077 112% 1949 1532 78%
2 512 7455.83 5861.03 78% 173.79 104.43 60% 1200322 3577142 298% 7831 
2114 26%
4 512 8962.28 7062.20 78% 213.08 127.82 59% 468142 2594812 554% 24030 
3468 14%
8 512 7849.82 8523.85 108% 175.41 154.19 87% 304923 1662023 545% 38640 
6479 16%

When multiqueue were enabled, it does have a higher packets per second 
but with a much more smaller packet size. It looks to me that multiqueue 
is faster and guest tcp have less oppotunity to build a larger skbs to 
send, so lots of small packet were required to send which leads to much 
more #exit and vhost works. One interesting thing is, if I run tcpdump 
in the host where guest run, I can get obvious throughput increasing. To 
verify the assumption, I hack the tcp_write_xmit() with following patch 
and set tcp_tso_win_divisor=1, then I multiqueue can outperform or at 
least get the same throughput as singlequeue, though it could introduce 
latency but I havent' measured it.

I'm not expert of tcp, but looks like the changes are reasonable:
- we can do full-sized TSO check in tcp_tso_should_defer() only for 
westwood, according to tcp westwood
- run tcp_tso_should_defer for tso_segs = 1 when tso is enabled.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c465d3e..166a888 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1567,7 +1567,7 @@ static bool tcp_tso_should_defer(struct sock *sk, 
struct sk_buff *skb)

         in_flight = tcp_packets_in_flight(tp);

-       BUG_ON(tcp_skb_pcount(skb) <= 1 || (tp->snd_cwnd <= in_flight));
+       BUG_ON(tp->snd_cwnd <= in_flight);

         send_win = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;

@@ -1576,9 +1576,11 @@ static bool tcp_tso_should_defer(struct sock *sk, 
struct sk_buff *skb)

         limit = min(send_win, cong_win);

+#if 0
         /* If a full-sized TSO skb can be sent, do it. */
         if (limit >= sk->sk_gso_max_size)
                 goto send_now;
+#endif

         /* Middle in queue won't get any more data, full sendable 
already? */
         if ((skb != tcp_write_queue_tail(sk)) && (limit >= skb->len))
@@ -1795,10 +1797,9 @@ static bool tcp_write_xmit(struct sock *sk, 
unsigned int mss_now, int nonagle,
                                                      
(tcp_skb_is_last(sk, skb) ?
                                                       nonagle : 
TCP_NAGLE_PUSH))))
                                 break;
-               } else {
-                       if (!push_one && tcp_tso_should_defer(sk, skb))
-                               break;
                 }
+               if (!push_one && tcp_tso_should_defer(sk, skb))
+                       break;

                 limit = mss_now;
                 if (tso_segs > 1 && !tcp_urg_mode(tp))




>
> Instead of calling them throughput1 and throughput2, it might be more 
> clear in future to identify them as singlequeue and multiqueue.
>

Sure.
> Also, how are you combining the concurrent netperf results?  Are you 
> taking sums of what netperf reports, or are you gathering statistics 
> outside of netperf?
>

The throughput were just sumed from netperf result like what netperf 
manual suggests. The cpu utilization were measured by mpstat.
>> - TCP RR
>> sessions size throughput1 throughput2   norm1 norm2
>> 50 1 54695.41 84164.98 153% 1957.33 1901.31 97%
>
> A single instance TCP_RR test would help confirm/refute any 
> non-trivial change in (effective) path length between the two cases.
>

Yes, I would test this thanks.
> happy benchmarking,
>
> rick jones
> -- 
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2012-07-06  7:40 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-05 10:29 [net-next RFC V5 0/5] Multiqueue virtio-net Jason Wang
2012-07-05 10:29 ` Jason Wang
2012-07-05 10:29 ` [net-next RFC V5 1/5] virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE Jason Wang
2012-07-05 10:29   ` Jason Wang
2012-07-05 10:29 ` [net-next RFC V5 2/5] virtio_ring: move queue_index to vring_virtqueue Jason Wang
2012-07-05 10:29   ` Jason Wang
2012-07-05 11:40   ` Sasha Levin
2012-07-05 11:40     ` Sasha Levin
2012-07-06  3:17     ` Jason Wang
2012-07-06  3:17       ` Jason Wang
2012-07-26  8:20     ` Paolo Bonzini
2012-07-26  8:20       ` Paolo Bonzini
2012-07-30  3:30       ` Jason Wang
2012-07-30  3:30         ` Jason Wang
2012-07-05 10:29 ` [net-next RFC V5 3/5] virtio: intorduce an API to set affinity for a virtqueue Jason Wang
2012-07-05 10:29   ` Jason Wang
2012-07-27 14:38   ` Paolo Bonzini
2012-07-27 14:38     ` Paolo Bonzini
2012-07-29 20:40     ` Michael S. Tsirkin
2012-07-29 20:40       ` Michael S. Tsirkin
2012-07-30  6:27       ` Paolo Bonzini
2012-08-09 15:14         ` Paolo Bonzini
2012-08-09 15:14           ` Paolo Bonzini
2012-08-09 15:13   ` Paolo Bonzini
2012-08-09 15:13     ` Paolo Bonzini
2012-08-09 15:35     ` Avi Kivity
2012-08-09 15:35       ` Avi Kivity
2012-07-05 10:29 ` [net-next RFC V5 4/5] virtio_net: multiqueue support Jason Wang
2012-07-05 10:29   ` Jason Wang
2012-07-05 20:02   ` Amos Kong
2012-07-05 20:02     ` Amos Kong
2012-07-06  7:45     ` Jason Wang
2012-07-06  7:45       ` Jason Wang
2012-07-20 13:40   ` Michael S. Tsirkin
2012-07-20 13:40     ` Michael S. Tsirkin
2012-07-21 12:02     ` Sasha Levin
2012-07-21 12:02       ` Sasha Levin
2012-07-23  5:54       ` Jason Wang
2012-07-23  5:54         ` Jason Wang
2012-07-23  9:28         ` Sasha Levin
2012-07-23  9:28           ` Sasha Levin
2012-07-30  3:29           ` Jason Wang
2012-07-30  3:29             ` Jason Wang
2012-07-29  9:44       ` Michael S. Tsirkin
2012-07-29  9:44         ` Michael S. Tsirkin
2012-07-30  3:26         ` Jason Wang
2012-07-30  3:26           ` Jason Wang
2012-07-30 13:00         ` Sasha Levin
2012-07-30 13:00           ` Sasha Levin
2012-07-23  5:48     ` Jason Wang
2012-07-23  5:48       ` Jason Wang
2012-07-29  9:50       ` Michael S. Tsirkin
2012-07-29  9:50         ` Michael S. Tsirkin
2012-07-30  5:15         ` Jason Wang
2012-07-30  5:15           ` Jason Wang
2012-07-05 10:29 ` [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq Jason Wang
2012-07-05 10:29   ` Jason Wang
2012-07-05 12:51   ` Sasha Levin
2012-07-05 12:51     ` Sasha Levin
2012-07-05 20:07     ` Amos Kong
2012-07-05 20:07       ` Amos Kong
2012-07-06  7:46       ` Jason Wang
2012-07-06  7:46         ` Jason Wang
2012-07-06  3:20     ` Jason Wang
2012-07-06  3:20       ` Jason Wang
2012-07-06  6:38       ` Stephen Hemminger
2012-07-06  6:38         ` Stephen Hemminger
2012-07-06  9:26         ` Jason Wang
2012-07-06  9:26           ` Jason Wang
2012-07-06  8:10       ` Sasha Levin
2012-07-06  8:10         ` Sasha Levin
2012-07-09 20:13   ` Ben Hutchings
2012-07-09 20:13     ` Ben Hutchings
2012-07-20 12:33   ` Michael S. Tsirkin
2012-07-20 12:33     ` Michael S. Tsirkin
2012-07-23  5:32     ` Jason Wang
2012-07-23  5:32       ` Jason Wang
2012-07-05 17:45 ` [net-next RFC V5 0/5] Multiqueue virtio-net Rick Jones
2012-07-05 17:45   ` Rick Jones
2012-07-06  7:42   ` Jason Wang [this message]
2012-07-06  7:42     ` Jason Wang
2012-07-06 16:23     ` Rick Jones
2012-07-06 16:23       ` Rick Jones
2012-07-09  3:23       ` Jason Wang
2012-07-09  3:23         ` Jason Wang
2012-07-09 16:46         ` Rick Jones
2012-07-09 16:46           ` Rick Jones
2012-07-08  8:19 ` Ronen Hod
2012-07-08  8:19   ` Ronen Hod
2012-07-09  5:35   ` Jason Wang
2012-07-09  5:35     ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FF696C9.5070907@redhat.com \
    --to=jasowang@redhat.com \
    --cc=akong@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=habanero@linux.vnet.ibm.com \
    --cc=jwhan@filewood.snu.ac.kr \
    --cc=krkumar2@in.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mashirle@us.ibm.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    --cc=rusty@rustcorp.com.au \
    --cc=sri@us.ibm.com \
    --cc=tahm@linux.vnet.ibm.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.