linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] virtio-net: interrupt related improvements
@ 2018-12-05 22:54 Michael S. Tsirkin
  2018-12-05 22:54 ` [PATCH RFC 1/2] virtio-net: bql support Michael S. Tsirkin
  2018-12-05 22:54 ` [PATCH RFC 2/2] virtio_net: bulk free tx skbs Michael S. Tsirkin
  0 siblings, 2 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2018-12-05 22:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: maxime.coquelin, tiwei.bie, wexu, jfreimann, virtualization,
	netdev, Jason Wang

Now that we have brought the virtio overhead way down with a fast packed
ring implementation, we seem to be actually observing TCP drops
indicative of bufferbloat. So let's try to enable TSQ.  Note: it isn't
clear that the default pacing is great for the virt usecase. It's worth
trying to play with sk_pacing_shift_update to see what happens.

For this reason, and for a more important one that I didn't
have time to test it yet, sending as an RFC.

Michael S. Tsirkin (2):
  virtio-net: bql support
  virtio_net: bulk free tx skbs

 drivers/net/virtio_net.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

-- 
MST


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFC 1/2] virtio-net: bql support
  2018-12-05 22:54 [PATCH RFC 0/2] virtio-net: interrupt related improvements Michael S. Tsirkin
@ 2018-12-05 22:54 ` Michael S. Tsirkin
  2018-12-06  8:17   ` Jason Wang
  2018-12-05 22:54 ` [PATCH RFC 2/2] virtio_net: bulk free tx skbs Michael S. Tsirkin
  1 sibling, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2018-12-05 22:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: maxime.coquelin, tiwei.bie, wexu, jfreimann, Jason Wang,
	David S. Miller, virtualization, netdev

When use_napi is set, let's enable BQLs.  Note: some of the issues are
similar to wifi.  It's worth considering whether something similar to
commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
benefitial.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index cecfd77c9f3c..b657bde6b94b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 	return stats.packets;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
+static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
+			       bool use_napi)
 {
 	struct sk_buff *skb;
 	unsigned int len;
@@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	if (!packets)
 		return;
 
+	if (use_napi)
+		netdev_tx_completed_queue(txq, packets, bytes);
+
 	u64_stats_update_begin(&sq->stats.syncp);
 	sq->stats.bytes += bytes;
 	sq->stats.packets += packets;
@@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 		return;
 
 	if (__netif_tx_trylock(txq)) {
-		free_old_xmit_skbs(sq);
+		free_old_xmit_skbs(sq, txq, true);
 		__netif_tx_unlock(txq);
 	}
 
@@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
 
 	__netif_tx_lock(txq, raw_smp_processor_id());
-	free_old_xmit_skbs(sq);
+	free_old_xmit_skbs(sq, txq, true);
 	__netif_tx_unlock(txq);
 
 	virtqueue_napi_complete(napi, sq->vq, 0);
@@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct send_queue *sq = &vi->sq[qnum];
 	int err;
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
-	bool kick = !skb->xmit_more;
+	bool more = skb->xmit_more;
 	bool use_napi = sq->napi.weight;
+	unsigned int bytes = skb->len;
+	bool kick;
 
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	free_old_xmit_skbs(sq, txq, use_napi);
 
-	if (use_napi && kick)
+	if (use_napi && !more)
 		virtqueue_enable_cb_delayed(sq->vq);
 
 	/* timestamp packet in software */
@@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (!use_napi &&
 		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
+			free_old_xmit_skbs(sq, txq, false);
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
@@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 
-	if (kick || netif_xmit_stopped(txq)) {
+	if (use_napi)
+		kick = __netdev_tx_sent_queue(txq, bytes, more);
+	else
+		kick = !more || netif_xmit_stopped(txq);
+
+	if (kick) {
 		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
 			u64_stats_update_begin(&sq->stats.syncp);
 			sq->stats.kicks++;
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 2/2] virtio_net: bulk free tx skbs
  2018-12-05 22:54 [PATCH RFC 0/2] virtio-net: interrupt related improvements Michael S. Tsirkin
  2018-12-05 22:54 ` [PATCH RFC 1/2] virtio-net: bql support Michael S. Tsirkin
@ 2018-12-05 22:54 ` Michael S. Tsirkin
  1 sibling, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2018-12-05 22:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: maxime.coquelin, tiwei.bie, wexu, jfreimann, Jason Wang,
	David S. Miller, virtualization, netdev

Use napi_consume_skb() to get bulk free.  Note that napi_consume_skb is
safe to call in a non-napi context as long as the napi_budget flag is
correct.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b657bde6b94b..18c06322be40 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1339,7 +1339,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
 		bytes += skb->len;
 		packets++;
 
-		dev_consume_skb_any(skb);
+		napi_consume_skb(skb, use_napi);
 	}
 
 	/* Avoid overhead when no packets have been processed
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-05 22:54 ` [PATCH RFC 1/2] virtio-net: bql support Michael S. Tsirkin
@ 2018-12-06  8:17   ` Jason Wang
  2018-12-06  8:31     ` Jason Wang
                       ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Jason Wang @ 2018-12-06  8:17 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: maxime.coquelin, tiwei.bie, wexu, jfreimann, David S. Miller,
	virtualization, netdev


On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> When use_napi is set, let's enable BQLs.  Note: some of the issues are
> similar to wifi.  It's worth considering whether something similar to
> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> benefitial.


I've played a similar patch several days before. The tricky part is the 
mode switching between napi and no napi. We should make sure when the 
packet is sent and trakced by BQL,  it should be consumed by BQL as 
well. I did it by tracking it through skb->cb.  And deal with the freeze 
by reset the BQL status. Patch attached.

But when testing with vhost-net, I don't very a stable performance, it 
was probably because we batch the used ring updating so tx interrupt may 
come randomly. We probably need to implement time bounded coalescing 
mechanism which could be configured from userspace.

Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR 
regression on machine without APICv, (haven't found time to test APICv 
machine). But consider it was for correctness, I think it's acceptable? 
Then we can do optimization on top?


Thanks


> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
>   1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index cecfd77c9f3c..b657bde6b94b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>   	return stats.packets;
>   }
>   
> -static void free_old_xmit_skbs(struct send_queue *sq)
> +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
> +			       bool use_napi)
>   {
>   	struct sk_buff *skb;
>   	unsigned int len;
> @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>   	if (!packets)
>   		return;
>   
> +	if (use_napi)
> +		netdev_tx_completed_queue(txq, packets, bytes);
> +
>   	u64_stats_update_begin(&sq->stats.syncp);
>   	sq->stats.bytes += bytes;
>   	sq->stats.packets += packets;
> @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   		return;
>   
>   	if (__netif_tx_trylock(txq)) {
> -		free_old_xmit_skbs(sq);
> +		free_old_xmit_skbs(sq, txq, true);
>   		__netif_tx_unlock(txq);
>   	}
>   
> @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
>   
>   	__netif_tx_lock(txq, raw_smp_processor_id());
> -	free_old_xmit_skbs(sq);
> +	free_old_xmit_skbs(sq, txq, true);
>   	__netif_tx_unlock(txq);
>   
>   	virtqueue_napi_complete(napi, sq->vq, 0);
> @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	struct send_queue *sq = &vi->sq[qnum];
>   	int err;
>   	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> -	bool kick = !skb->xmit_more;
> +	bool more = skb->xmit_more;
>   	bool use_napi = sq->napi.weight;
> +	unsigned int bytes = skb->len;
> +	bool kick;
>   
>   	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq);
> +	free_old_xmit_skbs(sq, txq, use_napi);
>   
> -	if (use_napi && kick)
> +	if (use_napi && !more)
>   		virtqueue_enable_cb_delayed(sq->vq);
>   
>   	/* timestamp packet in software */
> @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>   		if (!use_napi &&
>   		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>   			/* More just got used, free them then recheck. */
> -			free_old_xmit_skbs(sq);
> +			free_old_xmit_skbs(sq, txq, false);
>   			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>   				netif_start_subqueue(dev, qnum);
>   				virtqueue_disable_cb(sq->vq);
> @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>   		}
>   	}
>   
> -	if (kick || netif_xmit_stopped(txq)) {
> +	if (use_napi)
> +		kick = __netdev_tx_sent_queue(txq, bytes, more);
> +	else
> +		kick = !more || netif_xmit_stopped(txq);
> +
> +	if (kick) {
>   		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
>   			u64_stats_update_begin(&sq->stats.syncp);
>   			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-06  8:17   ` Jason Wang
@ 2018-12-06  8:31     ` Jason Wang
  2018-12-26 15:15     ` Michael S. Tsirkin
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2018-12-06  8:31 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: netdev, virtualization, maxime.coquelin, wexu, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 5396 bytes --]


On 2018/12/6 下午4:17, Jason Wang wrote:
>
> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>> When use_napi is set, let's enable BQLs. Note: some of the issues are
>> similar to wifi.  It's worth considering whether something similar to
>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>> benefitial.
>
>
> I've played a similar patch several days before. The tricky part is 
> the mode switching between napi and no napi. We should make sure when 
> the packet is sent and trakced by BQL,  it should be consumed by BQL 
> as well. I did it by tracking it through skb->cb.  And deal with the 
> freeze by reset the BQL status. Patch attached.


Add the patch.

Thanks


>
> But when testing with vhost-net, I don't very a stable performance, it 
> was probably because we batch the used ring updating so tx interrupt 
> may come randomly. We probably need to implement time bounded 
> coalescing mechanism which could be configured from userspace.
>
> Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR 
> regression on machine without APICv, (haven't found time to test APICv 
> machine). But consider it was for correctness, I think it's 
> acceptable? Then we can do optimization on top?
>
>
> Thanks
>
>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>   drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
>>   1 file changed, 19 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index cecfd77c9f3c..b657bde6b94b 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue 
>> *rq, int budget,
>>       return stats.packets;
>>   }
>>   -static void free_old_xmit_skbs(struct send_queue *sq)
>> +static void free_old_xmit_skbs(struct send_queue *sq, struct 
>> netdev_queue *txq,
>> +                   bool use_napi)
>>   {
>>       struct sk_buff *skb;
>>       unsigned int len;
>> @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct 
>> send_queue *sq)
>>       if (!packets)
>>           return;
>>   +    if (use_napi)
>> +        netdev_tx_completed_queue(txq, packets, bytes);
>> +
>>       u64_stats_update_begin(&sq->stats.syncp);
>>       sq->stats.bytes += bytes;
>>       sq->stats.packets += packets;
>> @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct 
>> receive_queue *rq)
>>           return;
>>         if (__netif_tx_trylock(txq)) {
>> -        free_old_xmit_skbs(sq);
>> +        free_old_xmit_skbs(sq, txq, true);
>>           __netif_tx_unlock(txq);
>>       }
>>   @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct 
>> *napi, int budget)
>>       struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, 
>> vq2txq(sq->vq));
>>         __netif_tx_lock(txq, raw_smp_processor_id());
>> -    free_old_xmit_skbs(sq);
>> +    free_old_xmit_skbs(sq, txq, true);
>>       __netif_tx_unlock(txq);
>>         virtqueue_napi_complete(napi, sq->vq, 0);
>> @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff 
>> *skb, struct net_device *dev)
>>       struct send_queue *sq = &vi->sq[qnum];
>>       int err;
>>       struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>> -    bool kick = !skb->xmit_more;
>> +    bool more = skb->xmit_more;
>>       bool use_napi = sq->napi.weight;
>> +    unsigned int bytes = skb->len;
>> +    bool kick;
>>         /* Free up any pending old buffers before queueing new ones. */
>> -    free_old_xmit_skbs(sq);
>> +    free_old_xmit_skbs(sq, txq, use_napi);
>>   -    if (use_napi && kick)
>> +    if (use_napi && !more)
>>           virtqueue_enable_cb_delayed(sq->vq);
>>         /* timestamp packet in software */
>> @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff 
>> *skb, struct net_device *dev)
>>           if (!use_napi &&
>>               unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>>               /* More just got used, free them then recheck. */
>> -            free_old_xmit_skbs(sq);
>> +            free_old_xmit_skbs(sq, txq, false);
>>               if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>>                   netif_start_subqueue(dev, qnum);
>>                   virtqueue_disable_cb(sq->vq);
>> @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff 
>> *skb, struct net_device *dev)
>>           }
>>       }
>>   -    if (kick || netif_xmit_stopped(txq)) {
>> +    if (use_napi)
>> +        kick = __netdev_tx_sent_queue(txq, bytes, more);
>> +    else
>> +        kick = !more || netif_xmit_stopped(txq);
>> +
>> +    if (kick) {
>>           if (virtqueue_kick_prepare(sq->vq) && 
>> virtqueue_notify(sq->vq)) {
>>               u64_stats_update_begin(&sq->stats.syncp);
>>               sq->stats.kicks++;
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[-- Attachment #2: 0001-virtio-net-byte-queue-limit-support.patch --]
[-- Type: text/x-patch, Size: 4777 bytes --]

From f1c27543dc412778e682b63addbb0a471afc5153 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 20 Nov 2018 14:25:30 +0800
Subject: [PATCH] virtio-net: byte queue limit support

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 46 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 47979fc..8712c11 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -279,6 +279,14 @@ static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb)
 	return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
 }
 
+
+static inline int *skb_cb_bql(struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(struct virtio_net_hdr_mrg_rxbuf) +
+		     sizeof(int) > sizeof(skb->cb));
+	return (int *)(skb->cb + sizeof(struct virtio_net_hdr_mrg_rxbuf));
+}
+
 /*
  * private is used to chain pages for big packets, put the whole
  * most recent used list in the beginning for reuse
@@ -1325,12 +1333,14 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 	return stats.packets;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
+static void free_old_xmit_skbs(struct send_queue *sq,
+			       struct netdev_queue *txq)
 {
 	struct sk_buff *skb;
 	unsigned int len;
-	unsigned int packets = 0;
-	unsigned int bytes = 0;
+	unsigned int packets = 0, bql_packets = 0;
+	unsigned int bytes = 0, bql_bytes = 0;
+	int *bql;
 
 	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
@@ -1338,6 +1348,12 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 		bytes += skb->len;
 		packets++;
 
+		bql = skb_cb_bql(skb);
+		if (*bql) {
+			bql_packets ++;
+			bql_bytes += skb->len;
+		}
+
 		dev_consume_skb_any(skb);
 	}
 
@@ -1351,6 +1367,8 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	sq->stats.bytes += bytes;
 	sq->stats.packets += packets;
 	u64_stats_update_end(&sq->stats.syncp);
+
+	netdev_tx_completed_queue(txq, bql_packets, bql_bytes);
 }
 
 static void virtnet_poll_cleantx(struct receive_queue *rq)
@@ -1364,7 +1382,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 		return;
 
 	if (__netif_tx_trylock(txq)) {
-		free_old_xmit_skbs(sq);
+		free_old_xmit_skbs(sq, txq);
 		__netif_tx_unlock(txq);
 	}
 
@@ -1440,7 +1458,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
 
 	__netif_tx_lock(txq, raw_smp_processor_id());
-	free_old_xmit_skbs(sq);
+	free_old_xmit_skbs(sq, txq);
 	__netif_tx_unlock(txq);
 
 	virtqueue_napi_complete(napi, sq->vq, 0);
@@ -1459,6 +1477,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 	int num_sg;
 	unsigned hdr_len = vi->hdr_len;
 	bool can_push;
+	int *bql = skb_cb_bql(skb);
 
 	pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
 
@@ -1495,6 +1514,8 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 			return num_sg;
 		num_sg++;
 	}
+
+	*bql = sq->napi.weight ? 1 : 0;
 	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
 }
 
@@ -1509,7 +1530,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	bool use_napi = sq->napi.weight;
 
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	free_old_xmit_skbs(sq, txq);
 
 	if (use_napi && kick)
 		virtqueue_enable_cb_delayed(sq->vq);
@@ -1537,6 +1558,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		nf_reset(skb);
 	}
 
+	if (use_napi)
+		netdev_tx_sent_queue(txq, skb->len);
+
 	/* If running out of space, stop queue to avoid getting packets that we
 	 * are then unable to transmit.
 	 * An alternative would be to force queuing layer to requeue the skb by
@@ -1552,7 +1576,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (!use_napi &&
 		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
+			free_old_xmit_skbs(sq, txq);
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
@@ -2275,8 +2299,14 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
+			struct send_queue *sq = &vi->sq[i];
+			struct netdev_queue *txq =
+			       netdev_get_tx_queue(vi->dev, i);
+
 			napi_disable(&vi->rq[i].napi);
-			virtnet_napi_tx_disable(&vi->sq[i].napi);
+			virtnet_napi_tx_disable(&sq->napi);
+			if (sq->napi.weight)
+				netdev_tx_reset_queue(txq);
 		}
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-06  8:17   ` Jason Wang
  2018-12-06  8:31     ` Jason Wang
@ 2018-12-26 15:15     ` Michael S. Tsirkin
  2018-12-27  9:56       ` Jason Wang
  2018-12-26 15:19     ` Michael S. Tsirkin
  2018-12-26 15:22     ` Michael S. Tsirkin
  3 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2018-12-26 15:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> 
> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > similar to wifi.  It's worth considering whether something similar to
> > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > benefitial.
> 
> 
> I've played a similar patch several days before. The tricky part is the mode
> switching between napi and no napi. We should make sure when the packet is
> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
> tracking it through skb->cb.  And deal with the freeze by reset the BQL
> status. Patch attached.
> 
> But when testing with vhost-net, I don't very a stable performance, it was
> probably because we batch the used ring updating so tx interrupt may come
> randomly. We probably need to implement time bounded coalescing mechanism
> which could be configured from userspace.
> 
> Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
> regression on machine without APICv, (haven't found time to test APICv
> machine). But consider it was for correctness, I think it's acceptable? Then
> we can do optimization on top?
> 
> 
> Thanks

I don't see how it's for correctness to be frank.
What if we just do the bulk free? Does that fix the regression?


> 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >   drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
> >   1 file changed, 19 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index cecfd77c9f3c..b657bde6b94b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >   	return stats.packets;
> >   }
> > -static void free_old_xmit_skbs(struct send_queue *sq)
> > +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
> > +			       bool use_napi)
> >   {
> >   	struct sk_buff *skb;
> >   	unsigned int len;
> > @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> >   	if (!packets)
> >   		return;
> > +	if (use_napi)
> > +		netdev_tx_completed_queue(txq, packets, bytes);
> > +
> >   	u64_stats_update_begin(&sq->stats.syncp);
> >   	sq->stats.bytes += bytes;
> >   	sq->stats.packets += packets;
> > @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >   		return;
> >   	if (__netif_tx_trylock(txq)) {
> > -		free_old_xmit_skbs(sq);
> > +		free_old_xmit_skbs(sq, txq, true);
> >   		__netif_tx_unlock(txq);
> >   	}
> > @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >   	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> >   	__netif_tx_lock(txq, raw_smp_processor_id());
> > -	free_old_xmit_skbs(sq);
> > +	free_old_xmit_skbs(sq, txq, true);
> >   	__netif_tx_unlock(txq);
> >   	virtqueue_napi_complete(napi, sq->vq, 0);
> > @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   	struct send_queue *sq = &vi->sq[qnum];
> >   	int err;
> >   	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> > -	bool kick = !skb->xmit_more;
> > +	bool more = skb->xmit_more;
> >   	bool use_napi = sq->napi.weight;
> > +	unsigned int bytes = skb->len;
> > +	bool kick;
> >   	/* Free up any pending old buffers before queueing new ones. */
> > -	free_old_xmit_skbs(sq);
> > +	free_old_xmit_skbs(sq, txq, use_napi);
> > -	if (use_napi && kick)
> > +	if (use_napi && !more)
> >   		virtqueue_enable_cb_delayed(sq->vq);
> >   	/* timestamp packet in software */
> > @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   		if (!use_napi &&
> >   		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> >   			/* More just got used, free them then recheck. */
> > -			free_old_xmit_skbs(sq);
> > +			free_old_xmit_skbs(sq, txq, false);
> >   			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> >   				netif_start_subqueue(dev, qnum);
> >   				virtqueue_disable_cb(sq->vq);
> > @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   		}
> >   	}
> > -	if (kick || netif_xmit_stopped(txq)) {
> > +	if (use_napi)
> > +		kick = __netdev_tx_sent_queue(txq, bytes, more);
> > +	else
> > +		kick = !more || netif_xmit_stopped(txq);
> > +
> > +	if (kick) {
> >   		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
> >   			u64_stats_update_begin(&sq->stats.syncp);
> >   			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-06  8:17   ` Jason Wang
  2018-12-06  8:31     ` Jason Wang
  2018-12-26 15:15     ` Michael S. Tsirkin
@ 2018-12-26 15:19     ` Michael S. Tsirkin
  2018-12-27 10:00       ` Jason Wang
  2018-12-26 15:22     ` Michael S. Tsirkin
  3 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2018-12-26 15:19 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> 
> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > similar to wifi.  It's worth considering whether something similar to
> > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > benefitial.
> 
> 
> I've played a similar patch several days before. The tricky part is the mode
> switching between napi and no napi. We should make sure when the packet is
> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
> tracking it through skb->cb.  And deal with the freeze by reset the BQL
> status. Patch attached.
> 
> But when testing with vhost-net, I don't very a stable performance,

So how about increasing TSQ pacing shift then?

> it was
> probably because we batch the used ring updating so tx interrupt may come
> randomly. We probably need to implement time bounded coalescing mechanism
> which could be configured from userspace.

I don't think it's reasonable to expect userspace to be that smart ...
Why do we need time bounded? used ring is always updated when ring
becomes empty.

> Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
> regression on machine without APICv, (haven't found time to test APICv
> machine). But consider it was for correctness, I think it's acceptable? Then
> we can do optimization on top?
> 
> 
> Thanks
> 
> 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >   drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
> >   1 file changed, 19 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index cecfd77c9f3c..b657bde6b94b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >   	return stats.packets;
> >   }
> > -static void free_old_xmit_skbs(struct send_queue *sq)
> > +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
> > +			       bool use_napi)
> >   {
> >   	struct sk_buff *skb;
> >   	unsigned int len;
> > @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> >   	if (!packets)
> >   		return;
> > +	if (use_napi)
> > +		netdev_tx_completed_queue(txq, packets, bytes);
> > +
> >   	u64_stats_update_begin(&sq->stats.syncp);
> >   	sq->stats.bytes += bytes;
> >   	sq->stats.packets += packets;
> > @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >   		return;
> >   	if (__netif_tx_trylock(txq)) {
> > -		free_old_xmit_skbs(sq);
> > +		free_old_xmit_skbs(sq, txq, true);
> >   		__netif_tx_unlock(txq);
> >   	}
> > @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >   	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> >   	__netif_tx_lock(txq, raw_smp_processor_id());
> > -	free_old_xmit_skbs(sq);
> > +	free_old_xmit_skbs(sq, txq, true);
> >   	__netif_tx_unlock(txq);
> >   	virtqueue_napi_complete(napi, sq->vq, 0);
> > @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   	struct send_queue *sq = &vi->sq[qnum];
> >   	int err;
> >   	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> > -	bool kick = !skb->xmit_more;
> > +	bool more = skb->xmit_more;
> >   	bool use_napi = sq->napi.weight;
> > +	unsigned int bytes = skb->len;
> > +	bool kick;
> >   	/* Free up any pending old buffers before queueing new ones. */
> > -	free_old_xmit_skbs(sq);
> > +	free_old_xmit_skbs(sq, txq, use_napi);
> > -	if (use_napi && kick)
> > +	if (use_napi && !more)
> >   		virtqueue_enable_cb_delayed(sq->vq);
> >   	/* timestamp packet in software */
> > @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   		if (!use_napi &&
> >   		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> >   			/* More just got used, free them then recheck. */
> > -			free_old_xmit_skbs(sq);
> > +			free_old_xmit_skbs(sq, txq, false);
> >   			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> >   				netif_start_subqueue(dev, qnum);
> >   				virtqueue_disable_cb(sq->vq);
> > @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   		}
> >   	}
> > -	if (kick || netif_xmit_stopped(txq)) {
> > +	if (use_napi)
> > +		kick = __netdev_tx_sent_queue(txq, bytes, more);
> > +	else
> > +		kick = !more || netif_xmit_stopped(txq);
> > +
> > +	if (kick) {
> >   		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
> >   			u64_stats_update_begin(&sq->stats.syncp);
> >   			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-06  8:17   ` Jason Wang
                       ` (2 preceding siblings ...)
  2018-12-26 15:19     ` Michael S. Tsirkin
@ 2018-12-26 15:22     ` Michael S. Tsirkin
  2018-12-27 10:04       ` Jason Wang
  3 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2018-12-26 15:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> 
> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > similar to wifi.  It's worth considering whether something similar to
> > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > benefitial.
> 
> 
> I've played a similar patch several days before. The tricky part is the mode
> switching between napi and no napi. We should make sure when the packet is
> sent and trakced by BQL,  it should be consumed by BQL as well.


I just went over the patch again and I don't understand this comment.
This patch only enabled BQL with tx napi.

Thus there's no mode switching.

What did I miss?


> I did it by
> tracking it through skb->cb.  And deal with the freeze by reset the BQL
> status. Patch attached.
> 
> But when testing with vhost-net, I don't very a stable performance, it was
> probably because we batch the used ring updating so tx interrupt may come
> randomly. We probably need to implement time bounded coalescing mechanism
> which could be configured from userspace.
> 
> Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
> regression on machine without APICv, (haven't found time to test APICv
> machine). But consider it was for correctness, I think it's acceptable? Then
> we can do optimization on top?
> 
> 
> Thanks
> 
> 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >   drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
> >   1 file changed, 19 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index cecfd77c9f3c..b657bde6b94b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >   	return stats.packets;
> >   }
> > -static void free_old_xmit_skbs(struct send_queue *sq)
> > +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
> > +			       bool use_napi)
> >   {
> >   	struct sk_buff *skb;
> >   	unsigned int len;
> > @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> >   	if (!packets)
> >   		return;
> > +	if (use_napi)
> > +		netdev_tx_completed_queue(txq, packets, bytes);
> > +
> >   	u64_stats_update_begin(&sq->stats.syncp);
> >   	sq->stats.bytes += bytes;
> >   	sq->stats.packets += packets;
> > @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >   		return;
> >   	if (__netif_tx_trylock(txq)) {
> > -		free_old_xmit_skbs(sq);
> > +		free_old_xmit_skbs(sq, txq, true);
> >   		__netif_tx_unlock(txq);
> >   	}
> > @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >   	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> >   	__netif_tx_lock(txq, raw_smp_processor_id());
> > -	free_old_xmit_skbs(sq);
> > +	free_old_xmit_skbs(sq, txq, true);
> >   	__netif_tx_unlock(txq);
> >   	virtqueue_napi_complete(napi, sq->vq, 0);
> > @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   	struct send_queue *sq = &vi->sq[qnum];
> >   	int err;
> >   	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> > -	bool kick = !skb->xmit_more;
> > +	bool more = skb->xmit_more;
> >   	bool use_napi = sq->napi.weight;
> > +	unsigned int bytes = skb->len;
> > +	bool kick;
> >   	/* Free up any pending old buffers before queueing new ones. */
> > -	free_old_xmit_skbs(sq);
> > +	free_old_xmit_skbs(sq, txq, use_napi);
> > -	if (use_napi && kick)
> > +	if (use_napi && !more)
> >   		virtqueue_enable_cb_delayed(sq->vq);
> >   	/* timestamp packet in software */
> > @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   		if (!use_napi &&
> >   		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> >   			/* More just got used, free them then recheck. */
> > -			free_old_xmit_skbs(sq);
> > +			free_old_xmit_skbs(sq, txq, false);
> >   			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> >   				netif_start_subqueue(dev, qnum);
> >   				virtqueue_disable_cb(sq->vq);
> > @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   		}
> >   	}
> > -	if (kick || netif_xmit_stopped(txq)) {
> > +	if (use_napi)
> > +		kick = __netdev_tx_sent_queue(txq, bytes, more);
> > +	else
> > +		kick = !more || netif_xmit_stopped(txq);
> > +
> > +	if (kick) {
> >   		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
> >   			u64_stats_update_begin(&sq->stats.syncp);
> >   			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-26 15:15     ` Michael S. Tsirkin
@ 2018-12-27  9:56       ` Jason Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2018-12-27  9:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2018/12/26 下午11:15, Michael S. Tsirkin wrote:
> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>> similar to wifi.  It's worth considering whether something similar to
>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>> benefitial.
>>
>> I've played a similar patch several days before. The tricky part is the mode
>> switching between napi and no napi. We should make sure when the packet is
>> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
>> tracking it through skb->cb.  And deal with the freeze by reset the BQL
>> status. Patch attached.
>>
>> But when testing with vhost-net, I don't very a stable performance, it was
>> probably because we batch the used ring updating so tx interrupt may come
>> randomly. We probably need to implement time bounded coalescing mechanism
>> which could be configured from userspace.
>>
>> Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
>> regression on machine without APICv, (haven't found time to test APICv
>> machine). But consider it was for correctness, I think it's acceptable? Then
>> we can do optimization on top?
>>
>>
>> Thanks
> I don't see how it's for correctness to be frank.


Socket accounting is wrong in the case. This should be a bug in fact.


> What if we just do the bulk free? Does that fix the regression?


I can test it.


>
>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>    drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
>>>    1 file changed, 19 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index cecfd77c9f3c..b657bde6b94b 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>>>    	return stats.packets;
>>>    }
>>> -static void free_old_xmit_skbs(struct send_queue *sq)
>>> +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
>>> +			       bool use_napi)
>>>    {
>>>    	struct sk_buff *skb;
>>>    	unsigned int len;
>>> @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>>>    	if (!packets)
>>>    		return;
>>> +	if (use_napi)
>>> +		netdev_tx_completed_queue(txq, packets, bytes);
>>> +
>>>    	u64_stats_update_begin(&sq->stats.syncp);
>>>    	sq->stats.bytes += bytes;
>>>    	sq->stats.packets += packets;
>>> @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>>>    		return;
>>>    	if (__netif_tx_trylock(txq)) {
>>> -		free_old_xmit_skbs(sq);
>>> +		free_old_xmit_skbs(sq, txq, true);
>>>    		__netif_tx_unlock(txq);
>>>    	}
>>> @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>>>    	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
>>>    	__netif_tx_lock(txq, raw_smp_processor_id());
>>> -	free_old_xmit_skbs(sq);
>>> +	free_old_xmit_skbs(sq, txq, true);
>>>    	__netif_tx_unlock(txq);
>>>    	virtqueue_napi_complete(napi, sq->vq, 0);
>>> @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>    	struct send_queue *sq = &vi->sq[qnum];
>>>    	int err;
>>>    	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>>> -	bool kick = !skb->xmit_more;
>>> +	bool more = skb->xmit_more;
>>>    	bool use_napi = sq->napi.weight;
>>> +	unsigned int bytes = skb->len;
>>> +	bool kick;
>>>    	/* Free up any pending old buffers before queueing new ones. */
>>> -	free_old_xmit_skbs(sq);
>>> +	free_old_xmit_skbs(sq, txq, use_napi);
>>> -	if (use_napi && kick)
>>> +	if (use_napi && !more)
>>>    		virtqueue_enable_cb_delayed(sq->vq);
>>>    	/* timestamp packet in software */
>>> @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>    		if (!use_napi &&
>>>    		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>>>    			/* More just got used, free them then recheck. */
>>> -			free_old_xmit_skbs(sq);
>>> +			free_old_xmit_skbs(sq, txq, false);
>>>    			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>>>    				netif_start_subqueue(dev, qnum);
>>>    				virtqueue_disable_cb(sq->vq);
>>> @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>    		}
>>>    	}
>>> -	if (kick || netif_xmit_stopped(txq)) {
>>> +	if (use_napi)
>>> +		kick = __netdev_tx_sent_queue(txq, bytes, more);
>>> +	else
>>> +		kick = !more || netif_xmit_stopped(txq);
>>> +
>>> +	if (kick) {
>>>    		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
>>>    			u64_stats_update_begin(&sq->stats.syncp);
>>>    			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-26 15:19     ` Michael S. Tsirkin
@ 2018-12-27 10:00       ` Jason Wang
  2018-12-30 18:45         ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2018-12-27 10:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>> similar to wifi.  It's worth considering whether something similar to
>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>> benefitial.
>>
>> I've played a similar patch several days before. The tricky part is the mode
>> switching between napi and no napi. We should make sure when the packet is
>> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
>> tracking it through skb->cb.  And deal with the freeze by reset the BQL
>> status. Patch attached.
>>
>> But when testing with vhost-net, I don't very a stable performance,
> So how about increasing TSQ pacing shift then?


I can test this. But changing default TCP value is much more than a 
virtio-net specific thing.


>
>> it was
>> probably because we batch the used ring updating so tx interrupt may come
>> randomly. We probably need to implement time bounded coalescing mechanism
>> which could be configured from userspace.
> I don't think it's reasonable to expect userspace to be that smart ...
> Why do we need time bounded? used ring is always updated when ring
> becomes empty.


We don't add used when means BQL may not see the consumed packet in 
time. And the delay varies based on the workload since we count packets 
not bytes or time before doing the batched updating.

Thanks


>
>> Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
>> regression on machine without APICv, (haven't found time to test APICv
>> machine). But consider it was for correctness, I think it's acceptable? Then
>> we can do optimization on top?
>>
>>
>> Thanks
>>
>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>    drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
>>>    1 file changed, 19 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index cecfd77c9f3c..b657bde6b94b 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>>>    	return stats.packets;
>>>    }
>>> -static void free_old_xmit_skbs(struct send_queue *sq)
>>> +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
>>> +			       bool use_napi)
>>>    {
>>>    	struct sk_buff *skb;
>>>    	unsigned int len;
>>> @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>>>    	if (!packets)
>>>    		return;
>>> +	if (use_napi)
>>> +		netdev_tx_completed_queue(txq, packets, bytes);
>>> +
>>>    	u64_stats_update_begin(&sq->stats.syncp);
>>>    	sq->stats.bytes += bytes;
>>>    	sq->stats.packets += packets;
>>> @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>>>    		return;
>>>    	if (__netif_tx_trylock(txq)) {
>>> -		free_old_xmit_skbs(sq);
>>> +		free_old_xmit_skbs(sq, txq, true);
>>>    		__netif_tx_unlock(txq);
>>>    	}
>>> @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>>>    	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
>>>    	__netif_tx_lock(txq, raw_smp_processor_id());
>>> -	free_old_xmit_skbs(sq);
>>> +	free_old_xmit_skbs(sq, txq, true);
>>>    	__netif_tx_unlock(txq);
>>>    	virtqueue_napi_complete(napi, sq->vq, 0);
>>> @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>    	struct send_queue *sq = &vi->sq[qnum];
>>>    	int err;
>>>    	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>>> -	bool kick = !skb->xmit_more;
>>> +	bool more = skb->xmit_more;
>>>    	bool use_napi = sq->napi.weight;
>>> +	unsigned int bytes = skb->len;
>>> +	bool kick;
>>>    	/* Free up any pending old buffers before queueing new ones. */
>>> -	free_old_xmit_skbs(sq);
>>> +	free_old_xmit_skbs(sq, txq, use_napi);
>>> -	if (use_napi && kick)
>>> +	if (use_napi && !more)
>>>    		virtqueue_enable_cb_delayed(sq->vq);
>>>    	/* timestamp packet in software */
>>> @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>    		if (!use_napi &&
>>>    		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>>>    			/* More just got used, free them then recheck. */
>>> -			free_old_xmit_skbs(sq);
>>> +			free_old_xmit_skbs(sq, txq, false);
>>>    			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>>>    				netif_start_subqueue(dev, qnum);
>>>    				virtqueue_disable_cb(sq->vq);
>>> @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>    		}
>>>    	}
>>> -	if (kick || netif_xmit_stopped(txq)) {
>>> +	if (use_napi)
>>> +		kick = __netdev_tx_sent_queue(txq, bytes, more);
>>> +	else
>>> +		kick = !more || netif_xmit_stopped(txq);
>>> +
>>> +	if (kick) {
>>>    		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
>>>    			u64_stats_update_begin(&sq->stats.syncp);
>>>    			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-26 15:22     ` Michael S. Tsirkin
@ 2018-12-27 10:04       ` Jason Wang
  2018-12-30 18:48         ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2018-12-27 10:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2018/12/26 下午11:22, Michael S. Tsirkin wrote:
> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>> similar to wifi.  It's worth considering whether something similar to
>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>> benefitial.
>>
>> I've played a similar patch several days before. The tricky part is the mode
>> switching between napi and no napi. We should make sure when the packet is
>> sent and trakced by BQL,  it should be consumed by BQL as well.
>
> I just went over the patch again and I don't understand this comment.
> This patch only enabled BQL with tx napi.
>
> Thus there's no mode switching.
>
> What did I miss?


Consider the case:


TX NAPI is disabled:

send N packets

turn TX NAPI on:

get tx interrupt

BQL try to consume those packets when lead WARN for dql.


Thanks



>
>
>> I did it by
>> tracking it through skb->cb.  And deal with the freeze by reset the BQL
>> status. Patch attached.
>>
>> But when testing with vhost-net, I don't very a stable performance, it was
>> probably because we batch the used ring updating so tx interrupt may come
>> randomly. We probably need to implement time bounded coalescing mechanism
>> which could be configured from userspace.
>>
>> Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
>> regression on machine without APICv, (haven't found time to test APICv
>> machine). But consider it was for correctness, I think it's acceptable? Then
>> we can do optimization on top?
>>
>>
>> Thanks
>>
>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>    drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
>>>    1 file changed, 19 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index cecfd77c9f3c..b657bde6b94b 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>>>    	return stats.packets;
>>>    }
>>> -static void free_old_xmit_skbs(struct send_queue *sq)
>>> +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
>>> +			       bool use_napi)
>>>    {
>>>    	struct sk_buff *skb;
>>>    	unsigned int len;
>>> @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>>>    	if (!packets)
>>>    		return;
>>> +	if (use_napi)
>>> +		netdev_tx_completed_queue(txq, packets, bytes);
>>> +
>>>    	u64_stats_update_begin(&sq->stats.syncp);
>>>    	sq->stats.bytes += bytes;
>>>    	sq->stats.packets += packets;
>>> @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>>>    		return;
>>>    	if (__netif_tx_trylock(txq)) {
>>> -		free_old_xmit_skbs(sq);
>>> +		free_old_xmit_skbs(sq, txq, true);
>>>    		__netif_tx_unlock(txq);
>>>    	}
>>> @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>>>    	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
>>>    	__netif_tx_lock(txq, raw_smp_processor_id());
>>> -	free_old_xmit_skbs(sq);
>>> +	free_old_xmit_skbs(sq, txq, true);
>>>    	__netif_tx_unlock(txq);
>>>    	virtqueue_napi_complete(napi, sq->vq, 0);
>>> @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>    	struct send_queue *sq = &vi->sq[qnum];
>>>    	int err;
>>>    	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>>> -	bool kick = !skb->xmit_more;
>>> +	bool more = skb->xmit_more;
>>>    	bool use_napi = sq->napi.weight;
>>> +	unsigned int bytes = skb->len;
>>> +	bool kick;
>>>    	/* Free up any pending old buffers before queueing new ones. */
>>> -	free_old_xmit_skbs(sq);
>>> +	free_old_xmit_skbs(sq, txq, use_napi);
>>> -	if (use_napi && kick)
>>> +	if (use_napi && !more)
>>>    		virtqueue_enable_cb_delayed(sq->vq);
>>>    	/* timestamp packet in software */
>>> @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>    		if (!use_napi &&
>>>    		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>>>    			/* More just got used, free them then recheck. */
>>> -			free_old_xmit_skbs(sq);
>>> +			free_old_xmit_skbs(sq, txq, false);
>>>    			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>>>    				netif_start_subqueue(dev, qnum);
>>>    				virtqueue_disable_cb(sq->vq);
>>> @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>    		}
>>>    	}
>>> -	if (kick || netif_xmit_stopped(txq)) {
>>> +	if (use_napi)
>>> +		kick = __netdev_tx_sent_queue(txq, bytes, more);
>>> +	else
>>> +		kick = !more || netif_xmit_stopped(txq);
>>> +
>>> +	if (kick) {
>>>    		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
>>>    			u64_stats_update_begin(&sq->stats.syncp);
>>>    			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-27 10:00       ` Jason Wang
@ 2018-12-30 18:45         ` Michael S. Tsirkin
  2019-01-02  3:28           ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2018-12-30 18:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
> 
> On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
> > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > > > similar to wifi.  It's worth considering whether something similar to
> > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > > > benefitial.
> > > 
> > > I've played a similar patch several days before. The tricky part is the mode
> > > switching between napi and no napi. We should make sure when the packet is
> > > sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
> > > tracking it through skb->cb.  And deal with the freeze by reset the BQL
> > > status. Patch attached.
> > > 
> > > But when testing with vhost-net, I don't very a stable performance,
> > So how about increasing TSQ pacing shift then?
> 
> 
> I can test this. But changing default TCP value is much more than a
> virtio-net specific thing.

Well same logic as wifi applies. Unpredictable latencies related
to radio in one case, to host scheduler in the other.

> 
> > 
> > > it was
> > > probably because we batch the used ring updating so tx interrupt may come
> > > randomly. We probably need to implement time bounded coalescing mechanism
> > > which could be configured from userspace.
> > I don't think it's reasonable to expect userspace to be that smart ...
> > Why do we need time bounded? used ring is always updated when ring
> > becomes empty.
> 
> 
> We don't add used when means BQL may not see the consumed packet in time.
> And the delay varies based on the workload since we count packets not bytes
> or time before doing the batched updating.
> 
> Thanks

Sorry I still don't get it.
When nothing is outstanding then we do update the used.
So if BQL stops userspace from sending packets then
we get an interrupt and packets start flowing again.

It might be suboptimal, we might need to tune it but I doubt running
timers is a solution, timer interrupts cause VM exits.

> 
> > 
> > > Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
> > > regression on machine without APICv, (haven't found time to test APICv
> > > machine). But consider it was for correctness, I think it's acceptable? Then
> > > we can do optimization on top?
> > > 
> > > 
> > > Thanks
> > > 
> > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >    drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
> > > >    1 file changed, 19 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index cecfd77c9f3c..b657bde6b94b 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > >    	return stats.packets;
> > > >    }
> > > > -static void free_old_xmit_skbs(struct send_queue *sq)
> > > > +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
> > > > +			       bool use_napi)
> > > >    {
> > > >    	struct sk_buff *skb;
> > > >    	unsigned int len;
> > > > @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > >    	if (!packets)
> > > >    		return;
> > > > +	if (use_napi)
> > > > +		netdev_tx_completed_queue(txq, packets, bytes);
> > > > +
> > > >    	u64_stats_update_begin(&sq->stats.syncp);
> > > >    	sq->stats.bytes += bytes;
> > > >    	sq->stats.packets += packets;
> > > > @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > > >    		return;
> > > >    	if (__netif_tx_trylock(txq)) {
> > > > -		free_old_xmit_skbs(sq);
> > > > +		free_old_xmit_skbs(sq, txq, true);
> > > >    		__netif_tx_unlock(txq);
> > > >    	}
> > > > @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> > > >    	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> > > >    	__netif_tx_lock(txq, raw_smp_processor_id());
> > > > -	free_old_xmit_skbs(sq);
> > > > +	free_old_xmit_skbs(sq, txq, true);
> > > >    	__netif_tx_unlock(txq);
> > > >    	virtqueue_napi_complete(napi, sq->vq, 0);
> > > > @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >    	struct send_queue *sq = &vi->sq[qnum];
> > > >    	int err;
> > > >    	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> > > > -	bool kick = !skb->xmit_more;
> > > > +	bool more = skb->xmit_more;
> > > >    	bool use_napi = sq->napi.weight;
> > > > +	unsigned int bytes = skb->len;
> > > > +	bool kick;
> > > >    	/* Free up any pending old buffers before queueing new ones. */
> > > > -	free_old_xmit_skbs(sq);
> > > > +	free_old_xmit_skbs(sq, txq, use_napi);
> > > > -	if (use_napi && kick)
> > > > +	if (use_napi && !more)
> > > >    		virtqueue_enable_cb_delayed(sq->vq);
> > > >    	/* timestamp packet in software */
> > > > @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >    		if (!use_napi &&
> > > >    		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> > > >    			/* More just got used, free them then recheck. */
> > > > -			free_old_xmit_skbs(sq);
> > > > +			free_old_xmit_skbs(sq, txq, false);
> > > >    			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> > > >    				netif_start_subqueue(dev, qnum);
> > > >    				virtqueue_disable_cb(sq->vq);
> > > > @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >    		}
> > > >    	}
> > > > -	if (kick || netif_xmit_stopped(txq)) {
> > > > +	if (use_napi)
> > > > +		kick = __netdev_tx_sent_queue(txq, bytes, more);
> > > > +	else
> > > > +		kick = !more || netif_xmit_stopped(txq);
> > > > +
> > > > +	if (kick) {
> > > >    		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
> > > >    			u64_stats_update_begin(&sq->stats.syncp);
> > > >    			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-27 10:04       ` Jason Wang
@ 2018-12-30 18:48         ` Michael S. Tsirkin
  2019-01-02  3:30           ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2018-12-30 18:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Thu, Dec 27, 2018 at 06:04:53PM +0800, Jason Wang wrote:
> 
> On 2018/12/26 下午11:22, Michael S. Tsirkin wrote:
> > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > > > similar to wifi.  It's worth considering whether something similar to
> > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > > > benefitial.
> > > 
> > > I've played a similar patch several days before. The tricky part is the mode
> > > switching between napi and no napi. We should make sure when the packet is
> > > sent and trakced by BQL,  it should be consumed by BQL as well.
> > 
> > I just went over the patch again and I don't understand this comment.
> > This patch only enabled BQL with tx napi.
> > 
> > Thus there's no mode switching.
> > 
> > What did I miss?
> 
> 
> Consider the case:
> 
> 
> TX NAPI is disabled:
> 
> send N packets
> 
> turn TX NAPI on:
> 
> get tx interrupt
> 
> BQL try to consume those packets when lead WARN for dql.
> 
> 
> Thanks

Can one really switch tx napi on and off? How?
While root can change the napi_tx module parameter, I don't think
that has any effect outside device probe time. What did I miss?



> 
> 
> > 
> > 
> > > I did it by
> > > tracking it through skb->cb.  And deal with the freeze by reset the BQL
> > > status. Patch attached.
> > > 
> > > But when testing with vhost-net, I don't very a stable performance, it was
> > > probably because we batch the used ring updating so tx interrupt may come
> > > randomly. We probably need to implement time bounded coalescing mechanism
> > > which could be configured from userspace.
> > > 
> > > Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
> > > regression on machine without APICv, (haven't found time to test APICv
> > > machine). But consider it was for correctness, I think it's acceptable? Then
> > > we can do optimization on top?
> > > 
> > > 
> > > Thanks
> > > 
> > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >    drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
> > > >    1 file changed, 19 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index cecfd77c9f3c..b657bde6b94b 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > >    	return stats.packets;
> > > >    }
> > > > -static void free_old_xmit_skbs(struct send_queue *sq)
> > > > +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
> > > > +			       bool use_napi)
> > > >    {
> > > >    	struct sk_buff *skb;
> > > >    	unsigned int len;
> > > > @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > >    	if (!packets)
> > > >    		return;
> > > > +	if (use_napi)
> > > > +		netdev_tx_completed_queue(txq, packets, bytes);
> > > > +
> > > >    	u64_stats_update_begin(&sq->stats.syncp);
> > > >    	sq->stats.bytes += bytes;
> > > >    	sq->stats.packets += packets;
> > > > @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > > >    		return;
> > > >    	if (__netif_tx_trylock(txq)) {
> > > > -		free_old_xmit_skbs(sq);
> > > > +		free_old_xmit_skbs(sq, txq, true);
> > > >    		__netif_tx_unlock(txq);
> > > >    	}
> > > > @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> > > >    	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> > > >    	__netif_tx_lock(txq, raw_smp_processor_id());
> > > > -	free_old_xmit_skbs(sq);
> > > > +	free_old_xmit_skbs(sq, txq, true);
> > > >    	__netif_tx_unlock(txq);
> > > >    	virtqueue_napi_complete(napi, sq->vq, 0);
> > > > @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >    	struct send_queue *sq = &vi->sq[qnum];
> > > >    	int err;
> > > >    	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> > > > -	bool kick = !skb->xmit_more;
> > > > +	bool more = skb->xmit_more;
> > > >    	bool use_napi = sq->napi.weight;
> > > > +	unsigned int bytes = skb->len;
> > > > +	bool kick;
> > > >    	/* Free up any pending old buffers before queueing new ones. */
> > > > -	free_old_xmit_skbs(sq);
> > > > +	free_old_xmit_skbs(sq, txq, use_napi);
> > > > -	if (use_napi && kick)
> > > > +	if (use_napi && !more)
> > > >    		virtqueue_enable_cb_delayed(sq->vq);
> > > >    	/* timestamp packet in software */
> > > > @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >    		if (!use_napi &&
> > > >    		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> > > >    			/* More just got used, free them then recheck. */
> > > > -			free_old_xmit_skbs(sq);
> > > > +			free_old_xmit_skbs(sq, txq, false);
> > > >    			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> > > >    				netif_start_subqueue(dev, qnum);
> > > >    				virtqueue_disable_cb(sq->vq);
> > > > @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >    		}
> > > >    	}
> > > > -	if (kick || netif_xmit_stopped(txq)) {
> > > > +	if (use_napi)
> > > > +		kick = __netdev_tx_sent_queue(txq, bytes, more);
> > > > +	else
> > > > +		kick = !more || netif_xmit_stopped(txq);
> > > > +
> > > > +	if (kick) {
> > > >    		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
> > > >    			u64_stats_update_begin(&sq->stats.syncp);
> > > >    			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-30 18:45         ` Michael S. Tsirkin
@ 2019-01-02  3:28           ` Jason Wang
  2019-01-02 13:59             ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2019-01-02  3:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
> On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
>> On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>>>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>>>> similar to wifi.  It's worth considering whether something similar to
>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>>>> benefitial.
>>>> I've played a similar patch several days before. The tricky part is the mode
>>>> switching between napi and no napi. We should make sure when the packet is
>>>> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
>>>> tracking it through skb->cb.  And deal with the freeze by reset the BQL
>>>> status. Patch attached.
>>>>
>>>> But when testing with vhost-net, I don't very a stable performance,
>>> So how about increasing TSQ pacing shift then?
>>
>> I can test this. But changing default TCP value is much more than a
>> virtio-net specific thing.
> Well same logic as wifi applies. Unpredictable latencies related
> to radio in one case, to host scheduler in the other.
>
>>>> it was
>>>> probably because we batch the used ring updating so tx interrupt may come
>>>> randomly. We probably need to implement time bounded coalescing mechanism
>>>> which could be configured from userspace.
>>> I don't think it's reasonable to expect userspace to be that smart ...
>>> Why do we need time bounded? used ring is always updated when ring
>>> becomes empty.
>>
>> We don't add used when means BQL may not see the consumed packet in time.
>> And the delay varies based on the workload since we count packets not bytes
>> or time before doing the batched updating.
>>
>> Thanks
> Sorry I still don't get it.
> When nothing is outstanding then we do update the used.
> So if BQL stops userspace from sending packets then
> we get an interrupt and packets start flowing again.


Yes, but how about the cases of multiple flows. That's where I see 
unstable results.


>
> It might be suboptimal, we might need to tune it but I doubt running
> timers is a solution, timer interrupts cause VM exits.


Probably not a timer but a time counter (or event byte counter) in vhost 
to add used and signal guest if it exceeds a value instead of waiting 
the number of packets.


Thanks


>
>>>> Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
>>>> regression on machine without APICv, (haven't found time to test APICv
>>>> machine). But consider it was for correctness, I think it's acceptable? Then
>>>> we can do optimization on top?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> ---
>>>>>     drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
>>>>>     1 file changed, 19 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index cecfd77c9f3c..b657bde6b94b 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>>>>>     	return stats.packets;
>>>>>     }
>>>>> -static void free_old_xmit_skbs(struct send_queue *sq)
>>>>> +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
>>>>> +			       bool use_napi)
>>>>>     {
>>>>>     	struct sk_buff *skb;
>>>>>     	unsigned int len;
>>>>> @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>>>>>     	if (!packets)
>>>>>     		return;
>>>>> +	if (use_napi)
>>>>> +		netdev_tx_completed_queue(txq, packets, bytes);
>>>>> +
>>>>>     	u64_stats_update_begin(&sq->stats.syncp);
>>>>>     	sq->stats.bytes += bytes;
>>>>>     	sq->stats.packets += packets;
>>>>> @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>>>>>     		return;
>>>>>     	if (__netif_tx_trylock(txq)) {
>>>>> -		free_old_xmit_skbs(sq);
>>>>> +		free_old_xmit_skbs(sq, txq, true);
>>>>>     		__netif_tx_unlock(txq);
>>>>>     	}
>>>>> @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>>>>>     	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
>>>>>     	__netif_tx_lock(txq, raw_smp_processor_id());
>>>>> -	free_old_xmit_skbs(sq);
>>>>> +	free_old_xmit_skbs(sq, txq, true);
>>>>>     	__netif_tx_unlock(txq);
>>>>>     	virtqueue_napi_complete(napi, sq->vq, 0);
>>>>> @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>     	struct send_queue *sq = &vi->sq[qnum];
>>>>>     	int err;
>>>>>     	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>>>>> -	bool kick = !skb->xmit_more;
>>>>> +	bool more = skb->xmit_more;
>>>>>     	bool use_napi = sq->napi.weight;
>>>>> +	unsigned int bytes = skb->len;
>>>>> +	bool kick;
>>>>>     	/* Free up any pending old buffers before queueing new ones. */
>>>>> -	free_old_xmit_skbs(sq);
>>>>> +	free_old_xmit_skbs(sq, txq, use_napi);
>>>>> -	if (use_napi && kick)
>>>>> +	if (use_napi && !more)
>>>>>     		virtqueue_enable_cb_delayed(sq->vq);
>>>>>     	/* timestamp packet in software */
>>>>> @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>     		if (!use_napi &&
>>>>>     		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>>>>>     			/* More just got used, free them then recheck. */
>>>>> -			free_old_xmit_skbs(sq);
>>>>> +			free_old_xmit_skbs(sq, txq, false);
>>>>>     			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>>>>>     				netif_start_subqueue(dev, qnum);
>>>>>     				virtqueue_disable_cb(sq->vq);
>>>>> @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>     		}
>>>>>     	}
>>>>> -	if (kick || netif_xmit_stopped(txq)) {
>>>>> +	if (use_napi)
>>>>> +		kick = __netdev_tx_sent_queue(txq, bytes, more);
>>>>> +	else
>>>>> +		kick = !more || netif_xmit_stopped(txq);
>>>>> +
>>>>> +	if (kick) {
>>>>>     		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
>>>>>     			u64_stats_update_begin(&sq->stats.syncp);
>>>>>     			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2018-12-30 18:48         ` Michael S. Tsirkin
@ 2019-01-02  3:30           ` Jason Wang
  2019-01-02 13:54             ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2019-01-02  3:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2018/12/31 上午2:48, Michael S. Tsirkin wrote:
> On Thu, Dec 27, 2018 at 06:04:53PM +0800, Jason Wang wrote:
>> On 2018/12/26 下午11:22, Michael S. Tsirkin wrote:
>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>>>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>>>> similar to wifi.  It's worth considering whether something similar to
>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>>>> benefitial.
>>>> I've played a similar patch several days before. The tricky part is the mode
>>>> switching between napi and no napi. We should make sure when the packet is
>>>> sent and trakced by BQL,  it should be consumed by BQL as well.
>>> I just went over the patch again and I don't understand this comment.
>>> This patch only enabled BQL with tx napi.
>>>
>>> Thus there's no mode switching.
>>>
>>> What did I miss?
>> Consider the case:
>>
>>
>> TX NAPI is disabled:
>>
>> send N packets
>>
>> turn TX NAPI on:
>>
>> get tx interrupt
>>
>> BQL try to consume those packets when lead WARN for dql.
>>
>>
>> Thanks
> Can one really switch tx napi on and off? How?
> While root can change the napi_tx module parameter, I don't think
> that has any effect outside device probe time. What did I miss?
>
>
>

We support switch the mode through ethtool recently. See

commit 0c465be183c7c57a26446df6ea96d8676b865f92
Author: Jason Wang <jasowang@redhat.com>
Date:   Tue Oct 9 10:06:26 2018 +0800

     virtio_net: ethtool tx napi configuration

     Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
     Interrupt moderation is currently not supported, so these accept and
     display the default settings of 0 usec and 1 frame.

     Toggle tx napi through setting tx-frames. So as to not interfere
     with possible future interrupt moderation, value 1 means tx napi while
     value 0 means not.

     Only allow the switching when device is down for simplicity.

     Link: https://patchwork.ozlabs.org/patch/948149/
     Suggested-by: Jason Wang <jasowang@redhat.com>
     Signed-off-by: Willem de Bruijn <willemb@google.com>
     Signed-off-by: Jason Wang <jasowang@redhat.com>
     Signed-off-by: David S. Miller <davem@davemloft.net>

Thanks


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-02  3:30           ` Jason Wang
@ 2019-01-02 13:54             ` Michael S. Tsirkin
  2019-01-17 13:09               ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2019-01-02 13:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Wed, Jan 02, 2019 at 11:30:11AM +0800, Jason Wang wrote:
> 
> On 2018/12/31 上午2:48, Michael S. Tsirkin wrote:
> > On Thu, Dec 27, 2018 at 06:04:53PM +0800, Jason Wang wrote:
> > > On 2018/12/26 下午11:22, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > > > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > > > > > similar to wifi.  It's worth considering whether something similar to
> > > > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > > > > > benefitial.
> > > > > I've played a similar patch several days before. The tricky part is the mode
> > > > > switching between napi and no napi. We should make sure when the packet is
> > > > > sent and trakced by BQL,  it should be consumed by BQL as well.
> > > > I just went over the patch again and I don't understand this comment.
> > > > This patch only enabled BQL with tx napi.
> > > > 
> > > > Thus there's no mode switching.
> > > > 
> > > > What did I miss?
> > > Consider the case:
> > > 
> > > 
> > > TX NAPI is disabled:
> > > 
> > > send N packets
> > > 
> > > turn TX NAPI on:
> > > 
> > > get tx interrupt
> > > 
> > > BQL try to consume those packets when lead WARN for dql.
> > > 
> > > 
> > > Thanks
> > Can one really switch tx napi on and off? How?
> > While root can change the napi_tx module parameter, I don't think
> > that has any effect outside device probe time. What did I miss?
> > 
> > 
> > 
> 
> We support switch the mode through ethtool recently. See
> 
> commit 0c465be183c7c57a26446df6ea96d8676b865f92
> Author: Jason Wang <jasowang@redhat.com>
> Date:   Tue Oct 9 10:06:26 2018 +0800
> 
>     virtio_net: ethtool tx napi configuration
> 
>     Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
>     Interrupt moderation is currently not supported, so these accept and
>     display the default settings of 0 usec and 1 frame.
> 
>     Toggle tx napi through setting tx-frames. So as to not interfere
>     with possible future interrupt moderation, value 1 means tx napi while
>     value 0 means not.
> 
>     Only allow the switching when device is down for simplicity.
> 
>     Link: https://patchwork.ozlabs.org/patch/948149/
>     Suggested-by: Jason Wang <jasowang@redhat.com>
>     Signed-off-by: Willem de Bruijn <willemb@google.com>
>     Signed-off-by: Jason Wang <jasowang@redhat.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Thanks


It's disabled when device is up - isn't that enough?

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-02  3:28           ` Jason Wang
@ 2019-01-02 13:59             ` Michael S. Tsirkin
  2019-01-07  2:14               ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2019-01-02 13:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
> 
> On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
> > On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
> > > On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > > > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > > > > > similar to wifi.  It's worth considering whether something similar to
> > > > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > > > > > benefitial.
> > > > > I've played a similar patch several days before. The tricky part is the mode
> > > > > switching between napi and no napi. We should make sure when the packet is
> > > > > sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
> > > > > tracking it through skb->cb.  And deal with the freeze by reset the BQL
> > > > > status. Patch attached.
> > > > > 
> > > > > But when testing with vhost-net, I don't very a stable performance,
> > > > So how about increasing TSQ pacing shift then?
> > > 
> > > I can test this. But changing default TCP value is much more than a
> > > virtio-net specific thing.
> > Well same logic as wifi applies. Unpredictable latencies related
> > to radio in one case, to host scheduler in the other.
> > 
> > > > > it was
> > > > > probably because we batch the used ring updating so tx interrupt may come
> > > > > randomly. We probably need to implement time bounded coalescing mechanism
> > > > > which could be configured from userspace.
> > > > I don't think it's reasonable to expect userspace to be that smart ...
> > > > Why do we need time bounded? used ring is always updated when ring
> > > > becomes empty.
> > > 
> > > We don't add used when means BQL may not see the consumed packet in time.
> > > And the delay varies based on the workload since we count packets not bytes
> > > or time before doing the batched updating.
> > > 
> > > Thanks
> > Sorry I still don't get it.
> > When nothing is outstanding then we do update the used.
> > So if BQL stops userspace from sending packets then
> > we get an interrupt and packets start flowing again.
> 
> 
> Yes, but how about the cases of multiple flows. That's where I see unstable
> results.
> 
> 
> > 
> > It might be suboptimal, we might need to tune it but I doubt running
> > timers is a solution, timer interrupts cause VM exits.
> 
> 
> Probably not a timer but a time counter (or event byte counter) in vhost to
> add used and signal guest if it exceeds a value instead of waiting the
> number of packets.
> 
> 
> Thanks

Well we already have VHOST_NET_WEIGHT - is it too big then?

And maybe we should expose the "MORE" flag in the descriptor -
do you think that will help?



> 
> > 
> > > > > Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
> > > > > regression on machine without APICv, (haven't found time to test APICv
> > > > > machine). But consider it was for correctness, I think it's acceptable? Then
> > > > > we can do optimization on top?
> > > > > 
> > > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > ---
> > > > > >     drivers/net/virtio_net.c | 27 +++++++++++++++++++--------
> > > > > >     1 file changed, 19 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index cecfd77c9f3c..b657bde6b94b 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > >     	return stats.packets;
> > > > > >     }
> > > > > > -static void free_old_xmit_skbs(struct send_queue *sq)
> > > > > > +static void free_old_xmit_skbs(struct send_queue *sq, struct netdev_queue *txq,
> > > > > > +			       bool use_napi)
> > > > > >     {
> > > > > >     	struct sk_buff *skb;
> > > > > >     	unsigned int len;
> > > > > > @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > > > >     	if (!packets)
> > > > > >     		return;
> > > > > > +	if (use_napi)
> > > > > > +		netdev_tx_completed_queue(txq, packets, bytes);
> > > > > > +
> > > > > >     	u64_stats_update_begin(&sq->stats.syncp);
> > > > > >     	sq->stats.bytes += bytes;
> > > > > >     	sq->stats.packets += packets;
> > > > > > @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > > > > >     		return;
> > > > > >     	if (__netif_tx_trylock(txq)) {
> > > > > > -		free_old_xmit_skbs(sq);
> > > > > > +		free_old_xmit_skbs(sq, txq, true);
> > > > > >     		__netif_tx_unlock(txq);
> > > > > >     	}
> > > > > > @@ -1440,7 +1444,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> > > > > >     	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> > > > > >     	__netif_tx_lock(txq, raw_smp_processor_id());
> > > > > > -	free_old_xmit_skbs(sq);
> > > > > > +	free_old_xmit_skbs(sq, txq, true);
> > > > > >     	__netif_tx_unlock(txq);
> > > > > >     	virtqueue_napi_complete(napi, sq->vq, 0);
> > > > > > @@ -1505,13 +1509,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >     	struct send_queue *sq = &vi->sq[qnum];
> > > > > >     	int err;
> > > > > >     	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> > > > > > -	bool kick = !skb->xmit_more;
> > > > > > +	bool more = skb->xmit_more;
> > > > > >     	bool use_napi = sq->napi.weight;
> > > > > > +	unsigned int bytes = skb->len;
> > > > > > +	bool kick;
> > > > > >     	/* Free up any pending old buffers before queueing new ones. */
> > > > > > -	free_old_xmit_skbs(sq);
> > > > > > +	free_old_xmit_skbs(sq, txq, use_napi);
> > > > > > -	if (use_napi && kick)
> > > > > > +	if (use_napi && !more)
> > > > > >     		virtqueue_enable_cb_delayed(sq->vq);
> > > > > >     	/* timestamp packet in software */
> > > > > > @@ -1552,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >     		if (!use_napi &&
> > > > > >     		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> > > > > >     			/* More just got used, free them then recheck. */
> > > > > > -			free_old_xmit_skbs(sq);
> > > > > > +			free_old_xmit_skbs(sq, txq, false);
> > > > > >     			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> > > > > >     				netif_start_subqueue(dev, qnum);
> > > > > >     				virtqueue_disable_cb(sq->vq);
> > > > > > @@ -1560,7 +1566,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >     		}
> > > > > >     	}
> > > > > > -	if (kick || netif_xmit_stopped(txq)) {
> > > > > > +	if (use_napi)
> > > > > > +		kick = __netdev_tx_sent_queue(txq, bytes, more);
> > > > > > +	else
> > > > > > +		kick = !more || netif_xmit_stopped(txq);
> > > > > > +
> > > > > > +	if (kick) {
> > > > > >     		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
> > > > > >     			u64_stats_update_begin(&sq->stats.syncp);
> > > > > >     			sq->stats.kicks++;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-02 13:59             ` Michael S. Tsirkin
@ 2019-01-07  2:14               ` Jason Wang
  2019-01-07  3:17                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2019-01-07  2:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2019/1/2 下午9:59, Michael S. Tsirkin wrote:
> On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
>> On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
>>> On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
>>>> On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>>>>>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>>>>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>>>>>> similar to wifi.  It's worth considering whether something similar to
>>>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>>>>>> benefitial.
>>>>>> I've played a similar patch several days before. The tricky part is the mode
>>>>>> switching between napi and no napi. We should make sure when the packet is
>>>>>> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
>>>>>> tracking it through skb->cb.  And deal with the freeze by reset the BQL
>>>>>> status. Patch attached.
>>>>>>
>>>>>> But when testing with vhost-net, I don't very a stable performance,
>>>>> So how about increasing TSQ pacing shift then?
>>>> I can test this. But changing default TCP value is much more than a
>>>> virtio-net specific thing.
>>> Well same logic as wifi applies. Unpredictable latencies related
>>> to radio in one case, to host scheduler in the other.
>>>
>>>>>> it was
>>>>>> probably because we batch the used ring updating so tx interrupt may come
>>>>>> randomly. We probably need to implement time bounded coalescing mechanism
>>>>>> which could be configured from userspace.
>>>>> I don't think it's reasonable to expect userspace to be that smart ...
>>>>> Why do we need time bounded? used ring is always updated when ring
>>>>> becomes empty.
>>>> We don't add used when means BQL may not see the consumed packet in time.
>>>> And the delay varies based on the workload since we count packets not bytes
>>>> or time before doing the batched updating.
>>>>
>>>> Thanks
>>> Sorry I still don't get it.
>>> When nothing is outstanding then we do update the used.
>>> So if BQL stops userspace from sending packets then
>>> we get an interrupt and packets start flowing again.
>> Yes, but how about the cases of multiple flows. That's where I see unstable
>> results.
>>
>>
>>> It might be suboptimal, we might need to tune it but I doubt running
>>> timers is a solution, timer interrupts cause VM exits.
>> Probably not a timer but a time counter (or event byte counter) in vhost to
>> add used and signal guest if it exceeds a value instead of waiting the
>> number of packets.
>>
>>
>> Thanks
> Well we already have VHOST_NET_WEIGHT - is it too big then?


I'm not sure, it might be too big.


>
> And maybe we should expose the "MORE" flag in the descriptor -
> do you think that will help?
>

I don't know. But how a "more" flag can help here?

Thanks


>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-07  2:14               ` Jason Wang
@ 2019-01-07  3:17                 ` Michael S. Tsirkin
  2019-01-07  3:51                   ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2019-01-07  3:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote:
> 
> On 2019/1/2 下午9:59, Michael S. Tsirkin wrote:
> > On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
> > > On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
> > > > > On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
> > > > > > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > > > > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > > > > > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > > > > > > > similar to wifi.  It's worth considering whether something similar to
> > > > > > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > > > > > > > benefitial.
> > > > > > > I've played a similar patch several days before. The tricky part is the mode
> > > > > > > switching between napi and no napi. We should make sure when the packet is
> > > > > > > sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
> > > > > > > tracking it through skb->cb.  And deal with the freeze by reset the BQL
> > > > > > > status. Patch attached.
> > > > > > > 
> > > > > > > But when testing with vhost-net, I don't very a stable performance,
> > > > > > So how about increasing TSQ pacing shift then?
> > > > > I can test this. But changing default TCP value is much more than a
> > > > > virtio-net specific thing.
> > > > Well same logic as wifi applies. Unpredictable latencies related
> > > > to radio in one case, to host scheduler in the other.
> > > > 
> > > > > > > it was
> > > > > > > probably because we batch the used ring updating so tx interrupt may come
> > > > > > > randomly. We probably need to implement time bounded coalescing mechanism
> > > > > > > which could be configured from userspace.
> > > > > > I don't think it's reasonable to expect userspace to be that smart ...
> > > > > > Why do we need time bounded? used ring is always updated when ring
> > > > > > becomes empty.
> > > > > We don't add used when means BQL may not see the consumed packet in time.
> > > > > And the delay varies based on the workload since we count packets not bytes
> > > > > or time before doing the batched updating.
> > > > > 
> > > > > Thanks
> > > > Sorry I still don't get it.
> > > > When nothing is outstanding then we do update the used.
> > > > So if BQL stops userspace from sending packets then
> > > > we get an interrupt and packets start flowing again.
> > > Yes, but how about the cases of multiple flows. That's where I see unstable
> > > results.
> > > 
> > > 
> > > > It might be suboptimal, we might need to tune it but I doubt running
> > > > timers is a solution, timer interrupts cause VM exits.
> > > Probably not a timer but a time counter (or event byte counter) in vhost to
> > > add used and signal guest if it exceeds a value instead of waiting the
> > > number of packets.
> > > 
> > > 
> > > Thanks
> > Well we already have VHOST_NET_WEIGHT - is it too big then?
> 
> 
> I'm not sure, it might be too big.
> 
> 
> > 
> > And maybe we should expose the "MORE" flag in the descriptor -
> > do you think that will help?
> > 
> 
> I don't know. But how a "more" flag can help here?
> 
> Thanks

It sounds like we should be a bit more aggressive in updating used ring.
But if we just do it naively we will harm performance for sure as that
is how we are doing batching right now. Instead we could make guest
control batching using the more flag - if that's not set we write out
the used ring.

> 
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-07  3:17                 ` Michael S. Tsirkin
@ 2019-01-07  3:51                   ` Jason Wang
  2019-01-07  4:01                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2019-01-07  3:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2019/1/7 上午11:17, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote:
>> On 2019/1/2 下午9:59, Michael S. Tsirkin wrote:
>>> On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
>>>> On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
>>>>>> On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>>>>>>>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>>>>>>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>>>>>>>> similar to wifi.  It's worth considering whether something similar to
>>>>>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>>>>>>>> benefitial.
>>>>>>>> I've played a similar patch several days before. The tricky part is the mode
>>>>>>>> switching between napi and no napi. We should make sure when the packet is
>>>>>>>> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
>>>>>>>> tracking it through skb->cb.  And deal with the freeze by reset the BQL
>>>>>>>> status. Patch attached.
>>>>>>>>
>>>>>>>> But when testing with vhost-net, I don't very a stable performance,
>>>>>>> So how about increasing TSQ pacing shift then?
>>>>>> I can test this. But changing default TCP value is much more than a
>>>>>> virtio-net specific thing.
>>>>> Well same logic as wifi applies. Unpredictable latencies related
>>>>> to radio in one case, to host scheduler in the other.
>>>>>
>>>>>>>> it was
>>>>>>>> probably because we batch the used ring updating so tx interrupt may come
>>>>>>>> randomly. We probably need to implement time bounded coalescing mechanism
>>>>>>>> which could be configured from userspace.
>>>>>>> I don't think it's reasonable to expect userspace to be that smart ...
>>>>>>> Why do we need time bounded? used ring is always updated when ring
>>>>>>> becomes empty.
>>>>>> We don't add used when means BQL may not see the consumed packet in time.
>>>>>> And the delay varies based on the workload since we count packets not bytes
>>>>>> or time before doing the batched updating.
>>>>>>
>>>>>> Thanks
>>>>> Sorry I still don't get it.
>>>>> When nothing is outstanding then we do update the used.
>>>>> So if BQL stops userspace from sending packets then
>>>>> we get an interrupt and packets start flowing again.
>>>> Yes, but how about the cases of multiple flows. That's where I see unstable
>>>> results.
>>>>
>>>>
>>>>> It might be suboptimal, we might need to tune it but I doubt running
>>>>> timers is a solution, timer interrupts cause VM exits.
>>>> Probably not a timer but a time counter (or event byte counter) in vhost to
>>>> add used and signal guest if it exceeds a value instead of waiting the
>>>> number of packets.
>>>>
>>>>
>>>> Thanks
>>> Well we already have VHOST_NET_WEIGHT - is it too big then?
>>
>> I'm not sure, it might be too big.
>>
>>
>>> And maybe we should expose the "MORE" flag in the descriptor -
>>> do you think that will help?
>>>
>> I don't know. But how a "more" flag can help here?
>>
>> Thanks
> It sounds like we should be a bit more aggressive in updating used ring.
> But if we just do it naively we will harm performance for sure as that
> is how we are doing batching right now.


I agree but the problem is to balance the PPS and throughput. More 
batching helps for PPS but may damage TCP throughput.


>   Instead we could make guest
> control batching using the more flag - if that's not set we write out
> the used ring.


It's under the control of guest, so I'm afraid we still need some more 
guard (e.g time/bytes counters) on host.

Thanks


>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-07  3:51                   ` Jason Wang
@ 2019-01-07  4:01                     ` Michael S. Tsirkin
  2019-01-07  6:31                       ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2019-01-07  4:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Mon, Jan 07, 2019 at 11:51:55AM +0800, Jason Wang wrote:
> 
> On 2019/1/7 上午11:17, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote:
> > > On 2019/1/2 下午9:59, Michael S. Tsirkin wrote:
> > > > On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
> > > > > On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
> > > > > > On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
> > > > > > > On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > > > > > > > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > > > > > > > > > similar to wifi.  It's worth considering whether something similar to
> > > > > > > > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > > > > > > > > > benefitial.
> > > > > > > > > I've played a similar patch several days before. The tricky part is the mode
> > > > > > > > > switching between napi and no napi. We should make sure when the packet is
> > > > > > > > > sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
> > > > > > > > > tracking it through skb->cb.  And deal with the freeze by reset the BQL
> > > > > > > > > status. Patch attached.
> > > > > > > > > 
> > > > > > > > > But when testing with vhost-net, I don't very a stable performance,
> > > > > > > > So how about increasing TSQ pacing shift then?
> > > > > > > I can test this. But changing default TCP value is much more than a
> > > > > > > virtio-net specific thing.
> > > > > > Well same logic as wifi applies. Unpredictable latencies related
> > > > > > to radio in one case, to host scheduler in the other.
> > > > > > 
> > > > > > > > > it was
> > > > > > > > > probably because we batch the used ring updating so tx interrupt may come
> > > > > > > > > randomly. We probably need to implement time bounded coalescing mechanism
> > > > > > > > > which could be configured from userspace.
> > > > > > > > I don't think it's reasonable to expect userspace to be that smart ...
> > > > > > > > Why do we need time bounded? used ring is always updated when ring
> > > > > > > > becomes empty.
> > > > > > > We don't add used when means BQL may not see the consumed packet in time.
> > > > > > > And the delay varies based on the workload since we count packets not bytes
> > > > > > > or time before doing the batched updating.
> > > > > > > 
> > > > > > > Thanks
> > > > > > Sorry I still don't get it.
> > > > > > When nothing is outstanding then we do update the used.
> > > > > > So if BQL stops userspace from sending packets then
> > > > > > we get an interrupt and packets start flowing again.
> > > > > Yes, but how about the cases of multiple flows. That's where I see unstable
> > > > > results.
> > > > > 
> > > > > 
> > > > > > It might be suboptimal, we might need to tune it but I doubt running
> > > > > > timers is a solution, timer interrupts cause VM exits.
> > > > > Probably not a timer but a time counter (or event byte counter) in vhost to
> > > > > add used and signal guest if it exceeds a value instead of waiting the
> > > > > number of packets.
> > > > > 
> > > > > 
> > > > > Thanks
> > > > Well we already have VHOST_NET_WEIGHT - is it too big then?
> > > 
> > > I'm not sure, it might be too big.
> > > 
> > > 
> > > > And maybe we should expose the "MORE" flag in the descriptor -
> > > > do you think that will help?
> > > > 
> > > I don't know. But how a "more" flag can help here?
> > > 
> > > Thanks
> > It sounds like we should be a bit more aggressive in updating used ring.
> > But if we just do it naively we will harm performance for sure as that
> > is how we are doing batching right now.
> 
> 
> I agree but the problem is to balance the PPS and throughput. More batching
> helps for PPS but may damage TCP throughput.

That is what more flag is supposed to be I think - it is only set if
there's a socket that actually needs the skb freed in order to go on.

> 
> >   Instead we could make guest
> > control batching using the more flag - if that's not set we write out
> > the used ring.
> 
> 
> It's under the control of guest, so I'm afraid we still need some more guard
> (e.g time/bytes counters) on host.
> 
> Thanks

Point is if guest does not care about the skb being freed, then there is no
rush host side to mark buffer used.


> 
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-07  4:01                     ` Michael S. Tsirkin
@ 2019-01-07  6:31                       ` Jason Wang
  2019-01-07 14:19                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2019-01-07  6:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2019/1/7 下午12:01, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 11:51:55AM +0800, Jason Wang wrote:
>> On 2019/1/7 上午11:17, Michael S. Tsirkin wrote:
>>> On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote:
>>>> On 2019/1/2 下午9:59, Michael S. Tsirkin wrote:
>>>>> On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
>>>>>> On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
>>>>>>>> On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
>>>>>>>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>>>>>>>>>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>>>>>>>>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>>>>>>>>>> similar to wifi.  It's worth considering whether something similar to
>>>>>>>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>>>>>>>>>> benefitial.
>>>>>>>>>> I've played a similar patch several days before. The tricky part is the mode
>>>>>>>>>> switching between napi and no napi. We should make sure when the packet is
>>>>>>>>>> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
>>>>>>>>>> tracking it through skb->cb.  And deal with the freeze by reset the BQL
>>>>>>>>>> status. Patch attached.
>>>>>>>>>>
>>>>>>>>>> But when testing with vhost-net, I don't very a stable performance,
>>>>>>>>> So how about increasing TSQ pacing shift then?
>>>>>>>> I can test this. But changing default TCP value is much more than a
>>>>>>>> virtio-net specific thing.
>>>>>>> Well same logic as wifi applies. Unpredictable latencies related
>>>>>>> to radio in one case, to host scheduler in the other.
>>>>>>>
>>>>>>>>>> it was
>>>>>>>>>> probably because we batch the used ring updating so tx interrupt may come
>>>>>>>>>> randomly. We probably need to implement time bounded coalescing mechanism
>>>>>>>>>> which could be configured from userspace.
>>>>>>>>> I don't think it's reasonable to expect userspace to be that smart ...
>>>>>>>>> Why do we need time bounded? used ring is always updated when ring
>>>>>>>>> becomes empty.
>>>>>>>> We don't add used when means BQL may not see the consumed packet in time.
>>>>>>>> And the delay varies based on the workload since we count packets not bytes
>>>>>>>> or time before doing the batched updating.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Sorry I still don't get it.
>>>>>>> When nothing is outstanding then we do update the used.
>>>>>>> So if BQL stops userspace from sending packets then
>>>>>>> we get an interrupt and packets start flowing again.
>>>>>> Yes, but how about the cases of multiple flows. That's where I see unstable
>>>>>> results.
>>>>>>
>>>>>>
>>>>>>> It might be suboptimal, we might need to tune it but I doubt running
>>>>>>> timers is a solution, timer interrupts cause VM exits.
>>>>>> Probably not a timer but a time counter (or event byte counter) in vhost to
>>>>>> add used and signal guest if it exceeds a value instead of waiting the
>>>>>> number of packets.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>> Well we already have VHOST_NET_WEIGHT - is it too big then?
>>>> I'm not sure, it might be too big.
>>>>
>>>>
>>>>> And maybe we should expose the "MORE" flag in the descriptor -
>>>>> do you think that will help?
>>>>>
>>>> I don't know. But how a "more" flag can help here?
>>>>
>>>> Thanks
>>> It sounds like we should be a bit more aggressive in updating used ring.
>>> But if we just do it naively we will harm performance for sure as that
>>> is how we are doing batching right now.
>>
>> I agree but the problem is to balance the PPS and throughput. More batching
>> helps for PPS but may damage TCP throughput.
> That is what more flag is supposed to be I think - it is only set if
> there's a socket that actually needs the skb freed in order to go on.


I'm not quite sure I get, but is this something similar to what you want?

https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027667.html

Which enables tx interrupt for TCP packets, and you want to add used 
more aggressively for those sockets?


Thanks


>>>    Instead we could make guest
>>> control batching using the more flag - if that's not set we write out
>>> the used ring.
>>
>> It's under the control of guest, so I'm afraid we still need some more guard
>> (e.g time/bytes counters) on host.
>>
>> Thanks
> Point is if guest does not care about the skb being freed, then there is no
> rush host side to mark buffer used.
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-07  6:31                       ` Jason Wang
@ 2019-01-07 14:19                         ` Michael S. Tsirkin
  2019-01-08 10:06                           ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2019-01-07 14:19 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev

On Mon, Jan 07, 2019 at 02:31:47PM +0800, Jason Wang wrote:
> 
> On 2019/1/7 下午12:01, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 11:51:55AM +0800, Jason Wang wrote:
> > > On 2019/1/7 上午11:17, Michael S. Tsirkin wrote:
> > > > On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote:
> > > > > On 2019/1/2 下午9:59, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
> > > > > > > On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
> > > > > > > > > > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > > > > > > > > > When use_napi is set, let's enable BQLs.  Note: some of the issues are
> > > > > > > > > > > > similar to wifi.  It's worth considering whether something similar to
> > > > > > > > > > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > > > > > > > > > > > benefitial.
> > > > > > > > > > > I've played a similar patch several days before. The tricky part is the mode
> > > > > > > > > > > switching between napi and no napi. We should make sure when the packet is
> > > > > > > > > > > sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
> > > > > > > > > > > tracking it through skb->cb.  And deal with the freeze by reset the BQL
> > > > > > > > > > > status. Patch attached.
> > > > > > > > > > > 
> > > > > > > > > > > But when testing with vhost-net, I don't very a stable performance,
> > > > > > > > > > So how about increasing TSQ pacing shift then?
> > > > > > > > > I can test this. But changing default TCP value is much more than a
> > > > > > > > > virtio-net specific thing.
> > > > > > > > Well same logic as wifi applies. Unpredictable latencies related
> > > > > > > > to radio in one case, to host scheduler in the other.
> > > > > > > > 
> > > > > > > > > > > it was
> > > > > > > > > > > probably because we batch the used ring updating so tx interrupt may come
> > > > > > > > > > > randomly. We probably need to implement time bounded coalescing mechanism
> > > > > > > > > > > which could be configured from userspace.
> > > > > > > > > > I don't think it's reasonable to expect userspace to be that smart ...
> > > > > > > > > > Why do we need time bounded? used ring is always updated when ring
> > > > > > > > > > becomes empty.
> > > > > > > > > We don't add used when means BQL may not see the consumed packet in time.
> > > > > > > > > And the delay varies based on the workload since we count packets not bytes
> > > > > > > > > or time before doing the batched updating.
> > > > > > > > > 
> > > > > > > > > Thanks
> > > > > > > > Sorry I still don't get it.
> > > > > > > > When nothing is outstanding then we do update the used.
> > > > > > > > So if BQL stops userspace from sending packets then
> > > > > > > > we get an interrupt and packets start flowing again.
> > > > > > > Yes, but how about the cases of multiple flows. That's where I see unstable
> > > > > > > results.
> > > > > > > 
> > > > > > > 
> > > > > > > > It might be suboptimal, we might need to tune it but I doubt running
> > > > > > > > timers is a solution, timer interrupts cause VM exits.
> > > > > > > Probably not a timer but a time counter (or event byte counter) in vhost to
> > > > > > > add used and signal guest if it exceeds a value instead of waiting the
> > > > > > > number of packets.
> > > > > > > 
> > > > > > > 
> > > > > > > Thanks
> > > > > > Well we already have VHOST_NET_WEIGHT - is it too big then?
> > > > > I'm not sure, it might be too big.
> > > > > 
> > > > > 
> > > > > > And maybe we should expose the "MORE" flag in the descriptor -
> > > > > > do you think that will help?
> > > > > > 
> > > > > I don't know. But how a "more" flag can help here?
> > > > > 
> > > > > Thanks
> > > > It sounds like we should be a bit more aggressive in updating used ring.
> > > > But if we just do it naively we will harm performance for sure as that
> > > > is how we are doing batching right now.
> > > 
> > > I agree but the problem is to balance the PPS and throughput. More batching
> > > helps for PPS but may damage TCP throughput.
> > That is what more flag is supposed to be I think - it is only set if
> > there's a socket that actually needs the skb freed in order to go on.
> 
> 
> I'm not quite sure I get, but is this something similar to what you want?
> 
> https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027667.html
> 
> Which enables tx interrupt for TCP packets, and you want to add used more
> aggressively for those sockets?
> 
> 
> Thanks

That's the idea.
But then you said we can just play with event index
instead. I think the answer to why not do that is that it's tricky to do
without races.


We need to think about the exact semantics: e.g. I think it is better to
keep interrupts on and then saying "I promise sending more buffers even
if you do not use any buffers so using this one is not urgent" rather
than as your patches do keeping them off and then saying "this one is
urgent".

The reason being is that "I promise to send more" is
more informative and can allow better batching for the
host.

> 
> > > >    Instead we could make guest
> > > > control batching using the more flag - if that's not set we write out
> > > > the used ring.
> > > 
> > > It's under the control of guest, so I'm afraid we still need some more guard
> > > (e.g time/bytes counters) on host.
> > > 
> > > Thanks
> > Point is if guest does not care about the skb being freed, then there is no
> > rush host side to mark buffer used.
> > 
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-07 14:19                         ` Michael S. Tsirkin
@ 2019-01-08 10:06                           ` Jason Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2019-01-08 10:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2019/1/7 下午10:19, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 02:31:47PM +0800, Jason Wang wrote:
>> On 2019/1/7 下午12:01, Michael S. Tsirkin wrote:
>>> On Mon, Jan 07, 2019 at 11:51:55AM +0800, Jason Wang wrote:
>>>> On 2019/1/7 上午11:17, Michael S. Tsirkin wrote:
>>>>> On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote:
>>>>>> On 2019/1/2 下午9:59, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
>>>>>>>> On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
>>>>>>>>> On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
>>>>>>>>>> On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>>>>>>>>>>>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>>>>>>>>>>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>>>>>>>>>>>> similar to wifi.  It's worth considering whether something similar to
>>>>>>>>>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>>>>>>>>>>>> benefitial.
>>>>>>>>>>>> I've played a similar patch several days before. The tricky part is the mode
>>>>>>>>>>>> switching between napi and no napi. We should make sure when the packet is
>>>>>>>>>>>> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
>>>>>>>>>>>> tracking it through skb->cb.  And deal with the freeze by reset the BQL
>>>>>>>>>>>> status. Patch attached.
>>>>>>>>>>>>
>>>>>>>>>>>> But when testing with vhost-net, I don't very a stable performance,
>>>>>>>>>>> So how about increasing TSQ pacing shift then?
>>>>>>>>>> I can test this. But changing default TCP value is much more than a
>>>>>>>>>> virtio-net specific thing.
>>>>>>>>> Well same logic as wifi applies. Unpredictable latencies related
>>>>>>>>> to radio in one case, to host scheduler in the other.
>>>>>>>>>
>>>>>>>>>>>> it was
>>>>>>>>>>>> probably because we batch the used ring updating so tx interrupt may come
>>>>>>>>>>>> randomly. We probably need to implement time bounded coalescing mechanism
>>>>>>>>>>>> which could be configured from userspace.
>>>>>>>>>>> I don't think it's reasonable to expect userspace to be that smart ...
>>>>>>>>>>> Why do we need time bounded? used ring is always updated when ring
>>>>>>>>>>> becomes empty.
>>>>>>>>>> We don't add used when means BQL may not see the consumed packet in time.
>>>>>>>>>> And the delay varies based on the workload since we count packets not bytes
>>>>>>>>>> or time before doing the batched updating.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>> Sorry I still don't get it.
>>>>>>>>> When nothing is outstanding then we do update the used.
>>>>>>>>> So if BQL stops userspace from sending packets then
>>>>>>>>> we get an interrupt and packets start flowing again.
>>>>>>>> Yes, but how about the cases of multiple flows. That's where I see unstable
>>>>>>>> results.
>>>>>>>>
>>>>>>>>
>>>>>>>>> It might be suboptimal, we might need to tune it but I doubt running
>>>>>>>>> timers is a solution, timer interrupts cause VM exits.
>>>>>>>> Probably not a timer but a time counter (or event byte counter) in vhost to
>>>>>>>> add used and signal guest if it exceeds a value instead of waiting the
>>>>>>>> number of packets.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Well we already have VHOST_NET_WEIGHT - is it too big then?
>>>>>> I'm not sure, it might be too big.
>>>>>>
>>>>>>
>>>>>>> And maybe we should expose the "MORE" flag in the descriptor -
>>>>>>> do you think that will help?
>>>>>>>
>>>>>> I don't know. But how a "more" flag can help here?
>>>>>>
>>>>>> Thanks
>>>>> It sounds like we should be a bit more aggressive in updating used ring.
>>>>> But if we just do it naively we will harm performance for sure as that
>>>>> is how we are doing batching right now.
>>>> I agree but the problem is to balance the PPS and throughput. More batching
>>>> helps for PPS but may damage TCP throughput.
>>> That is what more flag is supposed to be I think - it is only set if
>>> there's a socket that actually needs the skb freed in order to go on.
>>
>> I'm not quite sure I get, but is this something similar to what you want?
>>
>> https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027667.html
>>
>> Which enables tx interrupt for TCP packets, and you want to add used more
>> aggressively for those sockets?
>>
>>
>> Thanks
> That's the idea.
> But then you said we can just play with event index
> instead. I think the answer to why not do that is that it's tricky to do
> without races.


We don't do batched used ring update at that time. We can check whether 
or not guest asking for a interrupt and add used immediately. Actually, 
I've played a patch to do this. It helps a little but damage the PPS. 
This is probably because we need more userspace memory accesses.


>
>
> We need to think about the exact semantics: e.g. I think it is better to
> keep interrupts on and then saying "I promise sending more buffers even
> if you do not use any buffers so using this one is not urgent" rather
> than as your patches do keeping them off and then saying "this one is
> urgent".
>
> The reason being is that "I promise to send more" is
> more informative and can allow better batching for the
> host.


Just to make sure I understand, you mean set batch flag for e.g non TCP 
socket?

Thanks


>
>>>>>     Instead we could make guest
>>>>> control batching using the more flag - if that's not set we write out
>>>>> the used ring.
>>>> It's under the control of guest, so I'm afraid we still need some more guard
>>>> (e.g time/bytes counters) on host.
>>>>
>>>> Thanks
>>> Point is if guest does not care about the skb being freed, then there is no
>>> rush host side to mark buffer used.
>>>
>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/2] virtio-net: bql support
  2019-01-02 13:54             ` Michael S. Tsirkin
@ 2019-01-17 13:09               ` Jason Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2019-01-17 13:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, maxime.coquelin, tiwei.bie, wexu, jfreimann,
	David S. Miller, virtualization, netdev


On 2019/1/2 下午9:54, Michael S. Tsirkin wrote:
> On Wed, Jan 02, 2019 at 11:30:11AM +0800, Jason Wang wrote:
>> On 2018/12/31 上午2:48, Michael S. Tsirkin wrote:
>>> On Thu, Dec 27, 2018 at 06:04:53PM +0800, Jason Wang wrote:
>>>> On 2018/12/26 下午11:22, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>>>>>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>>>>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>>>>>> similar to wifi.  It's worth considering whether something similar to
>>>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>>>>>> benefitial.
>>>>>> I've played a similar patch several days before. The tricky part is the mode
>>>>>> switching between napi and no napi. We should make sure when the packet is
>>>>>> sent and trakced by BQL,  it should be consumed by BQL as well.
>>>>> I just went over the patch again and I don't understand this comment.
>>>>> This patch only enabled BQL with tx napi.
>>>>>
>>>>> Thus there's no mode switching.
>>>>>
>>>>> What did I miss?
>>>> Consider the case:
>>>>
>>>>
>>>> TX NAPI is disabled:
>>>>
>>>> send N packets
>>>>
>>>> turn TX NAPI on:
>>>>
>>>> get tx interrupt
>>>>
>>>> BQL try to consume those packets when lead WARN for dql.
>>>>
>>>>
>>>> Thanks
>>> Can one really switch tx napi on and off? How?
>>> While root can change the napi_tx module parameter, I don't think
>>> that has any effect outside device probe time. What did I miss?
>>>
>>>
>>>
>> We support switch the mode through ethtool recently. See
>>
>> commit 0c465be183c7c57a26446df6ea96d8676b865f92
>> Author: Jason Wang <jasowang@redhat.com>
>> Date:   Tue Oct 9 10:06:26 2018 +0800
>>
>>      virtio_net: ethtool tx napi configuration
>>
>>      Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
>>      Interrupt moderation is currently not supported, so these accept and
>>      display the default settings of 0 usec and 1 frame.
>>
>>      Toggle tx napi through setting tx-frames. So as to not interfere
>>      with possible future interrupt moderation, value 1 means tx napi while
>>      value 0 means not.
>>
>>      Only allow the switching when device is down for simplicity.
>>
>>      Link: https://patchwork.ozlabs.org/patch/948149/
>>      Suggested-by: Jason Wang <jasowang@redhat.com>
>>      Signed-off-by: Willem de Bruijn <willemb@google.com>
>>      Signed-off-by: Jason Wang <jasowang@redhat.com>
>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> Thanks
>
> It's disabled when device is up - isn't that enough?


Consider the case:

1) tx napi is disabled

2) send packets

3) tx napi is enabled through ethtool

4) get tx interrupt

5) BQL may start to consume packet that was sent when tx napi is 
disabled which will trigger BUG or WARN in dql

Thanks



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-01-17 13:09 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05 22:54 [PATCH RFC 0/2] virtio-net: interrupt related improvements Michael S. Tsirkin
2018-12-05 22:54 ` [PATCH RFC 1/2] virtio-net: bql support Michael S. Tsirkin
2018-12-06  8:17   ` Jason Wang
2018-12-06  8:31     ` Jason Wang
2018-12-26 15:15     ` Michael S. Tsirkin
2018-12-27  9:56       ` Jason Wang
2018-12-26 15:19     ` Michael S. Tsirkin
2018-12-27 10:00       ` Jason Wang
2018-12-30 18:45         ` Michael S. Tsirkin
2019-01-02  3:28           ` Jason Wang
2019-01-02 13:59             ` Michael S. Tsirkin
2019-01-07  2:14               ` Jason Wang
2019-01-07  3:17                 ` Michael S. Tsirkin
2019-01-07  3:51                   ` Jason Wang
2019-01-07  4:01                     ` Michael S. Tsirkin
2019-01-07  6:31                       ` Jason Wang
2019-01-07 14:19                         ` Michael S. Tsirkin
2019-01-08 10:06                           ` Jason Wang
2018-12-26 15:22     ` Michael S. Tsirkin
2018-12-27 10:04       ` Jason Wang
2018-12-30 18:48         ` Michael S. Tsirkin
2019-01-02  3:30           ` Jason Wang
2019-01-02 13:54             ` Michael S. Tsirkin
2019-01-17 13:09               ` Jason Wang
2018-12-05 22:54 ` [PATCH RFC 2/2] virtio_net: bulk free tx skbs Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).