All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net 0/3] net: bql: better deal with GSO
@ 2018-10-31 15:39 Eric Dumazet
  2018-10-31 15:39 ` [PATCH v2 net 1/3] net: bql: add __netdev_tx_sent_queue() Eric Dumazet
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Eric Dumazet @ 2018-10-31 15:39 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Willem de Bruijn, Eric Dumazet, Eric Dumazet

While BQL bulk dequeue works well for TSO packets, it is
not very efficient as soon as GSO is involved.

On a GSO only workload (UDP or TCP), this patch series
can save about 8 % of cpu cycles on a 40Gbit mlx4 NIC,
by keeping optimal batching, and avoiding expensive
doorbells, qdisc requeues and reschedules.

This patch series :

- Add __netdev_tx_sent_queue() so that drivers
  can implement efficient BQL and xmit_more support.

- Implement a work around in dev_hard_start_xmit()
  for drivers not using __netdev_tx_sent_queue()

- changes mlx4 to use __netdev_tx_sent_queue()

v2: Tariq and Willem feedback addressed.
    added __netdev_tx_sent_queue() (Willem suggestion)


Eric Dumazet (3):
  net: bql: add __netdev_tx_sent_queue()
  net: do not abort bulk send on BQL status
  net/mlx4_en: use __netdev_tx_sent_queue()

 drivers/net/ethernet/mellanox/mlx4/en_tx.c |  6 ++++--
 include/linux/netdevice.h                  | 20 ++++++++++++++++++++
 net/core/dev.c                             |  2 +-
 3 files changed, 25 insertions(+), 3 deletions(-)

-- 
2.19.1.930.g4563a0d9d0-goog

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 net 1/3] net: bql: add __netdev_tx_sent_queue()
  2018-10-31 15:39 [PATCH v2 net 0/3] net: bql: better deal with GSO Eric Dumazet
@ 2018-10-31 15:39 ` Eric Dumazet
  2018-10-31 15:39 ` [PATCH v2 net 2/3] net: do not abort bulk send on BQL status Eric Dumazet
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2018-10-31 15:39 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Willem de Bruijn, Eric Dumazet, Eric Dumazet

When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
of packets, GSO can later transform this list in another list
of skbs, and each skb is sent to device ndo_start_xmit(),
one at a time, with skb->xmit_more being set to one but
for last skb.

Problem is that very often, BQL limit is hit in the middle of
the packet train, forcing dev_hard_start_xmit() to stop the
bulk send and requeue the end of the list.

BQL role is to avoid head of line blocking, making sure
a qdisc can deliver high priority packets before low priority ones.

But there is no way requeued packets can be bypassed by fresh
packets in the qdisc.

Aborting the bulk send increases TX softirqs, and hot cache
lines (after skb_segment()) are wasted.

Note that for TSO packets, we never split a packet in the middle
because of BQL limit being hit.

Drivers should be able to update BQL counters without
flipping/caring about BQL status, if the current skb
has xmit_more set.

Upper layers are ultimately responsible to stop sending another
packet train when BQL limit is hit.

Code template in a driver might look like the following :

	send_doorbell = __netdev_tx_sent_queue(tx_queue, nr_bytes, skb->xmit_more);

Note that __netdev_tx_sent_queue() use is not mandatory,
since following patch will change dev_hard_start_xmit()
to not care about BQL status.

But it is highly recommended so that xmit_more full benefits
can be reached (less doorbells sent, and less atomic operations as well)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dc1d9ed33b3192e9406b17c3107b3235b28ff1b9..857f8abf7b91bc79731873fc8f68e31f6bff4d03 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3190,6 +3190,26 @@ static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
 #endif
 }
 
+/* Variant of netdev_tx_sent_queue() for drivers that are aware
+ * that they should not test BQL status themselves.
+ * We do want to change __QUEUE_STATE_STACK_XOFF only for the last
+ * skb of a batch.
+ * Returns true if the doorbell must be used to kick the NIC.
+ */
+static inline bool __netdev_tx_sent_queue(struct netdev_queue *dev_queue,
+					  unsigned int bytes,
+					  bool xmit_more)
+{
+	if (xmit_more) {
+#ifdef CONFIG_BQL
+		dql_queued(&dev_queue->dql, bytes);
+#endif
+		return netif_tx_queue_stopped(dev_queue);
+	}
+	netdev_tx_sent_queue(dev_queue, bytes);
+	return true;
+}
+
 /**
  * 	netdev_sent_queue - report the number of bytes queued to hardware
  * 	@dev: network device
-- 
2.19.1.930.g4563a0d9d0-goog

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 net 2/3] net: do not abort bulk send on BQL status
  2018-10-31 15:39 [PATCH v2 net 0/3] net: bql: better deal with GSO Eric Dumazet
  2018-10-31 15:39 ` [PATCH v2 net 1/3] net: bql: add __netdev_tx_sent_queue() Eric Dumazet
@ 2018-10-31 15:39 ` Eric Dumazet
  2018-10-31 15:39 ` [PATCH v2 net 3/3] net/mlx4_en: use __netdev_tx_sent_queue() Eric Dumazet
  2018-11-03 22:40 ` [PATCH v2 net 0/3] net: bql: better deal with GSO David Miller
  3 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2018-10-31 15:39 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Willem de Bruijn, Eric Dumazet, Eric Dumazet

Before calling dev_hard_start_xmit(), upper layers tried
to cook optimal skb list based on BQL budget.

Problem is that GSO packets can end up comsuming more than
the BQL budget.

Breaking the loop is not useful, since requeued packets
are ahead of any packets still in the qdisc.

It is also more expensive, since next TX completion will
push these packets later, while skbs are not in cpu caches.

It is also a behavior difference with TSO packets, that can
break the BQL limit by a large amount.

Note that drivers should use __netdev_tx_sent_queue()
in order to have optimal xmit_more support, and avoid
useless atomic operations as shown in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 77d43ae2a7bbe1267f8430d5c35637d1984f463c..0ffcbdd55fa9ee545c807f2ed3fc178830e3075a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3272,7 +3272,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *first, struct net_device *de
 		}
 
 		skb = next;
-		if (netif_xmit_stopped(txq) && skb) {
+		if (netif_tx_queue_stopped(txq) && skb) {
 			rc = NETDEV_TX_BUSY;
 			break;
 		}
-- 
2.19.1.930.g4563a0d9d0-goog

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 net 3/3] net/mlx4_en: use __netdev_tx_sent_queue()
  2018-10-31 15:39 [PATCH v2 net 0/3] net: bql: better deal with GSO Eric Dumazet
  2018-10-31 15:39 ` [PATCH v2 net 1/3] net: bql: add __netdev_tx_sent_queue() Eric Dumazet
  2018-10-31 15:39 ` [PATCH v2 net 2/3] net: do not abort bulk send on BQL status Eric Dumazet
@ 2018-10-31 15:39 ` Eric Dumazet
  2018-11-01  9:19   ` Tariq Toukan
  2018-11-03 22:40 ` [PATCH v2 net 0/3] net: bql: better deal with GSO David Miller
  3 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2018-10-31 15:39 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Willem de Bruijn, Eric Dumazet, Eric Dumazet

doorbell only depends on xmit_more and netif_tx_queue_stopped()

Using __netdev_tx_sent_queue() avoids messing with BQL stop flag,
and is more generic.

This patch increases performance on GSO workload by keeping
doorbells to the minimum required.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 1857ee0f0871d48285a6d3711f7c3e9a1e08a05f..6f5153afcab4dfc331c099da854c54f1b9500887 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -1006,7 +1006,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		ring->packets++;
 	}
 	ring->bytes += tx_info->nr_bytes;
-	netdev_tx_sent_queue(ring->tx_queue, tx_info->nr_bytes);
 	AVG_PERF_COUNTER(priv->pstats.tx_pktsz_avg, skb->len);
 
 	if (tx_info->inl)
@@ -1044,7 +1043,10 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		netif_tx_stop_queue(ring->tx_queue);
 		ring->queue_stopped++;
 	}
-	send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
+
+	send_doorbell = __netdev_tx_sent_queue(ring->tx_queue,
+					       tx_info->nr_bytes,
+					       skb->xmit_more);
 
 	real_size = (real_size / 16) & 0x3f;
 
-- 
2.19.1.930.g4563a0d9d0-goog

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 3/3] net/mlx4_en: use __netdev_tx_sent_queue()
  2018-10-31 15:39 ` [PATCH v2 net 3/3] net/mlx4_en: use __netdev_tx_sent_queue() Eric Dumazet
@ 2018-11-01  9:19   ` Tariq Toukan
  0 siblings, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2018-11-01  9:19 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Tariq Toukan, Willem de Bruijn, Eric Dumazet



On 31/10/2018 5:39 PM, Eric Dumazet wrote:
> doorbell only depends on xmit_more and netif_tx_queue_stopped()
> 
> Using __netdev_tx_sent_queue() avoids messing with BQL stop flag,
> and is more generic.
> 
> This patch increases performance on GSO workload by keeping
> doorbells to the minimum required.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_tx.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 1857ee0f0871d48285a6d3711f7c3e9a1e08a05f..6f5153afcab4dfc331c099da854c54f1b9500887 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -1006,7 +1006,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>   		ring->packets++;
>   	}
>   	ring->bytes += tx_info->nr_bytes;
> -	netdev_tx_sent_queue(ring->tx_queue, tx_info->nr_bytes);
>   	AVG_PERF_COUNTER(priv->pstats.tx_pktsz_avg, skb->len);
>   
>   	if (tx_info->inl)
> @@ -1044,7 +1043,10 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>   		netif_tx_stop_queue(ring->tx_queue);
>   		ring->queue_stopped++;
>   	}
> -	send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
> +
> +	send_doorbell = __netdev_tx_sent_queue(ring->tx_queue,
> +					       tx_info->nr_bytes,
> +					       skb->xmit_more);
>   
>   	real_size = (real_size / 16) & 0x3f;
>   
> 

Reviewed-by: Tariq Toukan <tariqt@mellanox.com>

Looks good to me.
Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 0/3] net: bql: better deal with GSO
  2018-10-31 15:39 [PATCH v2 net 0/3] net: bql: better deal with GSO Eric Dumazet
                   ` (2 preceding siblings ...)
  2018-10-31 15:39 ` [PATCH v2 net 3/3] net/mlx4_en: use __netdev_tx_sent_queue() Eric Dumazet
@ 2018-11-03 22:40 ` David Miller
  2018-11-03 23:00   ` Eric Dumazet
  2018-11-06 18:23   ` Edward Cree
  3 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2018-11-03 22:40 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, tariqt, willemb, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 31 Oct 2018 08:39:11 -0700

> While BQL bulk dequeue works well for TSO packets, it is
> not very efficient as soon as GSO is involved.
> 
> On a GSO only workload (UDP or TCP), this patch series
> can save about 8 % of cpu cycles on a 40Gbit mlx4 NIC,
> by keeping optimal batching, and avoiding expensive
> doorbells, qdisc requeues and reschedules.
> 
> This patch series :
> 
> - Add __netdev_tx_sent_queue() so that drivers
>   can implement efficient BQL and xmit_more support.
> 
> - Implement a work around in dev_hard_start_xmit()
>   for drivers not using __netdev_tx_sent_queue()
> 
> - changes mlx4 to use __netdev_tx_sent_queue()
> 
> v2: Tariq and Willem feedback addressed.
>     added __netdev_tx_sent_queue() (Willem suggestion)

Series applied, but I wonder how many other commonly used drivers we
should update the same way mlx4 is here?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 0/3] net: bql: better deal with GSO
  2018-11-03 22:40 ` [PATCH v2 net 0/3] net: bql: better deal with GSO David Miller
@ 2018-11-03 23:00   ` Eric Dumazet
  2018-11-04  0:26     ` David Miller
  2018-11-06 18:23   ` Edward Cree
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2018-11-03 23:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tariq Toukan, Willem de Bruijn, Eric Dumazet

On Sat, Nov 3, 2018 at 3:40 PM David Miller <davem@davemloft.net> wrote:

> Series applied, but I wonder how many other commonly used drivers we
> should update the same way mlx4 is here?

I can provide patches but can not test them, so probably we should get
Tested-by tags before applying them.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 0/3] net: bql: better deal with GSO
  2018-11-03 23:00   ` Eric Dumazet
@ 2018-11-04  0:26     ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2018-11-04  0:26 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, tariqt, willemb, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Sat, 3 Nov 2018 16:00:39 -0700

> On Sat, Nov 3, 2018 at 3:40 PM David Miller <davem@davemloft.net> wrote:
> 
>> Series applied, but I wonder how many other commonly used drivers we
>> should update the same way mlx4 is here?
> 
> I can provide patches but can not test them, so probably we should get
> Tested-by tags before applying them.

Ok, agreed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 0/3] net: bql: better deal with GSO
  2018-11-03 22:40 ` [PATCH v2 net 0/3] net: bql: better deal with GSO David Miller
  2018-11-03 23:00   ` Eric Dumazet
@ 2018-11-06 18:23   ` Edward Cree
  2018-11-06 18:41     ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Edward Cree @ 2018-11-06 18:23 UTC (permalink / raw)
  To: David Miller; +Cc: edumazet, netdev, eric.dumazet

On 03/11/18 22:40, David Miller wrote:
> Series applied, but I wonder how many other commonly used drivers we
> should update the same way mlx4 is here?

Hi David,

I'm doing a patch to update sfc to use this new helper.  Is this 'net'
 material, or should I wait for 'net-next' to open back up?

-Ed

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 0/3] net: bql: better deal with GSO
  2018-11-06 18:23   ` Edward Cree
@ 2018-11-06 18:41     ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2018-11-06 18:41 UTC (permalink / raw)
  To: ecree; +Cc: edumazet, netdev, eric.dumazet

From: Edward Cree <ecree@solarflare.com>
Date: Tue, 6 Nov 2018 18:23:42 +0000

> I'm doing a patch to update sfc to use this new helper.  Is this 'net'
>  material, or should I wait for 'net-next' to open back up?

As the driver maintainer I guess I can lead that judgment up to you
at least to a certain extent.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-11-07  4:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-31 15:39 [PATCH v2 net 0/3] net: bql: better deal with GSO Eric Dumazet
2018-10-31 15:39 ` [PATCH v2 net 1/3] net: bql: add __netdev_tx_sent_queue() Eric Dumazet
2018-10-31 15:39 ` [PATCH v2 net 2/3] net: do not abort bulk send on BQL status Eric Dumazet
2018-10-31 15:39 ` [PATCH v2 net 3/3] net/mlx4_en: use __netdev_tx_sent_queue() Eric Dumazet
2018-11-01  9:19   ` Tariq Toukan
2018-11-03 22:40 ` [PATCH v2 net 0/3] net: bql: better deal with GSO David Miller
2018-11-03 23:00   ` Eric Dumazet
2018-11-04  0:26     ` David Miller
2018-11-06 18:23   ` Edward Cree
2018-11-06 18:41     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.