All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] mlx4: better BIG-TCP support
@ 2022-12-06  5:50 Eric Dumazet
  2022-12-06  5:50 ` [PATCH net-next 1/3] net/mlx4: rename two constants Eric Dumazet
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Eric Dumazet @ 2022-12-06  5:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Tariq Toukan, Wei Wang, netdev, eric.dumazet, Eric Dumazet

mlx4 uses a bounce buffer in TX whenever the tx descriptors
wrap around the right edge of the ring.

Size of this bounce buffer was hard coded and can be
increased if/when needed.

Eric Dumazet (3):
  net/mlx4: rename two constants
  net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS
  net/mlx4: small optimization in mlx4_en_xmit()

 drivers/net/ethernet/mellanox/mlx4/en_tx.c   | 18 ++++++++++--------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 18 +++++++++++++-----
 2 files changed, 23 insertions(+), 13 deletions(-)

-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/3] net/mlx4: rename two constants
  2022-12-06  5:50 [PATCH net-next 0/3] mlx4: better BIG-TCP support Eric Dumazet
@ 2022-12-06  5:50 ` Eric Dumazet
  2022-12-06  5:50 ` [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS Eric Dumazet
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2022-12-06  5:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Tariq Toukan, Wei Wang, netdev, eric.dumazet, Eric Dumazet

MAX_DESC_SIZE is really the size of the bounce buffer used
when reaching the right side of TX ring buffer.

MAX_DESC_TXBBS get a MLX4_ prefix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   | 10 ++++++----
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  4 ++--
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 43a4102e9c091758b33aa7377dcb82cab7c43a94..8372aeb392a28cf36a454e1b8a4783bc2b2056eb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -65,7 +65,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	ring->size = size;
 	ring->size_mask = size - 1;
 	ring->sp_stride = stride;
-	ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
+	ring->full_size = ring->size - HEADROOM - MLX4_MAX_DESC_TXBBS;
 
 	tmp = size * sizeof(struct mlx4_en_tx_info);
 	ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
@@ -77,9 +77,11 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
 		 ring->tx_info, tmp);
 
-	ring->bounce_buf = kmalloc_node(MAX_DESC_SIZE, GFP_KERNEL, node);
+	ring->bounce_buf = kmalloc_node(MLX4_TX_BOUNCE_BUFFER_SIZE,
+					GFP_KERNEL, node);
 	if (!ring->bounce_buf) {
-		ring->bounce_buf = kmalloc(MAX_DESC_SIZE, GFP_KERNEL);
+		ring->bounce_buf = kmalloc(MLX4_TX_BOUNCE_BUFFER_SIZE,
+					   GFP_KERNEL);
 		if (!ring->bounce_buf) {
 			err = -ENOMEM;
 			goto err_info;
@@ -909,7 +911,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* Align descriptor to TXBB size */
 	desc_size = ALIGN(real_size, TXBB_SIZE);
 	nr_txbb = desc_size >> LOG_TXBB_SIZE;
-	if (unlikely(nr_txbb > MAX_DESC_TXBBS)) {
+	if (unlikely(nr_txbb > MLX4_MAX_DESC_TXBBS)) {
 		if (netif_msg_tx_err(priv))
 			en_warn(priv, "Oversized header or SG list\n");
 		goto tx_drop_count;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index e132ff4c82f2d33045f6c9aeecaaa409a41e0b0d..7cc288db2a64f75ffe64882e3c25b90715e68855 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -90,8 +90,8 @@
 #define MLX4_EN_FILTER_EXPIRY_QUOTA 60
 
 /* Typical TSO descriptor with 16 gather entries is 352 bytes... */
-#define MAX_DESC_SIZE		512
-#define MAX_DESC_TXBBS		(MAX_DESC_SIZE / TXBB_SIZE)
+#define MLX4_TX_BOUNCE_BUFFER_SIZE 512
+#define MLX4_MAX_DESC_TXBBS	   (MLX4_TX_BOUNCE_BUFFER_SIZE / TXBB_SIZE)
 
 /*
  * OS related constants and tunables
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS
  2022-12-06  5:50 [PATCH net-next 0/3] mlx4: better BIG-TCP support Eric Dumazet
  2022-12-06  5:50 ` [PATCH net-next 1/3] net/mlx4: rename two constants Eric Dumazet
@ 2022-12-06  5:50 ` Eric Dumazet
  2022-12-07 12:40   ` Tariq Toukan
  2022-12-06  5:50 ` [PATCH net-next 3/3] net/mlx4: small optimization in mlx4_en_xmit() Eric Dumazet
  2022-12-07  7:49 ` [PATCH net-next 0/3] mlx4: better BIG-TCP support Leon Romanovsky
  3 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2022-12-06  5:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Tariq Toukan, Wei Wang, netdev, eric.dumazet, Eric Dumazet

Google production kernel has increased MAX_SKB_FRAGS to 45
for BIG-TCP rollout.

Unfortunately mlx4 TX bounce buffer is not big enough whenever
an skb has up to 45 page fragments.

This can happen often with TCP TX zero copy, as one frag usually
holds 4096 bytes of payload (order-0 page).

Tested:
 Kernel built with MAX_SKB_FRAGS=45
 ip link set dev eth0 gso_max_size 185000
 netperf -t TCP_SENDFILE

I made sure that "ethtool -G eth0 tx 64" was properly working,
ring->full_size being set to 16.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 7cc288db2a64f75ffe64882e3c25b90715e68855..120b8c361e91d443f83f100a1afabcabc776a92a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -89,8 +89,18 @@
 #define MLX4_EN_FILTER_HASH_SHIFT 4
 #define MLX4_EN_FILTER_EXPIRY_QUOTA 60
 
-/* Typical TSO descriptor with 16 gather entries is 352 bytes... */
-#define MLX4_TX_BOUNCE_BUFFER_SIZE 512
+#define CTRL_SIZE	sizeof(struct mlx4_wqe_ctrl_seg)
+#define DS_SIZE		sizeof(struct mlx4_wqe_data_seg)
+
+/* Maximal size of the bounce buffer:
+ * 256 bytes for LSO headers.
+ * CTRL_SIZE for control desc.
+ * DS_SIZE if skb->head contains some payload.
+ * MAX_SKB_FRAGS frags.
+ */
+#define MLX4_TX_BOUNCE_BUFFER_SIZE (256 + CTRL_SIZE + DS_SIZE +		\
+				    MAX_SKB_FRAGS * DS_SIZE)
+
 #define MLX4_MAX_DESC_TXBBS	   (MLX4_TX_BOUNCE_BUFFER_SIZE / TXBB_SIZE)
 
 /*
@@ -217,9 +227,7 @@ struct mlx4_en_tx_info {
 
 
 #define MLX4_EN_BIT_DESC_OWN	0x80000000
-#define CTRL_SIZE	sizeof(struct mlx4_wqe_ctrl_seg)
 #define MLX4_EN_MEMTYPE_PAD	0x100
-#define DS_SIZE		sizeof(struct mlx4_wqe_data_seg)
 
 
 struct mlx4_en_tx_desc {
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/3] net/mlx4: small optimization in mlx4_en_xmit()
  2022-12-06  5:50 [PATCH net-next 0/3] mlx4: better BIG-TCP support Eric Dumazet
  2022-12-06  5:50 ` [PATCH net-next 1/3] net/mlx4: rename two constants Eric Dumazet
  2022-12-06  5:50 ` [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS Eric Dumazet
@ 2022-12-06  5:50 ` Eric Dumazet
  2022-12-07  7:49 ` [PATCH net-next 0/3] mlx4: better BIG-TCP support Leon Romanovsky
  3 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2022-12-06  5:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Tariq Toukan, Wei Wang, netdev, eric.dumazet, Eric Dumazet

Test against MLX4_MAX_DESC_TXBBS only matters if the TX
bounce buffer is going to be used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Wei Wang <weiwan@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 8372aeb392a28cf36a454e1b8a4783bc2b2056eb..c5758637b7bed67021a9f3e9c5283033f68639a3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -911,11 +911,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* Align descriptor to TXBB size */
 	desc_size = ALIGN(real_size, TXBB_SIZE);
 	nr_txbb = desc_size >> LOG_TXBB_SIZE;
-	if (unlikely(nr_txbb > MLX4_MAX_DESC_TXBBS)) {
-		if (netif_msg_tx_err(priv))
-			en_warn(priv, "Oversized header or SG list\n");
-		goto tx_drop_count;
-	}
 
 	bf_ok = ring->bf_enabled;
 	if (skb_vlan_tag_present(skb)) {
@@ -943,6 +938,11 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (likely(index + nr_txbb <= ring->size))
 		tx_desc = ring->buf + (index << LOG_TXBB_SIZE);
 	else {
+		if (unlikely(nr_txbb > MLX4_MAX_DESC_TXBBS)) {
+			if (netif_msg_tx_err(priv))
+				en_warn(priv, "Oversized header or SG list\n");
+			goto tx_drop_count;
+		}
 		tx_desc = (struct mlx4_en_tx_desc *) ring->bounce_buf;
 		bounce = true;
 		bf_ok = false;
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/3] mlx4: better BIG-TCP support
  2022-12-06  5:50 [PATCH net-next 0/3] mlx4: better BIG-TCP support Eric Dumazet
                   ` (2 preceding siblings ...)
  2022-12-06  5:50 ` [PATCH net-next 3/3] net/mlx4: small optimization in mlx4_en_xmit() Eric Dumazet
@ 2022-12-07  7:49 ` Leon Romanovsky
  3 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2022-12-07  7:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Tariq Toukan,
	Wei Wang, netdev, eric.dumazet

On Tue, Dec 06, 2022 at 05:50:56AM +0000, Eric Dumazet wrote:
> mlx4 uses a bounce buffer in TX whenever the tx descriptors
> wrap around the right edge of the ring.
> 
> Size of this bounce buffer was hard coded and can be
> increased if/when needed.
> 
> Eric Dumazet (3):
>   net/mlx4: rename two constants
>   net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS
>   net/mlx4: small optimization in mlx4_en_xmit()
> 
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c   | 18 ++++++++++--------
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 18 +++++++++++++-----
>  2 files changed, 23 insertions(+), 13 deletions(-)
> 

Thanks,
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS
  2022-12-06  5:50 ` [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS Eric Dumazet
@ 2022-12-07 12:40   ` Tariq Toukan
  2022-12-07 12:53     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Tariq Toukan @ 2022-12-07 12:40 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Tariq Toukan, Wei Wang, netdev, eric.dumazet



On 12/6/2022 7:50 AM, Eric Dumazet wrote:
> Google production kernel has increased MAX_SKB_FRAGS to 45
> for BIG-TCP rollout.
> 
> Unfortunately mlx4 TX bounce buffer is not big enough whenever
> an skb has up to 45 page fragments.
> 
> This can happen often with TCP TX zero copy, as one frag usually
> holds 4096 bytes of payload (order-0 page).
> 
> Tested:
>   Kernel built with MAX_SKB_FRAGS=45
>   ip link set dev eth0 gso_max_size 185000
>   netperf -t TCP_SENDFILE
> 
> I made sure that "ethtool -G eth0 tx 64" was properly working,
> ring->full_size being set to 16.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Wei Wang <weiwan@google.com>
> Cc: Tariq Toukan <tariqt@nvidia.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 7cc288db2a64f75ffe64882e3c25b90715e68855..120b8c361e91d443f83f100a1afabcabc776a92a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -89,8 +89,18 @@
>   #define MLX4_EN_FILTER_HASH_SHIFT 4
>   #define MLX4_EN_FILTER_EXPIRY_QUOTA 60
>   
> -/* Typical TSO descriptor with 16 gather entries is 352 bytes... */
> -#define MLX4_TX_BOUNCE_BUFFER_SIZE 512
> +#define CTRL_SIZE	sizeof(struct mlx4_wqe_ctrl_seg)
> +#define DS_SIZE		sizeof(struct mlx4_wqe_data_seg)
> +
> +/* Maximal size of the bounce buffer:
> + * 256 bytes for LSO headers.
> + * CTRL_SIZE for control desc.
> + * DS_SIZE if skb->head contains some payload.
> + * MAX_SKB_FRAGS frags.
> + */
> +#define MLX4_TX_BOUNCE_BUFFER_SIZE (256 + CTRL_SIZE + DS_SIZE +		\
> +				    MAX_SKB_FRAGS * DS_SIZE)
> +
>   #define MLX4_MAX_DESC_TXBBS	   (MLX4_TX_BOUNCE_BUFFER_SIZE / TXBB_SIZE)
>  

Now as MLX4_TX_BOUNCE_BUFFER_SIZE might not be a multiple of TXBB_SIZE, 
simple integer division won't work to calculate the max num of TXBBs.
Roundup is needed.

>   /*
> @@ -217,9 +227,7 @@ struct mlx4_en_tx_info {
>   
>   
>   #define MLX4_EN_BIT_DESC_OWN	0x80000000
> -#define CTRL_SIZE	sizeof(struct mlx4_wqe_ctrl_seg)
>   #define MLX4_EN_MEMTYPE_PAD	0x100
> -#define DS_SIZE		sizeof(struct mlx4_wqe_data_seg)
>   
>   
>   struct mlx4_en_tx_desc {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS
  2022-12-07 12:40   ` Tariq Toukan
@ 2022-12-07 12:53     ` Eric Dumazet
  2022-12-07 13:06       ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2022-12-07 12:53 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Tariq Toukan,
	Wei Wang, netdev, eric.dumazet

On Wed, Dec 7, 2022 at 1:40 PM Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>
>
>
> On 12/6/2022 7:50 AM, Eric Dumazet wrote:
> > Google production kernel has increased MAX_SKB_FRAGS to 45
> > for BIG-TCP rollout.
> >
> > Unfortunately mlx4 TX bounce buffer is not big enough whenever
> > an skb has up to 45 page fragments.
> >
> > This can happen often with TCP TX zero copy, as one frag usually
> > holds 4096 bytes of payload (order-0 page).
> >
> > Tested:
> >   Kernel built with MAX_SKB_FRAGS=45
> >   ip link set dev eth0 gso_max_size 185000
> >   netperf -t TCP_SENDFILE
> >
> > I made sure that "ethtool -G eth0 tx 64" was properly working,
> > ring->full_size being set to 16.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Wei Wang <weiwan@google.com>
> > Cc: Tariq Toukan <tariqt@nvidia.com>
> > ---
> >   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 16 ++++++++++++----
> >   1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> > index 7cc288db2a64f75ffe64882e3c25b90715e68855..120b8c361e91d443f83f100a1afabcabc776a92a 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> > +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> > @@ -89,8 +89,18 @@
> >   #define MLX4_EN_FILTER_HASH_SHIFT 4
> >   #define MLX4_EN_FILTER_EXPIRY_QUOTA 60
> >
> > -/* Typical TSO descriptor with 16 gather entries is 352 bytes... */
> > -#define MLX4_TX_BOUNCE_BUFFER_SIZE 512
> > +#define CTRL_SIZE    sizeof(struct mlx4_wqe_ctrl_seg)
> > +#define DS_SIZE              sizeof(struct mlx4_wqe_data_seg)
> > +
> > +/* Maximal size of the bounce buffer:
> > + * 256 bytes for LSO headers.
> > + * CTRL_SIZE for control desc.
> > + * DS_SIZE if skb->head contains some payload.
> > + * MAX_SKB_FRAGS frags.
> > + */
> > +#define MLX4_TX_BOUNCE_BUFFER_SIZE (256 + CTRL_SIZE + DS_SIZE +              \
> > +                                 MAX_SKB_FRAGS * DS_SIZE)
> > +
> >   #define MLX4_MAX_DESC_TXBBS    (MLX4_TX_BOUNCE_BUFFER_SIZE / TXBB_SIZE)
> >
>
> Now as MLX4_TX_BOUNCE_BUFFER_SIZE might not be a multiple of TXBB_SIZE,
> simple integer division won't work to calculate the max num of TXBBs.
> Roundup is needed.

I do not see why a roundup is needed. This seems like obfuscation to me.

A divide by TXBB_SIZE always "works".

A round up is already done in mlx4_en_xmit()

/* Align descriptor to TXBB size */
desc_size = ALIGN(real_size, TXBB_SIZE);
nr_txbb = desc_size >> LOG_TXBB_SIZE;

Then the check is :

if (unlikely(nr_txbb > MLX4_MAX_DESC_TXBBS)) {
   if (netif_msg_tx_err(priv))
       en_warn(priv, "Oversized header or SG list\n");
   goto tx_drop_count;
}

If we allocate X extra bytes (in case MLX4_TX_BOUNCE_BUFFER_SIZE %
TXBB_SIZE == X),
we are not going to use them anyway.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS
  2022-12-07 12:53     ` Eric Dumazet
@ 2022-12-07 13:06       ` Eric Dumazet
  2022-12-07 15:14         ` Tariq Toukan
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2022-12-07 13:06 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Tariq Toukan,
	Wei Wang, netdev, eric.dumazet

On Wed, Dec 7, 2022 at 1:53 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Dec 7, 2022 at 1:40 PM Tariq Toukan <ttoukan.linux@gmail.com> wrote:
> >
> >
> >
> > On 12/6/2022 7:50 AM, Eric Dumazet wrote:
> > > Google production kernel has increased MAX_SKB_FRAGS to 45
> > > for BIG-TCP rollout.
> > >
> > > Unfortunately mlx4 TX bounce buffer is not big enough whenever
> > > an skb has up to 45 page fragments.
> > >
> > > This can happen often with TCP TX zero copy, as one frag usually
> > > holds 4096 bytes of payload (order-0 page).
> > >
> > > Tested:
> > >   Kernel built with MAX_SKB_FRAGS=45
> > >   ip link set dev eth0 gso_max_size 185000
> > >   netperf -t TCP_SENDFILE
> > >
> > > I made sure that "ethtool -G eth0 tx 64" was properly working,
> > > ring->full_size being set to 16.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Reported-by: Wei Wang <weiwan@google.com>
> > > Cc: Tariq Toukan <tariqt@nvidia.com>
> > > ---
> > >   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 16 ++++++++++++----
> > >   1 file changed, 12 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> > > index 7cc288db2a64f75ffe64882e3c25b90715e68855..120b8c361e91d443f83f100a1afabcabc776a92a 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> > > +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> > > @@ -89,8 +89,18 @@
> > >   #define MLX4_EN_FILTER_HASH_SHIFT 4
> > >   #define MLX4_EN_FILTER_EXPIRY_QUOTA 60
> > >
> > > -/* Typical TSO descriptor with 16 gather entries is 352 bytes... */
> > > -#define MLX4_TX_BOUNCE_BUFFER_SIZE 512
> > > +#define CTRL_SIZE    sizeof(struct mlx4_wqe_ctrl_seg)
> > > +#define DS_SIZE              sizeof(struct mlx4_wqe_data_seg)
> > > +
> > > +/* Maximal size of the bounce buffer:
> > > + * 256 bytes for LSO headers.
> > > + * CTRL_SIZE for control desc.
> > > + * DS_SIZE if skb->head contains some payload.
> > > + * MAX_SKB_FRAGS frags.
> > > + */
> > > +#define MLX4_TX_BOUNCE_BUFFER_SIZE (256 + CTRL_SIZE + DS_SIZE +              \
> > > +                                 MAX_SKB_FRAGS * DS_SIZE)
> > > +
> > >   #define MLX4_MAX_DESC_TXBBS    (MLX4_TX_BOUNCE_BUFFER_SIZE / TXBB_SIZE)
> > >
> >
> > Now as MLX4_TX_BOUNCE_BUFFER_SIZE might not be a multiple of TXBB_SIZE,
> > simple integer division won't work to calculate the max num of TXBBs.
> > Roundup is needed.
>
> I do not see why a roundup is needed. This seems like obfuscation to me.
>
> A divide by TXBB_SIZE always "works".
>
> A round up is already done in mlx4_en_xmit()
>
> /* Align descriptor to TXBB size */
> desc_size = ALIGN(real_size, TXBB_SIZE);
> nr_txbb = desc_size >> LOG_TXBB_SIZE;
>
> Then the check is :
>
> if (unlikely(nr_txbb > MLX4_MAX_DESC_TXBBS)) {
>    if (netif_msg_tx_err(priv))
>        en_warn(priv, "Oversized header or SG list\n");
>    goto tx_drop_count;
> }
>
> If we allocate X extra bytes (in case MLX4_TX_BOUNCE_BUFFER_SIZE %
> TXBB_SIZE == X),
> we are not going to use them anyway.

I guess you are worried about not having exactly 256 bytes for the headers ?

Currently, the amount of space for headers is  208 bytes.

If MAX_SKB_FRAGS is 17,  MLX4_TX_BOUNCE_BUFFER_SIZE would be 0x230
after my patch,
so the same usable space as before the patch.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS
  2022-12-07 13:06       ` Eric Dumazet
@ 2022-12-07 15:14         ` Tariq Toukan
  2022-12-07 15:41           ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Tariq Toukan @ 2022-12-07 15:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Tariq Toukan,
	Wei Wang, netdev, eric.dumazet



On 12/7/2022 3:06 PM, Eric Dumazet wrote:
> On Wed, Dec 7, 2022 at 1:53 PM Eric Dumazet <edumazet@google.com> wrote:
>>
>> On Wed, Dec 7, 2022 at 1:40 PM Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>>>
>>>
>>>
>>> On 12/6/2022 7:50 AM, Eric Dumazet wrote:
>>>> Google production kernel has increased MAX_SKB_FRAGS to 45
>>>> for BIG-TCP rollout.
>>>>
>>>> Unfortunately mlx4 TX bounce buffer is not big enough whenever
>>>> an skb has up to 45 page fragments.
>>>>
>>>> This can happen often with TCP TX zero copy, as one frag usually
>>>> holds 4096 bytes of payload (order-0 page).
>>>>
>>>> Tested:
>>>>    Kernel built with MAX_SKB_FRAGS=45
>>>>    ip link set dev eth0 gso_max_size 185000
>>>>    netperf -t TCP_SENDFILE
>>>>
>>>> I made sure that "ethtool -G eth0 tx 64" was properly working,
>>>> ring->full_size being set to 16.
>>>>
>>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>> Reported-by: Wei Wang <weiwan@google.com>
>>>> Cc: Tariq Toukan <tariqt@nvidia.com>
>>>> ---
>>>>    drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 16 ++++++++++++----
>>>>    1 file changed, 12 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
>>>> index 7cc288db2a64f75ffe64882e3c25b90715e68855..120b8c361e91d443f83f100a1afabcabc776a92a 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
>>>> @@ -89,8 +89,18 @@
>>>>    #define MLX4_EN_FILTER_HASH_SHIFT 4
>>>>    #define MLX4_EN_FILTER_EXPIRY_QUOTA 60
>>>>
>>>> -/* Typical TSO descriptor with 16 gather entries is 352 bytes... */
>>>> -#define MLX4_TX_BOUNCE_BUFFER_SIZE 512
>>>> +#define CTRL_SIZE    sizeof(struct mlx4_wqe_ctrl_seg)
>>>> +#define DS_SIZE              sizeof(struct mlx4_wqe_data_seg)
>>>> +
>>>> +/* Maximal size of the bounce buffer:
>>>> + * 256 bytes for LSO headers.
>>>> + * CTRL_SIZE for control desc.
>>>> + * DS_SIZE if skb->head contains some payload.
>>>> + * MAX_SKB_FRAGS frags.
>>>> + */
>>>> +#define MLX4_TX_BOUNCE_BUFFER_SIZE (256 + CTRL_SIZE + DS_SIZE +              \
>>>> +                                 MAX_SKB_FRAGS * DS_SIZE)
>>>> +
>>>>    #define MLX4_MAX_DESC_TXBBS    (MLX4_TX_BOUNCE_BUFFER_SIZE / TXBB_SIZE)
>>>>
>>>
>>> Now as MLX4_TX_BOUNCE_BUFFER_SIZE might not be a multiple of TXBB_SIZE,
>>> simple integer division won't work to calculate the max num of TXBBs.
>>> Roundup is needed.
>>
>> I do not see why a roundup is needed. This seems like obfuscation to me.
>>
>> A divide by TXBB_SIZE always "works".
>>
>> A round up is already done in mlx4_en_xmit()
>>
>> /* Align descriptor to TXBB size */
>> desc_size = ALIGN(real_size, TXBB_SIZE);
>> nr_txbb = desc_size >> LOG_TXBB_SIZE;
>>
>> Then the check is :
>>
>> if (unlikely(nr_txbb > MLX4_MAX_DESC_TXBBS)) {
>>     if (netif_msg_tx_err(priv))
>>         en_warn(priv, "Oversized header or SG list\n");
>>     goto tx_drop_count;
>> }
>>
>> If we allocate X extra bytes (in case MLX4_TX_BOUNCE_BUFFER_SIZE %
>> TXBB_SIZE == X),
>> we are not going to use them anyway.

Now the MLX4_MAX_DESC_TXBBS gives a stricter limit than the allocated 
size MLX4_TX_BOUNCE_BUFFER_SIZE.

> 
> I guess you are worried about not having exactly 256 bytes for the headers ?
> 
> Currently, the amount of space for headers is  208 bytes.
> 
> If MAX_SKB_FRAGS is 17,  MLX4_TX_BOUNCE_BUFFER_SIZE would be 0x230
> after my patch,
> so the same usable space as before the patch.

So what you're saying is, if all the elements of 
MLX4_TX_BOUNCE_BUFFER_SIZE co-exist together for a TX descriptor, then 
the actual "headers" part can go only up to 208 (similar to today), not 
the whole 256 (as the new define documentation says).

This keeps the current behavior, but makes the code a bit more confusing.

IMO it is cleaner to have MLX4_TX_BOUNCE_BUFFER_SIZE explicitly defined 
as a multiple of TXBB_SIZE in the first place. This way, both the 
allocation size and the desc size limit will be in perfect sync, without 
having assumptions on the amount X lost in the division.

How about the below, to keep today's values for the defines?

#define MLX4_TX_BOUNCE_BUFFER_SIZE \
	ALIGN(208 + CTRL_SIZE + DS_SIZE + \
	      MAX_SKB_FRAGS * DS_SIZE, TXBB_SIZE)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS
  2022-12-07 15:14         ` Tariq Toukan
@ 2022-12-07 15:41           ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2022-12-07 15:41 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Tariq Toukan,
	Wei Wang, netdev, eric.dumazet

On Wed, Dec 7, 2022 at 4:14 PM Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>

> So what you're saying is, if all the elements of
> MLX4_TX_BOUNCE_BUFFER_SIZE co-exist together for a TX descriptor, then
> the actual "headers" part can go only up to 208 (similar to today), not
> the whole 256 (as the new define documentation says).
>
> This keeps the current behavior, but makes the code a bit more confusing.
>
> IMO it is cleaner to have MLX4_TX_BOUNCE_BUFFER_SIZE explicitly defined
> as a multiple of TXBB_SIZE in the first place. This way, both the
> allocation size and the desc size limit will be in perfect sync, without
> having assumptions on the amount X lost in the division.
>
> How about the below, to keep today's values for the defines?
>
> #define MLX4_TX_BOUNCE_BUFFER_SIZE \
>         ALIGN(208 + CTRL_SIZE + DS_SIZE + \
>               MAX_SKB_FRAGS * DS_SIZE, TXBB_SIZE)

I already sent a v2, with:

+#define MLX4_TX_BOUNCE_BUFFER_SIZE \
+       ALIGN(256 + CTRL_SIZE + DS_SIZE + MAX_SKB_FRAGS * DS_SIZE, TXBB_SIZE)
+

Please take a look, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-07 15:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-06  5:50 [PATCH net-next 0/3] mlx4: better BIG-TCP support Eric Dumazet
2022-12-06  5:50 ` [PATCH net-next 1/3] net/mlx4: rename two constants Eric Dumazet
2022-12-06  5:50 ` [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS Eric Dumazet
2022-12-07 12:40   ` Tariq Toukan
2022-12-07 12:53     ` Eric Dumazet
2022-12-07 13:06       ` Eric Dumazet
2022-12-07 15:14         ` Tariq Toukan
2022-12-07 15:41           ` Eric Dumazet
2022-12-06  5:50 ` [PATCH net-next 3/3] net/mlx4: small optimization in mlx4_en_xmit() Eric Dumazet
2022-12-07  7:49 ` [PATCH net-next 0/3] mlx4: better BIG-TCP support Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.