netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net: make MAX_SKB_FRAGS configurable
@ 2022-02-10 17:55 Eric Dumazet
  2022-02-10 17:55 ` [PATCH net-next 1/4] af_packet: do not assume MAX_SKB_FRAGS is unsigned long Eric Dumazet
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Eric Dumazet @ 2022-02-10 17:55 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Increasing MAX_SKB_FRAGS was a patch in BIG TCP v1 submission.

It appears this might take time to validate that all drivers
are ready for this change.

I have removed from BIG TCP series this patch, and made
a separate series.

MAX_SKB_FRAGS is now configurable (from 17 to 45),
and the default is 17.

Eric Dumazet (4):
  af_packet: do not assume MAX_SKB_FRAGS is unsigned long
  scsi: iscsi: do not assume MAX_SKB_FRAGS is unsigned long
  net: mvpp2: get rid of hard coded assumptions
  net: introduce a config option to tweak MAX_SKB_FRAGS

 drivers/net/ethernet/marvell/mvpp2/mvpp2.h |  6 +++---
 drivers/scsi/cxgbi/libcxgbi.h              |  2 +-
 include/linux/skbuff.h                     | 14 ++------------
 net/Kconfig                                | 12 ++++++++++++
 net/packet/af_packet.c                     |  4 ++--
 5 files changed, 20 insertions(+), 18 deletions(-)

-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/4] af_packet: do not assume MAX_SKB_FRAGS is unsigned long
  2022-02-10 17:55 [PATCH net-next 0/4] net: make MAX_SKB_FRAGS configurable Eric Dumazet
@ 2022-02-10 17:55 ` Eric Dumazet
  2022-02-10 17:55 ` [PATCH net-next 2/4] scsi: iscsi: " Eric Dumazet
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2022-02-10 17:55 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

MAX_SKB_FRAGS is small, there is no point forcing it to be
unsigned long.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ab87f22cc7ecde517ba4cd0b3804a28c3cccfc85..96ae70135bd94c4527e61621af35d6e8659ab9f7 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2606,8 +2606,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 		nr_frags = skb_shinfo(skb)->nr_frags;
 
 		if (unlikely(nr_frags >= MAX_SKB_FRAGS)) {
-			pr_err("Packet exceed the number of skb frags(%lu)\n",
-			       MAX_SKB_FRAGS);
+			pr_err("Packet exceed the number of skb frags(%d)\n",
+			       (int)MAX_SKB_FRAGS);
 			return -EFAULT;
 		}
 
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/4] scsi: iscsi: do not assume MAX_SKB_FRAGS is unsigned long
  2022-02-10 17:55 [PATCH net-next 0/4] net: make MAX_SKB_FRAGS configurable Eric Dumazet
  2022-02-10 17:55 ` [PATCH net-next 1/4] af_packet: do not assume MAX_SKB_FRAGS is unsigned long Eric Dumazet
@ 2022-02-10 17:55 ` Eric Dumazet
  2022-02-10 17:55 ` [PATCH net-next 3/4] net: mvpp2: get rid of hard coded assumptions Eric Dumazet
  2022-02-10 17:55 ` [PATCH net-next 4/4] net: introduce a config option to tweak MAX_SKB_FRAGS Eric Dumazet
  3 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2022-02-10 17:55 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

We want to add a CONFIG_MAX_SKB_FRAGS option in the future,
and this will change MAX_SKB_FRAGS to an int type.

cxgbi_sock_tx_queue_up() has one pr_err() using %lu to
print SKB_WR_LIST_SIZE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/scsi/cxgbi/libcxgbi.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h
index 3687b5c0cf905827aa4ba70fbc21d42c15e7b3f7..ccf7ea612e1281bd3a3deb9fe5246549e257cc0d 100644
--- a/drivers/scsi/cxgbi/libcxgbi.h
+++ b/drivers/scsi/cxgbi/libcxgbi.h
@@ -382,7 +382,7 @@ static inline struct sk_buff *alloc_wr(int wrlen, int dlen, gfp_t gfp)
  * length of the gather list represented by an skb into the # of necessary WRs.
  * The extra two fragments are for iscsi bhs and payload padding.
  */
-#define SKB_WR_LIST_SIZE	 (MAX_SKB_FRAGS + 2)
+#define SKB_WR_LIST_SIZE	 ((unsigned long)MAX_SKB_FRAGS + 2)
 
 static inline void cxgbi_sock_reset_wr_list(struct cxgbi_sock *csk)
 {
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 3/4] net: mvpp2: get rid of hard coded assumptions
  2022-02-10 17:55 [PATCH net-next 0/4] net: make MAX_SKB_FRAGS configurable Eric Dumazet
  2022-02-10 17:55 ` [PATCH net-next 1/4] af_packet: do not assume MAX_SKB_FRAGS is unsigned long Eric Dumazet
  2022-02-10 17:55 ` [PATCH net-next 2/4] scsi: iscsi: " Eric Dumazet
@ 2022-02-10 17:55 ` Eric Dumazet
  2022-02-10 17:55 ` [PATCH net-next 4/4] net: introduce a config option to tweak MAX_SKB_FRAGS Eric Dumazet
  3 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2022-02-10 17:55 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

A driver should not assume anything about sizeof(struct skb_shared_info),
or L1_CACHE_BYTES value.

Commit 704e624f7b3e ("net: mvvp2: fix short frame size on s390")
tried to fix this issue for s390, but it seems
MVPP2_BM_SHORT_FRAME_SIZE, MVPP2_BM_LONG_FRAME_SIZE and
MVPP2_BM_JUMBO_FRAME_SIZE should be precise.

We want to be able to tweak MAX_SKB_FRAGS in the future,
without breaking the build.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index ad73a488fc5fb6de4ddbf980355e31944b980e08..3dc0132a1fd569f7e75bfbef586c65163f0466c7 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -938,9 +938,9 @@ enum mvpp22_ptp_packet_format {
 #define MVPP2_BM_COOKIE_POOL_OFFS	8
 #define MVPP2_BM_COOKIE_CPU_OFFS	24
 
-#define MVPP2_BM_SHORT_FRAME_SIZE	736	/* frame size 128 */
-#define MVPP2_BM_LONG_FRAME_SIZE	2240	/* frame size 1664 */
-#define MVPP2_BM_JUMBO_FRAME_SIZE	10432	/* frame size 9856 */
+#define MVPP2_BM_SHORT_FRAME_SIZE	(128 + MVPP2_SKB_HEADROOM + MVPP2_SKB_SHINFO_SIZE)	/* frame size 128 */
+#define MVPP2_BM_LONG_FRAME_SIZE	(1664 + MVPP2_SKB_HEADROOM + MVPP2_SKB_SHINFO_SIZE)	/* frame size 1664 */
+#define MVPP2_BM_JUMBO_FRAME_SIZE	(9856 + MVPP2_SKB_HEADROOM + MVPP2_SKB_SHINFO_SIZE) /* frame size 9856 */
 /* BM short pool packet size
  * These value assure that for SWF the total number
  * of bytes allocated for each buffer will be 512
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 4/4] net: introduce a config option to tweak MAX_SKB_FRAGS
  2022-02-10 17:55 [PATCH net-next 0/4] net: make MAX_SKB_FRAGS configurable Eric Dumazet
                   ` (2 preceding siblings ...)
  2022-02-10 17:55 ` [PATCH net-next 3/4] net: mvpp2: get rid of hard coded assumptions Eric Dumazet
@ 2022-02-10 17:55 ` Eric Dumazet
  2022-02-11 22:16   ` Jakub Kicinski
  3 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2022-02-10 17:55 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Currently, MAX_SKB_FRAGS value is 17.

For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
attempts order-3 allocations, stuffing 32768 bytes per frag.

But with zero copy, we use order-0 pages.

For BIG TCP to show its full potential, we add a config option
to be able to fit up to 45 segments per skb.

This is also needed for BIG TCP rx zerocopy, as zerocopy currently
does not support skbs with frag list.

We have used MAX_SKB_FRAGS=45 value for years at Google before
we deployed 4K MTU, with no adverse effect.

Back then, goal was to be able to receive full size (64KB) GRO
packets without the frag_list overhead.

By default we keep the old/legacy value of 17 until we get
more coverage for the updated values.

Sizes of struct skb_shared_info on 64bit arches

MAX_SKB_FRAGS | sizeof(struct skb_shared_info)
----------------------------------------------
         17     320
         21     320+64  = 384
         25     320+128 = 448
         29     320+192 = 512
         33     320+256 = 576
         37     320+320 = 640
         41     320+384 = 704
         45     320+448 = 768

This inflation might cause problems for drivers assuming they could pack
both the incoming packet and skb_shared_info in half a page, using build_skb().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h | 14 ++------------
 net/Kconfig            | 12 ++++++++++++
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a5adbf6b51e86f955b7f4fcd4a65e38adce97601..6bba71532415019d33cd98e172b5469fa7a5c1bd 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -349,18 +349,8 @@ enum skb_drop_reason {
 	SKB_DROP_REASON_MAX,
 };
 
-/* To allow 64K frame to be packed as single skb without frag_list we
- * require 64K/PAGE_SIZE pages plus 1 additional page to allow for
- * buffers which do not start on a page boundary.
- *
- * Since GRO uses frags we allocate at least 16 regardless of page
- * size.
- */
-#if (65536/PAGE_SIZE + 1) < 16
-#define MAX_SKB_FRAGS 16UL
-#else
-#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
-#endif
+#define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
+
 extern int sysctl_max_skb_frags;
 
 /* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to
diff --git a/net/Kconfig b/net/Kconfig
index 8a1f9d0287de3c32040eee03b60114c6e6d150bc..7b96047911ee78bf61e9a290ad430261e4fc91c8 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -253,6 +253,18 @@ config PCPU_DEV_REFCNT
 	  network device refcount are using per cpu variables if this option is set.
 	  This can be forced to N to detect underflows (with a performance drop).
 
+config MAX_SKB_FRAGS
+	int "Maximum number of fragments per skb_shared_info"
+	range 17 45
+	default 17
+	help
+	  Having more fragments per skb_shared_info can help GRO efficiency.
+	  This helps BIG TCP workloads, but might expose bugs in some
+	  legacy drivers.
+	  This also increases memory overhead of small packets,
+	  and in drivers using build_skb().
+	  If unsure, say 17.
+
 config RPS
 	bool
 	depends on SMP && SYSFS
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 4/4] net: introduce a config option to tweak MAX_SKB_FRAGS
  2022-02-10 17:55 ` [PATCH net-next 4/4] net: introduce a config option to tweak MAX_SKB_FRAGS Eric Dumazet
@ 2022-02-11 22:16   ` Jakub Kicinski
  2022-02-12  2:41     ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-02-11 22:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet, Coco Li

On Thu, 10 Feb 2022 09:55:57 -0800 Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Currently, MAX_SKB_FRAGS value is 17.
> 
> For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
> attempts order-3 allocations, stuffing 32768 bytes per frag.
> 
> But with zero copy, we use order-0 pages.

If I read this right BIG TCP works but for zc cases, without this patch,
but there's little point to applying this patch without BIG TCP.

Shouldn't the BIG TCP work go in first and then we'll worry about how
many frags can each skb carry?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 4/4] net: introduce a config option to tweak MAX_SKB_FRAGS
  2022-02-11 22:16   ` Jakub Kicinski
@ 2022-02-12  2:41     ` Eric Dumazet
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2022-02-12  2:41 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Eric Dumazet, David S . Miller, netdev, Coco Li

On Fri, Feb 11, 2022 at 2:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 09:55:57 -0800 Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Currently, MAX_SKB_FRAGS value is 17.
> >
> > For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
> > attempts order-3 allocations, stuffing 32768 bytes per frag.
> >
> > But with zero copy, we use order-0 pages.
>
> If I read this right BIG TCP works but for zc cases, without this patch,
> but there's little point to applying this patch without BIG TCP.
>
> Shouldn't the BIG TCP work go in first and then we'll worry about how
> many frags can each skb carry?

This is orthogonal really.

My guess is that most people do not use TCP RX zerocopy, apart from Google ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-02-12  2:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10 17:55 [PATCH net-next 0/4] net: make MAX_SKB_FRAGS configurable Eric Dumazet
2022-02-10 17:55 ` [PATCH net-next 1/4] af_packet: do not assume MAX_SKB_FRAGS is unsigned long Eric Dumazet
2022-02-10 17:55 ` [PATCH net-next 2/4] scsi: iscsi: " Eric Dumazet
2022-02-10 17:55 ` [PATCH net-next 3/4] net: mvpp2: get rid of hard coded assumptions Eric Dumazet
2022-02-10 17:55 ` [PATCH net-next 4/4] net: introduce a config option to tweak MAX_SKB_FRAGS Eric Dumazet
2022-02-11 22:16   ` Jakub Kicinski
2022-02-12  2:41     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).