netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
@ 2019-10-09 22:06 Josh Hunt
  2019-10-09 22:06 ` [PATCH 1/3] igb: Add " Josh Hunt
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Josh Hunt @ 2019-10-09 22:06 UTC (permalink / raw)
  To: netdev, willemb, intel-wired-lan; +Cc: Josh Hunt

Alexander Duyck posted a series in 2018 proposing adding UDP segmentation
offload support to ixgbe and ixgbevf, but those patches were never
accepted:

https://lore.kernel.org/netdev/20180504003556.4769.11407.stgit@localhost.localdomain/

This series is a repost of his ixgbe patch along with a similar
change to the igb and i40e drivers. Testing using the udpgso_bench_tx
benchmark shows a noticeable performance improvement with these changes
applied.

All #s below were run with:
udpgso_bench_tx -C 1 -4 -D 172.25.43.133 -z -l 30 -u -S 0 -s $pkt_size

igb::

SW GSO (ethtool -K eth0 tx-udp-segmentation off):
$pkt_size	kB/s(sar)	MB/s	Calls/s	Msg/s	CPU	MB2CPU
========================================================================
1472		120143.64	113	81263	81263	83.55	1.35
2944		120160.09	114	40638	40638	62.88	1.81
5888		120160.64	114	20319	20319	43.59	2.61
11776		120160.76	114	10160	10160	37.52	3.03
23552		120159.25	114	5080	5080	34.75	3.28
47104		120160.55	114	2540	2540	32.83	3.47
61824		120160.56	114	1935	1935	32.09	3.55

HW GSO offload (ethtool -K eth0 tx-udp-segmentation on):
$pkt_size	kB/s(sar)	MB/s	Calls/s	Msg/s	CPU	MB2CPU
========================================================================
1472		120144.65	113	81264	81264	83.03	1.36
2944		120161.56	114	40638	40638	41	2.78
5888		120160.23	114	20319	20319	23.76	4.79
11776		120161.16	114	10160	10160	15.82	7.20
23552		120156.45	114	5079	5079	12.8	8.90
47104		120159.33	114	2540	2540	8.82	12.92
61824		120158.43	114	1935	1935	8.24	13.83

ixgbe::
SW GSO:
$pkt_size	kB/s(sar)	MB/s	Calls/s	Msg/s	CPU	MB2CPU
========================================================================
1472		1070565.90	1015	724112	724112	100	10.15
2944		1201579.19	1140	406342	406342	95.69	11.91
5888		1201217.55	1140	203185	203185	55.38	20.58
11776		1201613.49	1140	101588	101588	42.15	27.04
23552		1201631.32	1140	50795	50795	35.97	31.69
47104		1201626.38	1140	25397	25397	33.51	34.01
61824		1201625.52	1140	19350	19350	32.83	34.72

HW GSO Offload:
$pkt_size	kB/s(sar)	MB/s	Calls/s	Msg/s	CPU	MB2CPU
========================================================================
1472		1058681.25	1004	715954	715954	100	10.04
2944		1201730.86	1134	404254	404254	61.28	18.50
5888		1201776.61	1131	201608	201608	30.25	37.38
11776		1201795.90	1130	100676	100676	16.63	67.94
23552		1201807.90	1129	50304	50304	10.07	112.11
47104		1201748.35	1128	25143	25143	6.8	165.88
61824		1200770.45	1128	19140	19140	5.38	209.66

i40e::
SW GSO:
$pkt_size	kB/s(sar)	MB/s	Calls/s	Msg/s	CPU	MB2CPU
========================================================================
1472		650122.83	616	439362	439362	100	6.16
2944		943993.53	895	319042	319042	100	8.95
5888		1199751.90	1138	202857	202857	82.51	13.79
11776		1200288.08	1139	101477	101477	64.34	17.70
23552		1201596.56	1140	50793	50793	59.74	19.08
47104		1201597.98	1140	25396	25396	56.31	20.24
61824		1201610.43	1140	19350	19350	55.48	20.54

HW GSO offload:
$pkt_size	kB/s(sar)	MB/s	Calls/s	Msg/s	CPU	MB2CPU
========================================================================
1472		657424.83	623	444653	444653	100	6.23
2944		1201242.87	1139	406226	406226	91.45	12.45
5888		1201739.95	1140	203199	203199	57.46	19.83
11776		1201557.36	1140	101584	101584	36.83	30.95
23552		1201525.17	1140	50790	50790	23.86	47.77
47104		1201514.54	1140	25394	25394	17.45	65.32
61824		1201478.91	1140	19348	19348	14.79	77.07

I was not sure how to proper attribute Alexander on the ixgbe patch so
please adjust this as necessary.

Thanks!

Josh Hunt (3):
  igb: Add UDP segmentation offload support
  ixgbe: Add UDP segmentation offload support
  i40e: Add UDP segmentation offload support

 drivers/net/ethernet/intel/i40e/i40e_main.c   |  1 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 12 +++++++++---
 drivers/net/ethernet/intel/igb/e1000_82575.h  |  1 +
 drivers/net/ethernet/intel/igb/igb_main.c     | 23 +++++++++++++++++------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 +++++++++++++++++++-----
 5 files changed, 47 insertions(+), 14 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] igb: Add UDP segmentation offload support
  2019-10-09 22:06 [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support Josh Hunt
@ 2019-10-09 22:06 ` Josh Hunt
  2019-10-09 22:06 ` [PATCH 2/3] ixgbe: " Josh Hunt
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Josh Hunt @ 2019-10-09 22:06 UTC (permalink / raw)
  To: netdev, willemb, intel-wired-lan; +Cc: Josh Hunt, Alexander Duyck

Based on a series from Alexander Duyck this change adds UDP segmentation
offload support to the igb driver.

CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Josh Hunt <johunt@akamai.com>
---
 drivers/net/ethernet/intel/igb/e1000_82575.h |  1 +
 drivers/net/ethernet/intel/igb/igb_main.c    | 23 +++++++++++++++++------
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.h b/drivers/net/ethernet/intel/igb/e1000_82575.h
index 6ad775b1a4c5..63ec253ac788 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.h
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.h
@@ -127,6 +127,7 @@ struct e1000_adv_tx_context_desc {
 };
 
 #define E1000_ADVTXD_MACLEN_SHIFT    9  /* Adv ctxt desc mac len shift */
+#define E1000_ADVTXD_TUCMD_L4T_UDP 0x00000000  /* L4 Packet TYPE of UDP */
 #define E1000_ADVTXD_TUCMD_IPV4    0x00000400  /* IP Packet Type: 1=IPv4 */
 #define E1000_ADVTXD_TUCMD_L4T_TCP 0x00000800  /* L4 Packet TYPE of TCP */
 #define E1000_ADVTXD_TUCMD_L4T_SCTP 0x00001000 /* L4 packet TYPE of SCTP */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 105b0624081a..5eabfac5a18d 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2516,6 +2516,7 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev,
 	if (unlikely(mac_hdr_len > IGB_MAX_MAC_HDR_LEN))
 		return features & ~(NETIF_F_HW_CSUM |
 				    NETIF_F_SCTP_CRC |
+				    NETIF_F_GSO_UDP_L4 |
 				    NETIF_F_HW_VLAN_CTAG_TX |
 				    NETIF_F_TSO |
 				    NETIF_F_TSO6);
@@ -2524,6 +2525,7 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev,
 	if (unlikely(network_hdr_len >  IGB_MAX_NETWORK_HDR_LEN))
 		return features & ~(NETIF_F_HW_CSUM |
 				    NETIF_F_SCTP_CRC |
+				    NETIF_F_GSO_UDP_L4 |
 				    NETIF_F_TSO |
 				    NETIF_F_TSO6);
 
@@ -3120,7 +3122,7 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 			    NETIF_F_HW_CSUM;
 
 	if (hw->mac.type >= e1000_82576)
-		netdev->features |= NETIF_F_SCTP_CRC;
+		netdev->features |= NETIF_F_SCTP_CRC | NETIF_F_GSO_UDP_L4;
 
 	if (hw->mac.type >= e1000_i350)
 		netdev->features |= NETIF_F_HW_TC;
@@ -5694,6 +5696,7 @@ static int igb_tso(struct igb_ring *tx_ring,
 	} ip;
 	union {
 		struct tcphdr *tcp;
+		struct udphdr *udp;
 		unsigned char *hdr;
 	} l4;
 	u32 paylen, l4_offset;
@@ -5713,7 +5716,8 @@ static int igb_tso(struct igb_ring *tx_ring,
 	l4.hdr = skb_checksum_start(skb);
 
 	/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
-	type_tucmd = E1000_ADVTXD_TUCMD_L4T_TCP;
+	type_tucmd = (skb->csum_offset == offsetof(struct tcphdr, check)) ?
+		      E1000_ADVTXD_TUCMD_L4T_TCP : E1000_ADVTXD_TUCMD_L4T_UDP;
 
 	/* initialize outer IP header fields */
 	if (ip.v4->version == 4) {
@@ -5741,12 +5745,19 @@ static int igb_tso(struct igb_ring *tx_ring,
 	/* determine offset of inner transport header */
 	l4_offset = l4.hdr - skb->data;
 
-	/* compute length of segmentation header */
-	*hdr_len = (l4.tcp->doff * 4) + l4_offset;
-
 	/* remove payload length from inner checksum */
 	paylen = skb->len - l4_offset;
-	csum_replace_by_diff(&l4.tcp->check, htonl(paylen));
+	if (type_tucmd & E1000_ADVTXD_TUCMD_L4T_TCP) {
+		/* compute length of segmentation header */
+		*hdr_len = (l4.tcp->doff * 4) + l4_offset;
+		csum_replace_by_diff(&l4.tcp->check,
+			(__force __wsum)htonl(paylen));
+	} else {
+		/* compute length of segmentation header */
+		*hdr_len = sizeof(*l4.udp) + l4_offset;
+		csum_replace_by_diff(&l4.udp->check,
+				     (__force __wsum)htonl(paylen));
+	}
 
 	/* update gso size and bytecount with header size */
 	first->gso_segs = skb_shinfo(skb)->gso_segs;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] ixgbe: Add UDP segmentation offload support
  2019-10-09 22:06 [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support Josh Hunt
  2019-10-09 22:06 ` [PATCH 1/3] igb: Add " Josh Hunt
@ 2019-10-09 22:06 ` Josh Hunt
  2019-10-10  1:06   ` Josh Hunt
  2019-10-09 22:06 ` [PATCH 3/3] i40e: " Josh Hunt
  2019-10-09 22:44 ` [PATCH 0/3] igb, ixgbe, i40e " Alexander Duyck
  3 siblings, 1 reply; 14+ messages in thread
From: Josh Hunt @ 2019-10-09 22:06 UTC (permalink / raw)
  To: netdev, willemb, intel-wired-lan; +Cc: Josh Hunt, Alexander Duyck

Repost from a series by Alexander Duyck to add UDP segmentation offload
support to the igb driver:
https://lore.kernel.org/netdev/20180504003916.4769.66271.stgit@localhost.localdomain/

CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Josh Hunt <johunt@akamai.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1ce2397306b9..2b01d264e5ce 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7946,6 +7946,7 @@ static int ixgbe_tso(struct ixgbe_ring *tx_ring,
 	} ip;
 	union {
 		struct tcphdr *tcp;
+		struct udphdr *udp;
 		unsigned char *hdr;
 	} l4;
 	u32 paylen, l4_offset;
@@ -7969,6 +7970,9 @@ static int ixgbe_tso(struct ixgbe_ring *tx_ring,
 	l4.hdr = skb_checksum_start(skb);
 
 	/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
+	type_tucmd = (skb->csum_offset == offsetof(struct tcphdr, check)) ?
+		      IXGBE_ADVTXD_TUCMD_L4T_TCP : IXGBE_ADVTXD_TUCMD_L4T_UDP;
+
 	type_tucmd = IXGBE_ADVTXD_TUCMD_L4T_TCP;
 
 	/* initialize outer IP header fields */
@@ -7999,12 +8003,20 @@ static int ixgbe_tso(struct ixgbe_ring *tx_ring,
 	/* determine offset of inner transport header */
 	l4_offset = l4.hdr - skb->data;
 
-	/* compute length of segmentation header */
-	*hdr_len = (l4.tcp->doff * 4) + l4_offset;
-
 	/* remove payload length from inner checksum */
 	paylen = skb->len - l4_offset;
-	csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
+
+	if (type_tucmd & IXGBE_ADVTXD_TUCMD_L4T_TCP) {
+		/* compute length of segmentation header */
+		*hdr_len = (l4.tcp->doff * 4) + l4_offset;
+		csum_replace_by_diff(&l4.tcp->check,
+				     (__force __wsum)htonl(paylen));
+	} else {
+		/* compute length of segmentation header */
+		*hdr_len = sizeof(*l4.udp) + l4_offset;
+		csum_replace_by_diff(&l4.udp->check,
+				     (__force __wsum)htonl(paylen));
+	}
 
 	/* update gso size and bytecount with header size */
 	first->gso_segs = skb_shinfo(skb)->gso_segs;
@@ -10190,6 +10202,7 @@ ixgbe_features_check(struct sk_buff *skb, struct net_device *dev,
 	if (unlikely(mac_hdr_len > IXGBE_MAX_MAC_HDR_LEN))
 		return features & ~(NETIF_F_HW_CSUM |
 				    NETIF_F_SCTP_CRC |
+				    NETIF_F_GSO_UDP_L4 |
 				    NETIF_F_HW_VLAN_CTAG_TX |
 				    NETIF_F_TSO |
 				    NETIF_F_TSO6);
@@ -10198,6 +10211,7 @@ ixgbe_features_check(struct sk_buff *skb, struct net_device *dev,
 	if (unlikely(network_hdr_len >  IXGBE_MAX_NETWORK_HDR_LEN))
 		return features & ~(NETIF_F_HW_CSUM |
 				    NETIF_F_SCTP_CRC |
+				    NETIF_F_GSO_UDP_L4 |
 				    NETIF_F_TSO |
 				    NETIF_F_TSO6);
 
@@ -10907,7 +10921,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 			    IXGBE_GSO_PARTIAL_FEATURES;
 
 	if (hw->mac.type >= ixgbe_mac_82599EB)
-		netdev->features |= NETIF_F_SCTP_CRC;
+		netdev->features |= NETIF_F_SCTP_CRC | NETIF_F_GSO_UDP_L4;
 
 #ifdef CONFIG_IXGBE_IPSEC
 #define IXGBE_ESP_FEATURES	(NETIF_F_HW_ESP | \
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] i40e: Add UDP segmentation offload support
  2019-10-09 22:06 [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support Josh Hunt
  2019-10-09 22:06 ` [PATCH 1/3] igb: Add " Josh Hunt
  2019-10-09 22:06 ` [PATCH 2/3] ixgbe: " Josh Hunt
@ 2019-10-09 22:06 ` Josh Hunt
  2019-10-10  0:39   ` Samudrala, Sridhar
  2019-10-09 22:44 ` [PATCH 0/3] igb, ixgbe, i40e " Alexander Duyck
  3 siblings, 1 reply; 14+ messages in thread
From: Josh Hunt @ 2019-10-09 22:06 UTC (permalink / raw)
  To: netdev, willemb, intel-wired-lan; +Cc: Josh Hunt, Alexander Duyck

Based on a series from Alexander Duyck this change adds UDP segmentation
offload support to the i40e driver.

CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Josh Hunt <johunt@akamai.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c |  1 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 12 +++++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6031223eafab..56f8c52cbba1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12911,6 +12911,7 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 			  NETIF_F_GSO_IPXIP6		|
 			  NETIF_F_GSO_UDP_TUNNEL	|
 			  NETIF_F_GSO_UDP_TUNNEL_CSUM	|
+			  NETIF_F_GSO_UDP_L4		|
 			  NETIF_F_SCTP_CRC		|
 			  NETIF_F_RXHASH		|
 			  NETIF_F_RXCSUM		|
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e3f29dc8b290..0b32f04a6255 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2960,10 +2960,16 @@ static int i40e_tso(struct i40e_tx_buffer *first, u8 *hdr_len,
 
 	/* remove payload length from inner checksum */
 	paylen = skb->len - l4_offset;
-	csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
 
-	/* compute length of segmentation header */
-	*hdr_len = (l4.tcp->doff * 4) + l4_offset;
+	if (skb->csum_offset == offsetof(struct tcphdr, check)) {
+		csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
+		/* compute length of segmentation header */
+		*hdr_len = (l4.tcp->doff * 4) + l4_offset;
+	} else {
+		csum_replace_by_diff(&l4.udp->check, (__force __wsum)htonl(paylen));
+		/* compute length of segmentation header */
+		*hdr_len = sizeof(*l4.udp) + l4_offset;
+	}
 
 	/* pull values out of skb_shinfo */
 	gso_size = skb_shinfo(skb)->gso_size;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
  2019-10-09 22:06 [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support Josh Hunt
                   ` (2 preceding siblings ...)
  2019-10-09 22:06 ` [PATCH 3/3] i40e: " Josh Hunt
@ 2019-10-09 22:44 ` Alexander Duyck
  2019-10-10 21:17   ` Josh Hunt
  3 siblings, 1 reply; 14+ messages in thread
From: Alexander Duyck @ 2019-10-09 22:44 UTC (permalink / raw)
  To: Josh Hunt; +Cc: Netdev, Willem de Bruijn, intel-wired-lan

On Wed, Oct 9, 2019 at 3:08 PM Josh Hunt <johunt@akamai.com> wrote:
>
> Alexander Duyck posted a series in 2018 proposing adding UDP segmentation
> offload support to ixgbe and ixgbevf, but those patches were never
> accepted:
>
> https://lore.kernel.org/netdev/20180504003556.4769.11407.stgit@localhost.localdomain/
>
> This series is a repost of his ixgbe patch along with a similar
> change to the igb and i40e drivers. Testing using the udpgso_bench_tx
> benchmark shows a noticeable performance improvement with these changes
> applied.
>
> All #s below were run with:
> udpgso_bench_tx -C 1 -4 -D 172.25.43.133 -z -l 30 -u -S 0 -s $pkt_size
>
> igb::
>
> SW GSO (ethtool -K eth0 tx-udp-segmentation off):
> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> ========================================================================
> 1472            120143.64       113     81263   81263   83.55   1.35
> 2944            120160.09       114     40638   40638   62.88   1.81
> 5888            120160.64       114     20319   20319   43.59   2.61
> 11776           120160.76       114     10160   10160   37.52   3.03
> 23552           120159.25       114     5080    5080    34.75   3.28
> 47104           120160.55       114     2540    2540    32.83   3.47
> 61824           120160.56       114     1935    1935    32.09   3.55
>
> HW GSO offload (ethtool -K eth0 tx-udp-segmentation on):
> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> ========================================================================
> 1472            120144.65       113     81264   81264   83.03   1.36
> 2944            120161.56       114     40638   40638   41      2.78
> 5888            120160.23       114     20319   20319   23.76   4.79
> 11776           120161.16       114     10160   10160   15.82   7.20
> 23552           120156.45       114     5079    5079    12.8    8.90
> 47104           120159.33       114     2540    2540    8.82    12.92
> 61824           120158.43       114     1935    1935    8.24    13.83
>
> ixgbe::
> SW GSO:
> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> ========================================================================
> 1472            1070565.90      1015    724112  724112  100     10.15
> 2944            1201579.19      1140    406342  406342  95.69   11.91
> 5888            1201217.55      1140    203185  203185  55.38   20.58
> 11776           1201613.49      1140    101588  101588  42.15   27.04
> 23552           1201631.32      1140    50795   50795   35.97   31.69
> 47104           1201626.38      1140    25397   25397   33.51   34.01
> 61824           1201625.52      1140    19350   19350   32.83   34.72
>
> HW GSO Offload:
> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> ========================================================================
> 1472            1058681.25      1004    715954  715954  100     10.04
> 2944            1201730.86      1134    404254  404254  61.28   18.50
> 5888            1201776.61      1131    201608  201608  30.25   37.38
> 11776           1201795.90      1130    100676  100676  16.63   67.94
> 23552           1201807.90      1129    50304   50304   10.07   112.11
> 47104           1201748.35      1128    25143   25143   6.8     165.88
> 61824           1200770.45      1128    19140   19140   5.38    209.66
>
> i40e::
> SW GSO:
> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> ========================================================================
> 1472            650122.83       616     439362  439362  100     6.16
> 2944            943993.53       895     319042  319042  100     8.95
> 5888            1199751.90      1138    202857  202857  82.51   13.79
> 11776           1200288.08      1139    101477  101477  64.34   17.70
> 23552           1201596.56      1140    50793   50793   59.74   19.08
> 47104           1201597.98      1140    25396   25396   56.31   20.24
> 61824           1201610.43      1140    19350   19350   55.48   20.54
>
> HW GSO offload:
> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> ========================================================================
> 1472            657424.83       623     444653  444653  100     6.23
> 2944            1201242.87      1139    406226  406226  91.45   12.45
> 5888            1201739.95      1140    203199  203199  57.46   19.83
> 11776           1201557.36      1140    101584  101584  36.83   30.95
> 23552           1201525.17      1140    50790   50790   23.86   47.77
> 47104           1201514.54      1140    25394   25394   17.45   65.32
> 61824           1201478.91      1140    19348   19348   14.79   77.07
>
> I was not sure how to proper attribute Alexander on the ixgbe patch so
> please adjust this as necessary.

For the ixgbe patch I would be good with:
Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>

The big hurdle for this will be validation. I know that there are some
parts such as the 82598 in the case of the ixgbe driver or 82575 in
the case of igb that didn't support the feature, and I wasn't sure
about the parts supported by i40e either.  From what I can tell the
x710 datasheet seems to indicate that it is supported, and you were
able to get it working with your patch based on the numbers above. So
that just leaves validation of the x722 and making sure there isn't
anything firmware-wise on the i40e parts that may cause any issues.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] i40e: Add UDP segmentation offload support
  2019-10-09 22:06 ` [PATCH 3/3] i40e: " Josh Hunt
@ 2019-10-10  0:39   ` Samudrala, Sridhar
  2019-10-10  0:54     ` Josh Hunt
  0 siblings, 1 reply; 14+ messages in thread
From: Samudrala, Sridhar @ 2019-10-10  0:39 UTC (permalink / raw)
  To: Josh Hunt, netdev, willemb, intel-wired-lan; +Cc: Alexander Duyck



On 10/9/2019 3:06 PM, Josh Hunt wrote:
> Based on a series from Alexander Duyck this change adds UDP segmentation
> offload support to the i40e driver.
> 
> CC: Alexander Duyck <alexander.h.duyck@intel.com>
> CC: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Josh Hunt <johunt@akamai.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_main.c |  1 +
>   drivers/net/ethernet/intel/i40e/i40e_txrx.c | 12 +++++++++---
>   2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 6031223eafab..56f8c52cbba1 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -12911,6 +12911,7 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
>   			  NETIF_F_GSO_IPXIP6		|
>   			  NETIF_F_GSO_UDP_TUNNEL	|
>   			  NETIF_F_GSO_UDP_TUNNEL_CSUM	|
> +			  NETIF_F_GSO_UDP_L4		|
>   			  NETIF_F_SCTP_CRC		|
>   			  NETIF_F_RXHASH		|
>   			  NETIF_F_RXCSUM		|
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index e3f29dc8b290..0b32f04a6255 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -2960,10 +2960,16 @@ static int i40e_tso(struct i40e_tx_buffer *first, u8 *hdr_len,
>   
>   	/* remove payload length from inner checksum */
>   	paylen = skb->len - l4_offset;
> -	csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
>   
> -	/* compute length of segmentation header */
> -	*hdr_len = (l4.tcp->doff * 4) + l4_offset;
> +	if (skb->csum_offset == offsetof(struct tcphdr, check)) {

Isn't it more relevant to check for gso_type rather than base this on 
the csum_offset?


> +		csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
> +		/* compute length of segmentation header */
> +		*hdr_len = (l4.tcp->doff * 4) + l4_offset;
> +	} else {
> +		csum_replace_by_diff(&l4.udp->check, (__force __wsum)htonl(paylen));
> +		/* compute length of segmentation header */
> +		*hdr_len = sizeof(*l4.udp) + l4_offset;
> +	}
>   
>   	/* pull values out of skb_shinfo */
>   	gso_size = skb_shinfo(skb)->gso_size;
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] i40e: Add UDP segmentation offload support
  2019-10-10  0:39   ` Samudrala, Sridhar
@ 2019-10-10  0:54     ` Josh Hunt
  0 siblings, 0 replies; 14+ messages in thread
From: Josh Hunt @ 2019-10-10  0:54 UTC (permalink / raw)
  To: Samudrala, Sridhar, netdev, willemb, intel-wired-lan; +Cc: Alexander Duyck

On 10/9/19 5:39 PM, Samudrala, Sridhar wrote:
> 
> 
> On 10/9/2019 3:06 PM, Josh Hunt wrote:
>> Based on a series from Alexander Duyck this change adds UDP segmentation
>> offload support to the i40e driver.
>>
>> CC: Alexander Duyck <alexander.h.duyck@intel.com>
>> CC: Willem de Bruijn <willemb@google.com>
>> Signed-off-by: Josh Hunt <johunt@akamai.com>
>> ---
>>   drivers/net/ethernet/intel/i40e/i40e_main.c |  1 +
>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c | 12 +++++++++---
>>   2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
>> b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> index 6031223eafab..56f8c52cbba1 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> @@ -12911,6 +12911,7 @@ static int i40e_config_netdev(struct i40e_vsi 
>> *vsi)
>>                 NETIF_F_GSO_IPXIP6        |
>>                 NETIF_F_GSO_UDP_TUNNEL    |
>>                 NETIF_F_GSO_UDP_TUNNEL_CSUM    |
>> +              NETIF_F_GSO_UDP_L4        |
>>                 NETIF_F_SCTP_CRC        |
>>                 NETIF_F_RXHASH        |
>>                 NETIF_F_RXCSUM        |
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
>> b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> index e3f29dc8b290..0b32f04a6255 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> @@ -2960,10 +2960,16 @@ static int i40e_tso(struct i40e_tx_buffer 
>> *first, u8 *hdr_len,
>>       /* remove payload length from inner checksum */
>>       paylen = skb->len - l4_offset;
>> -    csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
>> -    /* compute length of segmentation header */
>> -    *hdr_len = (l4.tcp->doff * 4) + l4_offset;
>> +    if (skb->csum_offset == offsetof(struct tcphdr, check)) {
> 
> Isn't it more relevant to check for gso_type rather than base this on 
> the csum_offset?
Thanks Sridhar for the review. Yeah I think you're right. I will change 
this on all 3 patches.

Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] ixgbe: Add UDP segmentation offload support
  2019-10-09 22:06 ` [PATCH 2/3] ixgbe: " Josh Hunt
@ 2019-10-10  1:06   ` Josh Hunt
  0 siblings, 0 replies; 14+ messages in thread
From: Josh Hunt @ 2019-10-10  1:06 UTC (permalink / raw)
  To: netdev, willemb, intel-wired-lan; +Cc: Alexander Duyck

On 10/9/19 3:06 PM, Josh Hunt wrote:
> Repost from a series by Alexander Duyck to add UDP segmentation offload
> support to the igb driver:
> https://lore.kernel.org/netdev/20180504003916.4769.66271.stgit@localhost.localdomain/
> 
> CC: Alexander Duyck <alexander.h.duyck@intel.com>
> CC: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Josh Hunt <johunt@akamai.com>
> ---
>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 +++++++++++++++++++-----
>   1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 1ce2397306b9..2b01d264e5ce 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -7946,6 +7946,7 @@ static int ixgbe_tso(struct ixgbe_ring *tx_ring,
>   	} ip;
>   	union {
>   		struct tcphdr *tcp;
> +		struct udphdr *udp;
>   		unsigned char *hdr;
>   	} l4;
>   	u32 paylen, l4_offset;
> @@ -7969,6 +7970,9 @@ static int ixgbe_tso(struct ixgbe_ring *tx_ring,
>   	l4.hdr = skb_checksum_start(skb);
>   
>   	/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
> +	type_tucmd = (skb->csum_offset == offsetof(struct tcphdr, check)) ?
> +		      IXGBE_ADVTXD_TUCMD_L4T_TCP : IXGBE_ADVTXD_TUCMD_L4T_UDP;
> +
>   	type_tucmd = IXGBE_ADVTXD_TUCMD_L4T_TCP;

Copy/paste bug ^^. Will fix in v2.

Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
  2019-10-09 22:44 ` [PATCH 0/3] igb, ixgbe, i40e " Alexander Duyck
@ 2019-10-10 21:17   ` Josh Hunt
  2019-10-10 21:32     ` Alexander Duyck
  0 siblings, 1 reply; 14+ messages in thread
From: Josh Hunt @ 2019-10-10 21:17 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Netdev, Willem de Bruijn, intel-wired-lan

On 10/9/19 3:44 PM, Alexander Duyck wrote:
> On Wed, Oct 9, 2019 at 3:08 PM Josh Hunt <johunt@akamai.com> wrote:
>>
>> Alexander Duyck posted a series in 2018 proposing adding UDP segmentation
>> offload support to ixgbe and ixgbevf, but those patches were never
>> accepted:
>>
>> https://lore.kernel.org/netdev/20180504003556.4769.11407.stgit@localhost.localdomain/
>>
>> This series is a repost of his ixgbe patch along with a similar
>> change to the igb and i40e drivers. Testing using the udpgso_bench_tx
>> benchmark shows a noticeable performance improvement with these changes
>> applied.
>>
>> All #s below were run with:
>> udpgso_bench_tx -C 1 -4 -D 172.25.43.133 -z -l 30 -u -S 0 -s $pkt_size
>>
>> igb::
>>
>> SW GSO (ethtool -K eth0 tx-udp-segmentation off):
>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>> ========================================================================
>> 1472            120143.64       113     81263   81263   83.55   1.35
>> 2944            120160.09       114     40638   40638   62.88   1.81
>> 5888            120160.64       114     20319   20319   43.59   2.61
>> 11776           120160.76       114     10160   10160   37.52   3.03
>> 23552           120159.25       114     5080    5080    34.75   3.28
>> 47104           120160.55       114     2540    2540    32.83   3.47
>> 61824           120160.56       114     1935    1935    32.09   3.55
>>
>> HW GSO offload (ethtool -K eth0 tx-udp-segmentation on):
>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>> ========================================================================
>> 1472            120144.65       113     81264   81264   83.03   1.36
>> 2944            120161.56       114     40638   40638   41      2.78
>> 5888            120160.23       114     20319   20319   23.76   4.79
>> 11776           120161.16       114     10160   10160   15.82   7.20
>> 23552           120156.45       114     5079    5079    12.8    8.90
>> 47104           120159.33       114     2540    2540    8.82    12.92
>> 61824           120158.43       114     1935    1935    8.24    13.83
>>
>> ixgbe::
>> SW GSO:
>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>> ========================================================================
>> 1472            1070565.90      1015    724112  724112  100     10.15
>> 2944            1201579.19      1140    406342  406342  95.69   11.91
>> 5888            1201217.55      1140    203185  203185  55.38   20.58
>> 11776           1201613.49      1140    101588  101588  42.15   27.04
>> 23552           1201631.32      1140    50795   50795   35.97   31.69
>> 47104           1201626.38      1140    25397   25397   33.51   34.01
>> 61824           1201625.52      1140    19350   19350   32.83   34.72
>>
>> HW GSO Offload:
>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>> ========================================================================
>> 1472            1058681.25      1004    715954  715954  100     10.04
>> 2944            1201730.86      1134    404254  404254  61.28   18.50
>> 5888            1201776.61      1131    201608  201608  30.25   37.38
>> 11776           1201795.90      1130    100676  100676  16.63   67.94
>> 23552           1201807.90      1129    50304   50304   10.07   112.11
>> 47104           1201748.35      1128    25143   25143   6.8     165.88
>> 61824           1200770.45      1128    19140   19140   5.38    209.66
>>
>> i40e::
>> SW GSO:
>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>> ========================================================================
>> 1472            650122.83       616     439362  439362  100     6.16
>> 2944            943993.53       895     319042  319042  100     8.95
>> 5888            1199751.90      1138    202857  202857  82.51   13.79
>> 11776           1200288.08      1139    101477  101477  64.34   17.70
>> 23552           1201596.56      1140    50793   50793   59.74   19.08
>> 47104           1201597.98      1140    25396   25396   56.31   20.24
>> 61824           1201610.43      1140    19350   19350   55.48   20.54
>>
>> HW GSO offload:
>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>> ========================================================================
>> 1472            657424.83       623     444653  444653  100     6.23
>> 2944            1201242.87      1139    406226  406226  91.45   12.45
>> 5888            1201739.95      1140    203199  203199  57.46   19.83
>> 11776           1201557.36      1140    101584  101584  36.83   30.95
>> 23552           1201525.17      1140    50790   50790   23.86   47.77
>> 47104           1201514.54      1140    25394   25394   17.45   65.32
>> 61824           1201478.91      1140    19348   19348   14.79   77.07
>>
>> I was not sure how to proper attribute Alexander on the ixgbe patch so
>> please adjust this as necessary.
> 
> For the ixgbe patch I would be good with:
> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> The big hurdle for this will be validation. I know that there are some
> parts such as the 82598 in the case of the ixgbe driver or 82575 in
> the case of igb that didn't support the feature, and I wasn't sure
> about the parts supported by i40e either.  From what I can tell the
> x710 datasheet seems to indicate that it is supported, and you were
> able to get it working with your patch based on the numbers above. So
> that just leaves validation of the x722 and making sure there isn't
> anything firmware-wise on the i40e parts that may cause any issues.

Thanks for feedback Alex.

For validation, I will look around and see if we have any of the above 
chips in our testbeds. The above #s are from i210, 82599ES, and x710 
respectively. I'm happy to share my wrapper script for the gso selftest 
if others have the missing chipsets and can verify.

Thanks!
Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
  2019-10-10 21:17   ` Josh Hunt
@ 2019-10-10 21:32     ` Alexander Duyck
  2019-10-11  0:07       ` Josh Hunt
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Duyck @ 2019-10-10 21:32 UTC (permalink / raw)
  To: Josh Hunt; +Cc: Netdev, Willem de Bruijn, intel-wired-lan, Brown, Aaron F

On Thu, Oct 10, 2019 at 2:17 PM Josh Hunt <johunt@akamai.com> wrote:
>
> On 10/9/19 3:44 PM, Alexander Duyck wrote:
> > On Wed, Oct 9, 2019 at 3:08 PM Josh Hunt <johunt@akamai.com> wrote:
> >>
> >> Alexander Duyck posted a series in 2018 proposing adding UDP segmentation
> >> offload support to ixgbe and ixgbevf, but those patches were never
> >> accepted:
> >>
> >> https://lore.kernel.org/netdev/20180504003556.4769.11407.stgit@localhost.localdomain/
> >>
> >> This series is a repost of his ixgbe patch along with a similar
> >> change to the igb and i40e drivers. Testing using the udpgso_bench_tx
> >> benchmark shows a noticeable performance improvement with these changes
> >> applied.
> >>
> >> All #s below were run with:
> >> udpgso_bench_tx -C 1 -4 -D 172.25.43.133 -z -l 30 -u -S 0 -s $pkt_size
> >>
> >> igb::
> >>
> >> SW GSO (ethtool -K eth0 tx-udp-segmentation off):
> >> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >> ========================================================================
> >> 1472            120143.64       113     81263   81263   83.55   1.35
> >> 2944            120160.09       114     40638   40638   62.88   1.81
> >> 5888            120160.64       114     20319   20319   43.59   2.61
> >> 11776           120160.76       114     10160   10160   37.52   3.03
> >> 23552           120159.25       114     5080    5080    34.75   3.28
> >> 47104           120160.55       114     2540    2540    32.83   3.47
> >> 61824           120160.56       114     1935    1935    32.09   3.55
> >>
> >> HW GSO offload (ethtool -K eth0 tx-udp-segmentation on):
> >> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >> ========================================================================
> >> 1472            120144.65       113     81264   81264   83.03   1.36
> >> 2944            120161.56       114     40638   40638   41      2.78
> >> 5888            120160.23       114     20319   20319   23.76   4.79
> >> 11776           120161.16       114     10160   10160   15.82   7.20
> >> 23552           120156.45       114     5079    5079    12.8    8.90
> >> 47104           120159.33       114     2540    2540    8.82    12.92
> >> 61824           120158.43       114     1935    1935    8.24    13.83
> >>
> >> ixgbe::
> >> SW GSO:
> >> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >> ========================================================================
> >> 1472            1070565.90      1015    724112  724112  100     10.15
> >> 2944            1201579.19      1140    406342  406342  95.69   11.91
> >> 5888            1201217.55      1140    203185  203185  55.38   20.58
> >> 11776           1201613.49      1140    101588  101588  42.15   27.04
> >> 23552           1201631.32      1140    50795   50795   35.97   31.69
> >> 47104           1201626.38      1140    25397   25397   33.51   34.01
> >> 61824           1201625.52      1140    19350   19350   32.83   34.72
> >>
> >> HW GSO Offload:
> >> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >> ========================================================================
> >> 1472            1058681.25      1004    715954  715954  100     10.04
> >> 2944            1201730.86      1134    404254  404254  61.28   18.50
> >> 5888            1201776.61      1131    201608  201608  30.25   37.38
> >> 11776           1201795.90      1130    100676  100676  16.63   67.94
> >> 23552           1201807.90      1129    50304   50304   10.07   112.11
> >> 47104           1201748.35      1128    25143   25143   6.8     165.88
> >> 61824           1200770.45      1128    19140   19140   5.38    209.66
> >>
> >> i40e::
> >> SW GSO:
> >> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >> ========================================================================
> >> 1472            650122.83       616     439362  439362  100     6.16
> >> 2944            943993.53       895     319042  319042  100     8.95
> >> 5888            1199751.90      1138    202857  202857  82.51   13.79
> >> 11776           1200288.08      1139    101477  101477  64.34   17.70
> >> 23552           1201596.56      1140    50793   50793   59.74   19.08
> >> 47104           1201597.98      1140    25396   25396   56.31   20.24
> >> 61824           1201610.43      1140    19350   19350   55.48   20.54
> >>
> >> HW GSO offload:
> >> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >> ========================================================================
> >> 1472            657424.83       623     444653  444653  100     6.23
> >> 2944            1201242.87      1139    406226  406226  91.45   12.45
> >> 5888            1201739.95      1140    203199  203199  57.46   19.83
> >> 11776           1201557.36      1140    101584  101584  36.83   30.95
> >> 23552           1201525.17      1140    50790   50790   23.86   47.77
> >> 47104           1201514.54      1140    25394   25394   17.45   65.32
> >> 61824           1201478.91      1140    19348   19348   14.79   77.07
> >>
> >> I was not sure how to proper attribute Alexander on the ixgbe patch so
> >> please adjust this as necessary.
> >
> > For the ixgbe patch I would be good with:
> > Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > The big hurdle for this will be validation. I know that there are some
> > parts such as the 82598 in the case of the ixgbe driver or 82575 in
> > the case of igb that didn't support the feature, and I wasn't sure
> > about the parts supported by i40e either.  From what I can tell the
> > x710 datasheet seems to indicate that it is supported, and you were
> > able to get it working with your patch based on the numbers above. So
> > that just leaves validation of the x722 and making sure there isn't
> > anything firmware-wise on the i40e parts that may cause any issues.
>
> Thanks for feedback Alex.
>
> For validation, I will look around and see if we have any of the above
> chips in our testbeds. The above #s are from i210, 82599ES, and x710
> respectively. I'm happy to share my wrapper script for the gso selftest
> if others have the missing chipsets and can verify.
>
> Thanks!
> Josh

If you could share your test scripts that would be great. I believe
the networking division will have access to more hardware so if you
could include Aaron, who I added to the Cc, in your reply with the
script that would be great as I am sure he can forward it on to
whoever ends up having to ultimately test this patch set.

I'll keep an eye out for v2 of your patch set and review it when it is
available.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
  2019-10-10 21:32     ` Alexander Duyck
@ 2019-10-11  0:07       ` Josh Hunt
  2019-10-11  0:21         ` Brown, Aaron F
  0 siblings, 1 reply; 14+ messages in thread
From: Josh Hunt @ 2019-10-11  0:07 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Netdev, Willem de Bruijn, intel-wired-lan, Brown, Aaron F

[-- Attachment #1: Type: text/plain, Size: 7727 bytes --]

On 10/10/19 2:32 PM, Alexander Duyck wrote:
> On Thu, Oct 10, 2019 at 2:17 PM Josh Hunt <johunt@akamai.com> wrote:
>>
>> On 10/9/19 3:44 PM, Alexander Duyck wrote:
>>> On Wed, Oct 9, 2019 at 3:08 PM Josh Hunt <johunt@akamai.com> wrote:
>>>>
>>>> Alexander Duyck posted a series in 2018 proposing adding UDP segmentation
>>>> offload support to ixgbe and ixgbevf, but those patches were never
>>>> accepted:
>>>>
>>>> https://lore.kernel.org/netdev/20180504003556.4769.11407.stgit@localhost.localdomain/
>>>>
>>>> This series is a repost of his ixgbe patch along with a similar
>>>> change to the igb and i40e drivers. Testing using the udpgso_bench_tx
>>>> benchmark shows a noticeable performance improvement with these changes
>>>> applied.
>>>>
>>>> All #s below were run with:
>>>> udpgso_bench_tx -C 1 -4 -D 172.25.43.133 -z -l 30 -u -S 0 -s $pkt_size
>>>>
>>>> igb::
>>>>
>>>> SW GSO (ethtool -K eth0 tx-udp-segmentation off):
>>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>>>> ========================================================================
>>>> 1472            120143.64       113     81263   81263   83.55   1.35
>>>> 2944            120160.09       114     40638   40638   62.88   1.81
>>>> 5888            120160.64       114     20319   20319   43.59   2.61
>>>> 11776           120160.76       114     10160   10160   37.52   3.03
>>>> 23552           120159.25       114     5080    5080    34.75   3.28
>>>> 47104           120160.55       114     2540    2540    32.83   3.47
>>>> 61824           120160.56       114     1935    1935    32.09   3.55
>>>>
>>>> HW GSO offload (ethtool -K eth0 tx-udp-segmentation on):
>>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>>>> ========================================================================
>>>> 1472            120144.65       113     81264   81264   83.03   1.36
>>>> 2944            120161.56       114     40638   40638   41      2.78
>>>> 5888            120160.23       114     20319   20319   23.76   4.79
>>>> 11776           120161.16       114     10160   10160   15.82   7.20
>>>> 23552           120156.45       114     5079    5079    12.8    8.90
>>>> 47104           120159.33       114     2540    2540    8.82    12.92
>>>> 61824           120158.43       114     1935    1935    8.24    13.83
>>>>
>>>> ixgbe::
>>>> SW GSO:
>>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>>>> ========================================================================
>>>> 1472            1070565.90      1015    724112  724112  100     10.15
>>>> 2944            1201579.19      1140    406342  406342  95.69   11.91
>>>> 5888            1201217.55      1140    203185  203185  55.38   20.58
>>>> 11776           1201613.49      1140    101588  101588  42.15   27.04
>>>> 23552           1201631.32      1140    50795   50795   35.97   31.69
>>>> 47104           1201626.38      1140    25397   25397   33.51   34.01
>>>> 61824           1201625.52      1140    19350   19350   32.83   34.72
>>>>
>>>> HW GSO Offload:
>>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>>>> ========================================================================
>>>> 1472            1058681.25      1004    715954  715954  100     10.04
>>>> 2944            1201730.86      1134    404254  404254  61.28   18.50
>>>> 5888            1201776.61      1131    201608  201608  30.25   37.38
>>>> 11776           1201795.90      1130    100676  100676  16.63   67.94
>>>> 23552           1201807.90      1129    50304   50304   10.07   112.11
>>>> 47104           1201748.35      1128    25143   25143   6.8     165.88
>>>> 61824           1200770.45      1128    19140   19140   5.38    209.66
>>>>
>>>> i40e::
>>>> SW GSO:
>>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>>>> ========================================================================
>>>> 1472            650122.83       616     439362  439362  100     6.16
>>>> 2944            943993.53       895     319042  319042  100     8.95
>>>> 5888            1199751.90      1138    202857  202857  82.51   13.79
>>>> 11776           1200288.08      1139    101477  101477  64.34   17.70
>>>> 23552           1201596.56      1140    50793   50793   59.74   19.08
>>>> 47104           1201597.98      1140    25396   25396   56.31   20.24
>>>> 61824           1201610.43      1140    19350   19350   55.48   20.54
>>>>
>>>> HW GSO offload:
>>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
>>>> ========================================================================
>>>> 1472            657424.83       623     444653  444653  100     6.23
>>>> 2944            1201242.87      1139    406226  406226  91.45   12.45
>>>> 5888            1201739.95      1140    203199  203199  57.46   19.83
>>>> 11776           1201557.36      1140    101584  101584  36.83   30.95
>>>> 23552           1201525.17      1140    50790   50790   23.86   47.77
>>>> 47104           1201514.54      1140    25394   25394   17.45   65.32
>>>> 61824           1201478.91      1140    19348   19348   14.79   77.07
>>>>
>>>> I was not sure how to proper attribute Alexander on the ixgbe patch so
>>>> please adjust this as necessary.
>>>
>>> For the ixgbe patch I would be good with:
>>> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>>
>>> The big hurdle for this will be validation. I know that there are some
>>> parts such as the 82598 in the case of the ixgbe driver or 82575 in
>>> the case of igb that didn't support the feature, and I wasn't sure
>>> about the parts supported by i40e either.  From what I can tell the
>>> x710 datasheet seems to indicate that it is supported, and you were
>>> able to get it working with your patch based on the numbers above. So
>>> that just leaves validation of the x722 and making sure there isn't
>>> anything firmware-wise on the i40e parts that may cause any issues.
>>
>> Thanks for feedback Alex.
>>
>> For validation, I will look around and see if we have any of the above
>> chips in our testbeds. The above #s are from i210, 82599ES, and x710
>> respectively. I'm happy to share my wrapper script for the gso selftest
>> if others have the missing chipsets and can verify.
>>
>> Thanks!
>> Josh
> 
> If you could share your test scripts that would be great. I believe
> the networking division will have access to more hardware so if you
> could include Aaron, who I added to the Cc, in your reply with the
> script that would be great as I am sure he can forward it on to
> whoever ends up having to ultimately test this patch set.
> 
> I'll keep an eye out for v2 of your patch set and review it when it is
> available.
> 
> Thanks.
> 
> - Alex
> 

I've attached my benchmark wrapper script udpgso_bench.sh. To run it 
you'll need to copy it, udpgso_bench_rx, and udpgso_bench_tx (built from 
kernel's selftests dir) to your DUT. It also requires a remote sink 
machine able to receive traffic on UDP 8000 (or some configured port.)
The script will copy over and start the sink process (udpgso_bench_rx) 
on the remote box.

Here's some info on how to run it:

Usage: ./udpgso_bench.sh <interface name> <remote v4 IP> [extra 
benchmark options]

Example usage:
# ./udpgso_bench.sh eth0 172.25.43.133 -u

Beware it will make some configuration changes to your local machine. It 
will overwrite:
  * /proc/sys/net/core/{optmem_max,wmem_max,wmem_default}
  * qdisc setup for <int>
  * IRQ affinity and XPS configuration for <int>

Please let me know if you hit any problems with the script. It 
originally had some akamai-specific items in it, but I (hopefully) have 
removed them all.

Josh

[-- Attachment #2: udpgso_bench.sh --]
[-- Type: application/x-shellscript, Size: 6544 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
  2019-10-11  0:07       ` Josh Hunt
@ 2019-10-11  0:21         ` Brown, Aaron F
  2019-10-11  0:27           ` Josh Hunt
  0 siblings, 1 reply; 14+ messages in thread
From: Brown, Aaron F @ 2019-10-11  0:21 UTC (permalink / raw)
  To: Josh Hunt, Alexander Duyck, Bowers, AndrewX
  Cc: Netdev, Willem de Bruijn, intel-wired-lan

Adding Andrew as he is most likely going to be testing this patch.

Unfortunately my mail server flags attached scripts as potential threats and strips them out.  Can you resent it as an tar file?  I don't believe it's smart enough to open up tar and flag it as a script.

> -----Original Message-----
> From: Josh Hunt [mailto:johunt@akamai.com]
> Sent: Thursday, October 10, 2019 5:08 PM
> To: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Netdev <netdev@vger.kernel.org>; Willem de Bruijn
> <willemb@google.com>; intel-wired-lan <intel-wired-lan@lists.osuosl.org>;
> Brown, Aaron F <aaron.f.brown@intel.com>
> Subject: Re: [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
> 
> On 10/10/19 2:32 PM, Alexander Duyck wrote:
> > On Thu, Oct 10, 2019 at 2:17 PM Josh Hunt <johunt@akamai.com> wrote:
> >>
> >> On 10/9/19 3:44 PM, Alexander Duyck wrote:
> >>> On Wed, Oct 9, 2019 at 3:08 PM Josh Hunt <johunt@akamai.com> wrote:
> >>>>
> >>>> Alexander Duyck posted a series in 2018 proposing adding UDP
> segmentation
> >>>> offload support to ixgbe and ixgbevf, but those patches were never
> >>>> accepted:
> >>>>
> >>>>
> https://lore.kernel.org/netdev/20180504003556.4769.11407.stgit@localhost.lo
> caldomain/
> >>>>
> >>>> This series is a repost of his ixgbe patch along with a similar
> >>>> change to the igb and i40e drivers. Testing using the udpgso_bench_tx
> >>>> benchmark shows a noticeable performance improvement with these
> changes
> >>>> applied.
> >>>>
> >>>> All #s below were run with:
> >>>> udpgso_bench_tx -C 1 -4 -D 172.25.43.133 -z -l 30 -u -S 0 -s $pkt_size
> >>>>
> >>>> igb::
> >>>>
> >>>> SW GSO (ethtool -K eth0 tx-udp-segmentation off):
> >>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >>>>
> =================================================================
> =======
> >>>> 1472            120143.64       113     81263   81263   83.55   1.35
> >>>> 2944            120160.09       114     40638   40638   62.88   1.81
> >>>> 5888            120160.64       114     20319   20319   43.59   2.61
> >>>> 11776           120160.76       114     10160   10160   37.52   3.03
> >>>> 23552           120159.25       114     5080    5080    34.75   3.28
> >>>> 47104           120160.55       114     2540    2540    32.83   3.47
> >>>> 61824           120160.56       114     1935    1935    32.09   3.55
> >>>>
> >>>> HW GSO offload (ethtool -K eth0 tx-udp-segmentation on):
> >>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >>>>
> =================================================================
> =======
> >>>> 1472            120144.65       113     81264   81264   83.03   1.36
> >>>> 2944            120161.56       114     40638   40638   41      2.78
> >>>> 5888            120160.23       114     20319   20319   23.76   4.79
> >>>> 11776           120161.16       114     10160   10160   15.82   7.20
> >>>> 23552           120156.45       114     5079    5079    12.8    8.90
> >>>> 47104           120159.33       114     2540    2540    8.82    12.92
> >>>> 61824           120158.43       114     1935    1935    8.24    13.83
> >>>>
> >>>> ixgbe::
> >>>> SW GSO:
> >>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >>>>
> =================================================================
> =======
> >>>> 1472            1070565.90      1015    724112  724112  100     10.15
> >>>> 2944            1201579.19      1140    406342  406342  95.69   11.91
> >>>> 5888            1201217.55      1140    203185  203185  55.38   20.58
> >>>> 11776           1201613.49      1140    101588  101588  42.15   27.04
> >>>> 23552           1201631.32      1140    50795   50795   35.97   31.69
> >>>> 47104           1201626.38      1140    25397   25397   33.51   34.01
> >>>> 61824           1201625.52      1140    19350   19350   32.83   34.72
> >>>>
> >>>> HW GSO Offload:
> >>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >>>>
> =================================================================
> =======
> >>>> 1472            1058681.25      1004    715954  715954  100     10.04
> >>>> 2944            1201730.86      1134    404254  404254  61.28   18.50
> >>>> 5888            1201776.61      1131    201608  201608  30.25   37.38
> >>>> 11776           1201795.90      1130    100676  100676  16.63   67.94
> >>>> 23552           1201807.90      1129    50304   50304   10.07   112.11
> >>>> 47104           1201748.35      1128    25143   25143   6.8     165.88
> >>>> 61824           1200770.45      1128    19140   19140   5.38    209.66
> >>>>
> >>>> i40e::
> >>>> SW GSO:
> >>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >>>>
> =================================================================
> =======
> >>>> 1472            650122.83       616     439362  439362  100     6.16
> >>>> 2944            943993.53       895     319042  319042  100     8.95
> >>>> 5888            1199751.90      1138    202857  202857  82.51   13.79
> >>>> 11776           1200288.08      1139    101477  101477  64.34   17.70
> >>>> 23552           1201596.56      1140    50793   50793   59.74   19.08
> >>>> 47104           1201597.98      1140    25396   25396   56.31   20.24
> >>>> 61824           1201610.43      1140    19350   19350   55.48   20.54
> >>>>
> >>>> HW GSO offload:
> >>>> $pkt_size       kB/s(sar)       MB/s    Calls/s Msg/s   CPU     MB2CPU
> >>>>
> =================================================================
> =======
> >>>> 1472            657424.83       623     444653  444653  100     6.23
> >>>> 2944            1201242.87      1139    406226  406226  91.45   12.45
> >>>> 5888            1201739.95      1140    203199  203199  57.46   19.83
> >>>> 11776           1201557.36      1140    101584  101584  36.83   30.95
> >>>> 23552           1201525.17      1140    50790   50790   23.86   47.77
> >>>> 47104           1201514.54      1140    25394   25394   17.45   65.32
> >>>> 61824           1201478.91      1140    19348   19348   14.79   77.07
> >>>>
> >>>> I was not sure how to proper attribute Alexander on the ixgbe patch so
> >>>> please adjust this as necessary.
> >>>
> >>> For the ixgbe patch I would be good with:
> >>> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >>>
> >>> The big hurdle for this will be validation. I know that there are some
> >>> parts such as the 82598 in the case of the ixgbe driver or 82575 in
> >>> the case of igb that didn't support the feature, and I wasn't sure
> >>> about the parts supported by i40e either.  From what I can tell the
> >>> x710 datasheet seems to indicate that it is supported, and you were
> >>> able to get it working with your patch based on the numbers above. So
> >>> that just leaves validation of the x722 and making sure there isn't
> >>> anything firmware-wise on the i40e parts that may cause any issues.
> >>
> >> Thanks for feedback Alex.
> >>
> >> For validation, I will look around and see if we have any of the above
> >> chips in our testbeds. The above #s are from i210, 82599ES, and x710
> >> respectively. I'm happy to share my wrapper script for the gso selftest
> >> if others have the missing chipsets and can verify.
> >>
> >> Thanks!
> >> Josh
> >
> > If you could share your test scripts that would be great. I believe
> > the networking division will have access to more hardware so if you
> > could include Aaron, who I added to the Cc, in your reply with the
> > script that would be great as I am sure he can forward it on to
> > whoever ends up having to ultimately test this patch set.
> >
> > I'll keep an eye out for v2 of your patch set and review it when it is
> > available.
> >
> > Thanks.
> >
> > - Alex
> >
> 
> I've attached my benchmark wrapper script udpgso_bench.sh. To run it
> you'll need to copy it, udpgso_bench_rx, and udpgso_bench_tx (built from
> kernel's selftests dir) to your DUT. It also requires a remote sink
> machine able to receive traffic on UDP 8000 (or some configured port.)
> The script will copy over and start the sink process (udpgso_bench_rx)
> on the remote box.
> 
> Here's some info on how to run it:
> 
> Usage: ./udpgso_bench.sh <interface name> <remote v4 IP> [extra
> benchmark options]
> 
> Example usage:
> # ./udpgso_bench.sh eth0 172.25.43.133 -u
> 
> Beware it will make some configuration changes to your local machine. It
> will overwrite:
>   * /proc/sys/net/core/{optmem_max,wmem_max,wmem_default}
>   * qdisc setup for <int>
>   * IRQ affinity and XPS configuration for <int>
> 
> Please let me know if you hit any problems with the script. It
> originally had some akamai-specific items in it, but I (hopefully) have
> removed them all.
> 
> Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
  2019-10-11  0:21         ` Brown, Aaron F
@ 2019-10-11  0:27           ` Josh Hunt
  2019-10-11  0:29             ` Brown, Aaron F
  0 siblings, 1 reply; 14+ messages in thread
From: Josh Hunt @ 2019-10-11  0:27 UTC (permalink / raw)
  To: Brown, Aaron F, Alexander Duyck, Bowers, AndrewX
  Cc: Netdev, Willem de Bruijn, intel-wired-lan

On 10/10/19 5:21 PM, Brown, Aaron F wrote:
> Adding Andrew as he is most likely going to be testing this patch.
> 
> Unfortunately my mail server flags attached scripts as potential threats and strips them out.  Can you resent it as an tar file?  I don't believe it's smart enough to open up tar and flag it as a script.
> 

Hi Aaron

It looks like the netdev archive has the file. Can you try grabbing it 
from here?

https://lore.kernel.org/netdev/0e0e706c-4ce9-c27a-af55-339b4eb6d524@akamai.com/2-udpgso_bench.sh

If that doesn't work I can try your tar workaround.

Thanks
Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
  2019-10-11  0:27           ` Josh Hunt
@ 2019-10-11  0:29             ` Brown, Aaron F
  0 siblings, 0 replies; 14+ messages in thread
From: Brown, Aaron F @ 2019-10-11  0:29 UTC (permalink / raw)
  To: Josh Hunt, Alexander Duyck, Bowers, AndrewX
  Cc: Netdev, Willem de Bruijn, intel-wired-lan



> -----Original Message-----
> From: Josh Hunt [mailto:johunt@akamai.com]
> Sent: Thursday, October 10, 2019 5:28 PM
> To: Brown, Aaron F <aaron.f.brown@intel.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; Bowers, AndrewX
> <andrewx.bowers@intel.com>
> Cc: Netdev <netdev@vger.kernel.org>; Willem de Bruijn
> <willemb@google.com>; intel-wired-lan <intel-wired-lan@lists.osuosl.org>
> Subject: Re: [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support
> 
> On 10/10/19 5:21 PM, Brown, Aaron F wrote:
> > Adding Andrew as he is most likely going to be testing this patch.
> >
> > Unfortunately my mail server flags attached scripts as potential threats and
> strips them out.  Can you resent it as an tar file?  I don't believe it's smart
> enough to open up tar and flag it as a script.
> >
> 
> Hi Aaron
> 
> It looks like the netdev archive has the file. Can you try grabbing it
> from here?

Yes, I can.  Thanks.

> 
> https://lore.kernel.org/netdev/0e0e706c-4ce9-c27a-af55-
> 339b4eb6d524@akamai.com/2-udpgso_bench.sh
> 
> If that doesn't work I can try your tar workaround.
> 
> Thanks
> Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-10-11  0:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09 22:06 [PATCH 0/3] igb, ixgbe, i40e UDP segmentation offload support Josh Hunt
2019-10-09 22:06 ` [PATCH 1/3] igb: Add " Josh Hunt
2019-10-09 22:06 ` [PATCH 2/3] ixgbe: " Josh Hunt
2019-10-10  1:06   ` Josh Hunt
2019-10-09 22:06 ` [PATCH 3/3] i40e: " Josh Hunt
2019-10-10  0:39   ` Samudrala, Sridhar
2019-10-10  0:54     ` Josh Hunt
2019-10-09 22:44 ` [PATCH 0/3] igb, ixgbe, i40e " Alexander Duyck
2019-10-10 21:17   ` Josh Hunt
2019-10-10 21:32     ` Alexander Duyck
2019-10-11  0:07       ` Josh Hunt
2019-10-11  0:21         ` Brown, Aaron F
2019-10-11  0:27           ` Josh Hunt
2019-10-11  0:29             ` Brown, Aaron F

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).