bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc
@ 2023-04-18 13:30 Jesper Dangaard Brouer
  2023-04-18 13:30 ` [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 13:30 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Implement both RX hash and RX timestamp XDP hints kfunc metadata
for driver igc.

First patch fix RX hashing for igc in general.

Last patch change test program xdp_hw_metadata to track more
timestamps, which helps us correlate the hardware RX timestamp
with something.

---
To maintainers: I'm uncertain which git tree this should be sent
against. This is primary NIC driver code (net-next), but it's
BPF/XDP related (bpf-next) via xdp_metadata_ops.

Jesper Dangaard Brouer (5):
      igc: enable and fix RX hash usage by netstack
      igc: add igc_xdp_buff wrapper for xdp_buff in driver
      igc: add XDP hints kfuncs for RX hash
      igc: add XDP hints kfuncs for RX timestamp
      selftests/bpf: xdp_hw_metadata track more timestamps


 drivers/net/ethernet/intel/igc/igc.h          |  35 ++++++
 drivers/net/ethernet/intel/igc/igc_main.c     | 116 ++++++++++++++++--
 .../selftests/bpf/progs/xdp_hw_metadata.c     |   4 +-
 tools/testing/selftests/bpf/xdp_hw_metadata.c |  47 ++++++-
 tools/testing/selftests/bpf/xdp_metadata.h    |   1 +
 5 files changed, 186 insertions(+), 17 deletions(-)

--


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack
  2023-04-18 13:30 [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
@ 2023-04-18 13:30 ` Jesper Dangaard Brouer
  2023-04-23 14:46   ` John Fastabend
  2023-04-18 13:30 ` [PATCH bpf-next V2 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 13:30 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a0372
("igc: Add transmit and receive fastpath and interrupt handlers"), the
hardware wasn't configured to provide RSS hash, thus it made sense to not
enable net_device NETIF_F_RXHASH feature bit.

The NIC hardware was configured to enable RSS hash info in v5.2 via commit
2121c2712f82 ("igc: Add multiple receive queues control supporting"), but
forgot to set the NETIF_F_RXHASH feature bit.

The original implementation of igc_rx_hash() didn't extract the associated
pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of
this patch are about extracting the RSS Type from the hardware and mapping
this to enum pkt_hash_types. This was based on Foxville i225 software user
manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03).

For UDP it's worth noting that RSS (type) hashing have been disabled both for
IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP)
because hardware RSS doesn't handle fragmented pkts well when enabled (can
cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and
hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
the effect that netstack will do a software based hash calc calling into
flow_dissect, but only when code calls skb_get_hash(), which doesn't
necessary happen for local delivery.

For QA verification testing I wrote a small bpftrace prog:
 [0] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/monitor_skb_hash_on_dev.bt

Fixes: 2121c2712f82 ("igc: Add multiple receive queues control supporting")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |   28 ++++++++++++++++++++++++++
 drivers/net/ethernet/intel/igc/igc_main.c |   31 +++++++++++++++++++++++++----
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 34aebf00a512..f7f9e217e7b4 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -13,6 +13,7 @@
 #include <linux/ptp_clock_kernel.h>
 #include <linux/timecounter.h>
 #include <linux/net_tstamp.h>
+#include <linux/bitfield.h>
 
 #include "igc_hw.h"
 
@@ -311,6 +312,33 @@ extern char igc_driver_name[];
 #define IGC_MRQC_RSS_FIELD_IPV4_UDP	0x00400000
 #define IGC_MRQC_RSS_FIELD_IPV6_UDP	0x00800000
 
+/* RX-desc Write-Back format RSS Type's */
+enum igc_rss_type_num {
+	IGC_RSS_TYPE_NO_HASH		= 0,
+	IGC_RSS_TYPE_HASH_TCP_IPV4	= 1,
+	IGC_RSS_TYPE_HASH_IPV4		= 2,
+	IGC_RSS_TYPE_HASH_TCP_IPV6	= 3,
+	IGC_RSS_TYPE_HASH_IPV6_EX	= 4,
+	IGC_RSS_TYPE_HASH_IPV6		= 5,
+	IGC_RSS_TYPE_HASH_TCP_IPV6_EX	= 6,
+	IGC_RSS_TYPE_HASH_UDP_IPV4	= 7,
+	IGC_RSS_TYPE_HASH_UDP_IPV6	= 8,
+	IGC_RSS_TYPE_HASH_UDP_IPV6_EX	= 9,
+	IGC_RSS_TYPE_MAX		= 10,
+};
+#define IGC_RSS_TYPE_MAX_TABLE		16
+#define IGC_RSS_TYPE_MASK		GENMASK(3,0) /* 4-bits (3:0) = mask 0x0F */
+
+/* igc_rss_type - Rx descriptor RSS type field */
+static inline u32 igc_rss_type(const union igc_adv_rx_desc *rx_desc)
+{
+	/* RSS Type 4-bits (3:0) number: 0-9 (above 9 is reserved)
+	 * Accessing the same bits via u16 (wb.lower.lo_dword.hs_rss.pkt_info)
+	 * is slightly slower than via u32 (wb.lower.lo_dword.data)
+	 */
+	return le32_get_bits(rx_desc->wb.lower.lo_dword.data, IGC_RSS_TYPE_MASK);
+}
+
 /* Interrupt defines */
 #define IGC_START_ITR			648 /* ~6000 ints/sec */
 #define IGC_4K_ITR			980
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 1c4676882082..bfa9768d447f 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1690,14 +1690,36 @@ static void igc_rx_checksum(struct igc_ring *ring,
 		   le32_to_cpu(rx_desc->wb.upper.status_error));
 }
 
+/* Mapping HW RSS Type to enum pkt_hash_types */
+static const enum pkt_hash_types igc_rss_type_table[IGC_RSS_TYPE_MAX_TABLE] = {
+	[IGC_RSS_TYPE_NO_HASH]		= PKT_HASH_TYPE_L2,
+	[IGC_RSS_TYPE_HASH_TCP_IPV4]	= PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_IPV4]	= PKT_HASH_TYPE_L3,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6]	= PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_IPV6_EX]	= PKT_HASH_TYPE_L3,
+	[IGC_RSS_TYPE_HASH_IPV6]	= PKT_HASH_TYPE_L3,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_UDP_IPV4]	= PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6]	= PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = PKT_HASH_TYPE_L4,
+	[10] = PKT_HASH_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW  */
+	[11] = PKT_HASH_TYPE_NONE, /* keep array sized for SW bit-mask   */
+	[12] = PKT_HASH_TYPE_NONE, /* to handle future HW revisons       */
+	[13] = PKT_HASH_TYPE_NONE,
+	[14] = PKT_HASH_TYPE_NONE,
+	[15] = PKT_HASH_TYPE_NONE,
+};
+
 static inline void igc_rx_hash(struct igc_ring *ring,
 			       union igc_adv_rx_desc *rx_desc,
 			       struct sk_buff *skb)
 {
-	if (ring->netdev->features & NETIF_F_RXHASH)
-		skb_set_hash(skb,
-			     le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
-			     PKT_HASH_TYPE_L3);
+	if (ring->netdev->features & NETIF_F_RXHASH) {
+		u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
+		u32 rss_type = igc_rss_type(rx_desc);
+
+		skb_set_hash(skb, rss_hash, igc_rss_type_table[rss_type]);
+	}
 }
 
 static void igc_rx_vlan(struct igc_ring *rx_ring,
@@ -6554,6 +6576,7 @@ static int igc_probe(struct pci_dev *pdev,
 	netdev->features |= NETIF_F_TSO;
 	netdev->features |= NETIF_F_TSO6;
 	netdev->features |= NETIF_F_TSO_ECN;
+	netdev->features |= NETIF_F_RXHASH;
 	netdev->features |= NETIF_F_RXCSUM;
 	netdev->features |= NETIF_F_HW_CSUM;
 	netdev->features |= NETIF_F_SCTP_CRC;



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next V2 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver
  2023-04-18 13:30 [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
  2023-04-18 13:30 ` [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
@ 2023-04-18 13:30 ` Jesper Dangaard Brouer
  2023-04-18 13:30 ` [PATCH bpf-next V2 3/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 13:30 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Driver specific metadata data for XDP-hints kfuncs are propagated via tail
extending the struct xdp_buff with a locally scoped driver struct.

Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for
extending the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define
catch size violations build time.

The changes needed for AF_XDP zero-copy in igc_clean_rx_irq_zc()
is done in next patch, because the member rx_desc isn't available
at this point.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |    5 +++++
 drivers/net/ethernet/intel/igc/igc_main.c |   16 +++++++++-------
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index f7f9e217e7b4..76a5115aefc8 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -499,6 +499,11 @@ struct igc_rx_buffer {
 	};
 };
 
+/* context wrapper around xdp_buff to provide access to descriptor metadata */
+struct igc_xdp_buff {
+	struct xdp_buff xdp;
+};
+
 struct igc_q_vector {
 	struct igc_adapter *adapter;    /* backlink */
 	void __iomem *itr_register;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index bfa9768d447f..6a4c7cd706bd 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2236,6 +2236,8 @@ static bool igc_alloc_rx_buffers_zc(struct igc_ring *ring, u16 count)
 	if (!count)
 		return ok;
 
+	XSK_CHECK_PRIV_TYPE(struct igc_xdp_buff);
+
 	desc = IGC_RX_DESC(ring, i);
 	bi = &ring->rx_buffer_info[i];
 	i -= ring->count;
@@ -2520,8 +2522,8 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		union igc_adv_rx_desc *rx_desc;
 		struct igc_rx_buffer *rx_buffer;
 		unsigned int size, truesize;
+		struct igc_xdp_buff ctx;
 		ktime_t timestamp = 0;
-		struct xdp_buff xdp;
 		int pkt_offset = 0;
 		void *pktbuf;
 
@@ -2555,13 +2557,13 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		}
 
 		if (!skb) {
-			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
-			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
+			xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq);
+			xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring),
 					 igc_rx_offset(rx_ring) + pkt_offset,
 					 size, true);
-			xdp_buff_clear_frags_flag(&xdp);
+			xdp_buff_clear_frags_flag(&ctx.xdp);
 
-			skb = igc_xdp_run_prog(adapter, &xdp);
+			skb = igc_xdp_run_prog(adapter, &ctx.xdp);
 		}
 
 		if (IS_ERR(skb)) {
@@ -2583,9 +2585,9 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		} else if (skb)
 			igc_add_rx_frag(rx_ring, rx_buffer, skb, size);
 		else if (ring_uses_build_skb(rx_ring))
-			skb = igc_build_skb(rx_ring, rx_buffer, &xdp);
+			skb = igc_build_skb(rx_ring, rx_buffer, &ctx.xdp);
 		else
-			skb = igc_construct_skb(rx_ring, rx_buffer, &xdp,
+			skb = igc_construct_skb(rx_ring, rx_buffer, &ctx.xdp,
 						timestamp);
 
 		/* exit if we failed to retrieve a buffer */



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next V2 3/5] igc: add XDP hints kfuncs for RX hash
  2023-04-18 13:30 [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
  2023-04-18 13:30 ` [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
  2023-04-18 13:30 ` [PATCH bpf-next V2 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
@ 2023-04-18 13:30 ` Jesper Dangaard Brouer
  2023-04-18 13:30 ` [PATCH bpf-next V2 4/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 13:30 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

This implements XDP hints kfunc for RX-hash (xmo_rx_hash).
The HW rss hash type is handled via mapping table.

This igc driver (default config) does L3 hashing for UDP packets
(excludes UDP src/dest ports in hash calc).  Meaning RSS hash type is
L3 based.  Tested that the igc_rss_type_num for UDP is either
IGC_RSS_TYPE_HASH_IPV4 or IGC_RSS_TYPE_HASH_IPV6.

This patch also updates AF_XDP zero-copy function igc_clean_rx_irq_zc()
to use the xdp_buff wrapper struct igc_xdp_buff.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |    1 +
 drivers/net/ethernet/intel/igc/igc_main.c |   53 +++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 76a5115aefc8..c609a2e648f8 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -502,6 +502,7 @@ struct igc_rx_buffer {
 /* context wrapper around xdp_buff to provide access to descriptor metadata */
 struct igc_xdp_buff {
 	struct xdp_buff xdp;
+	union igc_adv_rx_desc *rx_desc;
 };
 
 struct igc_q_vector {
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 6a4c7cd706bd..9cb43c0eab73 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2562,6 +2562,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 					 igc_rx_offset(rx_ring) + pkt_offset,
 					 size, true);
 			xdp_buff_clear_frags_flag(&ctx.xdp);
+			ctx.rx_desc = rx_desc;
 
 			skb = igc_xdp_run_prog(adapter, &ctx.xdp);
 		}
@@ -2688,6 +2689,15 @@ static void igc_dispatch_skb_zc(struct igc_q_vector *q_vector,
 	napi_gro_receive(&q_vector->napi, skb);
 }
 
+static struct igc_xdp_buff *xsk_buff_to_igc_ctx(struct xdp_buff *xdp)
+{
+	/* xdp_buff pointer used by ZC code path is alloc as xdp_buff_xsk. The
+	 * igc_xdp_buff shares its layout with xdp_buff_xsk and private
+	 * igc_xdp_buff fields fall into xdp_buff_xsk->cb
+	 */
+       return (struct igc_xdp_buff *)xdp;
+}
+
 static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
 {
 	struct igc_adapter *adapter = q_vector->adapter;
@@ -2706,6 +2716,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
 	while (likely(total_packets < budget)) {
 		union igc_adv_rx_desc *desc;
 		struct igc_rx_buffer *bi;
+		struct igc_xdp_buff *ctx;
 		ktime_t timestamp = 0;
 		unsigned int size;
 		int res;
@@ -2723,6 +2734,9 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
 
 		bi = &ring->rx_buffer_info[ntc];
 
+		ctx = xsk_buff_to_igc_ctx(bi->xdp);
+		ctx->rx_desc = desc;
+
 		if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
 			timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
 							bi->xdp->data);
@@ -6478,6 +6492,44 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
 	return value;
 }
 
+/* Mapping HW RSS Type to enum xdp_rss_hash_type */
+static enum xdp_rss_hash_type igc_xdp_rss_type[IGC_RSS_TYPE_MAX_TABLE] = {
+	[IGC_RSS_TYPE_NO_HASH]		= XDP_RSS_TYPE_L2,
+	[IGC_RSS_TYPE_HASH_TCP_IPV4]	= XDP_RSS_TYPE_L4_IPV4_TCP,
+	[IGC_RSS_TYPE_HASH_IPV4]	= XDP_RSS_TYPE_L3_IPV4,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6]	= XDP_RSS_TYPE_L4_IPV6_TCP,
+	[IGC_RSS_TYPE_HASH_IPV6_EX]	= XDP_RSS_TYPE_L3_IPV6_EX,
+	[IGC_RSS_TYPE_HASH_IPV6]	= XDP_RSS_TYPE_L3_IPV6,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX,
+	[IGC_RSS_TYPE_HASH_UDP_IPV4]	= XDP_RSS_TYPE_L4_IPV4_UDP,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6]	= XDP_RSS_TYPE_L4_IPV6_UDP,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX,
+	[10] = XDP_RSS_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW  */
+	[11] = XDP_RSS_TYPE_NONE, /* keep array sized for SW bit-mask   */
+	[12] = XDP_RSS_TYPE_NONE, /* to handle future HW revisons       */
+	[13] = XDP_RSS_TYPE_NONE,
+	[14] = XDP_RSS_TYPE_NONE,
+	[15] = XDP_RSS_TYPE_NONE,
+};
+
+static int igc_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
+			   enum xdp_rss_hash_type *rss_type)
+{
+	const struct igc_xdp_buff *ctx = (void *)_ctx;
+
+	if (!(ctx->xdp.rxq->dev->features & NETIF_F_RXHASH))
+		return -ENODATA;
+
+	*hash = le32_to_cpu(ctx->rx_desc->wb.lower.hi_dword.rss);
+	*rss_type = igc_xdp_rss_type[igc_rss_type(ctx->rx_desc)];
+
+	return 0;
+}
+
+static const struct xdp_metadata_ops igc_xdp_metadata_ops = {
+	.xmo_rx_hash			= igc_xdp_rx_hash,
+};
+
 /**
  * igc_probe - Device Initialization Routine
  * @pdev: PCI device information struct
@@ -6551,6 +6603,7 @@ static int igc_probe(struct pci_dev *pdev,
 	hw->hw_addr = adapter->io_addr;
 
 	netdev->netdev_ops = &igc_netdev_ops;
+	netdev->xdp_metadata_ops = &igc_xdp_metadata_ops;
 	igc_ethtool_set_ops(netdev);
 	netdev->watchdog_timeo = 5 * HZ;
 



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next V2 4/5] igc: add XDP hints kfuncs for RX timestamp
  2023-04-18 13:30 [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2023-04-18 13:30 ` [PATCH bpf-next V2 3/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
@ 2023-04-18 13:30 ` Jesper Dangaard Brouer
  2023-04-18 13:31 ` [PATCH bpf-next V2 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 13:30 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

The NIC hardware RX timestamping mechanism adds an optional tailored
header before the MAC header containing packet reception time. Optional
depending on RX descriptor TSIP status bit (IGC_RXDADV_STAT_TSIP). In
case this bit is set driver does offset adjustments to packet data start
and extracts the timestamp.

The timestamp need to be extracted before invoking the XDP bpf_prog,
because this area just before the packet is also accessible by XDP via
data_meta context pointer (and helper bpf_xdp_adjust_meta). Thus, an XDP
bpf_prog can potentially overwrite this and corrupt data that we want to
extract with the new kfunc for reading the timestamp.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |    1 +
 drivers/net/ethernet/intel/igc/igc_main.c |   16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index c609a2e648f8..18d4af934d8c 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -503,6 +503,7 @@ struct igc_rx_buffer {
 struct igc_xdp_buff {
 	struct xdp_buff xdp;
 	union igc_adv_rx_desc *rx_desc;
+	ktime_t rx_ts; /* data indication bit IGC_RXDADV_STAT_TSIP */
 };
 
 struct igc_q_vector {
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 9cb43c0eab73..38d113b48111 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2552,6 +2552,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		if (igc_test_staterr(rx_desc, IGC_RXDADV_STAT_TSIP)) {
 			timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
 							pktbuf);
+			ctx.rx_ts = timestamp;
 			pkt_offset = IGC_TS_HDR_LEN;
 			size -= IGC_TS_HDR_LEN;
 		}
@@ -2740,6 +2741,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
 		if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
 			timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
 							bi->xdp->data);
+			ctx->rx_ts = timestamp;
 
 			bi->xdp->data += IGC_TS_HDR_LEN;
 
@@ -6526,8 +6528,22 @@ static int igc_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
 	return 0;
 }
 
+static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp)
+{
+	const struct igc_xdp_buff *ctx = (void *)_ctx;
+
+	if (igc_test_staterr(ctx->rx_desc, IGC_RXDADV_STAT_TSIP)) {
+		*timestamp = ctx->rx_ts;
+
+		return 0;
+	}
+
+	return -ENODATA;
+}
+
 static const struct xdp_metadata_ops igc_xdp_metadata_ops = {
 	.xmo_rx_hash			= igc_xdp_rx_hash,
+	.xmo_rx_timestamp		= igc_xdp_rx_timestamp,
 };
 
 /**



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next V2 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-18 13:30 [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2023-04-18 13:30 ` [PATCH bpf-next V2 4/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
@ 2023-04-18 13:31 ` Jesper Dangaard Brouer
  2023-04-18 16:36   ` Stanislav Fomichev
  2023-04-18 14:53 ` [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Song, Yoong Siang
  2023-04-27 17:00 ` patchwork-bot+netdevbpf
  6 siblings, 1 reply; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 13:31 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

To correlate the hardware RX timestamp with something, add tracking of
two software timestamps both clock source CLOCK_TAI (see description in
man clock_gettime(2)).

XDP metadata is extended with xdp_timestamp for capturing when XDP
received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
could not find a BPF helper for getting CLOCK_REALTIME, which would have
been preferred. In userspace when AF_XDP sees the packet another
software timestamp is recorded via clock_gettime() also clock source
CLOCK_TAI.

Example output shortly after loading igc driver:

  poll: 1 (0) skip=1 fail=0 redir=2
  xsk_ring_cons__peek: 1
  0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
  rx_hash: 0x82A96531 with RSS type:0x1
  rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
  XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
  AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
  0x12557a8: complete idx=9 addr=9000

The first observation is that the 37 sec difference between RX HW vs XDP
timestamps, which indicate hardware is likely clock source
CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
with a 37 sec offset.

The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
userspace wakeup time. On this hardware it was caused by CPU idle sleep
states, which can be reduced by tuning /dev/cpu_dma_latency.

View current requested/allowed latency bound via:
  hexdump --format '"%d\n"' /dev/cpu_dma_latency

More explanation of the output and how this can be used to identify
clock drift for the HW clock can be seen here[1]:

[1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../testing/selftests/bpf/progs/xdp_hw_metadata.c  |    4 +-
 tools/testing/selftests/bpf/xdp_hw_metadata.c      |   47 ++++++++++++++++++--
 tools/testing/selftests/bpf/xdp_metadata.h         |    1 
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index e1c787815e44..b2dfd7066c6e 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -77,7 +77,9 @@ int rx(struct xdp_md *ctx)
 	}
 
 	err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
-	if (err)
+	if (!err)
+		meta->xdp_timestamp = bpf_ktime_get_tai_ns();
+	else
 		meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
 
 	err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 987cf0db5ebc..613321eb84c1 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -27,6 +27,7 @@
 #include <sys/mman.h>
 #include <net/if.h>
 #include <poll.h>
+#include <time.h>
 
 #include "xdp_metadata.h"
 
@@ -134,18 +135,52 @@ static void refill_rx(struct xsk *xsk, __u64 addr)
 	}
 }
 
-static void verify_xdp_metadata(void *data)
+#define NANOSEC_PER_SEC 1000000000 /* 10^9 */
+static __u64 gettime(clockid_t clock_id)
+{
+	struct timespec t;
+	int res;
+
+	/* See man clock_gettime(2) for type of clock_id's */
+	res = clock_gettime(clock_id, &t);
+
+	if (res < 0)
+		error(res, errno, "Error with clock_gettime()");
+
+	return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec;
+}
+
+static void verify_xdp_metadata(void *data, clockid_t clock_id)
 {
 	struct xdp_meta *meta;
 
 	meta = data - sizeof(*meta);
 
-	printf("rx_timestamp: %llu\n", meta->rx_timestamp);
 	if (meta->rx_hash_err < 0)
 		printf("No rx_hash err=%d\n", meta->rx_hash_err);
 	else
 		printf("rx_hash: 0x%X with RSS type:0x%X\n",
 		       meta->rx_hash, meta->rx_hash_type);
+
+	printf("rx_timestamp:  %llu (sec:%0.4f)\n", meta->rx_timestamp,
+	       (double)meta->rx_timestamp / NANOSEC_PER_SEC);
+	if (meta->rx_timestamp) {
+		__u64 usr_clock = gettime(clock_id);
+		__u64 xdp_clock = meta->xdp_timestamp;
+		__s64 delta_X = xdp_clock - meta->rx_timestamp;
+		__s64 delta_X2U = usr_clock - xdp_clock;
+
+		printf("XDP RX-time:   %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+		       xdp_clock, (double)xdp_clock / NANOSEC_PER_SEC,
+		       (double)delta_X / NANOSEC_PER_SEC,
+		       (double)delta_X / 1000);
+
+		printf("AF_XDP time:   %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+		       usr_clock, (double)usr_clock / NANOSEC_PER_SEC,
+		       (double)delta_X2U / NANOSEC_PER_SEC,
+		       (double)delta_X2U / 1000);
+	}
+
 }
 
 static void verify_skb_metadata(int fd)
@@ -193,7 +228,7 @@ static void verify_skb_metadata(int fd)
 	printf("skb hwtstamp is not found!\n");
 }
 
-static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
+static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t clock_id)
 {
 	const struct xdp_desc *rx_desc;
 	struct pollfd fds[rxq + 1];
@@ -243,7 +278,8 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
 			addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
 			printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
 			       xsk, idx, rx_desc->addr, addr, comp_addr);
-			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr));
+			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr),
+					    clock_id);
 			xsk_ring_cons__release(&xsk->rx, 1);
 			refill_rx(xsk, comp_addr);
 		}
@@ -370,6 +406,7 @@ static void timestamping_enable(int fd, int val)
 
 int main(int argc, char *argv[])
 {
+	clockid_t clock_id = CLOCK_TAI;
 	int server_fd = -1;
 	int ret;
 	int i;
@@ -443,7 +480,7 @@ int main(int argc, char *argv[])
 		error(1, -ret, "bpf_xdp_attach");
 
 	signal(SIGINT, handle_signal);
-	ret = verify_metadata(rx_xsk, rxq, server_fd);
+	ret = verify_metadata(rx_xsk, rxq, server_fd, clock_id);
 	close(server_fd);
 	cleanup();
 	if (ret)
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
index 0c4624dc6f2f..938a729bd307 100644
--- a/tools/testing/selftests/bpf/xdp_metadata.h
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -11,6 +11,7 @@
 
 struct xdp_meta {
 	__u64 rx_timestamp;
+	__u64 xdp_timestamp;
 	__u32 rx_hash;
 	union {
 		__u32 rx_hash_type;



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc
  2023-04-18 13:30 [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
                   ` (4 preceding siblings ...)
  2023-04-18 13:31 ` [PATCH bpf-next V2 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
@ 2023-04-18 14:53 ` Song, Yoong Siang
  2023-04-21 14:52   ` [xdp-hints] " Daniel Borkmann
  2023-04-27 17:00 ` patchwork-bot+netdevbpf
  6 siblings, 1 reply; 18+ messages in thread
From: Song, Yoong Siang @ 2023-04-18 14:53 UTC (permalink / raw)
  To: Brouer, Jesper, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
	Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
	Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem

On Tuesday, April 18, 2023 9:31 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>Implement both RX hash and RX timestamp XDP hints kfunc metadata for driver
>igc.
>
>First patch fix RX hashing for igc in general.
>
>Last patch change test program xdp_hw_metadata to track more timestamps,
>which helps us correlate the hardware RX timestamp with something.
>
>---
>To maintainers: I'm uncertain which git tree this should be sent against. This is
>primary NIC driver code (net-next), but it's BPF/XDP related (bpf-next) via
>xdp_metadata_ops.
>
>Jesper Dangaard Brouer (5):
>      igc: enable and fix RX hash usage by netstack
>      igc: add igc_xdp_buff wrapper for xdp_buff in driver
>      igc: add XDP hints kfuncs for RX hash
>      igc: add XDP hints kfuncs for RX timestamp
>      selftests/bpf: xdp_hw_metadata track more timestamps
>
>
> drivers/net/ethernet/intel/igc/igc.h          |  35 ++++++
> drivers/net/ethernet/intel/igc/igc_main.c     | 116 ++++++++++++++++--
> .../selftests/bpf/progs/xdp_hw_metadata.c     |   4 +-
> tools/testing/selftests/bpf/xdp_hw_metadata.c |  47 ++++++-
> tools/testing/selftests/bpf/xdp_metadata.h    |   1 +
> 5 files changed, 186 insertions(+), 17 deletions(-)
>
>--

This patchset lgtm.
Thanks for the changes.

Thanks & Regards
Siang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next V2 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-18 13:31 ` [PATCH bpf-next V2 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
@ 2023-04-18 16:36   ` Stanislav Fomichev
  2023-04-19 16:41     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2023-04-18 16:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, Toke Høiland-Jørgensen, netdev, martin.lau, ast,
	daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
	yoong.siang.song, intel-wired-lan, pabeni, jesse.brandeburg,
	kuba, edumazet, john.fastabend, hawk, davem

On 04/18, Jesper Dangaard Brouer wrote:
> To correlate the hardware RX timestamp with something, add tracking of
> two software timestamps both clock source CLOCK_TAI (see description in
> man clock_gettime(2)).
> 
> XDP metadata is extended with xdp_timestamp for capturing when XDP
> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
> could not find a BPF helper for getting CLOCK_REALTIME, which would have
> been preferred. In userspace when AF_XDP sees the packet another
> software timestamp is recorded via clock_gettime() also clock source
> CLOCK_TAI.
> 
> Example output shortly after loading igc driver:
> 
>   poll: 1 (0) skip=1 fail=0 redir=2
>   xsk_ring_cons__peek: 1
>   0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>   rx_hash: 0x82A96531 with RSS type:0x1
>   rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>   XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>   AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>   0x12557a8: complete idx=9 addr=9000
> 
> The first observation is that the 37 sec difference between RX HW vs XDP
> timestamps, which indicate hardware is likely clock source
> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
> with a 37 sec offset.
> 
> The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
> userspace wakeup time. On this hardware it was caused by CPU idle sleep
> states, which can be reduced by tuning /dev/cpu_dma_latency.
> 
> View current requested/allowed latency bound via:
>   hexdump --format '"%d\n"' /dev/cpu_dma_latency
> 
> More explanation of the output and how this can be used to identify
> clock drift for the HW clock can be seen here[1]:
> 
> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Stanislav Fomichev <sdf@google.com>

> ---
>  .../testing/selftests/bpf/progs/xdp_hw_metadata.c  |    4 +-
>  tools/testing/selftests/bpf/xdp_hw_metadata.c      |   47 ++++++++++++++++++--
>  tools/testing/selftests/bpf/xdp_metadata.h         |    1 
>  3 files changed, 46 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> index e1c787815e44..b2dfd7066c6e 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> @@ -77,7 +77,9 @@ int rx(struct xdp_md *ctx)
>  	}
>  
>  	err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
> -	if (err)

[..]

> +	if (!err)
> +		meta->xdp_timestamp = bpf_ktime_get_tai_ns();

nit: why not set it unconditionally?

> +	else
>  		meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
>  
>  	err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
> diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> index 987cf0db5ebc..613321eb84c1 100644
> --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> @@ -27,6 +27,7 @@
>  #include <sys/mman.h>
>  #include <net/if.h>
>  #include <poll.h>
> +#include <time.h>
>  
>  #include "xdp_metadata.h"
>  
> @@ -134,18 +135,52 @@ static void refill_rx(struct xsk *xsk, __u64 addr)
>  	}
>  }
>  
> -static void verify_xdp_metadata(void *data)
> +#define NANOSEC_PER_SEC 1000000000 /* 10^9 */
> +static __u64 gettime(clockid_t clock_id)
> +{
> +	struct timespec t;
> +	int res;
> +
> +	/* See man clock_gettime(2) for type of clock_id's */
> +	res = clock_gettime(clock_id, &t);
> +
> +	if (res < 0)
> +		error(res, errno, "Error with clock_gettime()");
> +
> +	return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec;
> +}
> +
> +static void verify_xdp_metadata(void *data, clockid_t clock_id)
>  {
>  	struct xdp_meta *meta;
>  
>  	meta = data - sizeof(*meta);
>  
> -	printf("rx_timestamp: %llu\n", meta->rx_timestamp);
>  	if (meta->rx_hash_err < 0)
>  		printf("No rx_hash err=%d\n", meta->rx_hash_err);
>  	else
>  		printf("rx_hash: 0x%X with RSS type:0x%X\n",
>  		       meta->rx_hash, meta->rx_hash_type);
> +
> +	printf("rx_timestamp:  %llu (sec:%0.4f)\n", meta->rx_timestamp,
> +	       (double)meta->rx_timestamp / NANOSEC_PER_SEC);
> +	if (meta->rx_timestamp) {
> +		__u64 usr_clock = gettime(clock_id);
> +		__u64 xdp_clock = meta->xdp_timestamp;
> +		__s64 delta_X = xdp_clock - meta->rx_timestamp;
> +		__s64 delta_X2U = usr_clock - xdp_clock;
> +
> +		printf("XDP RX-time:   %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
> +		       xdp_clock, (double)xdp_clock / NANOSEC_PER_SEC,
> +		       (double)delta_X / NANOSEC_PER_SEC,
> +		       (double)delta_X / 1000);
> +
> +		printf("AF_XDP time:   %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
> +		       usr_clock, (double)usr_clock / NANOSEC_PER_SEC,
> +		       (double)delta_X2U / NANOSEC_PER_SEC,
> +		       (double)delta_X2U / 1000);
> +	}
> +
>  }
>  
>  static void verify_skb_metadata(int fd)
> @@ -193,7 +228,7 @@ static void verify_skb_metadata(int fd)
>  	printf("skb hwtstamp is not found!\n");
>  }
>  
> -static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
> +static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t clock_id)
>  {
>  	const struct xdp_desc *rx_desc;
>  	struct pollfd fds[rxq + 1];
> @@ -243,7 +278,8 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
>  			addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
>  			printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
>  			       xsk, idx, rx_desc->addr, addr, comp_addr);
> -			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr));
> +			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr),
> +					    clock_id);
>  			xsk_ring_cons__release(&xsk->rx, 1);
>  			refill_rx(xsk, comp_addr);
>  		}
> @@ -370,6 +406,7 @@ static void timestamping_enable(int fd, int val)
>  
>  int main(int argc, char *argv[])
>  {
> +	clockid_t clock_id = CLOCK_TAI;
>  	int server_fd = -1;
>  	int ret;
>  	int i;
> @@ -443,7 +480,7 @@ int main(int argc, char *argv[])
>  		error(1, -ret, "bpf_xdp_attach");
>  
>  	signal(SIGINT, handle_signal);
> -	ret = verify_metadata(rx_xsk, rxq, server_fd);
> +	ret = verify_metadata(rx_xsk, rxq, server_fd, clock_id);
>  	close(server_fd);
>  	cleanup();
>  	if (ret)
> diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
> index 0c4624dc6f2f..938a729bd307 100644
> --- a/tools/testing/selftests/bpf/xdp_metadata.h
> +++ b/tools/testing/selftests/bpf/xdp_metadata.h
> @@ -11,6 +11,7 @@
>  
>  struct xdp_meta {
>  	__u64 rx_timestamp;
> +	__u64 xdp_timestamp;
>  	__u32 rx_hash;
>  	union {
>  		__u32 rx_hash_type;
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next V2 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-18 16:36   ` Stanislav Fomichev
@ 2023-04-19 16:41     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-19 16:41 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: brouer, bpf, Toke Høiland-Jørgensen, netdev,
	martin.lau, ast, daniel, alexandr.lobakin, larysa.zaremba,
	xdp-hints, yoong.siang.song, intel-wired-lan, pabeni,
	jesse.brandeburg, kuba, edumazet, john.fastabend, hawk, davem



On 18/04/2023 18.36, Stanislav Fomichev wrote:
> On 04/18, Jesper Dangaard Brouer wrote:
>> To correlate the hardware RX timestamp with something, add tracking of
>> two software timestamps both clock source CLOCK_TAI (see description in
>> man clock_gettime(2)).
>>
>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
>> could not find a BPF helper for getting CLOCK_REALTIME, which would have
>> been preferred. In userspace when AF_XDP sees the packet another
>> software timestamp is recorded via clock_gettime() also clock source
>> CLOCK_TAI.
>>
[...]

>> More explanation of the output and how this can be used to identify
>> clock drift for the HW clock can be seen here[1]:
>>
>> [1]https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
>>
>> Signed-off-by: Jesper Dangaard Brouer<brouer@redhat.com>
> Acked-by: Stanislav Fomichev<sdf@google.com>
> 
>> ---
>>   .../testing/selftests/bpf/progs/xdp_hw_metadata.c  |    4 +-
>>   tools/testing/selftests/bpf/xdp_hw_metadata.c      |   47 ++++++++++++++++++--
>>   tools/testing/selftests/bpf/xdp_metadata.h         |    1
>>   3 files changed, 46 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
>> index e1c787815e44..b2dfd7066c6e 100644
>> --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
>> +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
>> @@ -77,7 +77,9 @@ int rx(struct xdp_md *ctx)
>>   	}
>>   
>>   	err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
>> -	if (err)
> [..]
> 
>> +	if (!err)
>> +		meta->xdp_timestamp = bpf_ktime_get_tai_ns();
> nit: why not set it unconditionally?

Because userspace application doesn't use it for anything, when
meta->rx_timestamp is zero.

--Jesper


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc
  2023-04-18 14:53 ` [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Song, Yoong Siang
@ 2023-04-21 14:52   ` Daniel Borkmann
  2023-04-24  2:14     ` Song, Yoong Siang
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Borkmann @ 2023-04-21 14:52 UTC (permalink / raw)
  To: Song, Yoong Siang, Brouer, Jesper, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: netdev, martin.lau, ast, Lobakin, Aleksander, Zaremba, Larysa,
	xdp-hints, intel-wired-lan, pabeni, Brandeburg, Jesse, kuba,
	edumazet, john.fastabend, hawk, davem

On 4/18/23 4:53 PM, Song, Yoong Siang wrote:
> On Tuesday, April 18, 2023 9:31 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>> Implement both RX hash and RX timestamp XDP hints kfunc metadata for driver
>> igc.
>>
>> First patch fix RX hashing for igc in general.
>>
>> Last patch change test program xdp_hw_metadata to track more timestamps,
>> which helps us correlate the hardware RX timestamp with something.
>>
>> ---
>> To maintainers: I'm uncertain which git tree this should be sent against. This is
>> primary NIC driver code (net-next), but it's BPF/XDP related (bpf-next) via
>> xdp_metadata_ops.
>>
>> Jesper Dangaard Brouer (5):
>>       igc: enable and fix RX hash usage by netstack
>>       igc: add igc_xdp_buff wrapper for xdp_buff in driver
>>       igc: add XDP hints kfuncs for RX hash
>>       igc: add XDP hints kfuncs for RX timestamp
>>       selftests/bpf: xdp_hw_metadata track more timestamps
>>
>>
>> drivers/net/ethernet/intel/igc/igc.h          |  35 ++++++
>> drivers/net/ethernet/intel/igc/igc_main.c     | 116 ++++++++++++++++--
>> .../selftests/bpf/progs/xdp_hw_metadata.c     |   4 +-
>> tools/testing/selftests/bpf/xdp_hw_metadata.c |  47 ++++++-
>> tools/testing/selftests/bpf/xdp_metadata.h    |   1 +
>> 5 files changed, 186 insertions(+), 17 deletions(-)
>>
>> --
> 
> This patchset lgtm.
> Thanks for the changes.

Siang, can I take this into the patches as your:

Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>

?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack
  2023-04-18 13:30 ` [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
@ 2023-04-23 14:46   ` John Fastabend
  2023-04-24 14:20     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 18+ messages in thread
From: John Fastabend @ 2023-04-23 14:46 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Jesper Dangaard Brouer wrote:
> When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a0372
> ("igc: Add transmit and receive fastpath and interrupt handlers"), the
> hardware wasn't configured to provide RSS hash, thus it made sense to not
> enable net_device NETIF_F_RXHASH feature bit.
> 
> The NIC hardware was configured to enable RSS hash info in v5.2 via commit
> 2121c2712f82 ("igc: Add multiple receive queues control supporting"), but
> forgot to set the NETIF_F_RXHASH feature bit.
> 
> The original implementation of igc_rx_hash() didn't extract the associated
> pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of
> this patch are about extracting the RSS Type from the hardware and mapping
> this to enum pkt_hash_types. This was based on Foxville i225 software user
> manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03).
> 
> For UDP it's worth noting that RSS (type) hashing have been disabled both for
> IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP)
> because hardware RSS doesn't handle fragmented pkts well when enabled (can
> cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and
> hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
> the effect that netstack will do a software based hash calc calling into
> flow_dissect, but only when code calls skb_get_hash(), which doesn't
> necessary happen for local delivery.
> 
> For QA verification testing I wrote a small bpftrace prog:
>  [0] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/monitor_skb_hash_on_dev.bt
> 
> Fixes: 2121c2712f82 ("igc: Add multiple receive queues control supporting")
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/intel/igc/igc.h      |   28 ++++++++++++++++++++++++++
>  drivers/net/ethernet/intel/igc/igc_main.c |   31 +++++++++++++++++++++++++----
>  2 files changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index 34aebf00a512..f7f9e217e7b4 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -13,6 +13,7 @@
>  #include <linux/ptp_clock_kernel.h>
>  #include <linux/timecounter.h>
>  #include <linux/net_tstamp.h>
> +#include <linux/bitfield.h>
>  
>  #include "igc_hw.h"
>  
> @@ -311,6 +312,33 @@ extern char igc_driver_name[];
>  #define IGC_MRQC_RSS_FIELD_IPV4_UDP	0x00400000
>  #define IGC_MRQC_RSS_FIELD_IPV6_UDP	0x00800000
>  
> +/* RX-desc Write-Back format RSS Type's */
> +enum igc_rss_type_num {
> +	IGC_RSS_TYPE_NO_HASH		= 0,
> +	IGC_RSS_TYPE_HASH_TCP_IPV4	= 1,
> +	IGC_RSS_TYPE_HASH_IPV4		= 2,
> +	IGC_RSS_TYPE_HASH_TCP_IPV6	= 3,
> +	IGC_RSS_TYPE_HASH_IPV6_EX	= 4,
> +	IGC_RSS_TYPE_HASH_IPV6		= 5,
> +	IGC_RSS_TYPE_HASH_TCP_IPV6_EX	= 6,
> +	IGC_RSS_TYPE_HASH_UDP_IPV4	= 7,
> +	IGC_RSS_TYPE_HASH_UDP_IPV6	= 8,
> +	IGC_RSS_TYPE_HASH_UDP_IPV6_EX	= 9,
> +	IGC_RSS_TYPE_MAX		= 10,
> +};
> +#define IGC_RSS_TYPE_MAX_TABLE		16
> +#define IGC_RSS_TYPE_MASK		GENMASK(3,0) /* 4-bits (3:0) = mask 0x0F */
> +
> +/* igc_rss_type - Rx descriptor RSS type field */
> +static inline u32 igc_rss_type(const union igc_adv_rx_desc *rx_desc)
> +{
> +	/* RSS Type 4-bits (3:0) number: 0-9 (above 9 is reserved)
> +	 * Accessing the same bits via u16 (wb.lower.lo_dword.hs_rss.pkt_info)
> +	 * is slightly slower than via u32 (wb.lower.lo_dword.data)
> +	 */
> +	return le32_get_bits(rx_desc->wb.lower.lo_dword.data, IGC_RSS_TYPE_MASK);
> +}
> +
>  /* Interrupt defines */
>  #define IGC_START_ITR			648 /* ~6000 ints/sec */
>  #define IGC_4K_ITR			980
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 1c4676882082..bfa9768d447f 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -1690,14 +1690,36 @@ static void igc_rx_checksum(struct igc_ring *ring,
>  		   le32_to_cpu(rx_desc->wb.upper.status_error));
>  }
>  
> +/* Mapping HW RSS Type to enum pkt_hash_types */
> +static const enum pkt_hash_types igc_rss_type_table[IGC_RSS_TYPE_MAX_TABLE] = {
> +	[IGC_RSS_TYPE_NO_HASH]		= PKT_HASH_TYPE_L2,
> +	[IGC_RSS_TYPE_HASH_TCP_IPV4]	= PKT_HASH_TYPE_L4,
> +	[IGC_RSS_TYPE_HASH_IPV4]	= PKT_HASH_TYPE_L3,
> +	[IGC_RSS_TYPE_HASH_TCP_IPV6]	= PKT_HASH_TYPE_L4,
> +	[IGC_RSS_TYPE_HASH_IPV6_EX]	= PKT_HASH_TYPE_L3,
> +	[IGC_RSS_TYPE_HASH_IPV6]	= PKT_HASH_TYPE_L3,
> +	[IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = PKT_HASH_TYPE_L4,
> +	[IGC_RSS_TYPE_HASH_UDP_IPV4]	= PKT_HASH_TYPE_L4,
> +	[IGC_RSS_TYPE_HASH_UDP_IPV6]	= PKT_HASH_TYPE_L4,
> +	[IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = PKT_HASH_TYPE_L4,
> +	[10] = PKT_HASH_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW  */
> +	[11] = PKT_HASH_TYPE_NONE, /* keep array sized for SW bit-mask   */
> +	[12] = PKT_HASH_TYPE_NONE, /* to handle future HW revisons       */
> +	[13] = PKT_HASH_TYPE_NONE,
> +	[14] = PKT_HASH_TYPE_NONE,
> +	[15] = PKT_HASH_TYPE_NONE,
> +};
> +
>  static inline void igc_rx_hash(struct igc_ring *ring,
>  			       union igc_adv_rx_desc *rx_desc,
>  			       struct sk_buff *skb)
>  {
> -	if (ring->netdev->features & NETIF_F_RXHASH)
> -		skb_set_hash(skb,
> -			     le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
> -			     PKT_HASH_TYPE_L3);
> +	if (ring->netdev->features & NETIF_F_RXHASH) {
> +		u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
> +		u32 rss_type = igc_rss_type(rx_desc);
> +
> +		skb_set_hash(skb, rss_hash, igc_rss_type_table[rss_type]);

Just curious why not copy the logic from the other driver fms10k, ice, ect.

	skb_set_hash(skb, le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
		     (IXGBE_RSS_L4_TYPES_MASK & (1ul << rss_type)) ?
		     PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);

avoiding the table logic. Do the driver folks care?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [xdp-hints] Re: [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc
  2023-04-21 14:52   ` [xdp-hints] " Daniel Borkmann
@ 2023-04-24  2:14     ` Song, Yoong Siang
  0 siblings, 0 replies; 18+ messages in thread
From: Song, Yoong Siang @ 2023-04-24  2:14 UTC (permalink / raw)
  To: Daniel Borkmann, Brouer, Jesper, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: netdev, martin.lau, ast, Lobakin, Aleksander, Zaremba, Larysa,
	xdp-hints, intel-wired-lan, pabeni, Brandeburg, Jesse, kuba,
	edumazet, john.fastabend, hawk, davem

On Friday, April 21, 2023 10:53 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>On 4/18/23 4:53 PM, Song, Yoong Siang wrote:
>> On Tuesday, April 18, 2023 9:31 PM, Jesper Dangaard Brouer
><brouer@redhat.com> wrote:
>>> Implement both RX hash and RX timestamp XDP hints kfunc metadata for
>>> driver igc.
>>>
>>> First patch fix RX hashing for igc in general.
>>>
>>> Last patch change test program xdp_hw_metadata to track more
>>> timestamps, which helps us correlate the hardware RX timestamp with
>something.
>>>
>>> ---
>>> To maintainers: I'm uncertain which git tree this should be sent
>>> against. This is primary NIC driver code (net-next), but it's BPF/XDP
>>> related (bpf-next) via xdp_metadata_ops.
>>>
>>> Jesper Dangaard Brouer (5):
>>>       igc: enable and fix RX hash usage by netstack
>>>       igc: add igc_xdp_buff wrapper for xdp_buff in driver
>>>       igc: add XDP hints kfuncs for RX hash
>>>       igc: add XDP hints kfuncs for RX timestamp
>>>       selftests/bpf: xdp_hw_metadata track more timestamps
>>>
>>>
>>> drivers/net/ethernet/intel/igc/igc.h          |  35 ++++++
>>> drivers/net/ethernet/intel/igc/igc_main.c     | 116 ++++++++++++++++--
>>> .../selftests/bpf/progs/xdp_hw_metadata.c     |   4 +-
>>> tools/testing/selftests/bpf/xdp_hw_metadata.c |  47 ++++++-
>>> tools/testing/selftests/bpf/xdp_metadata.h    |   1 +
>>> 5 files changed, 186 insertions(+), 17 deletions(-)
>>>
>>> --
>>
>> This patchset lgtm.
>> Thanks for the changes.
>
>Siang, can I take this into the patches as your:
>
>Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
>
>?

Sure.

Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>

Thanks & Regards
Siang

>
>Thanks,
>Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack
  2023-04-23 14:46   ` John Fastabend
@ 2023-04-24 14:20     ` Jesper Dangaard Brouer
  2023-04-24 19:17       ` John Fastabend
  0 siblings, 1 reply; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-24 14:20 UTC (permalink / raw)
  To: John Fastabend, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen, Tony Nguyen
  Cc: brouer, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, yoong.siang.song, intel-wired-lan,
	pabeni, jesse.brandeburg, kuba, edumazet, hawk, davem



On 23/04/2023 16.46, John Fastabend wrote:
> Jesper Dangaard Brouer wrote:
>> When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a0372
>> ("igc: Add transmit and receive fastpath and interrupt handlers"), the
>> hardware wasn't configured to provide RSS hash, thus it made sense to not
>> enable net_device NETIF_F_RXHASH feature bit.
>>
>> The NIC hardware was configured to enable RSS hash info in v5.2 via commit
>> 2121c2712f82 ("igc: Add multiple receive queues control supporting"), but
>> forgot to set the NETIF_F_RXHASH feature bit.
>>
>> The original implementation of igc_rx_hash() didn't extract the associated
>> pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of
>> this patch are about extracting the RSS Type from the hardware and mapping
>> this to enum pkt_hash_types. This was based on Foxville i225 software user
>> manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03).
>>
>> For UDP it's worth noting that RSS (type) hashing have been disabled both for
>> IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP)
>> because hardware RSS doesn't handle fragmented pkts well when enabled (can
>> cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and
>> hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
>> the effect that netstack will do a software based hash calc calling into
>> flow_dissect, but only when code calls skb_get_hash(), which doesn't
>> necessary happen for local delivery.
>>
>> For QA verification testing I wrote a small bpftrace prog:
>>   [0] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/monitor_skb_hash_on_dev.bt
>>
>> Fixes: 2121c2712f82 ("igc: Add multiple receive queues control supporting")
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>>   drivers/net/ethernet/intel/igc/igc.h      |   28 ++++++++++++++++++++++++++
>>   drivers/net/ethernet/intel/igc/igc_main.c |   31 +++++++++++++++++++++++++----
>>   2 files changed, 55 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
>> index 34aebf00a512..f7f9e217e7b4 100644
>> --- a/drivers/net/ethernet/intel/igc/igc.h
>> +++ b/drivers/net/ethernet/intel/igc/igc.h
>> @@ -13,6 +13,7 @@
>>   #include <linux/ptp_clock_kernel.h>
>>   #include <linux/timecounter.h>
>>   #include <linux/net_tstamp.h>
>> +#include <linux/bitfield.h>
>>   
>>   #include "igc_hw.h"
>>   
>> @@ -311,6 +312,33 @@ extern char igc_driver_name[];
>>   #define IGC_MRQC_RSS_FIELD_IPV4_UDP	0x00400000
>>   #define IGC_MRQC_RSS_FIELD_IPV6_UDP	0x00800000
>>   
>> +/* RX-desc Write-Back format RSS Type's */
>> +enum igc_rss_type_num {
>> +	IGC_RSS_TYPE_NO_HASH		= 0,
>> +	IGC_RSS_TYPE_HASH_TCP_IPV4	= 1,
>> +	IGC_RSS_TYPE_HASH_IPV4		= 2,
>> +	IGC_RSS_TYPE_HASH_TCP_IPV6	= 3,
>> +	IGC_RSS_TYPE_HASH_IPV6_EX	= 4,
>> +	IGC_RSS_TYPE_HASH_IPV6		= 5,
>> +	IGC_RSS_TYPE_HASH_TCP_IPV6_EX	= 6,
>> +	IGC_RSS_TYPE_HASH_UDP_IPV4	= 7,
>> +	IGC_RSS_TYPE_HASH_UDP_IPV6	= 8,
>> +	IGC_RSS_TYPE_HASH_UDP_IPV6_EX	= 9,
>> +	IGC_RSS_TYPE_MAX		= 10,
>> +};
>> +#define IGC_RSS_TYPE_MAX_TABLE		16
>> +#define IGC_RSS_TYPE_MASK		GENMASK(3,0) /* 4-bits (3:0) = mask 0x0F */
>> +
>> +/* igc_rss_type - Rx descriptor RSS type field */
>> +static inline u32 igc_rss_type(const union igc_adv_rx_desc *rx_desc)
>> +{
>> +	/* RSS Type 4-bits (3:0) number: 0-9 (above 9 is reserved)
>> +	 * Accessing the same bits via u16 (wb.lower.lo_dword.hs_rss.pkt_info)
>> +	 * is slightly slower than via u32 (wb.lower.lo_dword.data)
>> +	 */
>> +	return le32_get_bits(rx_desc->wb.lower.lo_dword.data, IGC_RSS_TYPE_MASK);
>> +}
>> +
>>   /* Interrupt defines */
>>   #define IGC_START_ITR			648 /* ~6000 ints/sec */
>>   #define IGC_4K_ITR			980
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
>> index 1c4676882082..bfa9768d447f 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -1690,14 +1690,36 @@ static void igc_rx_checksum(struct igc_ring *ring,
>>   		   le32_to_cpu(rx_desc->wb.upper.status_error));
>>   }
>>   
>> +/* Mapping HW RSS Type to enum pkt_hash_types */
>> +static const enum pkt_hash_types igc_rss_type_table[IGC_RSS_TYPE_MAX_TABLE] = {
>> +	[IGC_RSS_TYPE_NO_HASH]		= PKT_HASH_TYPE_L2,
>> +	[IGC_RSS_TYPE_HASH_TCP_IPV4]	= PKT_HASH_TYPE_L4,
>> +	[IGC_RSS_TYPE_HASH_IPV4]	= PKT_HASH_TYPE_L3,
>> +	[IGC_RSS_TYPE_HASH_TCP_IPV6]	= PKT_HASH_TYPE_L4,
>> +	[IGC_RSS_TYPE_HASH_IPV6_EX]	= PKT_HASH_TYPE_L3,
>> +	[IGC_RSS_TYPE_HASH_IPV6]	= PKT_HASH_TYPE_L3,
>> +	[IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = PKT_HASH_TYPE_L4,
>> +	[IGC_RSS_TYPE_HASH_UDP_IPV4]	= PKT_HASH_TYPE_L4,
>> +	[IGC_RSS_TYPE_HASH_UDP_IPV6]	= PKT_HASH_TYPE_L4,
>> +	[IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = PKT_HASH_TYPE_L4,
>> +	[10] = PKT_HASH_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW  */
>> +	[11] = PKT_HASH_TYPE_NONE, /* keep array sized for SW bit-mask   */
>> +	[12] = PKT_HASH_TYPE_NONE, /* to handle future HW revisons       */
>> +	[13] = PKT_HASH_TYPE_NONE,
>> +	[14] = PKT_HASH_TYPE_NONE,
>> +	[15] = PKT_HASH_TYPE_NONE,
>> +};
>> +
>>   static inline void igc_rx_hash(struct igc_ring *ring,
>>   			       union igc_adv_rx_desc *rx_desc,
>>   			       struct sk_buff *skb)
>>   {
>> -	if (ring->netdev->features & NETIF_F_RXHASH)
>> -		skb_set_hash(skb,
>> -			     le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
>> -			     PKT_HASH_TYPE_L3);
>> +	if (ring->netdev->features & NETIF_F_RXHASH) {
>> +		u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
>> +		u32 rss_type = igc_rss_type(rx_desc);
>> +
>> +		skb_set_hash(skb, rss_hash, igc_rss_type_table[rss_type]);
> 
> Just curious why not copy the logic from the other driver fms10k, ice, ect.
> 
> 	skb_set_hash(skb, le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
> 		     (IXGBE_RSS_L4_TYPES_MASK & (1ul << rss_type)) ?
> 		     PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);

Detail: This code mis-categorize (e.g. ARP) PKT_HASH_TYPE_L2 as
PKT_HASH_TYPE_L3, but as core reduces this further to one SKB bit, it
doesn't really matter.

> avoiding the table logic. Do the driver folks care?

The define IXGBE_RSS_L4_TYPES_MASK becomes the "table" logic as a 1-bit
true/false table.  It is a more compact table, let me know if this is
preferred.

Yes, it is really upto driver maintainer people to decide, what code is
preferred ?

--Jesper


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack
  2023-04-24 14:20     ` Jesper Dangaard Brouer
@ 2023-04-24 19:17       ` John Fastabend
  2023-04-25  8:43         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 18+ messages in thread
From: John Fastabend @ 2023-04-24 19:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, John Fastabend, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen, Tony Nguyen
  Cc: brouer, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, yoong.siang.song, intel-wired-lan,
	pabeni, jesse.brandeburg, kuba, edumazet, hawk, davem

Jesper Dangaard Brouer wrote:
> 
> 
> On 23/04/2023 16.46, John Fastabend wrote:
> > Jesper Dangaard Brouer wrote:
> >> When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a0372
> >> ("igc: Add transmit and receive fastpath and interrupt handlers"), the
> >> hardware wasn't configured to provide RSS hash, thus it made sense to not
> >> enable net_device NETIF_F_RXHASH feature bit.
> >>
> >> The NIC hardware was configured to enable RSS hash info in v5.2 via commit
> >> 2121c2712f82 ("igc: Add multiple receive queues control supporting"), but
> >> forgot to set the NETIF_F_RXHASH feature bit.
> >>
> >> The original implementation of igc_rx_hash() didn't extract the associated
> >> pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of
> >> this patch are about extracting the RSS Type from the hardware and mapping
> >> this to enum pkt_hash_types. This was based on Foxville i225 software user
> >> manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03).
> >>
> >> For UDP it's worth noting that RSS (type) hashing have been disabled both for
> >> IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP)
> >> because hardware RSS doesn't handle fragmented pkts well when enabled (can
> >> cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and
> >> hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
> >> the effect that netstack will do a software based hash calc calling into
> >> flow_dissect, but only when code calls skb_get_hash(), which doesn't
> >> necessary happen for local delivery.
> >>
> >> For QA verification testing I wrote a small bpftrace prog:
> >>   [0] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/monitor_skb_hash_on_dev.bt
> >>
> >> Fixes: 2121c2712f82 ("igc: Add multiple receive queues control supporting")
> >> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> >> ---
> >>   drivers/net/ethernet/intel/igc/igc.h      |   28 ++++++++++++++++++++++++++
> >>   drivers/net/ethernet/intel/igc/igc_main.c |   31 +++++++++++++++++++++++++----
> >>   2 files changed, 55 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> >> index 34aebf00a512..f7f9e217e7b4 100644
> >> --- a/drivers/net/ethernet/intel/igc/igc.h
> >> +++ b/drivers/net/ethernet/intel/igc/igc.h
> >> @@ -13,6 +13,7 @@
> >>   #include <linux/ptp_clock_kernel.h>
> >>   #include <linux/timecounter.h>
> >>   #include <linux/net_tstamp.h>
> >> +#include <linux/bitfield.h>
> >>   
> >>   #include "igc_hw.h"
> >>   
> >> @@ -311,6 +312,33 @@ extern char igc_driver_name[];
> >>   #define IGC_MRQC_RSS_FIELD_IPV4_UDP	0x00400000
> >>   #define IGC_MRQC_RSS_FIELD_IPV6_UDP	0x00800000
> >>   
> >> +/* RX-desc Write-Back format RSS Type's */
> >> +enum igc_rss_type_num {
> >> +	IGC_RSS_TYPE_NO_HASH		= 0,
> >> +	IGC_RSS_TYPE_HASH_TCP_IPV4	= 1,
> >> +	IGC_RSS_TYPE_HASH_IPV4		= 2,
> >> +	IGC_RSS_TYPE_HASH_TCP_IPV6	= 3,
> >> +	IGC_RSS_TYPE_HASH_IPV6_EX	= 4,
> >> +	IGC_RSS_TYPE_HASH_IPV6		= 5,
> >> +	IGC_RSS_TYPE_HASH_TCP_IPV6_EX	= 6,
> >> +	IGC_RSS_TYPE_HASH_UDP_IPV4	= 7,
> >> +	IGC_RSS_TYPE_HASH_UDP_IPV6	= 8,
> >> +	IGC_RSS_TYPE_HASH_UDP_IPV6_EX	= 9,
> >> +	IGC_RSS_TYPE_MAX		= 10,
> >> +};
> >> +#define IGC_RSS_TYPE_MAX_TABLE		16
> >> +#define IGC_RSS_TYPE_MASK		GENMASK(3,0) /* 4-bits (3:0) = mask 0x0F */
> >> +
> >> +/* igc_rss_type - Rx descriptor RSS type field */
> >> +static inline u32 igc_rss_type(const union igc_adv_rx_desc *rx_desc)
> >> +{
> >> +	/* RSS Type 4-bits (3:0) number: 0-9 (above 9 is reserved)
> >> +	 * Accessing the same bits via u16 (wb.lower.lo_dword.hs_rss.pkt_info)
> >> +	 * is slightly slower than via u32 (wb.lower.lo_dword.data)
> >> +	 */
> >> +	return le32_get_bits(rx_desc->wb.lower.lo_dword.data, IGC_RSS_TYPE_MASK);
> >> +}
> >> +
> >>   /* Interrupt defines */
> >>   #define IGC_START_ITR			648 /* ~6000 ints/sec */
> >>   #define IGC_4K_ITR			980
> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> >> index 1c4676882082..bfa9768d447f 100644
> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> >> @@ -1690,14 +1690,36 @@ static void igc_rx_checksum(struct igc_ring *ring,
> >>   		   le32_to_cpu(rx_desc->wb.upper.status_error));
> >>   }
> >>   
> >> +/* Mapping HW RSS Type to enum pkt_hash_types */
> >> +static const enum pkt_hash_types igc_rss_type_table[IGC_RSS_TYPE_MAX_TABLE] = {
> >> +	[IGC_RSS_TYPE_NO_HASH]		= PKT_HASH_TYPE_L2,
> >> +	[IGC_RSS_TYPE_HASH_TCP_IPV4]	= PKT_HASH_TYPE_L4,
> >> +	[IGC_RSS_TYPE_HASH_IPV4]	= PKT_HASH_TYPE_L3,
> >> +	[IGC_RSS_TYPE_HASH_TCP_IPV6]	= PKT_HASH_TYPE_L4,
> >> +	[IGC_RSS_TYPE_HASH_IPV6_EX]	= PKT_HASH_TYPE_L3,
> >> +	[IGC_RSS_TYPE_HASH_IPV6]	= PKT_HASH_TYPE_L3,
> >> +	[IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = PKT_HASH_TYPE_L4,
> >> +	[IGC_RSS_TYPE_HASH_UDP_IPV4]	= PKT_HASH_TYPE_L4,
> >> +	[IGC_RSS_TYPE_HASH_UDP_IPV6]	= PKT_HASH_TYPE_L4,
> >> +	[IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = PKT_HASH_TYPE_L4,
> >> +	[10] = PKT_HASH_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW  */
> >> +	[11] = PKT_HASH_TYPE_NONE, /* keep array sized for SW bit-mask   */
> >> +	[12] = PKT_HASH_TYPE_NONE, /* to handle future HW revisons       */
> >> +	[13] = PKT_HASH_TYPE_NONE,
> >> +	[14] = PKT_HASH_TYPE_NONE,
> >> +	[15] = PKT_HASH_TYPE_NONE,
> >> +};
> >> +
> >>   static inline void igc_rx_hash(struct igc_ring *ring,
> >>   			       union igc_adv_rx_desc *rx_desc,
> >>   			       struct sk_buff *skb)
> >>   {
> >> -	if (ring->netdev->features & NETIF_F_RXHASH)
> >> -		skb_set_hash(skb,
> >> -			     le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
> >> -			     PKT_HASH_TYPE_L3);
> >> +	if (ring->netdev->features & NETIF_F_RXHASH) {
> >> +		u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
> >> +		u32 rss_type = igc_rss_type(rx_desc);
> >> +
> >> +		skb_set_hash(skb, rss_hash, igc_rss_type_table[rss_type]);
> > 
> > Just curious why not copy the logic from the other driver fms10k, ice, ect.
> > 
> > 	skb_set_hash(skb, le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
> > 		     (IXGBE_RSS_L4_TYPES_MASK & (1ul << rss_type)) ?
> > 		     PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
> 
> Detail: This code mis-categorize (e.g. ARP) PKT_HASH_TYPE_L2 as
> PKT_HASH_TYPE_L3, but as core reduces this further to one SKB bit, it
> doesn't really matter.
> 
> > avoiding the table logic. Do the driver folks care?
> 
> The define IXGBE_RSS_L4_TYPES_MASK becomes the "table" logic as a 1-bit
> true/false table.  It is a more compact table, let me know if this is
> preferred.
> 
> Yes, it is really upto driver maintainer people to decide, what code is
> preferred ?

Yeah doesn't matter much to me either way. I was just looking at code
compared to ice driver while reviewing.

> 
> --Jesper
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack
  2023-04-24 19:17       ` John Fastabend
@ 2023-04-25  8:43         ` Jesper Dangaard Brouer
  2023-04-27 17:00           ` Daniel Borkmann
  0 siblings, 1 reply; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-25  8:43 UTC (permalink / raw)
  To: davem, bpf, daniel
  Cc: brouer, netdev, martin.lau, ast, alexandr.lobakin,
	larysa.zaremba, xdp-hints, John Fastabend, Tony Nguyen,
	yoong.siang.song, intel-wired-lan, pabeni, jesse.brandeburg,
	Stanislav Fomichev, kuba, edumazet, hawk,
	Toke Høiland-Jørgensen


On 24/04/2023 21.17, John Fastabend wrote:
>>> Just curious why not copy the logic from the other driver fms10k, ice, ect.
>>>
>>> 	skb_set_hash(skb, le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
>>> 		     (IXGBE_RSS_L4_TYPES_MASK & (1ul << rss_type)) ?
>>> 		     PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
>> Detail: This code mis-categorize (e.g. ARP) PKT_HASH_TYPE_L2 as
>> PKT_HASH_TYPE_L3, but as core reduces this further to one SKB bit, it
>> doesn't really matter.
>>
>>> avoiding the table logic. Do the driver folks care?
>> The define IXGBE_RSS_L4_TYPES_MASK becomes the "table" logic as a 1-bit
>> true/false table.  It is a more compact table, let me know if this is
>> preferred.
>>
>> Yes, it is really upto driver maintainer people to decide, what code is
>> preferred ?
 >
> Yeah doesn't matter much to me either way. I was just looking at code
> compared to ice driver while reviewing.

My preference is to apply this patchset. We/I can easily followup and
change this to use the more compact approach later (if someone prefers).

I know net-next is "closed", but this patchset was posted prior to the
close.  Plus, a number of companies are waiting for the XDP-hint for HW
RX timestamp.  The support for driver stmmac is already in net-next
(commit e3f9c3e34840 ("net: stmmac: add Rx HWTS metadata to XDP receive
pkt")). Thus, it would be a help if both igc+stmmac changes land in same
kernel version, as both drivers are being evaluated by these companies.

Pretty please,
--Jesper


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc
  2023-04-18 13:30 [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
                   ` (5 preceding siblings ...)
  2023-04-18 14:53 ` [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Song, Yoong Siang
@ 2023-04-27 17:00 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 18+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-04-27 17:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, sdf, toke, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Hello:

This series was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Tue, 18 Apr 2023 15:30:37 +0200 you wrote:
> Implement both RX hash and RX timestamp XDP hints kfunc metadata
> for driver igc.
> 
> First patch fix RX hashing for igc in general.
> 
> Last patch change test program xdp_hw_metadata to track more
> timestamps, which helps us correlate the hardware RX timestamp
> with something.
> 
> [...]

Here is the summary with links:
  - [bpf-next,V2,1/5] igc: enable and fix RX hash usage by netstack
    https://git.kernel.org/bpf/bpf-next/c/84214ab4689f
  - [bpf-next,V2,2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver
    https://git.kernel.org/bpf/bpf-next/c/73b7123de0cf
  - [bpf-next,V2,3/5] igc: add XDP hints kfuncs for RX hash
    https://git.kernel.org/bpf/bpf-next/c/8416814fffa9
  - [bpf-next,V2,4/5] igc: add XDP hints kfuncs for RX timestamp
    https://git.kernel.org/bpf/bpf-next/c/d677266755c6
  - [bpf-next,V2,5/5] selftests/bpf: xdp_hw_metadata track more timestamps
    https://git.kernel.org/bpf/bpf-next/c/bb323478767d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack
  2023-04-25  8:43         ` Jesper Dangaard Brouer
@ 2023-04-27 17:00           ` Daniel Borkmann
  2023-04-28 10:13             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Borkmann @ 2023-04-27 17:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, davem, bpf
  Cc: brouer, netdev, martin.lau, ast, alexandr.lobakin,
	larysa.zaremba, xdp-hints, John Fastabend, Tony Nguyen,
	yoong.siang.song, intel-wired-lan, pabeni, jesse.brandeburg,
	Stanislav Fomichev, kuba, edumazet, hawk,
	Toke Høiland-Jørgensen

On 4/25/23 10:43 AM, Jesper Dangaard Brouer wrote:
> On 24/04/2023 21.17, John Fastabend wrote:
>>>> Just curious why not copy the logic from the other driver fms10k, ice, ect.
>>>>
>>>>     skb_set_hash(skb, le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
>>>>              (IXGBE_RSS_L4_TYPES_MASK & (1ul << rss_type)) ?
>>>>              PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
>>> Detail: This code mis-categorize (e.g. ARP) PKT_HASH_TYPE_L2 as
>>> PKT_HASH_TYPE_L3, but as core reduces this further to one SKB bit, it
>>> doesn't really matter.
>>>
>>>> avoiding the table logic. Do the driver folks care?
>>> The define IXGBE_RSS_L4_TYPES_MASK becomes the "table" logic as a 1-bit
>>> true/false table.  It is a more compact table, let me know if this is
>>> preferred.
>>>
>>> Yes, it is really upto driver maintainer people to decide, what code is
>>> preferred ?
>  >
>> Yeah doesn't matter much to me either way. I was just looking at code
>> compared to ice driver while reviewing.
> 
> My preference is to apply this patchset. We/I can easily followup and
> change this to use the more compact approach later (if someone prefers).

Consistency might help imo and would avoid questions/confusion on /why/
doing it differently for igc vs some of the others.

> I know net-next is "closed", but this patchset was posted prior to the
> close.  Plus, a number of companies are waiting for the XDP-hint for HW
> RX timestamp.  The support for driver stmmac is already in net-next
> (commit e3f9c3e34840 ("net: stmmac: add Rx HWTS metadata to XDP receive
> pkt")). Thus, it would be a help if both igc+stmmac changes land in same
> kernel version, as both drivers are being evaluated by these companies.

Given merge window is open now and net-next closed, it's too late to land
(unless Dave/Jakub thinks otherwise given it touches also driver bits).
I've applied the series to bpf-next right now.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack
  2023-04-27 17:00           ` Daniel Borkmann
@ 2023-04-28 10:13             ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-28 10:13 UTC (permalink / raw)
  To: Daniel Borkmann, Jesper Dangaard Brouer, davem, bpf
  Cc: brouer, netdev, martin.lau, ast, alexandr.lobakin,
	larysa.zaremba, xdp-hints, John Fastabend, Tony Nguyen,
	yoong.siang.song, intel-wired-lan, pabeni, jesse.brandeburg,
	Stanislav Fomichev, kuba, edumazet, hawk,
	Toke Høiland-Jørgensen


On 27/04/2023 19.00, Daniel Borkmann wrote:
> On 4/25/23 10:43 AM, Jesper Dangaard Brouer wrote:
>> On 24/04/2023 21.17, John Fastabend wrote:
>>>>> Just curious why not copy the logic from the other driver fms10k, 
>>>>> ice, ect.
>>>>>
>>>>>     skb_set_hash(skb, le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
>>>>>              (IXGBE_RSS_L4_TYPES_MASK & (1ul << rss_type)) ?
>>>>>              PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
>>>> Detail: This code mis-categorize (e.g. ARP) PKT_HASH_TYPE_L2 as
>>>> PKT_HASH_TYPE_L3, but as core reduces this further to one SKB bit, it
>>>> doesn't really matter.
>>>>
>>>>> avoiding the table logic. Do the driver folks care?
>>>> The define IXGBE_RSS_L4_TYPES_MASK becomes the "table" logic as a 1-bit
>>>> true/false table.  It is a more compact table, let me know if this is
>>>> preferred.
>>>>
>>>> Yes, it is really upto driver maintainer people to decide, what code is
>>>> preferred ?
>>  >
>>> Yeah doesn't matter much to me either way. I was just looking at code
>>> compared to ice driver while reviewing.
>>
>> My preference is to apply this patchset. We/I can easily followup and
>> change this to use the more compact approach later (if someone prefers).
> 
> Consistency might help imo and would avoid questions/confusion on /why/
> doing it differently for igc vs some of the others.
>

Well, drivers often do things differently, so that not something new. I
found the other approach less readable (and theoretically wrong for the
L2 case).  For igc this approach makes it easier to read (IMHO I'm
biased of cause) and easier to compare with kfunc metadata hint type
(that doesn't have RSS type information loss).

>> I know net-next is "closed", but this patchset was posted prior to the
>> close.  Plus, a number of companies are waiting for the XDP-hint for HW
>> RX timestamp.  The support for driver stmmac is already in net-next
>> (commit e3f9c3e34840 ("net: stmmac: add Rx HWTS metadata to XDP receive
>> pkt")). Thus, it would be a help if both igc+stmmac changes land in same
>> kernel version, as both drivers are being evaluated by these companies.
> 
> Given merge window is open now and net-next closed, it's too late to land
> (unless Dave/Jakub thinks otherwise given it touches also driver bits).
> I've applied the series to bpf-next right now.

It's not a big deal that it didn't reached net-next, end-users will just
have to wait for another kernel release to see this feature, or backport
the feature themselves.

Thanks for applying it.

--Jesper


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-04-28 10:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-18 13:30 [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
2023-04-18 13:30 ` [PATCH bpf-next V2 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
2023-04-23 14:46   ` John Fastabend
2023-04-24 14:20     ` Jesper Dangaard Brouer
2023-04-24 19:17       ` John Fastabend
2023-04-25  8:43         ` Jesper Dangaard Brouer
2023-04-27 17:00           ` Daniel Borkmann
2023-04-28 10:13             ` Jesper Dangaard Brouer
2023-04-18 13:30 ` [PATCH bpf-next V2 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
2023-04-18 13:30 ` [PATCH bpf-next V2 3/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
2023-04-18 13:30 ` [PATCH bpf-next V2 4/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
2023-04-18 13:31 ` [PATCH bpf-next V2 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
2023-04-18 16:36   ` Stanislav Fomichev
2023-04-19 16:41     ` Jesper Dangaard Brouer
2023-04-18 14:53 ` [PATCH bpf-next V2 0/5] XDP-hints: XDP kfunc metadata for driver igc Song, Yoong Siang
2023-04-21 14:52   ` [xdp-hints] " Daniel Borkmann
2023-04-24  2:14     ` Song, Yoong Siang
2023-04-27 17:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).