bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/8] xdp: hints via kfuncs
@ 2022-11-21 18:25 Stanislav Fomichev
  2022-11-21 18:25 ` [PATCH bpf-next v2 1/8] bpf: Document XDP RX metadata Stanislav Fomichev
                   ` (8 more replies)
  0 siblings, 9 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-21 18:25 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Please see the first patch in the series for the overall
design and use-cases.

Changes since v1:

- Drop xdp->skb metadata path (Jakub)

  No consensus yet on exposing xdp_skb_metadata in UAPI. Exploring
  whether everyone would be ok with kfunc to access that part..
  Will follow up separately.

- Drop kfunc unrolling (Alexei)

  Starting with simple code to resolve per-device ndo kfuncs.
  We can always go back to unrolling and keep the same kfuncs
  interface in the future.

- Add rx hash metadata (Toke)

  Not adding the rest (csum/hash_type/etc), I'd like us to agree on
  the framework.

- use dev_get_by_index and add proper refcnt (Toke)

Changes since last RFC:

- drop ice/bnxt example implementation (Alexander)

  -ENOHARDWARE to test

- fix/test mlx4 implementation

  Confirmed that I get reasonable looking timestamp.
  The last patch in the series is the small xsk program that can
  be used to dump incoming metadata.

- bpf_push64/bpf_pop64 (Alexei)

  x86_64+arm64(untested)+disassembler

- struct xdp_to_skb_metadata -> struct xdp_skb_metadata (Toke)

  s/xdp_to_skb/xdp_skb/

- Documentation/bpf/xdp-rx-metadata.rst

  Documents functionality, assumptions and limitations.

- bpf_xdp_metadata_export_to_skb returns true/false (Martin)

  Plus xdp_md->skb_metadata field to access it.

- BPF_F_XDP_HAS_METADATA flag (Toke/Martin)

  Drop magic, use the flag instead.

- drop __randomize_layout

  Not sure it's possible to sanely expose it via UAPI. Because every
  .o potentially gets its own randomized layout, test_progs
  refuses to link.

- remove __net_timestamp in veth driver (John/Jesper)

  Instead, calling ktime_get from the kfunc; enough for the selftests.

Future work on RX side:

- Support more devices besides veth and mlx4
- Support more metadata besides RX timestamp.
- Convert skb_metadata_set() callers to xdp_convert_skb_metadata()
  which handles extra xdp_skb_metadata

Prior art (to record pros/cons for different approaches):

- Stable UAPI approach:
  https://lore.kernel.org/bpf/20220628194812.1453059-1-alexandr.lobakin@intel.com/
- Metadata+BTF_ID appoach:
  https://lore.kernel.org/bpf/166256538687.1434226.15760041133601409770.stgit@firesoul/
- v1:
  https://lore.kernel.org/bpf/20221115030210.3159213-1-sdf@google.com/T/#t
- kfuncs v2 RFC:
  https://lore.kernel.org/bpf/20221027200019.4106375-1-sdf@google.com/
- kfuncs v1 RFC:
  https://lore.kernel.org/bpf/20221104032532.1615099-1-sdf@google.com/

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org

Stanislav Fomichev (8):
  bpf: Document XDP RX metadata
  bpf: XDP metadata RX kfuncs
  veth: Introduce veth_xdp_buff wrapper for xdp_buff
  veth: Support RX XDP metadata
  selftests/bpf: Verify xdp_metadata xdp->af_xdp path
  mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  mxl4: Support RX XDP metadata
  selftests/bpf: Simple program to dump XDP RX metadata

 Documentation/bpf/xdp-rx-metadata.rst         |  90 ++++
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  10 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  78 +++-
 drivers/net/veth.c                            |  88 ++--
 include/linux/bpf.h                           |   1 +
 include/linux/mlx4/device.h                   |   7 +
 include/linux/netdevice.h                     |   5 +
 include/net/xdp.h                             |  20 +
 include/uapi/linux/bpf.h                      |   5 +
 kernel/bpf/core.c                             |   1 +
 kernel/bpf/syscall.c                          |  17 +-
 kernel/bpf/verifier.c                         |  33 ++
 net/core/dev.c                                |   5 +
 net/core/xdp.c                                |  52 +++
 tools/include/uapi/linux/bpf.h                |   5 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   8 +-
 .../selftests/bpf/prog_tests/xdp_metadata.c   | 365 ++++++++++++++++
 .../selftests/bpf/progs/xdp_hw_metadata.c     |  93 ++++
 .../selftests/bpf/progs/xdp_metadata.c        |  57 +++
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 405 ++++++++++++++++++
 tools/testing/selftests/bpf/xdp_metadata.h    |   7 +
 22 files changed, 1311 insertions(+), 42 deletions(-)
 create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/xdp_metadata.h

-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH bpf-next v2 1/8] bpf: Document XDP RX metadata
  2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
@ 2022-11-21 18:25 ` Stanislav Fomichev
  2022-11-21 18:25 ` [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs Stanislav Fomichev
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-21 18:25 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Document all current use-cases and assumptions.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)
 create mode 100644 Documentation/bpf/xdp-rx-metadata.rst

diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
new file mode 100644
index 000000000000..498eae718275
--- /dev/null
+++ b/Documentation/bpf/xdp-rx-metadata.rst
@@ -0,0 +1,90 @@
+===============
+XDP RX Metadata
+===============
+
+XDP programs support creating and passing custom metadata via
+``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
+entities:
+
+1. ``AF_XDP`` consumer.
+2. Kernel core stack via ``XDP_PASS``.
+3. Another device via ``bpf_redirect_map``.
+4. Other BPF programs via ``bpf_tail_call``.
+
+General Design
+==============
+
+XDP has access to a set of kfuncs to manipulate the metadata. Every
+device driver implements these kfuncs. The set of kfuncs is
+declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
+
+Currently, the following kfuncs are supported. In the future, as more
+metadata is supported, this set will grow:
+
+- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
+  indicate whether the device supports RX timestamps
+- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp
+- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to
+  indicate whether the device supports RX hash
+- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash
+
+Within the XDP frame, the metadata layout is as follows::
+
+  +----------+-----------------+------+
+  | headroom | custom metadata | data |
+  +----------+-----------------+------+
+             ^                 ^
+             |                 |
+   xdp_buff->data_meta   xdp_buff->data
+
+AF_XDP
+======
+
+``AF_XDP`` use-case implies that there is a contract between the BPF program
+that redirects XDP frames into the ``XSK`` and the final consumer.
+Thus the BPF program manually allocates a fixed number of
+bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
+of kfuncs to populate it. User-space ``XSK`` consumer, looks
+at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
+
+Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
+
+  +----------+-----------------+------+
+  | headroom | custom metadata | data |
+  +----------+-----------------+------+
+                               ^
+                               |
+                        rx_desc->address
+
+XDP_PASS
+========
+
+This is the path where the packets processed by the XDP program are passed
+into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
+Currently, every driver has a custom kernel code to parse the descriptors and
+populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
+In the future, we'd like to support a case where XDP program can override
+some of that metadata.
+
+The plan of record is to make this path similar to ``bpf_redirect_map``
+so the program can control which metadata is passed to the skb layer.
+
+bpf_redirect_map
+================
+
+``bpf_redirect_map`` can redirect the frame to a different device.
+In this case we don't know ahead of time whether that final consumer
+will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
+Additionally, the final consumer doesn't have access to the original
+hardware descriptor and can't access any of the original metadata.
+
+For this use-case, only custom metadata is currently supported. If
+the frame is eventually passed to the kernel, the skb created from such
+a frame won't have any skb metadata. The ``XSK`` consumer will only
+have access to the custom metadata.
+
+bpf_tail_call
+=============
+
+No special handling here. Tail-called program operates on the same context
+as the original one.
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
  2022-11-21 18:25 ` [PATCH bpf-next v2 1/8] bpf: Document XDP RX metadata Stanislav Fomichev
@ 2022-11-21 18:25 ` Stanislav Fomichev
  2022-11-23  6:34   ` Martin KaFai Lau
                     ` (3 more replies)
  2022-11-21 18:25 ` [PATCH bpf-next v2 3/8] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
                   ` (6 subsequent siblings)
  8 siblings, 4 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-21 18:25 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

There is an ndo handler per kfunc, the verifier replaces a call to the
generic kfunc with a call to the per-device one.

For XDP, we define a new kfunc set (xdp_metadata_kfunc_ids) which
implements all possible metatada kfuncs. Not all devices have to
implement them. If kfunc is not supported by the target device,
the default implementation is called instead.

Upon loading, if BPF_F_XDP_HAS_METADATA is passed via prog_flags,
we treat prog_index as target device for kfunc resolution.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h            |  1 +
 include/linux/netdevice.h      |  5 ++++
 include/net/xdp.h              | 20 +++++++++++++
 include/uapi/linux/bpf.h       |  5 ++++
 kernel/bpf/core.c              |  1 +
 kernel/bpf/syscall.c           | 17 ++++++++++-
 kernel/bpf/verifier.c          | 33 +++++++++++++++++++++
 net/core/dev.c                 |  5 ++++
 net/core/xdp.c                 | 52 ++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  5 ++++
 10 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c9eafa67f2a2..01d62355d068 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1302,6 +1302,7 @@ struct bpf_prog_aux {
 		struct work_struct work;
 		struct rcu_head	rcu;
 	};
+	struct net_device *xdp_netdev; /* xdp metadata kfuncs */
 };
 
 struct bpf_prog {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ddc59ef98500..2878e4869dc8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -74,6 +74,7 @@ struct udp_tunnel_nic_info;
 struct udp_tunnel_nic;
 struct bpf_prog;
 struct xdp_buff;
+struct xdp_md;
 
 void synchronize_net(void);
 void netdev_set_default_ethtool_ops(struct net_device *dev,
@@ -1604,6 +1605,10 @@ struct net_device_ops {
 	ktime_t			(*ndo_get_tstamp)(struct net_device *dev,
 						  const struct skb_shared_hwtstamps *hwtstamps,
 						  bool cycles);
+	bool			(*ndo_xdp_rx_timestamp_supported)(const struct xdp_md *ctx);
+	u64			(*ndo_xdp_rx_timestamp)(const struct xdp_md *ctx);
+	bool			(*ndo_xdp_rx_hash_supported)(const struct xdp_md *ctx);
+	u32			(*ndo_xdp_rx_hash)(const struct xdp_md *ctx);
 };
 
 /**
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 55dbc68bfffc..348aefd467ed 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -7,6 +7,7 @@
 #define __LINUX_NET_XDP_H__
 
 #include <linux/skbuff.h> /* skb_shared_info */
+#include <linux/btf_ids.h> /* btf_id_set8 */
 
 /**
  * DOC: XDP RX-queue information
@@ -409,4 +410,23 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
 
 #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE
 
+#define XDP_METADATA_KFUNC_xxx	\
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED, \
+			   bpf_xdp_metadata_rx_timestamp_supported) \
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP, \
+			   bpf_xdp_metadata_rx_timestamp) \
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED, \
+			   bpf_xdp_metadata_rx_hash_supported) \
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_HASH, \
+			   bpf_xdp_metadata_rx_hash) \
+
+enum {
+#define XDP_METADATA_KFUNC(name, str) name,
+XDP_METADATA_KFUNC_xxx
+#undef XDP_METADATA_KFUNC
+MAX_XDP_METADATA_KFUNC,
+};
+
+u32 xdp_metadata_kfunc_id(int id);
+
 #endif /* __LINUX_NET_XDP_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ab86145df760..55eda6f0d39b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1156,6 +1156,11 @@ enum bpf_link_type {
  */
 #define BPF_F_XDP_HAS_FRAGS	(1U << 5)
 
+/* If BPF_F_XDP_HAS_METADATA is used in BPF_PROG_LOAD command, the loaded
+ * program becomes device-bound but can access it's XDP metadata.
+ */
+#define BPF_F_XDP_HAS_METADATA	(1U << 6)
+
 /* link_create.kprobe_multi.flags used in LINK_CREATE command for
  * BPF_TRACE_KPROBE_MULTI attach type to create return probe.
  */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 2e57fc839a5c..32cb07a8939c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2576,6 +2576,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 	} else {
 		bpf_jit_free(aux->prog);
 	}
+	dev_put(aux->xdp_netdev);
 }
 
 void bpf_prog_free(struct bpf_prog *fp)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35972afb6850..ece7f9234b2d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2491,7 +2491,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
 				 BPF_F_TEST_STATE_FREQ |
 				 BPF_F_SLEEPABLE |
 				 BPF_F_TEST_RND_HI32 |
-				 BPF_F_XDP_HAS_FRAGS))
+				 BPF_F_XDP_HAS_FRAGS |
+				 BPF_F_XDP_HAS_METADATA))
 		return -EINVAL;
 
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
@@ -2579,6 +2580,20 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
 	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
 	prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
 
+	if (attr->prog_flags & BPF_F_XDP_HAS_METADATA) {
+		/* Reuse prog_ifindex to bind to the device
+		 * for XDP metadata kfuncs.
+		 */
+		prog->aux->offload_requested = false;
+
+		prog->aux->xdp_netdev = dev_get_by_index(current->nsproxy->net_ns,
+							 attr->prog_ifindex);
+		if (!prog->aux->xdp_netdev) {
+			err = -EINVAL;
+			goto free_prog;
+		}
+	}
+
 	err = security_bpf_prog_alloc(prog->aux);
 	if (err)
 		goto free_prog;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9528a066cfa5..315876fa9d30 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15171,6 +15171,25 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 	return err;
 }
 
+static int fixup_xdp_kfunc_call(struct bpf_verifier_env *env, u32 func_id)
+{
+	struct bpf_prog_aux *aux = env->prog->aux;
+	void *resolved = NULL;
+
+	if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED))
+		resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp_supported;
+	else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
+		resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp;
+	else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED))
+		resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash_supported;
+	else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
+		resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash;
+
+	if (resolved)
+		return BPF_CALL_IMM(resolved);
+	return 0;
+}
+
 static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    struct bpf_insn *insn_buf, int insn_idx, int *cnt)
 {
@@ -15181,6 +15200,15 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		return -EINVAL;
 	}
 
+	if (resolve_prog_type(env->prog) == BPF_PROG_TYPE_XDP) {
+		int imm = fixup_xdp_kfunc_call(env, insn->imm);
+
+		if (imm) {
+			insn->imm = imm;
+			return 0;
+		}
+	}
+
 	/* insn->imm has the btf func_id. Replace it with
 	 * an address (relative to __bpf_base_call).
 	 */
@@ -15359,6 +15387,11 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		if (insn->src_reg == BPF_PSEUDO_CALL)
 			continue;
 		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
+			if (bpf_prog_is_dev_bound(env->prog->aux)) {
+				verbose(env, "no metadata kfuncs offload\n");
+				return -EINVAL;
+			}
+
 			ret = fixup_kfunc_call(env, insn, insn_buf, i + delta, &cnt);
 			if (ret)
 				return ret;
diff --git a/net/core/dev.c b/net/core/dev.c
index 117e830cabb0..b4021b7575a2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9248,6 +9248,11 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
 			NL_SET_ERR_MSG(extack, "BPF_XDP_CPUMAP programs can not be attached to a device");
 			return -EINVAL;
 		}
+		if (new_prog->aux->xdp_netdev &&
+		    new_prog->aux->xdp_netdev->netdev_ops != dev->netdev_ops) {
+			NL_SET_ERR_MSG(extack, "Cannot attach to a different target device");
+			return -EINVAL;
+		}
 	}
 
 	/* don't call drivers if the effective program didn't change */
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 844c9d99dc0e..e43f7d4ef4cf 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -4,6 +4,7 @@
  * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.
  */
 #include <linux/bpf.h>
+#include <linux/btf_ids.h>
 #include <linux/filter.h>
 #include <linux/types.h>
 #include <linux/mm.h>
@@ -709,3 +710,54 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
 
 	return nxdpf;
 }
+
+noinline bool bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx)
+{
+	return false;
+}
+
+noinline u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx)
+{
+	return 0;
+}
+
+noinline bool bpf_xdp_metadata_rx_hash_supported(const struct xdp_md *ctx)
+{
+	return false;
+}
+
+noinline u32 bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx)
+{
+	return 0;
+}
+
+#ifdef CONFIG_DEBUG_INFO_BTF
+BTF_SET8_START(xdp_metadata_kfunc_ids)
+#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, 0)
+XDP_METADATA_KFUNC_xxx
+#undef XDP_METADATA_KFUNC
+BTF_SET8_END(xdp_metadata_kfunc_ids)
+
+static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &xdp_metadata_kfunc_ids,
+};
+
+u32 xdp_metadata_kfunc_id(int id)
+{
+	return xdp_metadata_kfunc_ids.pairs[id].id;
+}
+EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);
+
+static int __init xdp_metadata_init(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
+}
+late_initcall(xdp_metadata_init);
+#else /* CONFIG_DEBUG_INFO_BTF */
+u32 xdp_metadata_kfunc_id(int id)
+{
+	return -1;
+}
+EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);
+#endif /* CONFIG_DEBUG_INFO_BTF */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6580448e9f77..6b01ac70f564 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1156,6 +1156,11 @@ enum bpf_link_type {
  */
 #define BPF_F_XDP_HAS_FRAGS	(1U << 5)
 
+/* If BPF_F_XDP_HAS_METADATA is used in BPF_PROG_LOAD command, the loaded
+ * program becomes device-bound but can access it's XDP metadata.
+ */
+#define BPF_F_XDP_HAS_METADATA	(1U << 6)
+
 /* link_create.kprobe_multi.flags used in LINK_CREATE command for
  * BPF_TRACE_KPROBE_MULTI attach type to create return probe.
  */
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH bpf-next v2 3/8] veth: Introduce veth_xdp_buff wrapper for xdp_buff
  2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
  2022-11-21 18:25 ` [PATCH bpf-next v2 1/8] bpf: Document XDP RX metadata Stanislav Fomichev
  2022-11-21 18:25 ` [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs Stanislav Fomichev
@ 2022-11-21 18:25 ` Stanislav Fomichev
  2022-11-21 18:25 ` [PATCH bpf-next v2 4/8] veth: Support RX XDP metadata Stanislav Fomichev
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-21 18:25 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

No functional changes. Boilerplate to allow stuffing more data after xdp_buff.

Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/veth.c | 56 +++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 25 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 2a4592780141..bbabc592d431 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -116,6 +116,10 @@ static struct {
 	{ "peer_ifindex" },
 };
 
+struct veth_xdp_buff {
+	struct xdp_buff xdp;
+};
+
 static int veth_get_link_ksettings(struct net_device *dev,
 				   struct ethtool_link_ksettings *cmd)
 {
@@ -592,23 +596,24 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (likely(xdp_prog)) {
-		struct xdp_buff xdp;
+		struct veth_xdp_buff vxbuf;
+		struct xdp_buff *xdp = &vxbuf.xdp;
 		u32 act;
 
-		xdp_convert_frame_to_buff(frame, &xdp);
-		xdp.rxq = &rq->xdp_rxq;
+		xdp_convert_frame_to_buff(frame, xdp);
+		xdp->rxq = &rq->xdp_rxq;
 
-		act = bpf_prog_run_xdp(xdp_prog, &xdp);
+		act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 		switch (act) {
 		case XDP_PASS:
-			if (xdp_update_frame_from_buff(&xdp, frame))
+			if (xdp_update_frame_from_buff(xdp, frame))
 				goto err_xdp;
 			break;
 		case XDP_TX:
 			orig_frame = *frame;
-			xdp.rxq->mem = frame->mem;
-			if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) {
+			xdp->rxq->mem = frame->mem;
+			if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
 				trace_xdp_exception(rq->dev, xdp_prog, act);
 				frame = &orig_frame;
 				stats->rx_drops++;
@@ -619,8 +624,8 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
 			goto xdp_xmit;
 		case XDP_REDIRECT:
 			orig_frame = *frame;
-			xdp.rxq->mem = frame->mem;
-			if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
+			xdp->rxq->mem = frame->mem;
+			if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
 				frame = &orig_frame;
 				stats->rx_drops++;
 				goto err_xdp;
@@ -801,7 +806,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 {
 	void *orig_data, *orig_data_end;
 	struct bpf_prog *xdp_prog;
-	struct xdp_buff xdp;
+	struct veth_xdp_buff vxbuf;
+	struct xdp_buff *xdp = &vxbuf.xdp;
 	u32 act, metalen;
 	int off;
 
@@ -815,22 +821,22 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	}
 
 	__skb_push(skb, skb->data - skb_mac_header(skb));
-	if (veth_convert_skb_to_xdp_buff(rq, &xdp, &skb))
+	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
 		goto drop;
 
-	orig_data = xdp.data;
-	orig_data_end = xdp.data_end;
+	orig_data = xdp->data;
+	orig_data_end = xdp->data_end;
 
-	act = bpf_prog_run_xdp(xdp_prog, &xdp);
+	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 	switch (act) {
 	case XDP_PASS:
 		break;
 	case XDP_TX:
-		veth_xdp_get(&xdp);
+		veth_xdp_get(xdp);
 		consume_skb(skb);
-		xdp.rxq->mem = rq->xdp_mem;
-		if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) {
+		xdp->rxq->mem = rq->xdp_mem;
+		if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
 			trace_xdp_exception(rq->dev, xdp_prog, act);
 			stats->rx_drops++;
 			goto err_xdp;
@@ -839,10 +845,10 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 		rcu_read_unlock();
 		goto xdp_xmit;
 	case XDP_REDIRECT:
-		veth_xdp_get(&xdp);
+		veth_xdp_get(xdp);
 		consume_skb(skb);
-		xdp.rxq->mem = rq->xdp_mem;
-		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
+		xdp->rxq->mem = rq->xdp_mem;
+		if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
 			stats->rx_drops++;
 			goto err_xdp;
 		}
@@ -862,7 +868,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	/* check if bpf_xdp_adjust_head was used */
-	off = orig_data - xdp.data;
+	off = orig_data - xdp->data;
 	if (off > 0)
 		__skb_push(skb, off);
 	else if (off < 0)
@@ -871,21 +877,21 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	skb_reset_mac_header(skb);
 
 	/* check if bpf_xdp_adjust_tail was used */
-	off = xdp.data_end - orig_data_end;
+	off = xdp->data_end - orig_data_end;
 	if (off != 0)
 		__skb_put(skb, off); /* positive on grow, negative on shrink */
 
 	/* XDP frag metadata (e.g. nr_frags) are updated in eBPF helpers
 	 * (e.g. bpf_xdp_adjust_tail), we need to update data_len here.
 	 */
-	if (xdp_buff_has_frags(&xdp))
+	if (xdp_buff_has_frags(xdp))
 		skb->data_len = skb_shinfo(skb)->xdp_frags_size;
 	else
 		skb->data_len = 0;
 
 	skb->protocol = eth_type_trans(skb, rq->dev);
 
-	metalen = xdp.data - xdp.data_meta;
+	metalen = xdp->data - xdp->data_meta;
 	if (metalen)
 		skb_metadata_set(skb, metalen);
 out:
@@ -898,7 +904,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	return NULL;
 err_xdp:
 	rcu_read_unlock();
-	xdp_return_buff(&xdp);
+	xdp_return_buff(xdp);
 xdp_xmit:
 	return NULL;
 }
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH bpf-next v2 4/8] veth: Support RX XDP metadata
  2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
                   ` (2 preceding siblings ...)
  2022-11-21 18:25 ` [PATCH bpf-next v2 3/8] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
@ 2022-11-21 18:25 ` Stanislav Fomichev
  2022-11-21 18:25 ` [PATCH bpf-next v2 5/8] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-21 18:25 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

The goal is to enable end-to-end testing of the metadata for AF_XDP.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/veth.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index bbabc592d431..fdbca2aee33a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -118,6 +118,7 @@ static struct {
 
 struct veth_xdp_buff {
 	struct xdp_buff xdp;
+	struct sk_buff *skb;
 };
 
 static int veth_get_link_ksettings(struct net_device *dev,
@@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
 
 		xdp_convert_frame_to_buff(frame, xdp);
 		xdp->rxq = &rq->xdp_rxq;
+		vxbuf.skb = NULL;
 
 		act = bpf_prog_run_xdp(xdp_prog, xdp);
 
@@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	__skb_push(skb, skb->data - skb_mac_header(skb));
 	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
 		goto drop;
+	vxbuf.skb = skb;
 
 	orig_data = xdp->data;
 	orig_data_end = xdp->data_end;
@@ -1665,6 +1668,30 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
+static bool veth_xdp_rx_timestamp_supported(const struct xdp_md *ctx)
+{
+	return true;
+}
+
+static u64 veth_xdp_rx_timestamp(const struct xdp_md *ctx)
+{
+	return ktime_get_mono_fast_ns();
+}
+
+static bool veth_xdp_rx_hash_supported(const struct xdp_md *ctx)
+{
+	return true;
+}
+
+static u32 veth_xdp_rx_hash(const struct xdp_md *ctx)
+{
+	struct veth_xdp_buff *_ctx = (void *)ctx;
+
+	if (_ctx->skb)
+		return skb_get_hash(_ctx->skb);
+	return 0;
+}
+
 static const struct net_device_ops veth_netdev_ops = {
 	.ndo_init            = veth_dev_init,
 	.ndo_open            = veth_open,
@@ -1684,6 +1711,11 @@ static const struct net_device_ops veth_netdev_ops = {
 	.ndo_bpf		= veth_xdp,
 	.ndo_xdp_xmit		= veth_ndo_xdp_xmit,
 	.ndo_get_peer_dev	= veth_peer_dev,
+
+	.ndo_xdp_rx_timestamp_supported = veth_xdp_rx_timestamp_supported,
+	.ndo_xdp_rx_timestamp	= veth_xdp_rx_timestamp,
+	.ndo_xdp_rx_hash_supported = veth_xdp_rx_hash_supported,
+	.ndo_xdp_rx_hash	= veth_xdp_rx_hash,
 };
 
 #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH bpf-next v2 5/8] selftests/bpf: Verify xdp_metadata xdp->af_xdp path
  2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
                   ` (3 preceding siblings ...)
  2022-11-21 18:25 ` [PATCH bpf-next v2 4/8] veth: Support RX XDP metadata Stanislav Fomichev
@ 2022-11-21 18:25 ` Stanislav Fomichev
  2022-11-29 10:06   ` Anton Protopopov
  2022-11-21 18:25 ` [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-21 18:25 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

- create new netns
- create veth pair (veTX+veRX)
- setup AF_XDP socket for both interfaces
- attach bpf to veRX
- send packet via veTX
- verify the packet has expected metadata at veRX

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/Makefile          |   2 +-
 .../selftests/bpf/prog_tests/xdp_metadata.c   | 365 ++++++++++++++++++
 .../selftests/bpf/progs/xdp_metadata.c        |  57 +++
 tools/testing/selftests/bpf/xdp_metadata.h    |   7 +
 4 files changed, 430 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/xdp_metadata.h

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 6a0f043dc410..4eed22fa3681 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -527,7 +527,7 @@ TRUNNER_BPF_PROGS_DIR := progs
 TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c	\
 			 network_helpers.c testing_helpers.c		\
 			 btf_helpers.c flow_dissector_load.h		\
-			 cap_helpers.c
+			 cap_helpers.c xsk.c
 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
 		       $(OUTPUT)/liburandom_read.so			\
 		       $(OUTPUT)/xdp_synproxy				\
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
new file mode 100644
index 000000000000..01035ff7d783
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
@@ -0,0 +1,365 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include "xdp_metadata.skel.h"
+#include "xdp_metadata.h"
+#include "xsk.h"
+
+#include <bpf/btf.h>
+#include <linux/errqueue.h>
+#include <linux/if_link.h>
+#include <linux/net_tstamp.h>
+#include <linux/udp.h>
+#include <sys/mman.h>
+#include <net/if.h>
+#include <poll.h>
+
+#define TX_NAME "veTX"
+#define RX_NAME "veRX"
+
+#define UDP_PAYLOAD_BYTES 4
+
+#define AF_XDP_SOURCE_PORT 1234
+#define AF_XDP_CONSUMER_PORT 8080
+
+#define UMEM_NUM 16
+#define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
+#define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM)
+#define XDP_FLAGS XDP_FLAGS_DRV_MODE
+#define QUEUE_ID 0
+
+#define TX_ADDR "10.0.0.1"
+#define RX_ADDR "10.0.0.2"
+#define PREFIX_LEN "8"
+#define FAMILY AF_INET
+
+#define SYS(cmd) ({ \
+	if (!ASSERT_OK(system(cmd), (cmd))) \
+		goto out; \
+})
+
+struct xsk {
+	void *umem_area;
+	struct xsk_umem *umem;
+	struct xsk_ring_prod fill;
+	struct xsk_ring_cons comp;
+	struct xsk_ring_prod tx;
+	struct xsk_ring_cons rx;
+	struct xsk_socket *socket;
+};
+
+static int open_xsk(const char *ifname, struct xsk *xsk)
+{
+	int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE;
+	const struct xsk_socket_config socket_config = {
+		.rx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD,
+		.xdp_flags = XDP_FLAGS,
+		.bind_flags = XDP_COPY,
+	};
+	const struct xsk_umem_config umem_config = {
+		.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
+		.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE,
+		.flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG,
+	};
+	__u32 idx;
+	u64 addr;
+	int ret;
+	int i;
+
+	xsk->umem_area = mmap(NULL, UMEM_SIZE, PROT_READ | PROT_WRITE, mmap_flags, -1, 0);
+	if (!ASSERT_NEQ(xsk->umem_area, MAP_FAILED, "mmap"))
+		return -1;
+
+	ret = xsk_umem__create(&xsk->umem,
+			       xsk->umem_area, UMEM_SIZE,
+			       &xsk->fill,
+			       &xsk->comp,
+			       &umem_config);
+	if (!ASSERT_OK(ret, "xsk_umem__create"))
+		return ret;
+
+	ret = xsk_socket__create(&xsk->socket, ifname, QUEUE_ID,
+				 xsk->umem,
+				 &xsk->rx,
+				 &xsk->tx,
+				 &socket_config);
+	if (!ASSERT_OK(ret, "xsk_socket__create"))
+		return ret;
+
+	/* First half of umem is for TX. This way address matches 1-to-1
+	 * to the completion queue index.
+	 */
+
+	for (i = 0; i < UMEM_NUM / 2; i++) {
+		addr = i * UMEM_FRAME_SIZE;
+		printf("%p: tx_desc[%d] -> %lx\n", xsk, i, addr);
+	}
+
+	/* Second half of umem is for RX. */
+
+	ret = xsk_ring_prod__reserve(&xsk->fill, UMEM_NUM / 2, &idx);
+	if (!ASSERT_EQ(UMEM_NUM / 2, ret, "xsk_ring_prod__reserve"))
+		return ret;
+	if (!ASSERT_EQ(idx, 0, "fill idx != 0"))
+		return -1;
+
+	for (i = 0; i < UMEM_NUM / 2; i++) {
+		addr = (UMEM_NUM / 2 + i) * UMEM_FRAME_SIZE;
+		printf("%p: rx_desc[%d] -> %lx\n", xsk, i, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, i) = addr;
+	}
+	xsk_ring_prod__submit(&xsk->fill, ret);
+
+	return 0;
+}
+
+static void close_xsk(struct xsk *xsk)
+{
+	if (xsk->umem)
+		xsk_umem__delete(xsk->umem);
+	if (xsk->socket)
+		xsk_socket__delete(xsk->socket);
+	munmap(xsk->umem, UMEM_SIZE);
+}
+
+static void ip_csum(struct iphdr *iph)
+{
+	__u32 sum = 0;
+	__u16 *p;
+	int i;
+
+	iph->check = 0;
+	p = (void *)iph;
+	for (i = 0; i < sizeof(*iph) / sizeof(*p); i++)
+		sum += p[i];
+
+	while (sum >> 16)
+		sum = (sum & 0xffff) + (sum >> 16);
+
+	iph->check = ~sum;
+}
+
+static int generate_packet(struct xsk *xsk, __u16 dst_port)
+{
+	struct xdp_desc *tx_desc;
+	struct udphdr *udph;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+	void *data;
+	__u32 idx;
+	int ret;
+
+	ret = xsk_ring_prod__reserve(&xsk->tx, 1, &idx);
+	if (!ASSERT_EQ(ret, 1, "xsk_ring_prod__reserve"))
+		return -1;
+
+	tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx);
+	tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE;
+	printf("%p: tx_desc[%u]->addr=%llx\n", xsk, idx, tx_desc->addr);
+	data = xsk_umem__get_data(xsk->umem_area, tx_desc->addr);
+
+	eth = data;
+	iph = (void *)(eth + 1);
+	udph = (void *)(iph + 1);
+
+	memcpy(eth->h_dest, "\x00\x00\x00\x00\x00\x02", ETH_ALEN);
+	memcpy(eth->h_source, "\x00\x00\x00\x00\x00\x01", ETH_ALEN);
+	eth->h_proto = htons(ETH_P_IP);
+
+	iph->version = 0x4;
+	iph->ihl = 0x5;
+	iph->tos = 0x9;
+	iph->tot_len = htons(sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES);
+	iph->id = 0;
+	iph->frag_off = 0;
+	iph->ttl = 0;
+	iph->protocol = IPPROTO_UDP;
+	ASSERT_EQ(inet_pton(FAMILY, TX_ADDR, &iph->saddr), 1, "inet_pton(TX_ADDR)");
+	ASSERT_EQ(inet_pton(FAMILY, RX_ADDR, &iph->daddr), 1, "inet_pton(RX_ADDR)");
+	ip_csum(iph);
+
+	udph->source = htons(AF_XDP_SOURCE_PORT);
+	udph->dest = htons(dst_port);
+	udph->len = htons(sizeof(*udph) + UDP_PAYLOAD_BYTES);
+	udph->check = 0;
+
+	memset(udph + 1, 0xAA, UDP_PAYLOAD_BYTES);
+
+	tx_desc->len = sizeof(*eth) + sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES;
+	xsk_ring_prod__submit(&xsk->tx, 1);
+
+	ret = sendto(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, 0);
+	if (!ASSERT_GE(ret, 0, "sendto"))
+		return ret;
+
+	return 0;
+}
+
+static void complete_tx(struct xsk *xsk)
+{
+	__u32 idx;
+	__u64 addr;
+
+	if (ASSERT_EQ(xsk_ring_cons__peek(&xsk->comp, 1, &idx), 1, "xsk_ring_cons__peek")) {
+		addr = *xsk_ring_cons__comp_addr(&xsk->comp, idx);
+
+		printf("%p: refill idx=%u addr=%llx\n", xsk, idx, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr;
+		xsk_ring_prod__submit(&xsk->fill, 1);
+	}
+}
+
+static void refill_rx(struct xsk *xsk, __u64 addr)
+{
+	__u32 idx;
+
+	if (ASSERT_EQ(xsk_ring_prod__reserve(&xsk->fill, 1, &idx), 1, "xsk_ring_prod__reserve")) {
+		printf("%p: complete idx=%u addr=%llx\n", xsk, idx, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr;
+		xsk_ring_prod__submit(&xsk->fill, 1);
+	}
+}
+
+static int verify_xsk_metadata(struct xsk *xsk)
+{
+	const struct xdp_desc *rx_desc;
+	struct pollfd fds = {};
+	struct xdp_meta *meta;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+	__u64 comp_addr;
+	void *data;
+	__u64 addr;
+	__u32 idx;
+	int ret;
+
+	ret = recvfrom(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, NULL);
+	if (!ASSERT_EQ(ret, 0, "recvfrom"))
+		return -1;
+
+	fds.fd = xsk_socket__fd(xsk->socket);
+	fds.events = POLLIN;
+
+	ret = poll(&fds, 1, 1000);
+	if (!ASSERT_GT(ret, 0, "poll"))
+		return -1;
+
+	ret = xsk_ring_cons__peek(&xsk->rx, 1, &idx);
+	if (!ASSERT_EQ(ret, 1, "xsk_ring_cons__peek"))
+		return -2;
+
+	rx_desc = xsk_ring_cons__rx_desc(&xsk->rx, idx);
+	comp_addr = xsk_umem__extract_addr(rx_desc->addr);
+	addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
+	printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
+	       xsk, idx, rx_desc->addr, addr, comp_addr);
+	data = xsk_umem__get_data(xsk->umem_area, addr);
+
+	/* Make sure we got the packet offset correctly. */
+
+	eth = data;
+	ASSERT_EQ(eth->h_proto, htons(ETH_P_IP), "eth->h_proto");
+	iph = (void *)(eth + 1);
+	ASSERT_EQ((int)iph->version, 4, "iph->version");
+
+	/* custom metadata */
+
+	meta = data - sizeof(struct xdp_meta);
+
+	if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp"))
+		return -1;
+
+	if (!ASSERT_NEQ(meta->rx_hash, 0, "rx_hash"))
+		return -1;
+
+	xsk_ring_cons__release(&xsk->rx, 1);
+	refill_rx(xsk, comp_addr);
+
+	return 0;
+}
+
+void test_xdp_metadata(void)
+{
+	struct xdp_metadata *bpf_obj = NULL;
+	struct nstoken *tok = NULL;
+	__u32 queue_id = QUEUE_ID;
+	struct bpf_program *prog;
+	struct xsk tx_xsk = {};
+	struct xsk rx_xsk = {};
+	int rx_ifindex;
+	int sock_fd;
+	int ret;
+
+	/* Setup new networking namespace, with a veth pair. */
+
+	SYS("ip netns add xdp_metadata");
+	tok = open_netns("xdp_metadata");
+	SYS("ip link add numtxqueues 1 numrxqueues 1 " TX_NAME
+	    " type veth peer " RX_NAME " numtxqueues 1 numrxqueues 1");
+	SYS("ip link set dev " TX_NAME " address 00:00:00:00:00:01");
+	SYS("ip link set dev " RX_NAME " address 00:00:00:00:00:02");
+	SYS("ip link set dev " TX_NAME " up");
+	SYS("ip link set dev " RX_NAME " up");
+	SYS("ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME);
+	SYS("ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME);
+
+	rx_ifindex = if_nametoindex(RX_NAME);
+
+	/* Setup separate AF_XDP for TX and RX interfaces. */
+
+	ret = open_xsk(TX_NAME, &tx_xsk);
+	if (!ASSERT_OK(ret, "open_xsk(TX_NAME)"))
+		goto out;
+
+	ret = open_xsk(RX_NAME, &rx_xsk);
+	if (!ASSERT_OK(ret, "open_xsk(RX_NAME)"))
+		goto out;
+
+	/* Attach BPF program to RX interface. */
+
+	bpf_obj = xdp_metadata__open();
+	if (!ASSERT_OK_PTR(bpf_obj, "open skeleton"))
+		goto out;
+
+	prog = bpf_object__find_program_by_name(bpf_obj->obj, "rx");
+	bpf_program__set_ifindex(prog, rx_ifindex);
+	bpf_program__set_flags(prog, BPF_F_XDP_HAS_METADATA);
+
+	if (!ASSERT_OK(xdp_metadata__load(bpf_obj), "load skeleton"))
+		goto out;
+
+	ret = bpf_xdp_attach(rx_ifindex,
+			     bpf_program__fd(bpf_obj->progs.rx),
+			     XDP_FLAGS, NULL);
+	if (!ASSERT_GE(ret, 0, "bpf_xdp_attach"))
+		goto out;
+
+	sock_fd = xsk_socket__fd(rx_xsk.socket);
+	ret = bpf_map_update_elem(bpf_map__fd(bpf_obj->maps.xsk), &queue_id, &sock_fd, 0);
+	if (!ASSERT_GE(ret, 0, "bpf_map_update_elem"))
+		goto out;
+
+	/* Send packet destined to RX AF_XDP socket. */
+	if (!ASSERT_GE(generate_packet(&tx_xsk, AF_XDP_CONSUMER_PORT), 0,
+		       "generate AF_XDP_CONSUMER_PORT"))
+		goto out;
+
+	/* Verify AF_XDP RX packet has proper metadata. */
+	if (!ASSERT_GE(verify_xsk_metadata(&rx_xsk), 0,
+		       "verify_xsk_metadata"))
+		goto out;
+
+	complete_tx(&tx_xsk);
+
+out:
+	close_xsk(&rx_xsk);
+	close_xsk(&tx_xsk);
+	if (bpf_obj)
+		xdp_metadata__destroy(bpf_obj);
+	system("ip netns del xdp_metadata");
+	if (tok)
+		close_netns(tok);
+}
diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c
new file mode 100644
index 000000000000..1b19a8d86efe
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+
+#ifndef ETH_P_IP
+#define ETH_P_IP 0x0800
+#endif
+
+#include "xdp_metadata.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_XSKMAP);
+	__uint(max_entries, 4);
+	__type(key, __u32);
+	__type(value, __u32);
+} xsk SEC(".maps");
+
+extern bool bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx) __ksym;
+extern __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) __ksym;
+extern bool bpf_xdp_metadata_rx_hash_supported(const struct xdp_md *ctx) __ksym;
+extern __u32 bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx) __ksym;
+
+SEC("xdp")
+int rx(struct xdp_md *ctx)
+{
+	void *data, *data_meta;
+	struct xdp_meta *meta;
+	int ret;
+
+	/* Reserve enough for all custom metadata. */
+
+	ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
+	if (ret != 0)
+		return XDP_DROP;
+
+	data = (void *)(long)ctx->data;
+	data_meta = (void *)(long)ctx->data_meta;
+
+	if (data_meta + sizeof(struct xdp_meta) > data)
+		return XDP_DROP;
+
+	meta = data_meta;
+
+	/* Export metadata. */
+
+	if (bpf_xdp_metadata_rx_timestamp_supported(ctx))
+		meta->rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx);
+
+	if (bpf_xdp_metadata_rx_hash_supported(ctx))
+		meta->rx_hash = bpf_xdp_metadata_rx_hash(ctx);
+
+	return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
new file mode 100644
index 000000000000..c4892d122b7f
--- /dev/null
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#pragma once
+
+struct xdp_meta {
+	__u64 rx_timestamp;
+	__u32 rx_hash;
+};
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
                   ` (4 preceding siblings ...)
  2022-11-21 18:25 ` [PATCH bpf-next v2 5/8] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
@ 2022-11-21 18:25 ` Stanislav Fomichev
  2022-11-22 13:49   ` Tariq Toukan
  2022-11-23 14:33   ` [xdp-hints] " Toke Høiland-Jørgensen
  2022-11-21 18:25 ` [PATCH bpf-next v2 7/8] mxl4: Support RX XDP metadata Stanislav Fomichev
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-21 18:25 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

No functional changes. Boilerplate to allow stuffing more data after xdp_buff.

Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 26 +++++++++++++---------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 8f762fc170b3..467356633172 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -661,17 +661,21 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
 #define MLX4_CQE_STATUS_IP_ANY (MLX4_CQE_STATUS_IPV4)
 #endif
 
+struct mlx4_xdp_buff {
+	struct xdp_buff xdp;
+};
+
 int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int factor = priv->cqe_factor;
 	struct mlx4_en_rx_ring *ring;
+	struct mlx4_xdp_buff mxbuf;
 	struct bpf_prog *xdp_prog;
 	int cq_ring = cq->ring;
 	bool doorbell_pending;
 	bool xdp_redir_flush;
 	struct mlx4_cqe *cqe;
-	struct xdp_buff xdp;
 	int polled = 0;
 	int index;
 
@@ -681,7 +685,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	ring = priv->rx_ring[cq_ring];
 
 	xdp_prog = rcu_dereference_bh(ring->xdp_prog);
-	xdp_init_buff(&xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
+	xdp_init_buff(&mxbuf.xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
 	doorbell_pending = false;
 	xdp_redir_flush = false;
 
@@ -776,24 +780,24 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 						priv->frag_info[0].frag_size,
 						DMA_FROM_DEVICE);
 
-			xdp_prepare_buff(&xdp, va - frags[0].page_offset,
+			xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
 					 frags[0].page_offset, length, false);
-			orig_data = xdp.data;
+			orig_data = mxbuf.xdp.data;
 
-			act = bpf_prog_run_xdp(xdp_prog, &xdp);
+			act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
 
-			length = xdp.data_end - xdp.data;
-			if (xdp.data != orig_data) {
-				frags[0].page_offset = xdp.data -
-					xdp.data_hard_start;
-				va = xdp.data;
+			length = mxbuf.xdp.data_end - mxbuf.xdp.data;
+			if (mxbuf.xdp.data != orig_data) {
+				frags[0].page_offset = mxbuf.xdp.data -
+					mxbuf.xdp.data_hard_start;
+				va = mxbuf.xdp.data;
 			}
 
 			switch (act) {
 			case XDP_PASS:
 				break;
 			case XDP_REDIRECT:
-				if (likely(!xdp_do_redirect(dev, &xdp, xdp_prog))) {
+				if (likely(!xdp_do_redirect(dev, &mxbuf.xdp, xdp_prog))) {
 					ring->xdp_redirect++;
 					xdp_redir_flush = true;
 					frags[0].page = NULL;
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH bpf-next v2 7/8] mxl4: Support RX XDP metadata
  2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
                   ` (5 preceding siblings ...)
  2022-11-21 18:25 ` [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
@ 2022-11-21 18:25 ` Stanislav Fomichev
  2022-11-22 13:50   ` Tariq Toukan
  2022-11-21 18:25 ` [PATCH bpf-next v2 8/8] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
  2022-11-23 14:46 ` [PATCH bpf-next 1/2] xdp: Add drv_priv pointer to struct xdp_buff Toke Høiland-Jørgensen
  8 siblings, 1 reply; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-21 18:25 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

RX timestamp and hash for now. Tested using the prog from the next
patch.

Also enabling xdp metadata support; don't see why it's disabled,
there is enough headroom..

Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../net/ethernet/mellanox/mlx4/en_netdev.c    | 10 ++++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    | 48 ++++++++++++++++++-
 include/linux/mlx4/device.h                   |  7 +++
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 8800d3f1f55c..1cb63746a851 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2855,6 +2855,11 @@ static const struct net_device_ops mlx4_netdev_ops = {
 	.ndo_features_check	= mlx4_en_features_check,
 	.ndo_set_tx_maxrate	= mlx4_en_set_tx_maxrate,
 	.ndo_bpf		= mlx4_xdp,
+
+	.ndo_xdp_rx_timestamp_supported = mlx4_xdp_rx_timestamp_supported,
+	.ndo_xdp_rx_timestamp	= mlx4_xdp_rx_timestamp,
+	.ndo_xdp_rx_hash_supported = mlx4_xdp_rx_hash_supported,
+	.ndo_xdp_rx_hash	= mlx4_xdp_rx_hash,
 };
 
 static const struct net_device_ops mlx4_netdev_ops_master = {
@@ -2887,6 +2892,11 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
 	.ndo_features_check	= mlx4_en_features_check,
 	.ndo_set_tx_maxrate	= mlx4_en_set_tx_maxrate,
 	.ndo_bpf		= mlx4_xdp,
+
+	.ndo_xdp_rx_timestamp_supported = mlx4_xdp_rx_timestamp_supported,
+	.ndo_xdp_rx_timestamp	= mlx4_xdp_rx_timestamp,
+	.ndo_xdp_rx_hash_supported = mlx4_xdp_rx_hash_supported,
+	.ndo_xdp_rx_hash	= mlx4_xdp_rx_hash,
 };
 
 struct mlx4_en_bond {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 467356633172..fd14d59f6cbf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -663,8 +663,50 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
 
 struct mlx4_xdp_buff {
 	struct xdp_buff xdp;
+	struct mlx4_cqe *cqe;
+	struct mlx4_en_dev *mdev;
+	struct mlx4_en_rx_ring *ring;
+	struct net_device *dev;
 };
 
+bool mlx4_xdp_rx_timestamp_supported(const struct xdp_md *ctx)
+{
+	struct mlx4_xdp_buff *_ctx = (void *)ctx;
+
+	return _ctx->ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL;
+}
+
+u64 mlx4_xdp_rx_timestamp(const struct xdp_md *ctx)
+{
+	struct mlx4_xdp_buff *_ctx = (void *)ctx;
+	unsigned int seq;
+	u64 timestamp;
+	u64 nsec;
+
+	timestamp = mlx4_en_get_cqe_ts(_ctx->cqe);
+
+	do {
+		seq = read_seqbegin(&_ctx->mdev->clock_lock);
+		nsec = timecounter_cyc2time(&_ctx->mdev->clock, timestamp);
+	} while (read_seqretry(&_ctx->mdev->clock_lock, seq));
+
+	return ns_to_ktime(nsec);
+}
+
+bool mlx4_xdp_rx_hash_supported(const struct xdp_md *ctx)
+{
+	struct mlx4_xdp_buff *_ctx = (void *)ctx;
+
+	return _ctx->dev->features & NETIF_F_RXHASH;
+}
+
+u32 mlx4_xdp_rx_hash(const struct xdp_md *ctx)
+{
+	struct mlx4_xdp_buff *_ctx = (void *)ctx;
+
+	return be32_to_cpu(_ctx->cqe->immed_rss_invalid);
+}
+
 int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -781,8 +823,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 						DMA_FROM_DEVICE);
 
 			xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
-					 frags[0].page_offset, length, false);
+					 frags[0].page_offset, length, true);
 			orig_data = mxbuf.xdp.data;
+			mxbuf.cqe = cqe;
+			mxbuf.mdev = priv->mdev;
+			mxbuf.ring = ring;
+			mxbuf.dev = dev;
 
 			act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
 
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 6646634a0b9d..d5904da1d490 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1585,4 +1585,11 @@ static inline int mlx4_get_num_reserved_uar(struct mlx4_dev *dev)
 	/* The first 128 UARs are used for EQ doorbells */
 	return (128 >> (PAGE_SHIFT - dev->uar_page_shift));
 }
+
+struct xdp_md;
+bool mlx4_xdp_rx_timestamp_supported(const struct xdp_md *ctx);
+u64 mlx4_xdp_rx_timestamp(const struct xdp_md *ctx);
+bool mlx4_xdp_rx_hash_supported(const struct xdp_md *ctx);
+u32 mlx4_xdp_rx_hash(const struct xdp_md *ctx);
+
 #endif /* MLX4_DEVICE_H */
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH bpf-next v2 8/8] selftests/bpf: Simple program to dump XDP RX metadata
  2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
                   ` (6 preceding siblings ...)
  2022-11-21 18:25 ` [PATCH bpf-next v2 7/8] mxl4: Support RX XDP metadata Stanislav Fomichev
@ 2022-11-21 18:25 ` Stanislav Fomichev
  2022-11-23 14:26   ` [xdp-hints] " Toke Høiland-Jørgensen
  2022-11-23 14:46 ` [PATCH bpf-next 1/2] xdp: Add drv_priv pointer to struct xdp_buff Toke Høiland-Jørgensen
  8 siblings, 1 reply; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-21 18:25 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

To be used for verification of driver implementations. Note that
the skb path is gone from the series, but I'm still keeping the
implementation for any possible future work.

$ xdp_hw_metadata <ifname>

On the other machine:

$ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP
$ echo -n skb | nc -u -q1 <target> 9092 # for skb

Sample output:

  # xdp
  xsk_ring_cons__peek: 1
  0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
  rx_timestamp_supported: 1
  rx_timestamp: 1667850075063948829
  0x19f9090: complete idx=8 addr=8000

  # skb
  found skb hwtstamp = 1668314052.854274681

Decoding:
  # xdp
  rx_timestamp=1667850075.063948829

  $ date -d @1667850075
  Mon Nov  7 11:41:15 AM PST 2022
  $ date
  Mon Nov  7 11:42:05 AM PST 2022

  # skb
  $ date -d @1668314052
  Sat Nov 12 08:34:12 PM PST 2022
  $ date
  Sat Nov 12 08:37:06 PM PST 2022

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   6 +-
 .../selftests/bpf/progs/xdp_hw_metadata.c     |  93 ++++
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 405 ++++++++++++++++++
 4 files changed, 504 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/xdp_hw_metadata.c

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 07d2d0a8c5cb..01e3baeefd4f 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -46,3 +46,4 @@ test_cpp
 xskxceiver
 xdp_redirect_multi
 xdp_synproxy
+xdp_hw_metadata
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 4eed22fa3681..189b39b0e5d0 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -83,7 +83,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \
 TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \
 	flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
 	test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \
-	xskxceiver xdp_redirect_multi xdp_synproxy veristat
+	xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata
 
 TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file
 TEST_GEN_FILES += liburandom_read.so
@@ -241,6 +241,9 @@ $(OUTPUT)/test_maps: $(TESTING_HELPERS)
 $(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS)
 $(OUTPUT)/xsk.o: $(BPFOBJ)
 $(OUTPUT)/xskxceiver: $(OUTPUT)/xsk.o
+$(OUTPUT)/xdp_hw_metadata: $(OUTPUT)/xsk.o $(OUTPUT)/xdp_hw_metadata.skel.h
+$(OUTPUT)/xdp_hw_metadata: $(OUTPUT)/network_helpers.o
+$(OUTPUT)/xdp_hw_metadata: LDFLAGS += -static
 
 BPFTOOL ?= $(DEFAULT_BPFTOOL)
 $(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile)    \
@@ -383,6 +386,7 @@ linked_maps.skel.h-deps := linked_maps1.bpf.o linked_maps2.bpf.o
 test_subskeleton.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib.bpf.o test_subskeleton.bpf.o
 test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib.bpf.o
 test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o
+xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o
 
 LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps)))
 
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
new file mode 100644
index 000000000000..0ae409094883
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/in.h>
+#include <linux/udp.h>
+#include <stdbool.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "xdp_metadata.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_XSKMAP);
+	__uint(max_entries, 256);
+	__type(key, __u32);
+	__type(value, __u32);
+} xsk SEC(".maps");
+
+extern bool bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx) __ksym;
+extern __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) __ksym;
+extern bool bpf_xdp_metadata_rx_hash_supported(const struct xdp_md *ctx) __ksym;
+extern __u32 bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx) __ksym;
+
+SEC("xdp")
+int rx(struct xdp_md *ctx)
+{
+	void *data, *data_meta, *data_end;
+	struct ipv6hdr *ip6h = NULL;
+	struct ethhdr *eth = NULL;
+	struct udphdr *udp = NULL;
+	struct iphdr *iph = NULL;
+	struct xdp_meta *meta;
+	int ret;
+
+	data = (void *)(long)ctx->data;
+	data_end = (void *)(long)ctx->data_end;
+	eth = data;
+	if (eth + 1 < data_end) {
+		if (eth->h_proto == bpf_htons(ETH_P_IP)) {
+			iph = (void *)(eth + 1);
+			if (iph + 1 < data_end && iph->protocol == IPPROTO_UDP)
+				udp = (void *)(iph + 1);
+		}
+		if (eth->h_proto == bpf_htons(ETH_P_IPV6)) {
+			ip6h = (void *)(eth + 1);
+			if (ip6h + 1 < data_end && ip6h->nexthdr == IPPROTO_UDP)
+				udp = (void *)(ip6h + 1);
+		}
+		if (udp && udp + 1 > data_end)
+			udp = NULL;
+	}
+
+	if (!udp)
+		return XDP_PASS;
+
+	if (udp->dest != bpf_htons(9091))
+		return XDP_PASS;
+
+	bpf_printk("forwarding UDP:9091 to AF_XDP");
+
+	ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
+	if (ret != 0) {
+		bpf_printk("bpf_xdp_adjust_meta returned %d", ret);
+		return XDP_PASS;
+	}
+
+	data = (void *)(long)ctx->data;
+	data_meta = (void *)(long)ctx->data_meta;
+	meta = data_meta;
+
+	if (meta + 1 > data) {
+		bpf_printk("bpf_xdp_adjust_meta doesn't appear to work");
+		return XDP_PASS;
+	}
+
+	if (bpf_xdp_metadata_rx_timestamp_supported(ctx)) {
+		meta->rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx);
+		bpf_printk("populated rx_timestamp with %u", meta->rx_timestamp);
+	}
+
+	if (bpf_xdp_metadata_rx_hash_supported(ctx)) {
+		meta->rx_hash = bpf_xdp_metadata_rx_hash(ctx);
+		bpf_printk("populated rx_hash with %u", meta->rx_hash);
+	}
+
+	return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
new file mode 100644
index 000000000000..7823a35a1ef7
--- /dev/null
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -0,0 +1,405 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Reference program for verifying XDP metadata on real HW. Functional test
+ * only, doesn't test the performance.
+ *
+ * RX:
+ * - UDP 9091 packets are diverted into AF_XDP
+ * - Metadata verified:
+ *   - rx_timestamp
+ *   - rx_hash
+ *
+ * TX:
+ * - TBD
+ */
+
+#include <test_progs.h>
+#include <network_helpers.h>
+#include "xdp_hw_metadata.skel.h"
+#include "xsk.h"
+
+#include <error.h>
+#include <linux/errqueue.h>
+#include <linux/if_link.h>
+#include <linux/net_tstamp.h>
+#include <linux/udp.h>
+#include <linux/sockios.h>
+#include <sys/mman.h>
+#include <net/if.h>
+#include <poll.h>
+
+#include "xdp_metadata.h"
+
+#define UMEM_NUM 16
+#define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
+#define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM)
+#define XDP_FLAGS (XDP_FLAGS_DRV_MODE | XDP_FLAGS_REPLACE)
+
+struct xsk {
+	void *umem_area;
+	struct xsk_umem *umem;
+	struct xsk_ring_prod fill;
+	struct xsk_ring_cons comp;
+	struct xsk_ring_prod tx;
+	struct xsk_ring_cons rx;
+	struct xsk_socket *socket;
+};
+
+struct xdp_hw_metadata *bpf_obj;
+struct xsk *rx_xsk;
+const char *ifname;
+int ifindex;
+int rxq;
+
+void test__fail(void) { /* for network_helpers.c */ }
+
+static int open_xsk(const char *ifname, struct xsk *xsk, __u32 queue_id)
+{
+	int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE;
+	const struct xsk_socket_config socket_config = {
+		.rx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD,
+		.xdp_flags = XDP_FLAGS,
+		.bind_flags = XDP_COPY,
+	};
+	const struct xsk_umem_config umem_config = {
+		.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
+		.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE,
+		.flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG,
+	};
+	__u32 idx;
+	u64 addr;
+	int ret;
+	int i;
+
+	xsk->umem_area = mmap(NULL, UMEM_SIZE, PROT_READ | PROT_WRITE, mmap_flags, -1, 0);
+	if (xsk->umem_area == MAP_FAILED)
+		return -ENOMEM;
+
+	ret = xsk_umem__create(&xsk->umem,
+			       xsk->umem_area, UMEM_SIZE,
+			       &xsk->fill,
+			       &xsk->comp,
+			       &umem_config);
+	if (ret)
+		return ret;
+
+	ret = xsk_socket__create(&xsk->socket, ifname, queue_id,
+				 xsk->umem,
+				 &xsk->rx,
+				 &xsk->tx,
+				 &socket_config);
+	if (ret)
+		return ret;
+
+	/* First half of umem is for TX. This way address matches 1-to-1
+	 * to the completion queue index.
+	 */
+
+	for (i = 0; i < UMEM_NUM / 2; i++) {
+		addr = i * UMEM_FRAME_SIZE;
+		printf("%p: tx_desc[%d] -> %lx\n", xsk, i, addr);
+	}
+
+	/* Second half of umem is for RX. */
+
+	ret = xsk_ring_prod__reserve(&xsk->fill, UMEM_NUM / 2, &idx);
+	for (i = 0; i < UMEM_NUM / 2; i++) {
+		addr = (UMEM_NUM / 2 + i) * UMEM_FRAME_SIZE;
+		printf("%p: rx_desc[%d] -> %lx\n", xsk, i, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, i) = addr;
+	}
+	xsk_ring_prod__submit(&xsk->fill, ret);
+
+	return 0;
+}
+
+static void close_xsk(struct xsk *xsk)
+{
+	if (xsk->umem)
+		xsk_umem__delete(xsk->umem);
+	if (xsk->socket)
+		xsk_socket__delete(xsk->socket);
+	munmap(xsk->umem, UMEM_SIZE);
+}
+
+static void refill_rx(struct xsk *xsk, __u64 addr)
+{
+	__u32 idx;
+
+	if (xsk_ring_prod__reserve(&xsk->fill, 1, &idx) == 1) {
+		printf("%p: complete idx=%u addr=%llx\n", xsk, idx, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr;
+		xsk_ring_prod__submit(&xsk->fill, 1);
+	}
+}
+
+static void verify_xdp_metadata(void *data)
+{
+	struct xdp_meta *meta;
+
+	meta = data - sizeof(*meta);
+
+	printf("rx_timestamp: %llu\n", meta->rx_timestamp);
+	printf("rx_hash: %u\n", meta->rx_hash);
+}
+
+static void verify_skb_metadata(int fd)
+{
+	char cmsg_buf[1024];
+	char packet_buf[128];
+
+	struct scm_timestamping *ts;
+	struct iovec packet_iov;
+	struct cmsghdr *cmsg;
+	struct msghdr hdr;
+
+	memset(&hdr, 0, sizeof(hdr));
+	hdr.msg_iov = &packet_iov;
+	hdr.msg_iovlen = 1;
+	packet_iov.iov_base = packet_buf;
+	packet_iov.iov_len = sizeof(packet_buf);
+
+	hdr.msg_control = cmsg_buf;
+	hdr.msg_controllen = sizeof(cmsg_buf);
+
+	if (recvmsg(fd, &hdr, 0) < 0)
+		error(-1, errno, "recvmsg");
+
+	for (cmsg = CMSG_FIRSTHDR(&hdr); cmsg != NULL;
+	     cmsg = CMSG_NXTHDR(&hdr, cmsg)) {
+
+		if (cmsg->cmsg_level != SOL_SOCKET)
+			continue;
+
+		switch (cmsg->cmsg_type) {
+		case SCM_TIMESTAMPING:
+			ts = (struct scm_timestamping *)CMSG_DATA(cmsg);
+			if (ts->ts[2].tv_sec || ts->ts[2].tv_nsec) {
+				printf("found skb hwtstamp = %lu.%lu\n",
+				       ts->ts[2].tv_sec, ts->ts[2].tv_nsec);
+				return;
+			}
+			break;
+		default:
+			break;
+		}
+	}
+
+	printf("skb hwtstamp is not found!\n");
+}
+
+static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
+{
+	const struct xdp_desc *rx_desc;
+	struct pollfd fds[rxq + 1];
+	__u64 comp_addr;
+	__u64 addr;
+	__u32 idx;
+	int ret;
+	int i;
+
+	for (i = 0; i < rxq; i++) {
+		fds[i].fd = xsk_socket__fd(rx_xsk[i].socket);
+		fds[i].events = POLLIN;
+		fds[i].revents = 0;
+	}
+
+	fds[rxq].fd = server_fd;
+	fds[rxq].events = POLLIN;
+	fds[rxq].revents = 0;
+
+	while (true) {
+		errno = 0;
+		ret = poll(fds, rxq + 1, 1000);
+		printf("poll: %d (%d)\n", ret, errno);
+		if (ret < 0)
+			break;
+		if (ret == 0)
+			continue;
+
+		if (fds[rxq].revents)
+			verify_skb_metadata(server_fd);
+
+		for (i = 0; i < rxq; i++) {
+			if (fds[i].revents == 0)
+				continue;
+
+			struct xsk *xsk = &rx_xsk[i];
+
+			ret = xsk_ring_cons__peek(&xsk->rx, 1, &idx);
+			printf("xsk_ring_cons__peek: %d\n", ret);
+			if (ret != 1)
+				continue;
+
+			rx_desc = xsk_ring_cons__rx_desc(&xsk->rx, idx);
+			comp_addr = xsk_umem__extract_addr(rx_desc->addr);
+			addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
+			printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
+			       xsk, idx, rx_desc->addr, addr, comp_addr);
+			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr));
+			xsk_ring_cons__release(&xsk->rx, 1);
+			refill_rx(xsk, comp_addr);
+		}
+	}
+
+	return 0;
+}
+
+struct ethtool_channels {
+	__u32	cmd;
+	__u32	max_rx;
+	__u32	max_tx;
+	__u32	max_other;
+	__u32	max_combined;
+	__u32	rx_count;
+	__u32	tx_count;
+	__u32	other_count;
+	__u32	combined_count;
+};
+
+#define ETHTOOL_GCHANNELS	0x0000003c /* Get no of channels */
+
+static int rxq_num(const char *ifname)
+{
+	struct ethtool_channels ch = {
+		.cmd = ETHTOOL_GCHANNELS,
+	};
+
+	struct ifreq ifr = {
+		.ifr_data = (void *)&ch,
+	};
+	strcpy(ifr.ifr_name, ifname);
+	int fd, ret;
+
+	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	if (fd < 0)
+		error(-1, errno, "socket");
+
+	ret = ioctl(fd, SIOCETHTOOL, &ifr);
+	if (ret < 0)
+		error(-1, errno, "socket");
+
+	close(fd);
+
+	return ch.rx_count;
+}
+
+static void cleanup(void)
+{
+	LIBBPF_OPTS(bpf_xdp_attach_opts, opts);
+	int ret;
+	int i;
+
+	if (bpf_obj) {
+		opts.old_prog_fd = bpf_program__fd(bpf_obj->progs.rx);
+		if (opts.old_prog_fd >= 0) {
+			printf("detaching bpf program....\n");
+			ret = bpf_xdp_detach(ifindex, XDP_FLAGS, &opts);
+			if (ret)
+				printf("failed to detach XDP program: %d\n", ret);
+		}
+	}
+
+	for (i = 0; i < rxq; i++)
+		close_xsk(&rx_xsk[i]);
+
+	if (bpf_obj)
+		xdp_hw_metadata__destroy(bpf_obj);
+}
+
+static void handle_signal(int sig)
+{
+	/* interrupting poll() is all we need */
+}
+
+static void timestamping_enable(int fd, int val)
+{
+	int ret;
+
+	ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
+	if (ret < 0)
+		error(-1, errno, "setsockopt(SO_TIMESTAMPING)");
+}
+
+int main(int argc, char *argv[])
+{
+	int server_fd = -1;
+	int ret;
+	int i;
+
+	struct bpf_program *prog;
+
+	if (argc != 2) {
+		fprintf(stderr, "pass device name\n");
+		return -1;
+	}
+
+	ifname = argv[1];
+	ifindex = if_nametoindex(ifname);
+	rxq = rxq_num(ifname);
+
+	printf("rxq: %d\n", rxq);
+
+	rx_xsk = malloc(sizeof(struct xsk) * rxq);
+	if (!rx_xsk)
+		error(-1, ENOMEM, "malloc");
+
+	for (i = 0; i < rxq; i++) {
+		printf("open_xsk(%s, %p, %d)\n", ifname, &rx_xsk[i], i);
+		ret = open_xsk(ifname, &rx_xsk[i], i);
+		if (ret)
+			error(-1, -ret, "open_xsk");
+
+		printf("xsk_socket__fd() -> %d\n", xsk_socket__fd(rx_xsk[i].socket));
+	}
+
+	printf("open bpf program...\n");
+	bpf_obj = xdp_hw_metadata__open();
+	if (libbpf_get_error(bpf_obj))
+		error(-1, libbpf_get_error(bpf_obj), "xdp_hw_metadata__open");
+
+	prog = bpf_object__find_program_by_name(bpf_obj->obj, "rx");
+	bpf_program__set_ifindex(prog, ifindex);
+	bpf_program__set_flags(prog, BPF_F_XDP_HAS_METADATA);
+
+	printf("load bpf program...\n");
+	ret = xdp_hw_metadata__load(bpf_obj);
+	if (ret)
+		error(-1, -ret, "xdp_hw_metadata__load");
+
+	printf("prepare skb endpoint...\n");
+	server_fd = start_server(AF_INET6, SOCK_DGRAM, NULL, 9092, 1000);
+	if (server_fd < 0)
+		error(-1, errno, "start_server");
+	timestamping_enable(server_fd,
+			    SOF_TIMESTAMPING_SOFTWARE |
+			    SOF_TIMESTAMPING_RAW_HARDWARE);
+
+	printf("prepare xsk map...\n");
+	for (i = 0; i < rxq; i++) {
+		int sock_fd = xsk_socket__fd(rx_xsk[i].socket);
+		__u32 queue_id = i;
+
+		printf("map[%d] = %d\n", queue_id, sock_fd);
+		ret = bpf_map_update_elem(bpf_map__fd(bpf_obj->maps.xsk), &queue_id, &sock_fd, 0);
+		if (ret)
+			error(-1, -ret, "bpf_map_update_elem");
+	}
+
+	printf("attach bpf program...\n");
+	ret = bpf_xdp_attach(ifindex,
+			     bpf_program__fd(bpf_obj->progs.rx),
+			     XDP_FLAGS, NULL);
+	if (ret)
+		error(-1, -ret, "bpf_xdp_attach");
+
+	signal(SIGINT, handle_signal);
+	ret = verify_metadata(rx_xsk, rxq, server_fd);
+	close(server_fd);
+	cleanup();
+	if (ret)
+		error(-1, -ret, "verify_metadata");
+}
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-21 18:25 ` [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
@ 2022-11-22 13:49   ` Tariq Toukan
  2022-11-22 18:08     ` Stanislav Fomichev
  2022-11-23 14:33   ` [xdp-hints] " Toke Høiland-Jørgensen
  1 sibling, 1 reply; 50+ messages in thread
From: Tariq Toukan @ 2022-11-22 13:49 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev



On 11/21/2022 8:25 PM, Stanislav Fomichev wrote:
> No functional changes. Boilerplate to allow stuffing more data after xdp_buff.
> 
> Cc: Tariq Toukan <tariqt@nvidia.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c | 26 +++++++++++++---------
>   1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 8f762fc170b3..467356633172 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -661,17 +661,21 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
>   #define MLX4_CQE_STATUS_IP_ANY (MLX4_CQE_STATUS_IPV4)
>   #endif
>   
> +struct mlx4_xdp_buff {
> +	struct xdp_buff xdp;
> +};
> +
>   int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
>   {
>   	struct mlx4_en_priv *priv = netdev_priv(dev);
>   	int factor = priv->cqe_factor;
>   	struct mlx4_en_rx_ring *ring;
> +	struct mlx4_xdp_buff this would helpmxbuf;

as it doesn't go through an init function (only mxbuf.xdp does), better 
init to zero.

>   	struct bpf_prog *xdp_prog;
>   	int cq_ring = cq->ring;
>   	bool doorbell_pending;
>   	bool xdp_redir_flush;
>   	struct mlx4_cqe *cqe;
> -	struct xdp_buff xdp;
>   	int polled = 0;
>   	int index;
>   
> @@ -681,7 +685,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   	ring = priv->rx_ring[cq_ring];
>   
>   	xdp_prog = rcu_dereference_bh(ring->xdp_prog);
> -	xdp_init_buff(&xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
> +	xdp_init_buff(&mxbuf.xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
>   	doorbell_pending = false;
>   	xdp_redir_flush = false;
>   
> @@ -776,24 +780,24 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   						priv->frag_info[0].frag_size,
>   						DMA_FROM_DEVICE);
>   
> -			xdp_prepare_buff(&xdp, va - frags[0].page_offset,
> +			xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
>   					 frags[0].page_offset, length, false);
> -			orig_data = xdp.data;
> +			orig_data = mxbuf.xdp.data;
>   
> -			act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +			act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
>   
> -			length = xdp.data_end - xdp.data;
> -			if (xdp.data != orig_data) {
> -				frags[0].page_offset = xdp.data -
> -					xdp.data_hard_start;
> -				va = xdp.data;
> +			length = mxbuf.xdp.data_end - mxbuf.xdp.data;
> +			if (mxbuf.xdp.data != orig_data) {
> +				frags[0].page_offset = mxbuf.xdp.data -
> +					mxbuf.xdp.data_hard_start;
> +				va = mxbuf.xdp.data;
>   			}
>   
>   			switch (act) {
>   			case XDP_PASS:
>   				break;
>   			case XDP_REDIRECT:
> -				if (likely(!xdp_do_redirect(dev, &xdp, xdp_prog))) {
> +				if (likely(!xdp_do_redirect(dev, &mxbuf.xdp, xdp_prog))) {
>   					ring->xdp_redirect++;
>   					xdp_redir_flush = true;
>   					frags[0].page = NULL;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 7/8] mxl4: Support RX XDP metadata
  2022-11-21 18:25 ` [PATCH bpf-next v2 7/8] mxl4: Support RX XDP metadata Stanislav Fomichev
@ 2022-11-22 13:50   ` Tariq Toukan
  2022-11-22 18:08     ` Stanislav Fomichev
  0 siblings, 1 reply; 50+ messages in thread
From: Tariq Toukan @ 2022-11-22 13:50 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev



On 11/21/2022 8:25 PM, Stanislav Fomichev wrote:
> RX timestamp and hash for now. Tested using the prog from the next
> patch.
> 
> Also enabling xdp metadata support; don't see why it's disabled,
> there is enough headroom..
> 
> Cc: Tariq Toukan <tariqt@nvidia.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>   .../net/ethernet/mellanox/mlx4/en_netdev.c    | 10 ++++
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c    | 48 ++++++++++++++++++-
>   include/linux/mlx4/device.h                   |  7 +++
>   3 files changed, 64 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 8800d3f1f55c..1cb63746a851 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2855,6 +2855,11 @@ static const struct net_device_ops mlx4_netdev_ops = {
>   	.ndo_features_check	= mlx4_en_features_check,
>   	.ndo_set_tx_maxrate	= mlx4_en_set_tx_maxrate,
>   	.ndo_bpf		= mlx4_xdp,
> +
> +	.ndo_xdp_rx_timestamp_supported = mlx4_xdp_rx_timestamp_supported,
> +	.ndo_xdp_rx_timestamp	= mlx4_xdp_rx_timestamp,
> +	.ndo_xdp_rx_hash_supported = mlx4_xdp_rx_hash_supported,
> +	.ndo_xdp_rx_hash	= mlx4_xdp_rx_hash,
>   };
>   
>   static const struct net_device_ops mlx4_netdev_ops_master = {
> @@ -2887,6 +2892,11 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
>   	.ndo_features_check	= mlx4_en_features_check,
>   	.ndo_set_tx_maxrate	= mlx4_en_set_tx_maxrate,
>   	.ndo_bpf		= mlx4_xdp,
> +
> +	.ndo_xdp_rx_timestamp_supported = mlx4_xdp_rx_timestamp_supported,
> +	.ndo_xdp_rx_timestamp	= mlx4_xdp_rx_timestamp,
> +	.ndo_xdp_rx_hash_supported = mlx4_xdp_rx_hash_supported,
> +	.ndo_xdp_rx_hash	= mlx4_xdp_rx_hash,
>   };
>   
>   struct mlx4_en_bond {
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 467356633172..fd14d59f6cbf 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -663,8 +663,50 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
>   
>   struct mlx4_xdp_buff {
>   	struct xdp_buff xdp;
> +	struct mlx4_cqe *cqe;
> +	struct mlx4_en_dev *mdev;
> +	struct mlx4_en_rx_ring *ring;
> +	struct net_device *dev;
>   };
>   
> +bool mlx4_xdp_rx_timestamp_supported(const struct xdp_md *ctx)
> +{
> +	struct mlx4_xdp_buff *_ctx = (void *)ctx;
> +
> +	return _ctx->ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL;
> +}
> +
> +u64 mlx4_xdp_rx_timestamp(const struct xdp_md *ctx)
> +{
> +	struct mlx4_xdp_buff *_ctx = (void *)ctx;
> +	unsigned int seq;
> +	u64 timestamp;
> +	u64 nsec;
> +
> +	timestamp = mlx4_en_get_cqe_ts(_ctx->cqe);
> +
> +	do {
> +		seq = read_seqbegin(&_ctx->mdev->clock_lock);
> +		nsec = timecounter_cyc2time(&_ctx->mdev->clock, timestamp);
> +	} while (read_seqretry(&_ctx->mdev->clock_lock, seq));
> +

This is open-code version of mlx4_en_fill_hwtstamps.
Better use the existing function.

> +	return ns_to_ktime(nsec);
> +}
> +
> +bool mlx4_xdp_rx_hash_supported(const struct xdp_md *ctx)
> +{
> +	struct mlx4_xdp_buff *_ctx = (void *)ctx;
> +
> +	return _ctx->dev->features & NETIF_F_RXHASH;
> +}
> +
> +u32 mlx4_xdp_rx_hash(const struct xdp_md *ctx)
> +{
> +	struct mlx4_xdp_buff *_ctx = (void *)ctx;
> +
> +	return be32_to_cpu(_ctx->cqe->immed_rss_invalid);
> +}
> +
>   int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
>   {
>   	struct mlx4_en_priv *priv = netdev_priv(dev);
> @@ -781,8 +823,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   						DMA_FROM_DEVICE);
>   
>   			xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
> -					 frags[0].page_offset, length, false);
> +					 frags[0].page_offset, length, true);
>   			orig_data = mxbuf.xdp.data;
> +			mxbuf.cqe = cqe;
> +			mxbuf.mdev = priv->mdev;
> +			mxbuf.ring = ring;
> +			mxbuf.dev = dev;
>   
>   			act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
>   
> diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
> index 6646634a0b9d..d5904da1d490 100644
> --- a/include/linux/mlx4/device.h
> +++ b/include/linux/mlx4/device.h
> @@ -1585,4 +1585,11 @@ static inline int mlx4_get_num_reserved_uar(struct mlx4_dev *dev)
>   	/* The first 128 UARs are used for EQ doorbells */
>   	return (128 >> (PAGE_SHIFT - dev->uar_page_shift));
>   }
> +
> +struct xdp_md;
> +bool mlx4_xdp_rx_timestamp_supported(const struct xdp_md *ctx);
> +u64 mlx4_xdp_rx_timestamp(const struct xdp_md *ctx);
> +bool mlx4_xdp_rx_hash_supported(const struct xdp_md *ctx);
> +u32 mlx4_xdp_rx_hash(const struct xdp_md *ctx);
> +
>   #endif /* MLX4_DEVICE_H */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-22 13:49   ` Tariq Toukan
@ 2022-11-22 18:08     ` Stanislav Fomichev
  0 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-22 18:08 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Tue, Nov 22, 2022 at 5:49 AM Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>
>
>
> On 11/21/2022 8:25 PM, Stanislav Fomichev wrote:
> > No functional changes. Boilerplate to allow stuffing more data after xdp_buff.
> >
> > Cc: Tariq Toukan <tariqt@nvidia.com>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > Cc: xdp-hints@xdp-project.net
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >   drivers/net/ethernet/mellanox/mlx4/en_rx.c | 26 +++++++++++++---------
> >   1 file changed, 15 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > index 8f762fc170b3..467356633172 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > @@ -661,17 +661,21 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
> >   #define MLX4_CQE_STATUS_IP_ANY (MLX4_CQE_STATUS_IPV4)
> >   #endif
> >
> > +struct mlx4_xdp_buff {
> > +     struct xdp_buff xdp;
> > +};
> > +
> >   int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
> >   {
> >       struct mlx4_en_priv *priv = netdev_priv(dev);
> >       int factor = priv->cqe_factor;
> >       struct mlx4_en_rx_ring *ring;
> > +     struct mlx4_xdp_buff this would helpmxbuf;
>
> as it doesn't go through an init function (only mxbuf.xdp does), better
> init to zero.

SG, will do, thanks!

> >       struct bpf_prog *xdp_prog;
> >       int cq_ring = cq->ring;
> >       bool doorbell_pending;
> >       bool xdp_redir_flush;
> >       struct mlx4_cqe *cqe;
> > -     struct xdp_buff xdp;
> >       int polled = 0;
> >       int index;
> >
> > @@ -681,7 +685,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> >       ring = priv->rx_ring[cq_ring];
> >
> >       xdp_prog = rcu_dereference_bh(ring->xdp_prog);
> > -     xdp_init_buff(&xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
> > +     xdp_init_buff(&mxbuf.xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
> >       doorbell_pending = false;
> >       xdp_redir_flush = false;
> >
> > @@ -776,24 +780,24 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> >                                               priv->frag_info[0].frag_size,
> >                                               DMA_FROM_DEVICE);
> >
> > -                     xdp_prepare_buff(&xdp, va - frags[0].page_offset,
> > +                     xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
> >                                        frags[0].page_offset, length, false);
> > -                     orig_data = xdp.data;
> > +                     orig_data = mxbuf.xdp.data;
> >
> > -                     act = bpf_prog_run_xdp(xdp_prog, &xdp);
> > +                     act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
> >
> > -                     length = xdp.data_end - xdp.data;
> > -                     if (xdp.data != orig_data) {
> > -                             frags[0].page_offset = xdp.data -
> > -                                     xdp.data_hard_start;
> > -                             va = xdp.data;
> > +                     length = mxbuf.xdp.data_end - mxbuf.xdp.data;
> > +                     if (mxbuf.xdp.data != orig_data) {
> > +                             frags[0].page_offset = mxbuf.xdp.data -
> > +                                     mxbuf.xdp.data_hard_start;
> > +                             va = mxbuf.xdp.data;
> >                       }
> >
> >                       switch (act) {
> >                       case XDP_PASS:
> >                               break;
> >                       case XDP_REDIRECT:
> > -                             if (likely(!xdp_do_redirect(dev, &xdp, xdp_prog))) {
> > +                             if (likely(!xdp_do_redirect(dev, &mxbuf.xdp, xdp_prog))) {
> >                                       ring->xdp_redirect++;
> >                                       xdp_redir_flush = true;
> >                                       frags[0].page = NULL;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 7/8] mxl4: Support RX XDP metadata
  2022-11-22 13:50   ` Tariq Toukan
@ 2022-11-22 18:08     ` Stanislav Fomichev
  0 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-22 18:08 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Tue, Nov 22, 2022 at 5:50 AM Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>
>
>
> On 11/21/2022 8:25 PM, Stanislav Fomichev wrote:
> > RX timestamp and hash for now. Tested using the prog from the next
> > patch.
> >
> > Also enabling xdp metadata support; don't see why it's disabled,
> > there is enough headroom..
> >
> > Cc: Tariq Toukan <tariqt@nvidia.com>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > Cc: xdp-hints@xdp-project.net
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >   .../net/ethernet/mellanox/mlx4/en_netdev.c    | 10 ++++
> >   drivers/net/ethernet/mellanox/mlx4/en_rx.c    | 48 ++++++++++++++++++-
> >   include/linux/mlx4/device.h                   |  7 +++
> >   3 files changed, 64 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > index 8800d3f1f55c..1cb63746a851 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > @@ -2855,6 +2855,11 @@ static const struct net_device_ops mlx4_netdev_ops = {
> >       .ndo_features_check     = mlx4_en_features_check,
> >       .ndo_set_tx_maxrate     = mlx4_en_set_tx_maxrate,
> >       .ndo_bpf                = mlx4_xdp,
> > +
> > +     .ndo_xdp_rx_timestamp_supported = mlx4_xdp_rx_timestamp_supported,
> > +     .ndo_xdp_rx_timestamp   = mlx4_xdp_rx_timestamp,
> > +     .ndo_xdp_rx_hash_supported = mlx4_xdp_rx_hash_supported,
> > +     .ndo_xdp_rx_hash        = mlx4_xdp_rx_hash,
> >   };
> >
> >   static const struct net_device_ops mlx4_netdev_ops_master = {
> > @@ -2887,6 +2892,11 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
> >       .ndo_features_check     = mlx4_en_features_check,
> >       .ndo_set_tx_maxrate     = mlx4_en_set_tx_maxrate,
> >       .ndo_bpf                = mlx4_xdp,
> > +
> > +     .ndo_xdp_rx_timestamp_supported = mlx4_xdp_rx_timestamp_supported,
> > +     .ndo_xdp_rx_timestamp   = mlx4_xdp_rx_timestamp,
> > +     .ndo_xdp_rx_hash_supported = mlx4_xdp_rx_hash_supported,
> > +     .ndo_xdp_rx_hash        = mlx4_xdp_rx_hash,
> >   };
> >
> >   struct mlx4_en_bond {
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > index 467356633172..fd14d59f6cbf 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > @@ -663,8 +663,50 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
> >
> >   struct mlx4_xdp_buff {
> >       struct xdp_buff xdp;
> > +     struct mlx4_cqe *cqe;
> > +     struct mlx4_en_dev *mdev;
> > +     struct mlx4_en_rx_ring *ring;
> > +     struct net_device *dev;
> >   };
> >
> > +bool mlx4_xdp_rx_timestamp_supported(const struct xdp_md *ctx)
> > +{
> > +     struct mlx4_xdp_buff *_ctx = (void *)ctx;
> > +
> > +     return _ctx->ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL;
> > +}
> > +
> > +u64 mlx4_xdp_rx_timestamp(const struct xdp_md *ctx)
> > +{
> > +     struct mlx4_xdp_buff *_ctx = (void *)ctx;
> > +     unsigned int seq;
> > +     u64 timestamp;
> > +     u64 nsec;
> > +
> > +     timestamp = mlx4_en_get_cqe_ts(_ctx->cqe);
> > +
> > +     do {
> > +             seq = read_seqbegin(&_ctx->mdev->clock_lock);
> > +             nsec = timecounter_cyc2time(&_ctx->mdev->clock, timestamp);
> > +     } while (read_seqretry(&_ctx->mdev->clock_lock, seq));
> > +
>
> This is open-code version of mlx4_en_fill_hwtstamps.
> Better use the existing function.

That one assumes the skb_shared_hwtstamps argument :-(
Should I try to separate the common parts into some new helper function instead?

Or maybe I can just change mlx4_en_fill_hwtstamps to the following?

u64 mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev, u64 timestamp)
{
   ...
   return ns_to_ktime(nsec);
}

And replace existing callers with:

skb_hwtstamps(skb)->hwtstamp = mlx4_en_fill_hwtstamps(priv->mdev, timestamp).

?


> > +     return ns_to_ktime(nsec);
> > +}
> > +
> > +bool mlx4_xdp_rx_hash_supported(const struct xdp_md *ctx)
> > +{
> > +     struct mlx4_xdp_buff *_ctx = (void *)ctx;
> > +
> > +     return _ctx->dev->features & NETIF_F_RXHASH;
> > +}
> > +
> > +u32 mlx4_xdp_rx_hash(const struct xdp_md *ctx)
> > +{
> > +     struct mlx4_xdp_buff *_ctx = (void *)ctx;
> > +
> > +     return be32_to_cpu(_ctx->cqe->immed_rss_invalid);
> > +}
> > +
> >   int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
> >   {
> >       struct mlx4_en_priv *priv = netdev_priv(dev);
> > @@ -781,8 +823,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> >                                               DMA_FROM_DEVICE);
> >
> >                       xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
> > -                                      frags[0].page_offset, length, false);
> > +                                      frags[0].page_offset, length, true);
> >                       orig_data = mxbuf.xdp.data;
> > +                     mxbuf.cqe = cqe;
> > +                     mxbuf.mdev = priv->mdev;
> > +                     mxbuf.ring = ring;
> > +                     mxbuf.dev = dev;
> >
> >                       act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
> >
> > diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
> > index 6646634a0b9d..d5904da1d490 100644
> > --- a/include/linux/mlx4/device.h
> > +++ b/include/linux/mlx4/device.h
> > @@ -1585,4 +1585,11 @@ static inline int mlx4_get_num_reserved_uar(struct mlx4_dev *dev)
> >       /* The first 128 UARs are used for EQ doorbells */
> >       return (128 >> (PAGE_SHIFT - dev->uar_page_shift));
> >   }
> > +
> > +struct xdp_md;
> > +bool mlx4_xdp_rx_timestamp_supported(const struct xdp_md *ctx);
> > +u64 mlx4_xdp_rx_timestamp(const struct xdp_md *ctx);
> > +bool mlx4_xdp_rx_hash_supported(const struct xdp_md *ctx);
> > +u32 mlx4_xdp_rx_hash(const struct xdp_md *ctx);
> > +
> >   #endif /* MLX4_DEVICE_H */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-21 18:25 ` [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs Stanislav Fomichev
@ 2022-11-23  6:34   ` Martin KaFai Lau
  2022-11-23 18:43     ` Stanislav Fomichev
  2022-11-23 14:24   ` [xdp-hints] " Toke Høiland-Jørgensen
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 50+ messages in thread
From: Martin KaFai Lau @ 2022-11-23  6:34 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: ast, daniel, andrii, song, yhs, john.fastabend, kpsingh, haoluo,
	jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, bpf

On 11/21/22 10:25 AM, Stanislav Fomichev wrote:
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2576,6 +2576,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
>   	} else {
>   		bpf_jit_free(aux->prog);
>   	}
> +	dev_put(aux->xdp_netdev);

I think dev_put needs to be done during unregister_netdevice event also. 
Otherwise, a loaded bpf prog may hold the dev for a long time.  May be there is 
ideas in offload.c.

[ ... ]

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 35972afb6850..ece7f9234b2d 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2491,7 +2491,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
>   				 BPF_F_TEST_STATE_FREQ |
>   				 BPF_F_SLEEPABLE |
>   				 BPF_F_TEST_RND_HI32 |
> -				 BPF_F_XDP_HAS_FRAGS))
> +				 BPF_F_XDP_HAS_FRAGS |
> +				 BPF_F_XDP_HAS_METADATA))
>   		return -EINVAL;
>   
>   	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
> @@ -2579,6 +2580,20 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
>   	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
>   	prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
>   
> +	if (attr->prog_flags & BPF_F_XDP_HAS_METADATA) {
> +		/* Reuse prog_ifindex to bind to the device
> +		 * for XDP metadata kfuncs.
> +		 */
> +		prog->aux->offload_requested = false;
> +
> +		prog->aux->xdp_netdev = dev_get_by_index(current->nsproxy->net_ns,
> +							 attr->prog_ifindex);
> +		if (!prog->aux->xdp_netdev) {
> +			err = -EINVAL;
> +			goto free_prog;
> +		}
> +	}


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-21 18:25 ` [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs Stanislav Fomichev
  2022-11-23  6:34   ` Martin KaFai Lau
@ 2022-11-23 14:24   ` Toke Høiland-Jørgensen
  2022-11-23 18:43     ` Stanislav Fomichev
  2022-11-25 17:53   ` Toke Høiland-Jørgensen
  2022-11-30 17:24   ` Larysa Zaremba
  3 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-23 14:24 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

>  static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			    struct bpf_insn *insn_buf, int insn_idx, int *cnt)
>  {
> @@ -15181,6 +15200,15 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  		return -EINVAL;
>  	}
>  
> +	if (resolve_prog_type(env->prog) == BPF_PROG_TYPE_XDP) {
> +		int imm = fixup_xdp_kfunc_call(env, insn->imm);
> +
> +		if (imm) {
> +			insn->imm = imm;
> +			return 0;

This needs to also set *cnt = 0 before returning; otherwise the verifier
can do some really weird instruction rewriting that leads to the JIT
barfing on invalid instructions (as I just found out while trying to
test this).

-Toke


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 8/8] selftests/bpf: Simple program to dump XDP RX metadata
  2022-11-21 18:25 ` [PATCH bpf-next v2 8/8] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
@ 2022-11-23 14:26   ` Toke Høiland-Jørgensen
  2022-11-23 18:29     ` Stanislav Fomichev
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-23 14:26 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Stanislav Fomichev <sdf@google.com> writes:

> +static int rxq_num(const char *ifname)
> +{
> +	struct ethtool_channels ch = {
> +		.cmd = ETHTOOL_GCHANNELS,
> +	};
> +
> +	struct ifreq ifr = {
> +		.ifr_data = (void *)&ch,
> +	};
> +	strcpy(ifr.ifr_name, ifname);
> +	int fd, ret;
> +
> +	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
> +	if (fd < 0)
> +		error(-1, errno, "socket");
> +
> +	ret = ioctl(fd, SIOCETHTOOL, &ifr);
> +	if (ret < 0)
> +		error(-1, errno, "socket");
> +
> +	close(fd);
> +
> +	return ch.rx_count;
> +}

mlx5 uses 'combined' channels, so this returns 0. Changing it to just:

return ch.rx_count ?: ch.combined_count; 

works though :)

-Toke


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-21 18:25 ` [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
  2022-11-22 13:49   ` Tariq Toukan
@ 2022-11-23 14:33   ` Toke Høiland-Jørgensen
  2022-11-23 18:26     ` Stanislav Fomichev
  1 sibling, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-23 14:33 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

Stanislav Fomichev <sdf@google.com> writes:

> No functional changes. Boilerplate to allow stuffing more data after xdp_buff.
>
> Cc: Tariq Toukan <tariqt@nvidia.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 26 +++++++++++++---------
>  1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 8f762fc170b3..467356633172 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -661,17 +661,21 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
>  #define MLX4_CQE_STATUS_IP_ANY (MLX4_CQE_STATUS_IPV4)
>  #endif
>  
> +struct mlx4_xdp_buff {
> +	struct xdp_buff xdp;
> +};

This embedding trick works for drivers that put xdp_buff on the stack,
but mlx5 supports XSK zerocopy, which uses the xsk_buff_pool for
allocating them. This makes it a bit awkward to do the same thing there;
and since it's probably going to be fairly common to do something like
this, how about we just add a 'void *drv_priv' pointer to struct
xdp_buff that the drivers can use? The xdp_buff already takes up a full
cache line anyway, so any data stuffed after it will spill over to a new
one; so I don't think there's much difference performance-wise.

I'll send my patch to add support to mlx5 (using the drv_priv pointer
approach) separately.

-Toke


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH bpf-next 1/2] xdp: Add drv_priv pointer to struct xdp_buff
  2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
                   ` (7 preceding siblings ...)
  2022-11-21 18:25 ` [PATCH bpf-next v2 8/8] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
@ 2022-11-23 14:46 ` Toke Høiland-Jørgensen
  2022-11-23 14:46   ` [PATCH bpf-next 2/2] mlx5: Support XDP RX metadata Toke Høiland-Jørgensen
  8 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-23 14:46 UTC (permalink / raw)
  To: bpf
  Cc: Toke Høiland-Jørgensen, John Fastabend, David Ahern,
	Martin KaFai Lau, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, Stanislav Fomichev, xdp-hints,
	netdev

This allows drivers to add more context data to the xdp_buff object, which
they can use for metadata kfunc implementations.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/xdp.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 348aefd467ed..27c54ad3c8e2 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -84,6 +84,7 @@ struct xdp_buff {
 	struct xdp_txq_info *txq;
 	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
 	u32 flags; /* supported values defined in xdp_buff_flags */
+	void *drv_priv;
 };
 
 static __always_inline bool xdp_buff_has_frags(struct xdp_buff *xdp)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH bpf-next 2/2] mlx5: Support XDP RX metadata
  2022-11-23 14:46 ` [PATCH bpf-next 1/2] xdp: Add drv_priv pointer to struct xdp_buff Toke Høiland-Jørgensen
@ 2022-11-23 14:46   ` Toke Høiland-Jørgensen
  2022-11-23 22:29     ` [xdp-hints] " Saeed Mahameed
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-23 14:46 UTC (permalink / raw)
  To: bpf
  Cc: Toke Høiland-Jørgensen, John Fastabend, David Ahern,
	Martin KaFai Lau, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, Stanislav Fomichev, xdp-hints,
	netdev

Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
pointer to the mlx5e_skb_from* functions so it can be retrieved from the
XDP ctx to do this.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
This goes on top of Stanislav's series, obvioulsy. Verified that it works using
the xdp_hw_metadata utility; going to do ome benchmarking and follow up with the
results, but figured I'd send this out straight away in case others wanted to
play with it.

Stanislav, feel free to fold it into the next version of your series if you
want!

-Toke


 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  7 +++-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  | 32 +++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en/xdp.h  | 10 ++++++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  3 ++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  4 +++
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 19 +++++------
 7 files changed, 65 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ff5b302531d5..960404027f0b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -629,7 +629,7 @@ typedef struct sk_buff *
 			       u16 cqe_bcnt, u32 head_offset, u32 page_idx);
 typedef struct sk_buff *
 (*mlx5e_fp_skb_from_cqe)(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
-			 u32 cqe_bcnt);
+			 struct mlx5_cqe64 *cqe, u32 cqe_bcnt);
 typedef bool (*mlx5e_fp_post_rx_wqes)(struct mlx5e_rq *rq);
 typedef void (*mlx5e_fp_dealloc_wqe)(struct mlx5e_rq*, u16);
 typedef void (*mlx5e_fp_shampo_dealloc_hd)(struct mlx5e_rq*, u16, u16, bool);
@@ -1035,6 +1035,11 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 proto,
 			   u16 vid);
 void mlx5e_timestamp_init(struct mlx5e_priv *priv);
 
+static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config)
+{
+	return config->rx_filter == HWTSTAMP_FILTER_ALL;
+}
+
 struct mlx5e_xsk_param;
 
 struct mlx5e_rq_param;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 20507ef2f956..604c8cdfde02 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -156,6 +156,38 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
 	return true;
 }
 
+bool mlx5e_xdp_rx_timestamp_supported(const struct xdp_md *ctx)
+{
+	const struct xdp_buff *xdp = (void *)ctx;
+	struct mlx5_xdp_ctx *mctx = xdp->drv_priv;
+
+	return mlx5e_rx_hw_stamp(mctx->rq->tstamp);
+}
+
+u64 mlx5e_xdp_rx_timestamp(const struct xdp_md *ctx)
+{
+	const struct xdp_buff *xdp = (void *)ctx;
+	struct mlx5_xdp_ctx *mctx = xdp->drv_priv;
+
+	return mlx5e_cqe_ts_to_ns(mctx->rq->ptp_cyc2time,
+				  mctx->rq->clock, get_cqe_ts(mctx->cqe));
+}
+
+bool mlx5e_xdp_rx_hash_supported(const struct xdp_md *ctx)
+{
+	const struct xdp_buff *xdp = (void *)ctx;
+
+	return xdp->rxq->dev->features & NETIF_F_RXHASH;
+}
+
+u32 mlx5e_xdp_rx_hash(const struct xdp_md *ctx)
+{
+	const struct xdp_buff *xdp = (void *)ctx;
+	struct mlx5_xdp_ctx *mctx = xdp->drv_priv;
+
+	return be32_to_cpu(mctx->cqe->rss_hash_result);
+}
+
 /* returns true if packet was consumed by xdp */
 bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
 		      struct bpf_prog *prog, struct xdp_buff *xdp)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
index bc2d9034af5b..07d80d0446ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
@@ -44,6 +44,11 @@
 	(MLX5E_XDP_INLINE_WQE_MAX_DS_CNT * MLX5_SEND_WQE_DS - \
 	 sizeof(struct mlx5_wqe_inline_seg))
 
+struct mlx5_xdp_ctx {
+	struct mlx5_cqe64 *cqe;
+	struct mlx5e_rq *rq;
+};
+
 struct mlx5e_xsk_param;
 int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk);
 bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
@@ -56,6 +61,11 @@ void mlx5e_xdp_rx_poll_complete(struct mlx5e_rq *rq);
 int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
 		   u32 flags);
 
+bool mlx5e_xdp_rx_hash_supported(const struct xdp_md *ctx);
+u32 mlx5e_xdp_rx_hash(const struct xdp_md *ctx);
+bool mlx5e_xdp_rx_timestamp_supported(const struct xdp_md *ctx);
+u64 mlx5e_xdp_rx_timestamp(const struct xdp_md *ctx);
+
 INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq,
 							  struct mlx5e_xmit_data *xdptxd,
 							  struct skb_shared_info *sinfo,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index c91b54d9ff27..c6715cb23d45 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -283,8 +283,10 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 
 struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 					      struct mlx5e_wqe_frag_info *wi,
+					      struct mlx5_cqe64 *cqe,
 					      u32 cqe_bcnt)
 {
+	struct mlx5_xdp_ctx mlctx = { .cqe = cqe, .rq = rq };
 	struct xdp_buff *xdp = wi->au->xsk;
 	struct bpf_prog *prog;
 
@@ -298,6 +300,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 	xsk_buff_set_size(xdp, cqe_bcnt);
 	xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
 	net_prefetch(xdp->data);
+	xdp->drv_priv = &mlctx;
 
 	prog = rcu_dereference(rq->xdp_prog);
 	if (likely(prog && mlx5e_xdp_handle(rq, NULL, prog, xdp)))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index 087c943bd8e9..9198f137f48f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -18,6 +18,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 						    u32 page_idx);
 struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 					      struct mlx5e_wqe_frag_info *wi,
-					      u32 cqe_bcnt);
+                                              struct mlx5_cqe64 *cqe,
+                                              u32 cqe_bcnt);
 
 #endif /* __MLX5_EN_XSK_RX_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 14bd86e368d5..015bfe891458 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4890,6 +4890,10 @@ const struct net_device_ops mlx5e_netdev_ops = {
 	.ndo_tx_timeout          = mlx5e_tx_timeout,
 	.ndo_bpf		 = mlx5e_xdp,
 	.ndo_xdp_xmit            = mlx5e_xdp_xmit,
+	.ndo_xdp_rx_timestamp_supported = mlx5e_xdp_rx_timestamp_supported,
+	.ndo_xdp_rx_timestamp    = mlx5e_xdp_rx_timestamp,
+	.ndo_xdp_rx_hash_supported = mlx5e_xdp_rx_hash_supported,
+	.ndo_xdp_rx_hash         = mlx5e_xdp_rx_hash,
 	.ndo_xsk_wakeup          = mlx5e_xsk_wakeup,
 #ifdef CONFIG_MLX5_EN_ARFS
 	.ndo_rx_flow_steer	 = mlx5e_rx_flow_steer,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b1ea0b995d9c..1d6600441e74 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -76,11 +76,6 @@ const struct mlx5e_rx_handlers mlx5e_rx_handlers_nic = {
 	.handle_rx_cqe_mpwqe_shampo = mlx5e_handle_rx_cqe_mpwrq_shampo,
 };
 
-static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config)
-{
-	return config->rx_filter == HWTSTAMP_FILTER_ALL;
-}
-
 static inline void mlx5e_read_cqe_slot(struct mlx5_cqwq *wq,
 				       u32 cqcc, void *data)
 {
@@ -1573,7 +1568,7 @@ static void mlx5e_fill_xdp_buff(struct mlx5e_rq *rq, void *va, u16 headroom,
 
 static struct sk_buff *
 mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
-			  u32 cqe_bcnt)
+			  struct mlx5_cqe64 *cqe, u32 cqe_bcnt)
 {
 	union mlx5e_alloc_unit *au = wi->au;
 	u16 rx_headroom = rq->buff.headroom;
@@ -1595,7 +1590,8 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
 
 	prog = rcu_dereference(rq->xdp_prog);
 	if (prog) {
-		struct xdp_buff xdp;
+		struct mlx5_xdp_ctx mlctx = { .cqe = cqe, .rq = rq };
+		struct xdp_buff xdp = { .drv_priv = &mlctx };
 
 		net_prefetchw(va); /* xdp_frame data area */
 		mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &xdp);
@@ -1619,16 +1615,17 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
 
 static struct sk_buff *
 mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
-			     u32 cqe_bcnt)
+			     struct mlx5_cqe64 *cqe, u32 cqe_bcnt)
 {
 	struct mlx5e_rq_frag_info *frag_info = &rq->wqe.info.arr[0];
+	struct mlx5_xdp_ctx mlctx = { .cqe = cqe, .rq = rq };
+	struct xdp_buff xdp = { .drv_priv = &mlctx };
 	struct mlx5e_wqe_frag_info *head_wi = wi;
 	union mlx5e_alloc_unit *au = wi->au;
 	u16 rx_headroom = rq->buff.headroom;
 	struct skb_shared_info *sinfo;
 	u32 frag_consumed_bytes;
 	struct bpf_prog *prog;
-	struct xdp_buff xdp;
 	struct sk_buff *skb;
 	dma_addr_t addr;
 	u32 truesize;
@@ -1766,7 +1763,7 @@ static void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 			      mlx5e_skb_from_cqe_linear,
 			      mlx5e_skb_from_cqe_nonlinear,
 			      mlx5e_xsk_skb_from_cqe_linear,
-			      rq, wi, cqe_bcnt);
+			      rq, wi, cqe, cqe_bcnt);
 	if (!skb) {
 		/* probably for XDP */
 		if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
@@ -2575,7 +2572,7 @@ static void mlx5e_trap_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe
 		goto free_wqe;
 	}
 
-	skb = mlx5e_skb_from_cqe_nonlinear(rq, wi, cqe_bcnt);
+	skb = mlx5e_skb_from_cqe_nonlinear(rq, wi, cqe, cqe_bcnt);
 	if (!skb)
 		goto free_wqe;
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-23 14:33   ` [xdp-hints] " Toke Høiland-Jørgensen
@ 2022-11-23 18:26     ` Stanislav Fomichev
  2022-11-23 19:14       ` Jakub Kicinski
  0 siblings, 1 reply; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-23 18:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Wed, Nov 23, 2022 at 6:33 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > No functional changes. Boilerplate to allow stuffing more data after xdp_buff.
> >
> > Cc: Tariq Toukan <tariqt@nvidia.com>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > Cc: xdp-hints@xdp-project.net
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 26 +++++++++++++---------
> >  1 file changed, 15 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > index 8f762fc170b3..467356633172 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > @@ -661,17 +661,21 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
> >  #define MLX4_CQE_STATUS_IP_ANY (MLX4_CQE_STATUS_IPV4)
> >  #endif
> >
> > +struct mlx4_xdp_buff {
> > +     struct xdp_buff xdp;
> > +};
>
> This embedding trick works for drivers that put xdp_buff on the stack,
> but mlx5 supports XSK zerocopy, which uses the xsk_buff_pool for
> allocating them. This makes it a bit awkward to do the same thing there;
> and since it's probably going to be fairly common to do something like
> this, how about we just add a 'void *drv_priv' pointer to struct
> xdp_buff that the drivers can use? The xdp_buff already takes up a full
> cache line anyway, so any data stuffed after it will spill over to a new
> one; so I don't think there's much difference performance-wise.

I guess the alternative is to extend xsk_buff_pool with some new
argument for xdp_buff tailroom? (so it can kmalloc(sizeof(xdp_buff) +
xdp_buff_tailroom))
But it seems messy because there is no way of knowing what the target
device's tailroom is, so it has to be a user setting :-/
I've started with a priv pointer in xdp_buff initially, it seems fine
to go back. I'll probably convert veth/mlx4 to the same mode as well
to avoid having different approaches in different places..

> I'll send my patch to add support to mlx5 (using the drv_priv pointer
> approach) separately.

Saw them, thanks! Will include them in v3+.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 8/8] selftests/bpf: Simple program to dump XDP RX metadata
  2022-11-23 14:26   ` [xdp-hints] " Toke Høiland-Jørgensen
@ 2022-11-23 18:29     ` Stanislav Fomichev
  2022-11-23 19:17       ` Jakub Kicinski
  0 siblings, 1 reply; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-23 18:29 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Wed, Nov 23, 2022 at 6:26 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > +static int rxq_num(const char *ifname)
> > +{
> > +     struct ethtool_channels ch = {
> > +             .cmd = ETHTOOL_GCHANNELS,
> > +     };
> > +
> > +     struct ifreq ifr = {
> > +             .ifr_data = (void *)&ch,
> > +     };
> > +     strcpy(ifr.ifr_name, ifname);
> > +     int fd, ret;
> > +
> > +     fd = socket(AF_UNIX, SOCK_DGRAM, 0);
> > +     if (fd < 0)
> > +             error(-1, errno, "socket");
> > +
> > +     ret = ioctl(fd, SIOCETHTOOL, &ifr);
> > +     if (ret < 0)
> > +             error(-1, errno, "socket");
> > +
> > +     close(fd);
> > +
> > +     return ch.rx_count;
> > +}
>
> mlx5 uses 'combined' channels, so this returns 0. Changing it to just:
>
> return ch.rx_count ?: ch.combined_count;
>
> works though :)

Perfect, will do the same :-) Thank you for running and testing!

> -Toke
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-23  6:34   ` Martin KaFai Lau
@ 2022-11-23 18:43     ` Stanislav Fomichev
  0 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-23 18:43 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: ast, daniel, andrii, song, yhs, john.fastabend, kpsingh, haoluo,
	jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, bpf

On Tue, Nov 22, 2022 at 10:34 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 11/21/22 10:25 AM, Stanislav Fomichev wrote:
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2576,6 +2576,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
> >       } else {
> >               bpf_jit_free(aux->prog);
> >       }
> > +     dev_put(aux->xdp_netdev);
>
> I think dev_put needs to be done during unregister_netdevice event also.
> Otherwise, a loaded bpf prog may hold the dev for a long time.  May be there is
> ideas in offload.c.

Let me try to play with a veth pair to make sure the proper cleanup triggers.
I see your point that we now seemingly have to detach/unload the
program to trigger netdev cleanup..

> [ ... ]
>
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 35972afb6850..ece7f9234b2d 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -2491,7 +2491,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
> >                                BPF_F_TEST_STATE_FREQ |
> >                                BPF_F_SLEEPABLE |
> >                                BPF_F_TEST_RND_HI32 |
> > -                              BPF_F_XDP_HAS_FRAGS))
> > +                              BPF_F_XDP_HAS_FRAGS |
> > +                              BPF_F_XDP_HAS_METADATA))
> >               return -EINVAL;
> >
> >       if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
> > @@ -2579,6 +2580,20 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
> >       prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
> >       prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
> >
> > +     if (attr->prog_flags & BPF_F_XDP_HAS_METADATA) {
> > +             /* Reuse prog_ifindex to bind to the device
> > +              * for XDP metadata kfuncs.
> > +              */
> > +             prog->aux->offload_requested = false;
> > +
> > +             prog->aux->xdp_netdev = dev_get_by_index(current->nsproxy->net_ns,
> > +                                                      attr->prog_ifindex);
> > +             if (!prog->aux->xdp_netdev) {
> > +                     err = -EINVAL;
> > +                     goto free_prog;
> > +             }
> > +     }
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-23 14:24   ` [xdp-hints] " Toke Høiland-Jørgensen
@ 2022-11-23 18:43     ` Stanislav Fomichev
  0 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-23 18:43 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Wed, Nov 23, 2022 at 6:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> >  static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >                           struct bpf_insn *insn_buf, int insn_idx, int *cnt)
> >  {
> > @@ -15181,6 +15200,15 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >               return -EINVAL;
> >       }
> >
> > +     if (resolve_prog_type(env->prog) == BPF_PROG_TYPE_XDP) {
> > +             int imm = fixup_xdp_kfunc_call(env, insn->imm);
> > +
> > +             if (imm) {
> > +                     insn->imm = imm;
> > +                     return 0;
>
> This needs to also set *cnt = 0 before returning; otherwise the verifier
> can do some really weird instruction rewriting that leads to the JIT
> barfing on invalid instructions (as I just found out while trying to
> test this).

Oops, that was me not paying too much attention during the merge..
Yonghong actually did some kfunc unrolling, yay :-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-23 18:26     ` Stanislav Fomichev
@ 2022-11-23 19:14       ` Jakub Kicinski
  2022-11-23 19:52         ` sdf
  0 siblings, 1 reply; 50+ messages in thread
From: Jakub Kicinski @ 2022-11-23 19:14 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Toke Høiland-Jørgensen, bpf, ast, daniel, andrii,
	martin.lau, song, yhs, john.fastabend, kpsingh, haoluo, jolsa,
	Tariq Toukan, David Ahern, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

On Wed, 23 Nov 2022 10:26:41 -0800 Stanislav Fomichev wrote:
> > This embedding trick works for drivers that put xdp_buff on the stack,
> > but mlx5 supports XSK zerocopy, which uses the xsk_buff_pool for
> > allocating them. This makes it a bit awkward to do the same thing there;
> > and since it's probably going to be fairly common to do something like
> > this, how about we just add a 'void *drv_priv' pointer to struct
> > xdp_buff that the drivers can use? The xdp_buff already takes up a full
> > cache line anyway, so any data stuffed after it will spill over to a new
> > one; so I don't think there's much difference performance-wise.  
> 
> I guess the alternative is to extend xsk_buff_pool with some new
> argument for xdp_buff tailroom? (so it can kmalloc(sizeof(xdp_buff) +
> xdp_buff_tailroom))
> But it seems messy because there is no way of knowing what the target
> device's tailroom is, so it has to be a user setting :-/
> I've started with a priv pointer in xdp_buff initially, it seems fine
> to go back. I'll probably convert veth/mlx4 to the same mode as well
> to avoid having different approaches in different places..

Can we not do this please? Add 16B of "private driver space" after
the xdp_buff in xdp_buff_xsk (we have 16B to full cacheline), the
drivers decide how they use it. Drivers can do BUILD_BUG_ON() for their
expected size and cast that to whatever struct they want. This is how
various offloads work, the variable size tailroom would be an over
design IMO.

And this way non XSK paths can keep its normal typing.

> > I'll send my patch to add support to mlx5 (using the drv_priv pointer
> > approach) separately.  
> 
> Saw them, thanks! Will include them in v3+.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 8/8] selftests/bpf: Simple program to dump XDP RX metadata
  2022-11-23 18:29     ` Stanislav Fomichev
@ 2022-11-23 19:17       ` Jakub Kicinski
  2022-11-23 19:54         ` Stanislav Fomichev
  0 siblings, 1 reply; 50+ messages in thread
From: Jakub Kicinski @ 2022-11-23 19:17 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Toke Høiland-Jørgensen, bpf, ast, daniel, andrii,
	martin.lau, song, yhs, john.fastabend, kpsingh, haoluo, jolsa,
	David Ahern, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Wed, 23 Nov 2022 10:29:23 -0800 Stanislav Fomichev wrote:
> > return ch.rx_count ?: ch.combined_count;
> >
> > works though :)  
> 
> Perfect, will do the same :-) Thank you for running and testing!

The correct value is ch.rx_count + ch.combined_count

We've been over this many times, I thought it was coded up in libbpf
but I don't see it now :S

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-23 19:14       ` Jakub Kicinski
@ 2022-11-23 19:52         ` sdf
  2022-11-23 21:54           ` Maciej Fijalkowski
  2022-11-23 21:55           ` [xdp-hints] " Toke Høiland-Jørgensen
  0 siblings, 2 replies; 50+ messages in thread
From: sdf @ 2022-11-23 19:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toke Høiland-Jørgensen, bpf, ast, daniel, andrii,
	martin.lau, song, yhs, john.fastabend, kpsingh, haoluo, jolsa,
	Tariq Toukan, David Ahern, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

On 11/23, Jakub Kicinski wrote:
> On Wed, 23 Nov 2022 10:26:41 -0800 Stanislav Fomichev wrote:
> > > This embedding trick works for drivers that put xdp_buff on the stack,
> > > but mlx5 supports XSK zerocopy, which uses the xsk_buff_pool for
> > > allocating them. This makes it a bit awkward to do the same thing  
> there;
> > > and since it's probably going to be fairly common to do something like
> > > this, how about we just add a 'void *drv_priv' pointer to struct
> > > xdp_buff that the drivers can use? The xdp_buff already takes up a  
> full
> > > cache line anyway, so any data stuffed after it will spill over to a  
> new
> > > one; so I don't think there's much difference performance-wise.
> >
> > I guess the alternative is to extend xsk_buff_pool with some new
> > argument for xdp_buff tailroom? (so it can kmalloc(sizeof(xdp_buff) +
> > xdp_buff_tailroom))
> > But it seems messy because there is no way of knowing what the target
> > device's tailroom is, so it has to be a user setting :-/
> > I've started with a priv pointer in xdp_buff initially, it seems fine
> > to go back. I'll probably convert veth/mlx4 to the same mode as well
> > to avoid having different approaches in different places..

> Can we not do this please? Add 16B of "private driver space" after
> the xdp_buff in xdp_buff_xsk (we have 16B to full cacheline), the
> drivers decide how they use it. Drivers can do BUILD_BUG_ON() for their
> expected size and cast that to whatever struct they want. This is how
> various offloads work, the variable size tailroom would be an over
> design IMO.

> And this way non XSK paths can keep its normal typing.

Good idea, prototyped below, lmk if it that's not what you had in mind.

struct xdp_buff_xsk {
	struct xdp_buff            xdp;                  /*     0    56 */
	u8                         cb[16];               /*    56    16 */
	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
	dma_addr_t                 dma;                  /*    72     8 */
	dma_addr_t                 frame_dma;            /*    80     8 */
	struct xsk_buff_pool *     pool;                 /*    88     8 */
	u64                        orig_addr;            /*    96     8 */
	struct list_head           free_list_node;       /*   104    16 */

	/* size: 120, cachelines: 2, members: 7 */
	/* last cacheline: 56 bytes */
};

Toke, I can try to merge this into your patch + keep your SoB (or feel free
to try this and retest yourself, whatever works).

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h  
b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
index bc2d9034af5b..837bf103b871 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
@@ -44,6 +44,11 @@
  	(MLX5E_XDP_INLINE_WQE_MAX_DS_CNT * MLX5_SEND_WQE_DS - \
  	 sizeof(struct mlx5_wqe_inline_seg))

+struct mlx5_xdp_cb {
+	struct mlx5_cqe64 *cqe;
+	struct mlx5e_rq *rq;
+};
+
  struct mlx5e_xsk_param;
  int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param  
*xsk);
  bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c  
b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index c91b54d9ff27..84d23b2da7ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -5,6 +5,7 @@
  #include "en/xdp.h"
  #include <net/xdp_sock_drv.h>
  #include <linux/filter.h>
+#include <linux/build_bug.h>

  /* RX data path */

@@ -286,8 +287,14 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct  
mlx5e_rq *rq,
  					      u32 cqe_bcnt)
  {
  	struct xdp_buff *xdp = wi->au->xsk;
+	struct mlx5_xdp_cb *cb;
  	struct bpf_prog *prog;

+	BUILD_BUG_ON(sizeof(struct mlx5_xdp_cb) > XSKB_CB_SIZE);
+	cb = xp_get_cb(xdp);
+	cb->cqe = NULL /*cqe*/;
+	cb->rq = rq;
+
  	/* wi->offset is not used in this function, because xdp->data and the
  	 * DMA address point directly to the necessary place. Furthermore, the
  	 * XSK allocator allocates frames per packet, instead of pages, so
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index f787c3f524b0..b298590429e7 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -19,8 +19,11 @@ struct xdp_sock;
  struct device;
  struct page;

+#define XSKB_CB_SIZE 16
+
  struct xdp_buff_xsk {
  	struct xdp_buff xdp;
+	u8 cb[XSKB_CB_SIZE]; /* Private area for the drivers to use. */
  	dma_addr_t dma;
  	dma_addr_t frame_dma;
  	struct xsk_buff_pool *pool;
@@ -143,6 +146,11 @@ static inline dma_addr_t xp_get_frame_dma(struct  
xdp_buff_xsk *xskb)
  	return xskb->frame_dma;
  }

+static inline void *xp_get_cb(struct xdp_buff *xdp)
+{
+	return (void *)xdp + offsetof(struct xdp_buff_xsk, cb);
+}
+
  void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb);
  static inline void xp_dma_sync_for_cpu(struct xdp_buff_xsk *xskb)
  {

> > > I'll send my patch to add support to mlx5 (using the drv_priv pointer
> > > approach) separately.
> >
> > Saw them, thanks! Will include them in v3+.

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 8/8] selftests/bpf: Simple program to dump XDP RX metadata
  2022-11-23 19:17       ` Jakub Kicinski
@ 2022-11-23 19:54         ` Stanislav Fomichev
  0 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-23 19:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toke Høiland-Jørgensen, bpf, ast, daniel, andrii,
	martin.lau, song, yhs, john.fastabend, kpsingh, haoluo, jolsa,
	David Ahern, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Wed, Nov 23, 2022 at 11:17 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 23 Nov 2022 10:29:23 -0800 Stanislav Fomichev wrote:
> > > return ch.rx_count ?: ch.combined_count;
> > >
> > > works though :)
> >
> > Perfect, will do the same :-) Thank you for running and testing!
>
> The correct value is ch.rx_count + ch.combined_count
>
> We've been over this many times, I thought it was coded up in libbpf
> but I don't see it now :S

Yeah, can't find it. Also doesn't exist on libxdp/libxsk. Will apply
your fix, thank you!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-23 19:52         ` sdf
@ 2022-11-23 21:54           ` Maciej Fijalkowski
  2022-11-23 21:55           ` [xdp-hints] " Toke Høiland-Jørgensen
  1 sibling, 0 replies; 50+ messages in thread
From: Maciej Fijalkowski @ 2022-11-23 21:54 UTC (permalink / raw)
  To: sdf
  Cc: Jakub Kicinski, Toke Høiland-Jørgensen, bpf, ast,
	daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh,
	haoluo, jolsa, Tariq Toukan, David Ahern, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

On Wed, Nov 23, 2022 at 11:52:12AM -0800, sdf@google.com wrote:
> On 11/23, Jakub Kicinski wrote:
> > On Wed, 23 Nov 2022 10:26:41 -0800 Stanislav Fomichev wrote:
> > > > This embedding trick works for drivers that put xdp_buff on the stack,
> > > > but mlx5 supports XSK zerocopy, which uses the xsk_buff_pool for
> > > > allocating them. This makes it a bit awkward to do the same thing
> > there;
> > > > and since it's probably going to be fairly common to do something like
> > > > this, how about we just add a 'void *drv_priv' pointer to struct
> > > > xdp_buff that the drivers can use? The xdp_buff already takes up a
> > full
> > > > cache line anyway, so any data stuffed after it will spill over to a
> > new
> > > > one; so I don't think there's much difference performance-wise.
> > >
> > > I guess the alternative is to extend xsk_buff_pool with some new
> > > argument for xdp_buff tailroom? (so it can kmalloc(sizeof(xdp_buff) +
> > > xdp_buff_tailroom))
> > > But it seems messy because there is no way of knowing what the target
> > > device's tailroom is, so it has to be a user setting :-/
> > > I've started with a priv pointer in xdp_buff initially, it seems fine
> > > to go back. I'll probably convert veth/mlx4 to the same mode as well
> > > to avoid having different approaches in different places..
> 
> > Can we not do this please? Add 16B of "private driver space" after
> > the xdp_buff in xdp_buff_xsk (we have 16B to full cacheline), the

It is time to jump the hints train I guess:D

We have 8 bytes left in the cacheline that xdp_buff occupies - pahole
output below shows that cb spans through two cachelines. Did you mean
something else though?

> > drivers decide how they use it. Drivers can do BUILD_BUG_ON() for their
> > expected size and cast that to whatever struct they want. This is how
> > various offloads work, the variable size tailroom would be an over
> > design IMO.
> 
> > And this way non XSK paths can keep its normal typing.
> 
> Good idea, prototyped below, lmk if it that's not what you had in mind.
> 
> struct xdp_buff_xsk {
> 	struct xdp_buff            xdp;                  /*     0    56 */
> 	u8                         cb[16];               /*    56    16 */
> 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> 	dma_addr_t                 dma;                  /*    72     8 */
> 	dma_addr_t                 frame_dma;            /*    80     8 */
> 	struct xsk_buff_pool *     pool;                 /*    88     8 */
> 	u64                        orig_addr;            /*    96     8 */
> 	struct list_head           free_list_node;       /*   104    16 */
> 
> 	/* size: 120, cachelines: 2, members: 7 */
> 	/* last cacheline: 56 bytes */
> };
> 
> Toke, I can try to merge this into your patch + keep your SoB (or feel free
> to try this and retest yourself, whatever works).
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> index bc2d9034af5b..837bf103b871 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> @@ -44,6 +44,11 @@
>  	(MLX5E_XDP_INLINE_WQE_MAX_DS_CNT * MLX5_SEND_WQE_DS - \
>  	 sizeof(struct mlx5_wqe_inline_seg))
> 
> +struct mlx5_xdp_cb {
> +	struct mlx5_cqe64 *cqe;
> +	struct mlx5e_rq *rq;
> +};
> +
>  struct mlx5e_xsk_param;
>  int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param
> *xsk);
>  bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> index c91b54d9ff27..84d23b2da7ce 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> @@ -5,6 +5,7 @@
>  #include "en/xdp.h"
>  #include <net/xdp_sock_drv.h>
>  #include <linux/filter.h>
> +#include <linux/build_bug.h>
> 
>  /* RX data path */
> 
> @@ -286,8 +287,14 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct
> mlx5e_rq *rq,
>  					      u32 cqe_bcnt)
>  {
>  	struct xdp_buff *xdp = wi->au->xsk;
> +	struct mlx5_xdp_cb *cb;
>  	struct bpf_prog *prog;
> 
> +	BUILD_BUG_ON(sizeof(struct mlx5_xdp_cb) > XSKB_CB_SIZE);
> +	cb = xp_get_cb(xdp);
> +	cb->cqe = NULL /*cqe*/;
> +	cb->rq = rq;

I believe that these could be set once at a setup time within a pool -
take a look at xsk_pool_set_rxq_info(). This will save us cycles so that
we will skip assignments per each processed xdp_buff.

AF_XDP ZC performance comes in a major part from the fact that thanks to
xsk_buff_pool we have less work to do per each processed buffer.

> +
>  	/* wi->offset is not used in this function, because xdp->data and the
>  	 * DMA address point directly to the necessary place. Furthermore, the
>  	 * XSK allocator allocates frames per packet, instead of pages, so
> diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> index f787c3f524b0..b298590429e7 100644
> --- a/include/net/xsk_buff_pool.h
> +++ b/include/net/xsk_buff_pool.h
> @@ -19,8 +19,11 @@ struct xdp_sock;
>  struct device;
>  struct page;
> 
> +#define XSKB_CB_SIZE 16
> +
>  struct xdp_buff_xsk {
>  	struct xdp_buff xdp;
> +	u8 cb[XSKB_CB_SIZE]; /* Private area for the drivers to use. */
>  	dma_addr_t dma;
>  	dma_addr_t frame_dma;
>  	struct xsk_buff_pool *pool;
> @@ -143,6 +146,11 @@ static inline dma_addr_t xp_get_frame_dma(struct
> xdp_buff_xsk *xskb)
>  	return xskb->frame_dma;
>  }
> 
> +static inline void *xp_get_cb(struct xdp_buff *xdp)
> +{
> +	return (void *)xdp + offsetof(struct xdp_buff_xsk, cb);
> +}

This should have a wrapper in include/net/xdp_sock_drv.h that drivers will
call.

Generally I think this should fly but I'm not sure about cb being 16
bytes.

> +
>  void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb);
>  static inline void xp_dma_sync_for_cpu(struct xdp_buff_xsk *xskb)
>  {
> 
> > > > I'll send my patch to add support to mlx5 (using the drv_priv pointer
> > > > approach) separately.
> > >
> > > Saw them, thanks! Will include them in v3+.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-23 19:52         ` sdf
  2022-11-23 21:54           ` Maciej Fijalkowski
@ 2022-11-23 21:55           ` Toke Høiland-Jørgensen
  2022-11-24  1:47             ` Jakub Kicinski
  1 sibling, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-23 21:55 UTC (permalink / raw)
  To: sdf, Jakub Kicinski
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, Tariq Toukan, David Ahern,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

sdf@google.com writes:

> On 11/23, Jakub Kicinski wrote:
>> On Wed, 23 Nov 2022 10:26:41 -0800 Stanislav Fomichev wrote:
>> > > This embedding trick works for drivers that put xdp_buff on the stack,
>> > > but mlx5 supports XSK zerocopy, which uses the xsk_buff_pool for
>> > > allocating them. This makes it a bit awkward to do the same thing  
>> there;
>> > > and since it's probably going to be fairly common to do something like
>> > > this, how about we just add a 'void *drv_priv' pointer to struct
>> > > xdp_buff that the drivers can use? The xdp_buff already takes up a  
>> full
>> > > cache line anyway, so any data stuffed after it will spill over to a  
>> new
>> > > one; so I don't think there's much difference performance-wise.
>> >
>> > I guess the alternative is to extend xsk_buff_pool with some new
>> > argument for xdp_buff tailroom? (so it can kmalloc(sizeof(xdp_buff) +
>> > xdp_buff_tailroom))
>> > But it seems messy because there is no way of knowing what the target
>> > device's tailroom is, so it has to be a user setting :-/
>> > I've started with a priv pointer in xdp_buff initially, it seems fine
>> > to go back. I'll probably convert veth/mlx4 to the same mode as well
>> > to avoid having different approaches in different places..
>
>> Can we not do this please? Add 16B of "private driver space" after
>> the xdp_buff in xdp_buff_xsk (we have 16B to full cacheline), the
>> drivers decide how they use it. Drivers can do BUILD_BUG_ON() for their
>> expected size and cast that to whatever struct they want. This is how
>> various offloads work, the variable size tailroom would be an over
>> design IMO.
>
>> And this way non XSK paths can keep its normal typing.
>
> Good idea, prototyped below, lmk if it that's not what you had in mind.
>
> struct xdp_buff_xsk {
> 	struct xdp_buff            xdp;                  /*     0    56 */
> 	u8                         cb[16];               /*    56    16 */
> 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */

As pahole helpfully says here, xdp_buff is actually only 8 bytes from
being a full cache line. I thought about adding a 'cb' field like this
to xdp_buff itself, but figured that since there's only room for a
single pointer, why not just add that and let the driver point it to
where it wants to store the extra context data?

I am not suggesting to make anything variable-size; the 'void *drv_priv'
is just a normal pointer. There's no changes to any typing; not sure
where you got that from, Jakub?

Also, the priv pointer approach works for both XSK and on-stack
allocations, unlike this approach (see below).

> 	dma_addr_t                 dma;                  /*    72     8 */
> 	dma_addr_t                 frame_dma;            /*    80     8 */
> 	struct xsk_buff_pool *     pool;                 /*    88     8 */
> 	u64                        orig_addr;            /*    96     8 */
> 	struct list_head           free_list_node;       /*   104    16 */
>
> 	/* size: 120, cachelines: 2, members: 7 */
> 	/* last cacheline: 56 bytes */
> };
>
> Toke, I can try to merge this into your patch + keep your SoB (or feel free
> to try this and retest yourself, whatever works).
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h  
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> index bc2d9034af5b..837bf103b871 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> @@ -44,6 +44,11 @@
>   	(MLX5E_XDP_INLINE_WQE_MAX_DS_CNT * MLX5_SEND_WQE_DS - \
>   	 sizeof(struct mlx5_wqe_inline_seg))
>
> +struct mlx5_xdp_cb {
> +	struct mlx5_cqe64 *cqe;
> +	struct mlx5e_rq *rq;
> +};
> +
>   struct mlx5e_xsk_param;
>   int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param  
> *xsk);
>   bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c  
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> index c91b54d9ff27..84d23b2da7ce 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> @@ -5,6 +5,7 @@
>   #include "en/xdp.h"
>   #include <net/xdp_sock_drv.h>
>   #include <linux/filter.h>
> +#include <linux/build_bug.h>
>
>   /* RX data path */
>
> @@ -286,8 +287,14 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct  
> mlx5e_rq *rq,
>   					      u32 cqe_bcnt)
>   {
>   	struct xdp_buff *xdp = wi->au->xsk;
> +	struct mlx5_xdp_cb *cb;
>   	struct bpf_prog *prog;
>
> +	BUILD_BUG_ON(sizeof(struct mlx5_xdp_cb) > XSKB_CB_SIZE);
> +	cb = xp_get_cb(xdp);
> +	cb->cqe = NULL /*cqe*/;
> +	cb->rq = rq;

So this works fine for the XSK path, but for the regular XDP path, mlx5
*does* indeed put the xdp_buff on the stack. So to re-use code there
would be an implicit assumption that both memory layout and size matches
between the two paths. I'm not sure that's better than just having a
pointer inside the xdp_buff and pointing it wherever makes sense for
that driver (as my patch did)?

-Toke


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next 2/2] mlx5: Support XDP RX metadata
  2022-11-23 14:46   ` [PATCH bpf-next 2/2] mlx5: Support XDP RX metadata Toke Høiland-Jørgensen
@ 2022-11-23 22:29     ` Saeed Mahameed
  2022-11-23 22:44       ` Stanislav Fomichev
  0 siblings, 1 reply; 50+ messages in thread
From: Saeed Mahameed @ 2022-11-23 22:29 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, John Fastabend, David Ahern, Martin KaFai Lau,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, Stanislav Fomichev, xdp-hints, netdev

On 23 Nov 15:46, Toke Høiland-Jørgensen wrote:
>Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
>pointer to the mlx5e_skb_from* functions so it can be retrieved from the
>XDP ctx to do this.
>
>Cc: John Fastabend <john.fastabend@gmail.com>
>Cc: David Ahern <dsahern@gmail.com>
>Cc: Martin KaFai Lau <martin.lau@linux.dev>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Willem de Bruijn <willemb@google.com>
>Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>Cc: Maryam Tahhan <mtahhan@redhat.com>
>Cc: Stanislav Fomichev <sdf@google.com>
>Cc: xdp-hints@xdp-project.net
>Cc: netdev@vger.kernel.org
>Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>---
>This goes on top of Stanislav's series, obvioulsy. Verified that it works using
>the xdp_hw_metadata utility; going to do ome benchmarking and follow up with the
>results, but figured I'd send this out straight away in case others wanted to
>play with it.
>
>Stanislav, feel free to fold it into the next version of your series if you
>want!
>

[...]

> #endif /* __MLX5_EN_XSK_RX_H__ */
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>index 14bd86e368d5..015bfe891458 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>@@ -4890,6 +4890,10 @@ const struct net_device_ops mlx5e_netdev_ops = {
> 	.ndo_tx_timeout          = mlx5e_tx_timeout,
> 	.ndo_bpf		 = mlx5e_xdp,
> 	.ndo_xdp_xmit            = mlx5e_xdp_xmit,
>+	.ndo_xdp_rx_timestamp_supported = mlx5e_xdp_rx_timestamp_supported,
>+	.ndo_xdp_rx_timestamp    = mlx5e_xdp_rx_timestamp,
>+	.ndo_xdp_rx_hash_supported = mlx5e_xdp_rx_hash_supported,
>+	.ndo_xdp_rx_hash         = mlx5e_xdp_rx_hash,

I hope i am not late to the party.
but I already expressed my feelings regarding using kfunc for xdp hints,
@LPC and @netdevconf.

I think it's wrong to use indirect calls, and for many usecases the
overhead will be higher than just calculating the metadata on the spot.

so you will need two indirect calls per packet per hint.. 
some would argue on some systems calculating the hash would be much faster.
and one major reason to have the hints is to accelerate xdp edge and
security programs with the hw provided hints.

what happened with just asking the driver to place the data in a specific
location on the headroom? 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next 2/2] mlx5: Support XDP RX metadata
  2022-11-23 22:29     ` [xdp-hints] " Saeed Mahameed
@ 2022-11-23 22:44       ` Stanislav Fomichev
  0 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-23 22:44 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Toke Høiland-Jørgensen, bpf, John Fastabend,
	David Ahern, Martin KaFai Lau, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

On Wed, Nov 23, 2022 at 2:29 PM Saeed Mahameed <saeed@kernel.org> wrote:
>
> On 23 Nov 15:46, Toke Høiland-Jørgensen wrote:
> >Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
> >pointer to the mlx5e_skb_from* functions so it can be retrieved from the
> >XDP ctx to do this.
> >
> >Cc: John Fastabend <john.fastabend@gmail.com>
> >Cc: David Ahern <dsahern@gmail.com>
> >Cc: Martin KaFai Lau <martin.lau@linux.dev>
> >Cc: Jakub Kicinski <kuba@kernel.org>
> >Cc: Willem de Bruijn <willemb@google.com>
> >Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> >Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> >Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> >Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> >Cc: Maryam Tahhan <mtahhan@redhat.com>
> >Cc: Stanislav Fomichev <sdf@google.com>
> >Cc: xdp-hints@xdp-project.net
> >Cc: netdev@vger.kernel.org
> >Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >---
> >This goes on top of Stanislav's series, obvioulsy. Verified that it works using
> >the xdp_hw_metadata utility; going to do ome benchmarking and follow up with the
> >results, but figured I'd send this out straight away in case others wanted to
> >play with it.
> >
> >Stanislav, feel free to fold it into the next version of your series if you
> >want!
> >
>
> [...]
>
> > #endif /* __MLX5_EN_XSK_RX_H__ */
> >diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> >index 14bd86e368d5..015bfe891458 100644
> >--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> >+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> >@@ -4890,6 +4890,10 @@ const struct net_device_ops mlx5e_netdev_ops = {
> >       .ndo_tx_timeout          = mlx5e_tx_timeout,
> >       .ndo_bpf                 = mlx5e_xdp,
> >       .ndo_xdp_xmit            = mlx5e_xdp_xmit,
> >+      .ndo_xdp_rx_timestamp_supported = mlx5e_xdp_rx_timestamp_supported,
> >+      .ndo_xdp_rx_timestamp    = mlx5e_xdp_rx_timestamp,
> >+      .ndo_xdp_rx_hash_supported = mlx5e_xdp_rx_hash_supported,
> >+      .ndo_xdp_rx_hash         = mlx5e_xdp_rx_hash,
>
> I hope i am not late to the party.
> but I already expressed my feelings regarding using kfunc for xdp hints,
> @LPC and @netdevconf.
>
> I think it's wrong to use indirect calls, and for many usecases the
> overhead will be higher than just calculating the metadata on the spot.
>
> so you will need two indirect calls per packet per hint..
> some would argue on some systems calculating the hash would be much faster.
> and one major reason to have the hints is to accelerate xdp edge and
> security programs with the hw provided hints.
>
> what happened with just asking the driver to place the data in a specific
> location on the headroom?

Take a look at [0], we are resolving indirect calls. We can also
always go back to unrolling those calls as was done initially in [1].

0: https://lore.kernel.org/bpf/20221121182552.2152891-3-sdf@google.com/
1: https://lore.kernel.org/bpf/20221115030210.3159213-4-sdf@google.com/

kfunc approach seems more flexible than an all-or-nothing approach
with the driver pre-filling all metadata.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-23 21:55           ` [xdp-hints] " Toke Høiland-Jørgensen
@ 2022-11-24  1:47             ` Jakub Kicinski
  2022-11-24 14:39               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Jakub Kicinski @ 2022-11-24  1:47 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: sdf, bpf, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, Tariq Toukan,
	David Ahern, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Wed, 23 Nov 2022 22:55:21 +0100 Toke Høiland-Jørgensen wrote:
> > Good idea, prototyped below, lmk if it that's not what you had in mind.
> >
> > struct xdp_buff_xsk {
> > 	struct xdp_buff            xdp;                  /*     0    56 */
> > 	u8                         cb[16];               /*    56    16 */
> > 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */  
> 
> As pahole helpfully says here, xdp_buff is actually only 8 bytes from
> being a full cache line. I thought about adding a 'cb' field like this
> to xdp_buff itself, but figured that since there's only room for a
> single pointer, why not just add that and let the driver point it to
> where it wants to store the extra context data?

What if the driver wants to store multiple pointers or an integer or
whatever else? The single pointer seems quite arbitrary and not
strictly necessary.

> I am not suggesting to make anything variable-size; the 'void *drv_priv'
> is just a normal pointer. There's no changes to any typing; not sure
> where you got that from, Jakub?

Often the descriptor pointer is in the same stack frame as the xdp_buff
(or close enough). The point of adding the wrapping structure is to be
able to move the descriptor pointer into a known place and then there's
no extra store copying the descriptor pointer from one place on the
stack to another.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-24  1:47             ` Jakub Kicinski
@ 2022-11-24 14:39               ` Toke Høiland-Jørgensen
  2022-11-24 15:17                 ` Maciej Fijalkowski
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-24 14:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: sdf, bpf, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, Tariq Toukan,
	David Ahern, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 23 Nov 2022 22:55:21 +0100 Toke Høiland-Jørgensen wrote:
>> > Good idea, prototyped below, lmk if it that's not what you had in mind.
>> >
>> > struct xdp_buff_xsk {
>> > 	struct xdp_buff            xdp;                  /*     0    56 */
>> > 	u8                         cb[16];               /*    56    16 */
>> > 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */  
>> 
>> As pahole helpfully says here, xdp_buff is actually only 8 bytes from
>> being a full cache line. I thought about adding a 'cb' field like this
>> to xdp_buff itself, but figured that since there's only room for a
>> single pointer, why not just add that and let the driver point it to
>> where it wants to store the extra context data?
>
> What if the driver wants to store multiple pointers or an integer or
> whatever else? The single pointer seems quite arbitrary and not
> strictly necessary.

Well, then you allocate a separate struct and point to that? Like I did
in mlx5:


+	struct mlx5_xdp_ctx mlctx = { .cqe = cqe, .rq = rq };
+	struct xdp_buff xdp = { .drv_priv = &mlctx };

but yeah, this does give an extra pointer deref on access. I'm not
really opposed to the cb field either, I just think it's a bit odd to
put it in struct xdp_buff_xsk; that basically requires the driver to
keep the layouts in sync.

Instead, why not but a cb field into xdp_buff itself so it can be used
for both the XSK and the non-XSK paths? Then the driver can just
typecast the xdp_buff into its own struct that has whatever data it
wants in place of the cb field?

-Toke


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-24 14:39               ` Toke Høiland-Jørgensen
@ 2022-11-24 15:17                 ` Maciej Fijalkowski
  2022-11-24 16:11                   ` Maciej Fijalkowski
  0 siblings, 1 reply; 50+ messages in thread
From: Maciej Fijalkowski @ 2022-11-24 15:17 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, sdf, bpf, ast, daniel, andrii, martin.lau, song,
	yhs, john.fastabend, kpsingh, haoluo, jolsa, Tariq Toukan,
	David Ahern, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Thu, Nov 24, 2022 at 03:39:20PM +0100, Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> 
> > On Wed, 23 Nov 2022 22:55:21 +0100 Toke Høiland-Jørgensen wrote:
> >> > Good idea, prototyped below, lmk if it that's not what you had in mind.
> >> >
> >> > struct xdp_buff_xsk {
> >> > 	struct xdp_buff            xdp;                  /*     0    56 */
> >> > 	u8                         cb[16];               /*    56    16 */
> >> > 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */  
> >> 
> >> As pahole helpfully says here, xdp_buff is actually only 8 bytes from
> >> being a full cache line. I thought about adding a 'cb' field like this
> >> to xdp_buff itself, but figured that since there's only room for a
> >> single pointer, why not just add that and let the driver point it to
> >> where it wants to store the extra context data?
> >
> > What if the driver wants to store multiple pointers or an integer or
> > whatever else? The single pointer seems quite arbitrary and not
> > strictly necessary.
> 
> Well, then you allocate a separate struct and point to that? Like I did
> in mlx5:
> 
> 
> +	struct mlx5_xdp_ctx mlctx = { .cqe = cqe, .rq = rq };
> +	struct xdp_buff xdp = { .drv_priv = &mlctx };
> 
> but yeah, this does give an extra pointer deref on access. I'm not
> really opposed to the cb field either, I just think it's a bit odd to
> put it in struct xdp_buff_xsk; that basically requires the driver to
> keep the layouts in sync.
> 
> Instead, why not but a cb field into xdp_buff itself so it can be used
> for both the XSK and the non-XSK paths? Then the driver can just
> typecast the xdp_buff into its own struct that has whatever data it
> wants in place of the cb field?

Why can't you simply have a pointer to xdp_buff in driver specific
xdp_buff container which would point to xdp_buff that is stack based (or
whatever else memory that will back it up - I am about to push a change
that makes ice driver embed xdp_buff within a struct that represents Rx
ring) for XDP path and for ZC the pointer to xdp_buff that you get from
xsk_buff_pool ? This would satisfy both sides I believe and would let us
keep the same container struct.

struct mlx4_xdp_buff {
	struct xdp_buff *xdp;
	struct mlx4_cqe *cqe;
	struct mlx4_en_dev *mdev;
	struct mlx4_en_rx_ring *ring;
	struct net_device *dev;
};

(...)

	struct mlx4_xdp_buff mxbuf;
	struct xdp_buff xdp;

	mxbuf.xdp = &xdp;
	xdp_init_buff(mxbuf.xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);

Also these additional things

+			mxbuf.cqe = cqe;
+			mxbuf.mdev = priv->mdev;
+			mxbuf.ring = ring;
+			mxbuf.dev = dev;

could be assigned once at a setup time or in worse case once per NAPI. So
maybe mlx4_xdp_buff shouldn't be stack based?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-24 15:17                 ` Maciej Fijalkowski
@ 2022-11-24 16:11                   ` Maciej Fijalkowski
  2022-11-25  0:36                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Maciej Fijalkowski @ 2022-11-24 16:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, sdf, bpf, ast, daniel, andrii, martin.lau, song,
	yhs, john.fastabend, kpsingh, haoluo, jolsa, Tariq Toukan,
	David Ahern, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Thu, Nov 24, 2022 at 04:17:01PM +0100, Maciej Fijalkowski wrote:
> On Thu, Nov 24, 2022 at 03:39:20PM +0100, Toke Høiland-Jørgensen wrote:
> > Jakub Kicinski <kuba@kernel.org> writes:
> > 
> > > On Wed, 23 Nov 2022 22:55:21 +0100 Toke Høiland-Jørgensen wrote:
> > >> > Good idea, prototyped below, lmk if it that's not what you had in mind.
> > >> >
> > >> > struct xdp_buff_xsk {
> > >> > 	struct xdp_buff            xdp;                  /*     0    56 */
> > >> > 	u8                         cb[16];               /*    56    16 */
> > >> > 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */  
> > >> 
> > >> As pahole helpfully says here, xdp_buff is actually only 8 bytes from
> > >> being a full cache line. I thought about adding a 'cb' field like this
> > >> to xdp_buff itself, but figured that since there's only room for a
> > >> single pointer, why not just add that and let the driver point it to
> > >> where it wants to store the extra context data?
> > >
> > > What if the driver wants to store multiple pointers or an integer or
> > > whatever else? The single pointer seems quite arbitrary and not
> > > strictly necessary.
> > 
> > Well, then you allocate a separate struct and point to that? Like I did
> > in mlx5:
> > 
> > 
> > +	struct mlx5_xdp_ctx mlctx = { .cqe = cqe, .rq = rq };
> > +	struct xdp_buff xdp = { .drv_priv = &mlctx };
> > 
> > but yeah, this does give an extra pointer deref on access. I'm not
> > really opposed to the cb field either, I just think it's a bit odd to
> > put it in struct xdp_buff_xsk; that basically requires the driver to
> > keep the layouts in sync.
> > 
> > Instead, why not but a cb field into xdp_buff itself so it can be used
> > for both the XSK and the non-XSK paths? Then the driver can just
> > typecast the xdp_buff into its own struct that has whatever data it
> > wants in place of the cb field?
> 
> Why can't you simply have a pointer to xdp_buff in driver specific
> xdp_buff container which would point to xdp_buff that is stack based (or
> whatever else memory that will back it up - I am about to push a change
> that makes ice driver embed xdp_buff within a struct that represents Rx
> ring) for XDP path and for ZC the pointer to xdp_buff that you get from
> xsk_buff_pool ? This would satisfy both sides I believe and would let us
> keep the same container struct.
> 
> struct mlx4_xdp_buff {
> 	struct xdp_buff *xdp;
> 	struct mlx4_cqe *cqe;
> 	struct mlx4_en_dev *mdev;
> 	struct mlx4_en_rx_ring *ring;
> 	struct net_device *dev;
> };

Nah this won't work from kfunc POV, probably no way to retrieve the
mlx4_xdp_buff based on xdp_buff ptr that needs to be used as an arg.

Sorry I'll think more about it, in the meantime let's hear more voices
whether we should keep Stan's original approach + modify xdp_buff_xsk or
go with Toke's proposal.

> 
> (...)
> 
> 	struct mlx4_xdp_buff mxbuf;
> 	struct xdp_buff xdp;
> 
> 	mxbuf.xdp = &xdp;
> 	xdp_init_buff(mxbuf.xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
> 
> Also these additional things
> 
> +			mxbuf.cqe = cqe;
> +			mxbuf.mdev = priv->mdev;
> +			mxbuf.ring = ring;
> +			mxbuf.dev = dev;
> 
> could be assigned once at a setup time or in worse case once per NAPI. So
> maybe mlx4_xdp_buff shouldn't be stack based?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-24 16:11                   ` Maciej Fijalkowski
@ 2022-11-25  0:36                     ` Toke Høiland-Jørgensen
  2022-11-28 21:58                       ` Stanislav Fomichev
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-25  0:36 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Jakub Kicinski, sdf, bpf, ast, daniel, andrii, martin.lau, song,
	yhs, john.fastabend, kpsingh, haoluo, jolsa, Tariq Toukan,
	David Ahern, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:

> On Thu, Nov 24, 2022 at 04:17:01PM +0100, Maciej Fijalkowski wrote:
>> On Thu, Nov 24, 2022 at 03:39:20PM +0100, Toke Høiland-Jørgensen wrote:
>> > Jakub Kicinski <kuba@kernel.org> writes:
>> > 
>> > > On Wed, 23 Nov 2022 22:55:21 +0100 Toke Høiland-Jørgensen wrote:
>> > >> > Good idea, prototyped below, lmk if it that's not what you had in mind.
>> > >> >
>> > >> > struct xdp_buff_xsk {
>> > >> > 	struct xdp_buff            xdp;                  /*     0    56 */
>> > >> > 	u8                         cb[16];               /*    56    16 */
>> > >> > 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */  
>> > >> 
>> > >> As pahole helpfully says here, xdp_buff is actually only 8 bytes from
>> > >> being a full cache line. I thought about adding a 'cb' field like this
>> > >> to xdp_buff itself, but figured that since there's only room for a
>> > >> single pointer, why not just add that and let the driver point it to
>> > >> where it wants to store the extra context data?
>> > >
>> > > What if the driver wants to store multiple pointers or an integer or
>> > > whatever else? The single pointer seems quite arbitrary and not
>> > > strictly necessary.
>> > 
>> > Well, then you allocate a separate struct and point to that? Like I did
>> > in mlx5:
>> > 
>> > 
>> > +	struct mlx5_xdp_ctx mlctx = { .cqe = cqe, .rq = rq };
>> > +	struct xdp_buff xdp = { .drv_priv = &mlctx };
>> > 
>> > but yeah, this does give an extra pointer deref on access. I'm not
>> > really opposed to the cb field either, I just think it's a bit odd to
>> > put it in struct xdp_buff_xsk; that basically requires the driver to
>> > keep the layouts in sync.
>> > 
>> > Instead, why not but a cb field into xdp_buff itself so it can be used
>> > for both the XSK and the non-XSK paths? Then the driver can just
>> > typecast the xdp_buff into its own struct that has whatever data it
>> > wants in place of the cb field?
>> 
>> Why can't you simply have a pointer to xdp_buff in driver specific
>> xdp_buff container which would point to xdp_buff that is stack based (or
>> whatever else memory that will back it up - I am about to push a change
>> that makes ice driver embed xdp_buff within a struct that represents Rx
>> ring) for XDP path and for ZC the pointer to xdp_buff that you get from
>> xsk_buff_pool ? This would satisfy both sides I believe and would let us
>> keep the same container struct.
>> 
>> struct mlx4_xdp_buff {
>> 	struct xdp_buff *xdp;
>> 	struct mlx4_cqe *cqe;
>> 	struct mlx4_en_dev *mdev;
>> 	struct mlx4_en_rx_ring *ring;
>> 	struct net_device *dev;
>> };
>
> Nah this won't work from kfunc POV, probably no way to retrieve the
> mlx4_xdp_buff based on xdp_buff ptr that needs to be used as an arg.
>
> Sorry I'll think more about it, in the meantime let's hear more voices
> whether we should keep Stan's original approach + modify xdp_buff_xsk or
> go with Toke's proposal.

OK, so I played around with the mlx5 code a bit more, and I think the
"wrapping struct + cb area" can be made to work without too many ugly
casts; I'll send an updated version of the mlx5 patches with this
incorporated tomorrow, after I've run some tests...

-Toke


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-21 18:25 ` [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs Stanislav Fomichev
  2022-11-23  6:34   ` Martin KaFai Lau
  2022-11-23 14:24   ` [xdp-hints] " Toke Høiland-Jørgensen
@ 2022-11-25 17:53   ` Toke Høiland-Jørgensen
  2022-11-28 18:53     ` Stanislav Fomichev
  2022-11-30 17:24   ` Larysa Zaremba
  3 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-25 17:53 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Stanislav Fomichev <sdf@google.com> writes:

> There is an ndo handler per kfunc, the verifier replaces a call to the
> generic kfunc with a call to the per-device one.
>
> For XDP, we define a new kfunc set (xdp_metadata_kfunc_ids) which
> implements all possible metatada kfuncs. Not all devices have to
> implement them. If kfunc is not supported by the target device,
> the default implementation is called instead.

BTW, this "the default implementation is called instead" bit is not
included in this version... :)

[...]

> +#ifdef CONFIG_DEBUG_INFO_BTF
> +BTF_SET8_START(xdp_metadata_kfunc_ids)
> +#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, 0)
> +XDP_METADATA_KFUNC_xxx
> +#undef XDP_METADATA_KFUNC
> +BTF_SET8_END(xdp_metadata_kfunc_ids)
> +
> +static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
> +	.owner = THIS_MODULE,
> +	.set   = &xdp_metadata_kfunc_ids,
> +};
> +
> +u32 xdp_metadata_kfunc_id(int id)
> +{
> +	return xdp_metadata_kfunc_ids.pairs[id].id;
> +}
> +EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);

So I was getting some really weird values when testing (always getting a
timestamp value of '1'), and it turns out to be because this way of
looking up the ID doesn't work: The set is always sorted by the BTF ID,
not the order it was defined. Which meant that the mapping code got the
functions mixed up, and would call a different one instead (so the
timestamp value I was getting was really the return value of
rx_hash_enabled()).

I fixed it by building a secondary lookup table as below; feel free to
incorporate that (or if you can come up with a better way, go ahead!).

-Toke

diff --git a/net/core/xdp.c b/net/core/xdp.c
index e43f7d4ef4cf..dc0a9644dacc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -738,6 +738,15 @@ XDP_METADATA_KFUNC_xxx
 #undef XDP_METADATA_KFUNC
 BTF_SET8_END(xdp_metadata_kfunc_ids)
 
+static struct xdp_metadata_kfunc_map {
+       const char *fname;
+       u32 btf_id;
+} xdp_metadata_kfunc_lookup_map[MAX_XDP_METADATA_KFUNC] = {
+#define XDP_METADATA_KFUNC(name, str) { .fname = __stringify(str) },
+XDP_METADATA_KFUNC_xxx
+#undef XDP_METADATA_KFUNC
+};
+
 static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
        .owner = THIS_MODULE,
        .set   = &xdp_metadata_kfunc_ids,
@@ -745,13 +754,41 @@ static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
 
 u32 xdp_metadata_kfunc_id(int id)
 {
-       return xdp_metadata_kfunc_ids.pairs[id].id;
+       return xdp_metadata_kfunc_lookup_map[id].btf_id;
 }
 EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);
 
 static int __init xdp_metadata_init(void)
 {
-       return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
+       const struct btf *btf;
+       int i, j, ret;
+
+       ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
+       if (ret)
+               return ret;
+
+       btf = bpf_get_btf_vmlinux();
+
+       for (i = 0; i < MAX_XDP_METADATA_KFUNC; i++) {
+               u32 btf_id = xdp_metadata_kfunc_ids.pairs[i].id;
+               const struct btf_type *t;
+               const char *name;
+
+               t = btf_type_by_id(btf, btf_id);
+               if (WARN_ON_ONCE(!t || !t->name_off))
+                       continue;
+
+               name = btf_name_by_offset(btf, t->name_off);
+
+               for (j = 0; j < MAX_XDP_METADATA_KFUNC; j++) {
+                       if (!strcmp(name, xdp_metadata_kfunc_lookup_map[j].fname)) {
+                               xdp_metadata_kfunc_lookup_map[j].btf_id = btf_id;
+                               break;
+                       }
+               }
+       }
+
+       return 0;
 }
 late_initcall(xdp_metadata_init);
 #else /* CONFIG_DEBUG_INFO_BTF */


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-25 17:53   ` Toke Høiland-Jørgensen
@ 2022-11-28 18:53     ` Stanislav Fomichev
  2022-11-28 19:21       ` Stanislav Fomichev
  2022-11-28 22:10       ` [xdp-hints] " Toke Høiland-Jørgensen
  0 siblings, 2 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-28 18:53 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

 s

On Fri, Nov 25, 2022 at 9:53 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > There is an ndo handler per kfunc, the verifier replaces a call to the
> > generic kfunc with a call to the per-device one.
> >
> > For XDP, we define a new kfunc set (xdp_metadata_kfunc_ids) which
> > implements all possible metatada kfuncs. Not all devices have to
> > implement them. If kfunc is not supported by the target device,
> > the default implementation is called instead.
>
> BTW, this "the default implementation is called instead" bit is not
> included in this version... :)

fixup_xdp_kfunc_call should return 0 when the device doesn't have a
kfunc defined and should fallback to the default kfunc implementation,
right?
Or am I missing something?

> [...]
>
> > +#ifdef CONFIG_DEBUG_INFO_BTF
> > +BTF_SET8_START(xdp_metadata_kfunc_ids)
> > +#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, 0)
> > +XDP_METADATA_KFUNC_xxx
> > +#undef XDP_METADATA_KFUNC
> > +BTF_SET8_END(xdp_metadata_kfunc_ids)
> > +
> > +static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
> > +     .owner = THIS_MODULE,
> > +     .set   = &xdp_metadata_kfunc_ids,
> > +};
> > +
> > +u32 xdp_metadata_kfunc_id(int id)
> > +{
> > +     return xdp_metadata_kfunc_ids.pairs[id].id;
> > +}
> > +EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);
>
> So I was getting some really weird values when testing (always getting a
> timestamp value of '1'), and it turns out to be because this way of
> looking up the ID doesn't work: The set is always sorted by the BTF ID,
> not the order it was defined. Which meant that the mapping code got the
> functions mixed up, and would call a different one instead (so the
> timestamp value I was getting was really the return value of
> rx_hash_enabled()).
>
> I fixed it by building a secondary lookup table as below; feel free to
> incorporate that (or if you can come up with a better way, go ahead!).

Interesting, will take a closer look. I took this pattern from
BTF_SOCK_TYPE_xxx, which means that 'sorting by btf-id' is something
BTF_SET8_START specific...
But if it's sorted, probably easier to do a bsearch over this table
than to build another one?

> -Toke
>
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index e43f7d4ef4cf..dc0a9644dacc 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -738,6 +738,15 @@ XDP_METADATA_KFUNC_xxx
>  #undef XDP_METADATA_KFUNC
>  BTF_SET8_END(xdp_metadata_kfunc_ids)
>
> +static struct xdp_metadata_kfunc_map {
> +       const char *fname;
> +       u32 btf_id;
> +} xdp_metadata_kfunc_lookup_map[MAX_XDP_METADATA_KFUNC] = {
> +#define XDP_METADATA_KFUNC(name, str) { .fname = __stringify(str) },
> +XDP_METADATA_KFUNC_xxx
> +#undef XDP_METADATA_KFUNC
> +};
> +
>  static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
>         .owner = THIS_MODULE,
>         .set   = &xdp_metadata_kfunc_ids,
> @@ -745,13 +754,41 @@ static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
>
>  u32 xdp_metadata_kfunc_id(int id)
>  {
> -       return xdp_metadata_kfunc_ids.pairs[id].id;
> +       return xdp_metadata_kfunc_lookup_map[id].btf_id;
>  }
>  EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);
>
>  static int __init xdp_metadata_init(void)
>  {
> -       return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
> +       const struct btf *btf;
> +       int i, j, ret;
> +
> +       ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
> +       if (ret)
> +               return ret;
> +
> +       btf = bpf_get_btf_vmlinux();
> +
> +       for (i = 0; i < MAX_XDP_METADATA_KFUNC; i++) {
> +               u32 btf_id = xdp_metadata_kfunc_ids.pairs[i].id;
> +               const struct btf_type *t;
> +               const char *name;
> +
> +               t = btf_type_by_id(btf, btf_id);
> +               if (WARN_ON_ONCE(!t || !t->name_off))
> +                       continue;
> +
> +               name = btf_name_by_offset(btf, t->name_off);
> +
> +               for (j = 0; j < MAX_XDP_METADATA_KFUNC; j++) {
> +                       if (!strcmp(name, xdp_metadata_kfunc_lookup_map[j].fname)) {
> +                               xdp_metadata_kfunc_lookup_map[j].btf_id = btf_id;
> +                               break;
> +                       }
> +               }
> +       }
> +
> +       return 0;
>  }
>  late_initcall(xdp_metadata_init);
>  #else /* CONFIG_DEBUG_INFO_BTF */
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-28 18:53     ` Stanislav Fomichev
@ 2022-11-28 19:21       ` Stanislav Fomichev
  2022-11-28 22:25         ` Toke Høiland-Jørgensen
  2022-11-28 22:10       ` [xdp-hints] " Toke Høiland-Jørgensen
  1 sibling, 1 reply; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-28 19:21 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Mon, Nov 28, 2022 at 10:53 AM Stanislav Fomichev <sdf@google.com> wrote:
>
>  s
>
> On Fri, Nov 25, 2022 at 9:53 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Stanislav Fomichev <sdf@google.com> writes:
> >
> > > There is an ndo handler per kfunc, the verifier replaces a call to the
> > > generic kfunc with a call to the per-device one.
> > >
> > > For XDP, we define a new kfunc set (xdp_metadata_kfunc_ids) which
> > > implements all possible metatada kfuncs. Not all devices have to
> > > implement them. If kfunc is not supported by the target device,
> > > the default implementation is called instead.
> >
> > BTW, this "the default implementation is called instead" bit is not
> > included in this version... :)
>
> fixup_xdp_kfunc_call should return 0 when the device doesn't have a
> kfunc defined and should fallback to the default kfunc implementation,
> right?
> Or am I missing something?
>
> > [...]
> >
> > > +#ifdef CONFIG_DEBUG_INFO_BTF
> > > +BTF_SET8_START(xdp_metadata_kfunc_ids)
> > > +#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, 0)
> > > +XDP_METADATA_KFUNC_xxx
> > > +#undef XDP_METADATA_KFUNC
> > > +BTF_SET8_END(xdp_metadata_kfunc_ids)
> > > +
> > > +static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
> > > +     .owner = THIS_MODULE,
> > > +     .set   = &xdp_metadata_kfunc_ids,
> > > +};
> > > +
> > > +u32 xdp_metadata_kfunc_id(int id)
> > > +{
> > > +     return xdp_metadata_kfunc_ids.pairs[id].id;
> > > +}
> > > +EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);
> >
> > So I was getting some really weird values when testing (always getting a
> > timestamp value of '1'), and it turns out to be because this way of
> > looking up the ID doesn't work: The set is always sorted by the BTF ID,
> > not the order it was defined. Which meant that the mapping code got the
> > functions mixed up, and would call a different one instead (so the
> > timestamp value I was getting was really the return value of
> > rx_hash_enabled()).
> >
> > I fixed it by building a secondary lookup table as below; feel free to
> > incorporate that (or if you can come up with a better way, go ahead!).
>
> Interesting, will take a closer look. I took this pattern from
> BTF_SOCK_TYPE_xxx, which means that 'sorting by btf-id' is something
> BTF_SET8_START specific...
> But if it's sorted, probably easier to do a bsearch over this table
> than to build another one?

Ah, I see, there is no place to store an index :-( Maybe the following
is easier still?

diff --git a/net/core/xdp.c b/net/core/xdp.c
index e43f7d4ef4cf..8240805bfdb7 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -743,9 +743,15 @@ static const struct btf_kfunc_id_set
xdp_metadata_kfunc_set = {
        .set   = &xdp_metadata_kfunc_ids,
 };

+BTF_ID_LIST(xdp_metadata_kfunc_ids_unsorted)
+#define XDP_METADATA_KFUNC(name, str) BTF_ID(func, str)
+XDP_METADATA_KFUNC_xxx
+#undef XDP_METADATA_KFUNC
+
 u32 xdp_metadata_kfunc_id(int id)
 {
-       return xdp_metadata_kfunc_ids.pairs[id].id;
+       /* xdp_metadata_kfunc_ids is sorted and can't be used */
+       return xdp_metadata_kfunc_ids_unsorted[id];
 }
 EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);



> > -Toke
> >
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index e43f7d4ef4cf..dc0a9644dacc 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -738,6 +738,15 @@ XDP_METADATA_KFUNC_xxx
> >  #undef XDP_METADATA_KFUNC
> >  BTF_SET8_END(xdp_metadata_kfunc_ids)
> >
> > +static struct xdp_metadata_kfunc_map {
> > +       const char *fname;
> > +       u32 btf_id;
> > +} xdp_metadata_kfunc_lookup_map[MAX_XDP_METADATA_KFUNC] = {
> > +#define XDP_METADATA_KFUNC(name, str) { .fname = __stringify(str) },
> > +XDP_METADATA_KFUNC_xxx
> > +#undef XDP_METADATA_KFUNC
> > +};
> > +
> >  static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
> >         .owner = THIS_MODULE,
> >         .set   = &xdp_metadata_kfunc_ids,
> > @@ -745,13 +754,41 @@ static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
> >
> >  u32 xdp_metadata_kfunc_id(int id)
> >  {
> > -       return xdp_metadata_kfunc_ids.pairs[id].id;
> > +       return xdp_metadata_kfunc_lookup_map[id].btf_id;
> >  }
> >  EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);
> >
> >  static int __init xdp_metadata_init(void)
> >  {
> > -       return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
> > +       const struct btf *btf;
> > +       int i, j, ret;
> > +
> > +       ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
> > +       if (ret)
> > +               return ret;
> > +
> > +       btf = bpf_get_btf_vmlinux();
> > +
> > +       for (i = 0; i < MAX_XDP_METADATA_KFUNC; i++) {
> > +               u32 btf_id = xdp_metadata_kfunc_ids.pairs[i].id;
> > +               const struct btf_type *t;
> > +               const char *name;
> > +
> > +               t = btf_type_by_id(btf, btf_id);
> > +               if (WARN_ON_ONCE(!t || !t->name_off))
> > +                       continue;
> > +
> > +               name = btf_name_by_offset(btf, t->name_off);
> > +
> > +               for (j = 0; j < MAX_XDP_METADATA_KFUNC; j++) {
> > +                       if (!strcmp(name, xdp_metadata_kfunc_lookup_map[j].fname)) {
> > +                               xdp_metadata_kfunc_lookup_map[j].btf_id = btf_id;
> > +                               break;
> > +                       }
> > +               }
> > +       }
> > +
> > +       return 0;
> >  }
> >  late_initcall(xdp_metadata_init);
> >  #else /* CONFIG_DEBUG_INFO_BTF */
> >

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-25  0:36                     ` Toke Høiland-Jørgensen
@ 2022-11-28 21:58                       ` Stanislav Fomichev
  2022-11-28 22:11                         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-28 21:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Maciej Fijalkowski, Jakub Kicinski, bpf, ast, daniel, andrii,
	martin.lau, song, yhs, john.fastabend, kpsingh, haoluo, jolsa,
	Tariq Toukan, David Ahern, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

On Thu, Nov 24, 2022 at 4:36 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
>
> > On Thu, Nov 24, 2022 at 04:17:01PM +0100, Maciej Fijalkowski wrote:
> >> On Thu, Nov 24, 2022 at 03:39:20PM +0100, Toke Høiland-Jørgensen wrote:
> >> > Jakub Kicinski <kuba@kernel.org> writes:
> >> >
> >> > > On Wed, 23 Nov 2022 22:55:21 +0100 Toke Høiland-Jørgensen wrote:
> >> > >> > Good idea, prototyped below, lmk if it that's not what you had in mind.
> >> > >> >
> >> > >> > struct xdp_buff_xsk {
> >> > >> >       struct xdp_buff            xdp;                  /*     0    56 */
> >> > >> >       u8                         cb[16];               /*    56    16 */
> >> > >> >       /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> >> > >>
> >> > >> As pahole helpfully says here, xdp_buff is actually only 8 bytes from
> >> > >> being a full cache line. I thought about adding a 'cb' field like this
> >> > >> to xdp_buff itself, but figured that since there's only room for a
> >> > >> single pointer, why not just add that and let the driver point it to
> >> > >> where it wants to store the extra context data?
> >> > >
> >> > > What if the driver wants to store multiple pointers or an integer or
> >> > > whatever else? The single pointer seems quite arbitrary and not
> >> > > strictly necessary.
> >> >
> >> > Well, then you allocate a separate struct and point to that? Like I did
> >> > in mlx5:
> >> >
> >> >
> >> > +  struct mlx5_xdp_ctx mlctx = { .cqe = cqe, .rq = rq };
> >> > +  struct xdp_buff xdp = { .drv_priv = &mlctx };
> >> >
> >> > but yeah, this does give an extra pointer deref on access. I'm not
> >> > really opposed to the cb field either, I just think it's a bit odd to
> >> > put it in struct xdp_buff_xsk; that basically requires the driver to
> >> > keep the layouts in sync.
> >> >
> >> > Instead, why not but a cb field into xdp_buff itself so it can be used
> >> > for both the XSK and the non-XSK paths? Then the driver can just
> >> > typecast the xdp_buff into its own struct that has whatever data it
> >> > wants in place of the cb field?

Agreed, maybe having an explicit cb field in the xdp_buff is a nice
compromise (assuming, over time, most devices will use it).

> >> Why can't you simply have a pointer to xdp_buff in driver specific
> >> xdp_buff container which would point to xdp_buff that is stack based (or
> >> whatever else memory that will back it up - I am about to push a change
> >> that makes ice driver embed xdp_buff within a struct that represents Rx
> >> ring) for XDP path and for ZC the pointer to xdp_buff that you get from
> >> xsk_buff_pool ? This would satisfy both sides I believe and would let us
> >> keep the same container struct.
> >>
> >> struct mlx4_xdp_buff {
> >>      struct xdp_buff *xdp;
> >>      struct mlx4_cqe *cqe;
> >>      struct mlx4_en_dev *mdev;
> >>      struct mlx4_en_rx_ring *ring;
> >>      struct net_device *dev;
> >> };
> >
> > Nah this won't work from kfunc POV, probably no way to retrieve the
> > mlx4_xdp_buff based on xdp_buff ptr that needs to be used as an arg.
> >
> > Sorry I'll think more about it, in the meantime let's hear more voices
> > whether we should keep Stan's original approach + modify xdp_buff_xsk or
> > go with Toke's proposal.
>
> OK, so I played around with the mlx5 code a bit more, and I think the
> "wrapping struct + cb area" can be made to work without too many ugly
> casts; I'll send an updated version of the mlx5 patches with this
> incorporated tomorrow, after I've run some tests...

I'll probably send a v3 sometime tomorrow (PST), so maybe wait for me
to make sure we are working on the same base?
Or LMK if you prefer to do it differently..

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-28 18:53     ` Stanislav Fomichev
  2022-11-28 19:21       ` Stanislav Fomichev
@ 2022-11-28 22:10       ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-28 22:10 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Stanislav Fomichev <sdf@google.com> writes:

>  s
>
> On Fri, Nov 25, 2022 at 9:53 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > There is an ndo handler per kfunc, the verifier replaces a call to the
>> > generic kfunc with a call to the per-device one.
>> >
>> > For XDP, we define a new kfunc set (xdp_metadata_kfunc_ids) which
>> > implements all possible metatada kfuncs. Not all devices have to
>> > implement them. If kfunc is not supported by the target device,
>> > the default implementation is called instead.
>>
>> BTW, this "the default implementation is called instead" bit is not
>> included in this version... :)
>
> fixup_xdp_kfunc_call should return 0 when the device doesn't have a
> kfunc defined and should fallback to the default kfunc implementation,
> right?
> Or am I missing something?

Ohh, right. Maybe add a comment stating this (as I obviously missed it :))

-Toke


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  2022-11-28 21:58                       ` Stanislav Fomichev
@ 2022-11-28 22:11                         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-28 22:11 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Maciej Fijalkowski, Jakub Kicinski, bpf, ast, daniel, andrii,
	martin.lau, song, yhs, john.fastabend, kpsingh, haoluo, jolsa,
	Tariq Toukan, David Ahern, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

Stanislav Fomichev <sdf@google.com> writes:

>> >> Why can't you simply have a pointer to xdp_buff in driver specific
>> >> xdp_buff container which would point to xdp_buff that is stack based (or
>> >> whatever else memory that will back it up - I am about to push a change
>> >> that makes ice driver embed xdp_buff within a struct that represents Rx
>> >> ring) for XDP path and for ZC the pointer to xdp_buff that you get from
>> >> xsk_buff_pool ? This would satisfy both sides I believe and would let us
>> >> keep the same container struct.
>> >>
>> >> struct mlx4_xdp_buff {
>> >>      struct xdp_buff *xdp;
>> >>      struct mlx4_cqe *cqe;
>> >>      struct mlx4_en_dev *mdev;
>> >>      struct mlx4_en_rx_ring *ring;
>> >>      struct net_device *dev;
>> >> };
>> >
>> > Nah this won't work from kfunc POV, probably no way to retrieve the
>> > mlx4_xdp_buff based on xdp_buff ptr that needs to be used as an arg.
>> >
>> > Sorry I'll think more about it, in the meantime let's hear more voices
>> > whether we should keep Stan's original approach + modify xdp_buff_xsk or
>> > go with Toke's proposal.
>>
>> OK, so I played around with the mlx5 code a bit more, and I think the
>> "wrapping struct + cb area" can be made to work without too many ugly
>> casts; I'll send an updated version of the mlx5 patches with this
>> incorporated tomorrow, after I've run some tests...
>
> I'll probably send a v3 sometime tomorrow (PST), so maybe wait for me
> to make sure we are working on the same base?
> Or LMK if you prefer to do it differently..

OK, I'll send you my mlx5 patches off-list so you can just incorporate
those. Got stuck on some annoying build issues for the perf testing, so
will defer that until your next version, then :)

-Toke


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-28 19:21       ` Stanislav Fomichev
@ 2022-11-28 22:25         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-28 22:25 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Stanislav Fomichev <sdf@google.com> writes:

> On Mon, Nov 28, 2022 at 10:53 AM Stanislav Fomichev <sdf@google.com> wrote:
>>
>>  s
>>
>> On Fri, Nov 25, 2022 at 9:53 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >
>> > Stanislav Fomichev <sdf@google.com> writes:
>> >
>> > > There is an ndo handler per kfunc, the verifier replaces a call to the
>> > > generic kfunc with a call to the per-device one.
>> > >
>> > > For XDP, we define a new kfunc set (xdp_metadata_kfunc_ids) which
>> > > implements all possible metatada kfuncs. Not all devices have to
>> > > implement them. If kfunc is not supported by the target device,
>> > > the default implementation is called instead.
>> >
>> > BTW, this "the default implementation is called instead" bit is not
>> > included in this version... :)
>>
>> fixup_xdp_kfunc_call should return 0 when the device doesn't have a
>> kfunc defined and should fallback to the default kfunc implementation,
>> right?
>> Or am I missing something?
>>
>> > [...]
>> >
>> > > +#ifdef CONFIG_DEBUG_INFO_BTF
>> > > +BTF_SET8_START(xdp_metadata_kfunc_ids)
>> > > +#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, 0)
>> > > +XDP_METADATA_KFUNC_xxx
>> > > +#undef XDP_METADATA_KFUNC
>> > > +BTF_SET8_END(xdp_metadata_kfunc_ids)
>> > > +
>> > > +static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
>> > > +     .owner = THIS_MODULE,
>> > > +     .set   = &xdp_metadata_kfunc_ids,
>> > > +};
>> > > +
>> > > +u32 xdp_metadata_kfunc_id(int id)
>> > > +{
>> > > +     return xdp_metadata_kfunc_ids.pairs[id].id;
>> > > +}
>> > > +EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);
>> >
>> > So I was getting some really weird values when testing (always getting a
>> > timestamp value of '1'), and it turns out to be because this way of
>> > looking up the ID doesn't work: The set is always sorted by the BTF ID,
>> > not the order it was defined. Which meant that the mapping code got the
>> > functions mixed up, and would call a different one instead (so the
>> > timestamp value I was getting was really the return value of
>> > rx_hash_enabled()).
>> >
>> > I fixed it by building a secondary lookup table as below; feel free to
>> > incorporate that (or if you can come up with a better way, go ahead!).
>>
>> Interesting, will take a closer look. I took this pattern from
>> BTF_SOCK_TYPE_xxx, which means that 'sorting by btf-id' is something
>> BTF_SET8_START specific...
>> But if it's sorted, probably easier to do a bsearch over this table
>> than to build another one?
>
> Ah, I see, there is no place to store an index :-( Maybe the following
> is easier still?
>
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index e43f7d4ef4cf..8240805bfdb7 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -743,9 +743,15 @@ static const struct btf_kfunc_id_set
> xdp_metadata_kfunc_set = {
>         .set   = &xdp_metadata_kfunc_ids,
>  };
>
> +BTF_ID_LIST(xdp_metadata_kfunc_ids_unsorted)
> +#define XDP_METADATA_KFUNC(name, str) BTF_ID(func, str)
> +XDP_METADATA_KFUNC_xxx
> +#undef XDP_METADATA_KFUNC
> +
>  u32 xdp_metadata_kfunc_id(int id)
>  {
> -       return xdp_metadata_kfunc_ids.pairs[id].id;
> +       /* xdp_metadata_kfunc_ids is sorted and can't be used */
> +       return xdp_metadata_kfunc_ids_unsorted[id];
>  }
>  EXPORT_SYMBOL_GPL(xdp_metadata_kfunc_id);

Right, as long as having that extra list isn't problematic (does it make
things show up twice somewhere or something like that? not really sure
how that works), that is certainly simpler than what I came up with :)

-Toke


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 5/8] selftests/bpf: Verify xdp_metadata xdp->af_xdp path
  2022-11-21 18:25 ` [PATCH bpf-next v2 5/8] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
@ 2022-11-29 10:06   ` Anton Protopopov
  2022-11-29 18:52     ` Stanislav Fomichev
  0 siblings, 1 reply; 50+ messages in thread
From: Anton Protopopov @ 2022-11-29 10:06 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On 22/11/21 10:25, Stanislav Fomichev wrote:
>
> [...]
>
> +
> +	if (bpf_xdp_metadata_rx_timestamp_supported(ctx))
> +		meta->rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx);
> +
> +	if (bpf_xdp_metadata_rx_hash_supported(ctx))
> +		meta->rx_hash = bpf_xdp_metadata_rx_hash(ctx);

Is there a case when F_supported and F are not called in a sequence? If not,
then you can join them:

	bool (*ndo_xdp_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);

so that a calling XDP program does one indirect call instead of two for one
field

	if (bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp)) {
		/* ... couldn't get the timestamp */
	}

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 5/8] selftests/bpf: Verify xdp_metadata xdp->af_xdp path
  2022-11-29 10:06   ` Anton Protopopov
@ 2022-11-29 18:52     ` Stanislav Fomichev
  0 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-29 18:52 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Tue, Nov 29, 2022 at 2:06 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>
> On 22/11/21 10:25, Stanislav Fomichev wrote:
> >
> > [...]
> >
> > +
> > +     if (bpf_xdp_metadata_rx_timestamp_supported(ctx))
> > +             meta->rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx);
> > +
> > +     if (bpf_xdp_metadata_rx_hash_supported(ctx))
> > +             meta->rx_hash = bpf_xdp_metadata_rx_hash(ctx);
>
> Is there a case when F_supported and F are not called in a sequence? If not,
> then you can join them:
>
>         bool (*ndo_xdp_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
>
> so that a calling XDP program does one indirect call instead of two for one
> field
>
>         if (bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp)) {
>                 /* ... couldn't get the timestamp */
>         }

The purpose of the original bpf_xdp_metadata_rx_hash_supported was to
allow unrolling and support dropping some dead branches by the
verifier.
Since there is still a chance we might eventually unroll some of
these, maybe it makes sense to keep as is?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-21 18:25 ` [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs Stanislav Fomichev
                     ` (2 preceding siblings ...)
  2022-11-25 17:53   ` Toke Høiland-Jørgensen
@ 2022-11-30 17:24   ` Larysa Zaremba
  2022-11-30 19:06     ` Stanislav Fomichev
  3 siblings, 1 reply; 50+ messages in thread
From: Larysa Zaremba @ 2022-11-30 17:24 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Mon, Nov 21, 2022 at 10:25:46AM -0800, Stanislav Fomichev wrote:

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9528a066cfa5..315876fa9d30 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -15171,6 +15171,25 @@ static int fixup_call_args(struct bpf_verifier_env *env)
>  	return err;
>  }
>  
> +static int fixup_xdp_kfunc_call(struct bpf_verifier_env *env, u32 func_id)
> +{
> +	struct bpf_prog_aux *aux = env->prog->aux;
> +	void *resolved = NULL;

First I would like to say I really like the kfunc hints impementation.

I am currently trying to test possible performace benefits of the unrolled
version in the ice driver. I was working on top of the RFC v2,
when I noticed a problem that also persists in this newer version.

For debugging purposes, I have put the following logs in this place in code.

printk(KERN_ERR "func_id=%u\n", func_id);
printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=%u\n",
       xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED));
printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP=%u\n",
       xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP));
printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=%u\n",
       xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED));
printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH=%u\n",
       xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH));

Loading the program, which uses bpf_xdp_metadata_rx_timestamp_supported()
and bpf_xdp_metadata_rx_timestamp(), has resulted in such messages:

[  412.611888] func_id=108131
[  412.611891] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
[  412.611892] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
[  412.611892] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
[  412.611893] XDP_METADATA_KFUNC_RX_HASH=108131
[  412.611894] func_id=108130
[  412.611894] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
[  412.611895] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
[  412.611895] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
[  412.611895] XDP_METADATA_KFUNC_RX_HASH=108131

As you can see, I've got 108131 and 108130 IDs in program,
while 108126 and 108128 would be more reasonable.
It's hard to proceed with the implementation, when IDs cannot be sustainably
compared.

Furthermore, dumped vmlinux BTF shows the IDs is in the exactly reversed 
order:

[108126] FUNC 'bpf_xdp_metadata_rx_hash' type_id=108125 linkage=static
[108128] FUNC 'bpf_xdp_metadata_rx_hash_supported' type_id=108127 linkage=static
[108130] FUNC 'bpf_xdp_metadata_rx_timestamp' type_id=108129 linkage=static
[108131] FUNC 'bpf_xdp_metadata_rx_timestamp_supported' type_id=108127 linkage=static

> +
> +	if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED))
> +		resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp_supported;
> +	else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
> +		resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp;
> +	else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED))
> +		resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash_supported;
> +	else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
> +		resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash;
> +
> +	if (resolved)
> +		return BPF_CALL_IMM(resolved);
> +	return 0;
> +}
> +

My working tree (based on this version) is available on github [0]. Situation
is also described in the last commit message.
I would be great, if you could check, whether this behaviour can be reproduced
on your setup.

[0] https://github.com/walking-machine/linux/tree/hints-v2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-30 17:24   ` Larysa Zaremba
@ 2022-11-30 19:06     ` Stanislav Fomichev
  2022-11-30 20:17       ` Stanislav Fomichev
  0 siblings, 1 reply; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-30 19:06 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Wed, Nov 30, 2022 at 9:38 AM Larysa Zaremba <larysa.zaremba@intel.com> wrote:
>
> On Mon, Nov 21, 2022 at 10:25:46AM -0800, Stanislav Fomichev wrote:
>
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 9528a066cfa5..315876fa9d30 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -15171,6 +15171,25 @@ static int fixup_call_args(struct bpf_verifier_env *env)
> >       return err;
> >  }
> >
> > +static int fixup_xdp_kfunc_call(struct bpf_verifier_env *env, u32 func_id)
> > +{
> > +     struct bpf_prog_aux *aux = env->prog->aux;
> > +     void *resolved = NULL;
>
> First I would like to say I really like the kfunc hints impementation.
>
> I am currently trying to test possible performace benefits of the unrolled
> version in the ice driver. I was working on top of the RFC v2,
> when I noticed a problem that also persists in this newer version.
>
> For debugging purposes, I have put the following logs in this place in code.
>
> printk(KERN_ERR "func_id=%u\n", func_id);
> printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=%u\n",
>        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED));
> printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP=%u\n",
>        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP));
> printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=%u\n",
>        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED));
> printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH=%u\n",
>        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH));
>
> Loading the program, which uses bpf_xdp_metadata_rx_timestamp_supported()
> and bpf_xdp_metadata_rx_timestamp(), has resulted in such messages:
>
> [  412.611888] func_id=108131
> [  412.611891] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
> [  412.611892] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
> [  412.611892] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
> [  412.611893] XDP_METADATA_KFUNC_RX_HASH=108131
> [  412.611894] func_id=108130
> [  412.611894] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
> [  412.611895] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
> [  412.611895] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
> [  412.611895] XDP_METADATA_KFUNC_RX_HASH=108131
>
> As you can see, I've got 108131 and 108130 IDs in program,
> while 108126 and 108128 would be more reasonable.
> It's hard to proceed with the implementation, when IDs cannot be sustainably
> compared.

Thanks for the report!
Toke has reported a similar issue in [0], have you tried his patch?
I've also tried to address it in v3 [1], could you retry on top of it?
I'll try to insert your printk in my local build to see what happens
with btf ids on my side. Will get back to you..

0: https://lore.kernel.org/bpf/87mt8e2a69.fsf@toke.dk/
1: https://lore.kernel.org/bpf/20221129193452.3448944-3-sdf@google.com/T/#u

> Furthermore, dumped vmlinux BTF shows the IDs is in the exactly reversed
> order:
>
> [108126] FUNC 'bpf_xdp_metadata_rx_hash' type_id=108125 linkage=static
> [108128] FUNC 'bpf_xdp_metadata_rx_hash_supported' type_id=108127 linkage=static
> [108130] FUNC 'bpf_xdp_metadata_rx_timestamp' type_id=108129 linkage=static
> [108131] FUNC 'bpf_xdp_metadata_rx_timestamp_supported' type_id=108127 linkage=static
>
> > +
> > +     if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED))
> > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp_supported;
> > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
> > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp;
> > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED))
> > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash_supported;
> > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
> > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash;
> > +
> > +     if (resolved)
> > +             return BPF_CALL_IMM(resolved);
> > +     return 0;
> > +}
> > +
>
> My working tree (based on this version) is available on github [0]. Situation
> is also described in the last commit message.
> I would be great, if you could check, whether this behaviour can be reproduced
> on your setup.
>
> [0] https://github.com/walking-machine/linux/tree/hints-v2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-30 19:06     ` Stanislav Fomichev
@ 2022-11-30 20:17       ` Stanislav Fomichev
  2022-12-01 13:52         ` Larysa Zaremba
  0 siblings, 1 reply; 50+ messages in thread
From: Stanislav Fomichev @ 2022-11-30 20:17 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Wed, Nov 30, 2022 at 11:06 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Wed, Nov 30, 2022 at 9:38 AM Larysa Zaremba <larysa.zaremba@intel.com> wrote:
> >
> > On Mon, Nov 21, 2022 at 10:25:46AM -0800, Stanislav Fomichev wrote:
> >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 9528a066cfa5..315876fa9d30 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -15171,6 +15171,25 @@ static int fixup_call_args(struct bpf_verifier_env *env)
> > >       return err;
> > >  }
> > >
> > > +static int fixup_xdp_kfunc_call(struct bpf_verifier_env *env, u32 func_id)
> > > +{
> > > +     struct bpf_prog_aux *aux = env->prog->aux;
> > > +     void *resolved = NULL;
> >
> > First I would like to say I really like the kfunc hints impementation.
> >
> > I am currently trying to test possible performace benefits of the unrolled
> > version in the ice driver. I was working on top of the RFC v2,
> > when I noticed a problem that also persists in this newer version.
> >
> > For debugging purposes, I have put the following logs in this place in code.
> >
> > printk(KERN_ERR "func_id=%u\n", func_id);
> > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=%u\n",
> >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED));
> > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP=%u\n",
> >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP));
> > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=%u\n",
> >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED));
> > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH=%u\n",
> >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH));
> >
> > Loading the program, which uses bpf_xdp_metadata_rx_timestamp_supported()
> > and bpf_xdp_metadata_rx_timestamp(), has resulted in such messages:
> >
> > [  412.611888] func_id=108131
> > [  412.611891] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
> > [  412.611892] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
> > [  412.611892] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
> > [  412.611893] XDP_METADATA_KFUNC_RX_HASH=108131
> > [  412.611894] func_id=108130
> > [  412.611894] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
> > [  412.611895] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
> > [  412.611895] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
> > [  412.611895] XDP_METADATA_KFUNC_RX_HASH=108131
> >
> > As you can see, I've got 108131 and 108130 IDs in program,
> > while 108126 and 108128 would be more reasonable.
> > It's hard to proceed with the implementation, when IDs cannot be sustainably
> > compared.
>
> Thanks for the report!
> Toke has reported a similar issue in [0], have you tried his patch?
> I've also tried to address it in v3 [1], could you retry on top of it?
> I'll try to insert your printk in my local build to see what happens
> with btf ids on my side. Will get back to you..
>
> 0: https://lore.kernel.org/bpf/87mt8e2a69.fsf@toke.dk/
> 1: https://lore.kernel.org/bpf/20221129193452.3448944-3-sdf@google.com/T/#u

Nope, even if I go back to v2, I still can't reproduce locally.
Somehow in my setup they are sorted properly :-/
Would appreciate it if you can test the v3 patch and confirm whether
it's fixed on your side or not.

> > Furthermore, dumped vmlinux BTF shows the IDs is in the exactly reversed
> > order:
> >
> > [108126] FUNC 'bpf_xdp_metadata_rx_hash' type_id=108125 linkage=static
> > [108128] FUNC 'bpf_xdp_metadata_rx_hash_supported' type_id=108127 linkage=static
> > [108130] FUNC 'bpf_xdp_metadata_rx_timestamp' type_id=108129 linkage=static
> > [108131] FUNC 'bpf_xdp_metadata_rx_timestamp_supported' type_id=108127 linkage=static
> >
> > > +
> > > +     if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED))
> > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp_supported;
> > > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
> > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp;
> > > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED))
> > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash_supported;
> > > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
> > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash;
> > > +
> > > +     if (resolved)
> > > +             return BPF_CALL_IMM(resolved);
> > > +     return 0;
> > > +}
> > > +
> >
> > My working tree (based on this version) is available on github [0]. Situation
> > is also described in the last commit message.
> > I would be great, if you could check, whether this behaviour can be reproduced
> > on your setup.
> >
> > [0] https://github.com/walking-machine/linux/tree/hints-v2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-11-30 20:17       ` Stanislav Fomichev
@ 2022-12-01 13:52         ` Larysa Zaremba
  2022-12-01 17:14           ` Stanislav Fomichev
  0 siblings, 1 reply; 50+ messages in thread
From: Larysa Zaremba @ 2022-12-01 13:52 UTC (permalink / raw)
  To: Stanislav Fomichev, toke
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Wed, Nov 30, 2022 at 12:17:39PM -0800, Stanislav Fomichev wrote:
> On Wed, Nov 30, 2022 at 11:06 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On Wed, Nov 30, 2022 at 9:38 AM Larysa Zaremba <larysa.zaremba@intel.com> wrote:
> > >
> > > On Mon, Nov 21, 2022 at 10:25:46AM -0800, Stanislav Fomichev wrote:
> > >
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 9528a066cfa5..315876fa9d30 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -15171,6 +15171,25 @@ static int fixup_call_args(struct bpf_verifier_env *env)
> > > >       return err;
> > > >  }
> > > >
> > > > +static int fixup_xdp_kfunc_call(struct bpf_verifier_env *env, u32 func_id)
> > > > +{
> > > > +     struct bpf_prog_aux *aux = env->prog->aux;
> > > > +     void *resolved = NULL;
> > >
> > > First I would like to say I really like the kfunc hints impementation.
> > >
> > > I am currently trying to test possible performace benefits of the unrolled
> > > version in the ice driver. I was working on top of the RFC v2,
> > > when I noticed a problem that also persists in this newer version.
> > >
> > > For debugging purposes, I have put the following logs in this place in code.
> > >
> > > printk(KERN_ERR "func_id=%u\n", func_id);
> > > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=%u\n",
> > >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED));
> > > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP=%u\n",
> > >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP));
> > > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=%u\n",
> > >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED));
> > > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH=%u\n",
> > >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH));
> > >
> > > Loading the program, which uses bpf_xdp_metadata_rx_timestamp_supported()
> > > and bpf_xdp_metadata_rx_timestamp(), has resulted in such messages:
> > >
> > > [  412.611888] func_id=108131
> > > [  412.611891] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
> > > [  412.611892] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
> > > [  412.611892] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
> > > [  412.611893] XDP_METADATA_KFUNC_RX_HASH=108131
> > > [  412.611894] func_id=108130
> > > [  412.611894] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
> > > [  412.611895] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
> > > [  412.611895] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
> > > [  412.611895] XDP_METADATA_KFUNC_RX_HASH=108131
> > >
> > > As you can see, I've got 108131 and 108130 IDs in program,
> > > while 108126 and 108128 would be more reasonable.
> > > It's hard to proceed with the implementation, when IDs cannot be sustainably
> > > compared.
> >
> > Thanks for the report!
> > Toke has reported a similar issue in [0], have you tried his patch?
> > I've also tried to address it in v3 [1], could you retry on top of it?
> > I'll try to insert your printk in my local build to see what happens
> > with btf ids on my side. Will get back to you..
> >
> > 0: https://lore.kernel.org/bpf/87mt8e2a69.fsf@toke.dk/
> > 1: https://lore.kernel.org/bpf/20221129193452.3448944-3-sdf@google.com/T/#u
> 
> Nope, even if I go back to v2, I still can't reproduce locally.
> Somehow in my setup they are sorted properly :-/
> Would appreciate it if you can test the v3 patch and confirm whether
> it's fixed on your side or not.
>

I've tested v3 and it looks like the isssue was resolved.
Thanks a lot!
 
> > > Furthermore, dumped vmlinux BTF shows the IDs is in the exactly reversed
> > > order:
> > >
> > > [108126] FUNC 'bpf_xdp_metadata_rx_hash' type_id=108125 linkage=static
> > > [108128] FUNC 'bpf_xdp_metadata_rx_hash_supported' type_id=108127 linkage=static
> > > [108130] FUNC 'bpf_xdp_metadata_rx_timestamp' type_id=108129 linkage=static
> > > [108131] FUNC 'bpf_xdp_metadata_rx_timestamp_supported' type_id=108127 linkage=static
> > >
> > > > +
> > > > +     if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED))
> > > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp_supported;
> > > > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
> > > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp;
> > > > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED))
> > > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash_supported;
> > > > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
> > > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash;
> > > > +
> > > > +     if (resolved)
> > > > +             return BPF_CALL_IMM(resolved);
> > > > +     return 0;
> > > > +}
> > > > +
> > >
> > > My working tree (based on this version) is available on github [0]. Situation
> > > is also described in the last commit message.
> > > I would be great, if you could check, whether this behaviour can be reproduced
> > > on your setup.
> > >
> > > [0] https://github.com/walking-machine/linux/tree/hints-v2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs
  2022-12-01 13:52         ` Larysa Zaremba
@ 2022-12-01 17:14           ` Stanislav Fomichev
  0 siblings, 0 replies; 50+ messages in thread
From: Stanislav Fomichev @ 2022-12-01 17:14 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: toke, bpf, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Thu, Dec 1, 2022 at 6:08 AM Larysa Zaremba <larysa.zaremba@intel.com> wrote:
>
> On Wed, Nov 30, 2022 at 12:17:39PM -0800, Stanislav Fomichev wrote:
> > On Wed, Nov 30, 2022 at 11:06 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > On Wed, Nov 30, 2022 at 9:38 AM Larysa Zaremba <larysa.zaremba@intel.com> wrote:
> > > >
> > > > On Mon, Nov 21, 2022 at 10:25:46AM -0800, Stanislav Fomichev wrote:
> > > >
> > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > index 9528a066cfa5..315876fa9d30 100644
> > > > > --- a/kernel/bpf/verifier.c
> > > > > +++ b/kernel/bpf/verifier.c
> > > > > @@ -15171,6 +15171,25 @@ static int fixup_call_args(struct bpf_verifier_env *env)
> > > > >       return err;
> > > > >  }
> > > > >
> > > > > +static int fixup_xdp_kfunc_call(struct bpf_verifier_env *env, u32 func_id)
> > > > > +{
> > > > > +     struct bpf_prog_aux *aux = env->prog->aux;
> > > > > +     void *resolved = NULL;
> > > >
> > > > First I would like to say I really like the kfunc hints impementation.
> > > >
> > > > I am currently trying to test possible performace benefits of the unrolled
> > > > version in the ice driver. I was working on top of the RFC v2,
> > > > when I noticed a problem that also persists in this newer version.
> > > >
> > > > For debugging purposes, I have put the following logs in this place in code.
> > > >
> > > > printk(KERN_ERR "func_id=%u\n", func_id);
> > > > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=%u\n",
> > > >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED));
> > > > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_TIMESTAMP=%u\n",
> > > >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP));
> > > > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=%u\n",
> > > >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED));
> > > > printk(KERN_ERR "XDP_METADATA_KFUNC_RX_HASH=%u\n",
> > > >        xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH));
> > > >
> > > > Loading the program, which uses bpf_xdp_metadata_rx_timestamp_supported()
> > > > and bpf_xdp_metadata_rx_timestamp(), has resulted in such messages:
> > > >
> > > > [  412.611888] func_id=108131
> > > > [  412.611891] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
> > > > [  412.611892] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
> > > > [  412.611892] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
> > > > [  412.611893] XDP_METADATA_KFUNC_RX_HASH=108131
> > > > [  412.611894] func_id=108130
> > > > [  412.611894] XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED=108126
> > > > [  412.611895] XDP_METADATA_KFUNC_RX_TIMESTAMP=108128
> > > > [  412.611895] XDP_METADATA_KFUNC_RX_HASH_SUPPORTED=108130
> > > > [  412.611895] XDP_METADATA_KFUNC_RX_HASH=108131
> > > >
> > > > As you can see, I've got 108131 and 108130 IDs in program,
> > > > while 108126 and 108128 would be more reasonable.
> > > > It's hard to proceed with the implementation, when IDs cannot be sustainably
> > > > compared.
> > >
> > > Thanks for the report!
> > > Toke has reported a similar issue in [0], have you tried his patch?
> > > I've also tried to address it in v3 [1], could you retry on top of it?
> > > I'll try to insert your printk in my local build to see what happens
> > > with btf ids on my side. Will get back to you..
> > >
> > > 0: https://lore.kernel.org/bpf/87mt8e2a69.fsf@toke.dk/
> > > 1: https://lore.kernel.org/bpf/20221129193452.3448944-3-sdf@google.com/T/#u
> >
> > Nope, even if I go back to v2, I still can't reproduce locally.
> > Somehow in my setup they are sorted properly :-/
> > Would appreciate it if you can test the v3 patch and confirm whether
> > it's fixed on your side or not.
> >
>
> I've tested v3 and it looks like the isssue was resolved.
> Thanks a lot!

Great, thank you for verifying!

> > > > Furthermore, dumped vmlinux BTF shows the IDs is in the exactly reversed
> > > > order:
> > > >
> > > > [108126] FUNC 'bpf_xdp_metadata_rx_hash' type_id=108125 linkage=static
> > > > [108128] FUNC 'bpf_xdp_metadata_rx_hash_supported' type_id=108127 linkage=static
> > > > [108130] FUNC 'bpf_xdp_metadata_rx_timestamp' type_id=108129 linkage=static
> > > > [108131] FUNC 'bpf_xdp_metadata_rx_timestamp_supported' type_id=108127 linkage=static
> > > >
> > > > > +
> > > > > +     if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED))
> > > > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp_supported;
> > > > > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
> > > > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_timestamp;
> > > > > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH_SUPPORTED))
> > > > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash_supported;
> > > > > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
> > > > > +             resolved = aux->xdp_netdev->netdev_ops->ndo_xdp_rx_hash;
> > > > > +
> > > > > +     if (resolved)
> > > > > +             return BPF_CALL_IMM(resolved);
> > > > > +     return 0;
> > > > > +}
> > > > > +
> > > >
> > > > My working tree (based on this version) is available on github [0]. Situation
> > > > is also described in the last commit message.
> > > > I would be great, if you could check, whether this behaviour can be reproduced
> > > > on your setup.
> > > >
> > > > [0] https://github.com/walking-machine/linux/tree/hints-v2

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-12-01 17:14 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-21 18:25 [PATCH bpf-next v2 0/8] xdp: hints via kfuncs Stanislav Fomichev
2022-11-21 18:25 ` [PATCH bpf-next v2 1/8] bpf: Document XDP RX metadata Stanislav Fomichev
2022-11-21 18:25 ` [PATCH bpf-next v2 2/8] bpf: XDP metadata RX kfuncs Stanislav Fomichev
2022-11-23  6:34   ` Martin KaFai Lau
2022-11-23 18:43     ` Stanislav Fomichev
2022-11-23 14:24   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-23 18:43     ` Stanislav Fomichev
2022-11-25 17:53   ` Toke Høiland-Jørgensen
2022-11-28 18:53     ` Stanislav Fomichev
2022-11-28 19:21       ` Stanislav Fomichev
2022-11-28 22:25         ` Toke Høiland-Jørgensen
2022-11-28 22:10       ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-30 17:24   ` Larysa Zaremba
2022-11-30 19:06     ` Stanislav Fomichev
2022-11-30 20:17       ` Stanislav Fomichev
2022-12-01 13:52         ` Larysa Zaremba
2022-12-01 17:14           ` Stanislav Fomichev
2022-11-21 18:25 ` [PATCH bpf-next v2 3/8] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-21 18:25 ` [PATCH bpf-next v2 4/8] veth: Support RX XDP metadata Stanislav Fomichev
2022-11-21 18:25 ` [PATCH bpf-next v2 5/8] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-29 10:06   ` Anton Protopopov
2022-11-29 18:52     ` Stanislav Fomichev
2022-11-21 18:25 ` [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-22 13:49   ` Tariq Toukan
2022-11-22 18:08     ` Stanislav Fomichev
2022-11-23 14:33   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-23 18:26     ` Stanislav Fomichev
2022-11-23 19:14       ` Jakub Kicinski
2022-11-23 19:52         ` sdf
2022-11-23 21:54           ` Maciej Fijalkowski
2022-11-23 21:55           ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-24  1:47             ` Jakub Kicinski
2022-11-24 14:39               ` Toke Høiland-Jørgensen
2022-11-24 15:17                 ` Maciej Fijalkowski
2022-11-24 16:11                   ` Maciej Fijalkowski
2022-11-25  0:36                     ` Toke Høiland-Jørgensen
2022-11-28 21:58                       ` Stanislav Fomichev
2022-11-28 22:11                         ` Toke Høiland-Jørgensen
2022-11-21 18:25 ` [PATCH bpf-next v2 7/8] mxl4: Support RX XDP metadata Stanislav Fomichev
2022-11-22 13:50   ` Tariq Toukan
2022-11-22 18:08     ` Stanislav Fomichev
2022-11-21 18:25 ` [PATCH bpf-next v2 8/8] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
2022-11-23 14:26   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-23 18:29     ` Stanislav Fomichev
2022-11-23 19:17       ` Jakub Kicinski
2022-11-23 19:54         ` Stanislav Fomichev
2022-11-23 14:46 ` [PATCH bpf-next 1/2] xdp: Add drv_priv pointer to struct xdp_buff Toke Høiland-Jørgensen
2022-11-23 14:46   ` [PATCH bpf-next 2/2] mlx5: Support XDP RX metadata Toke Høiland-Jørgensen
2022-11-23 22:29     ` [xdp-hints] " Saeed Mahameed
2022-11-23 22:44       ` Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).