All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v4 00/15] xdp: hints via kfuncs
@ 2022-12-13  2:35 Stanislav Fomichev
  2022-12-13  2:35 ` [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata Stanislav Fomichev
                   ` (14 more replies)
  0 siblings, 15 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Please see the first patch in the series for the overall
design and use-cases.

See the following email from Toke for the per-packet metadata overhead:
https://lore.kernel.org/bpf/20221206024554.3826186-1-sdf@google.com/T/#m49d48ea08d525ec88360c7d14c4d34fb0e45e798

Recent changes:

- Drop _supported kfuncs, return status from the existing ones,
  return the actual payload via arguments (Jakub)

- Use 'device-bound' instead of 'offloaded' in existing error message (Jakub)

- Move offload init into late_initcall (Jakub)

- Separate xdp_metadata_ops to host netdev kfunc pointers (Jakub)

- Remove forward declarations (Jakub)

- Rename more offload routines to dev_bound (Jakub)
  bpf_offload_resolve_kfunc -> bpf_dev_bound_resolve_kfunc
  bpf_offload_bound_netdev_unregister -> bpf_dev_bound_netdev_unregister
  bpf_prog_offload_init -> bpf_prog_dev_bound_init
  bpf_prog_offload_destroy -> bpf_prog_dev_bound_destroy
  maybe_remove_bound_netdev -> bpf_dev_bound_try_remove_netdev

- Move bpf_prog_is_dev_bound check into bpf_prog_map_compatible (Toke)

- Prohibit metadata kfuncs unless device-bound (Toke)

- Adjust selftest to exercise freplace + include the path (Toke)

- Take rtnl in bpf_prog_offload_destroy to avoid the race (Martin)

- BPF_F_XDP_HAS_METADATA -> BPF_F_XDP_DEV_BOUND_ONLY (Martin)

- Prohibit only metadata kfuncs, not all (Alexei/Martin)

- Try to fix xdp_hw_metadata.c build issue on CI (Alexei)

  Wasn't able to reproduce it locally, so trying my best guess...

- mlx4 -> mlx4_en (Tariq)

  Plus other issues like using net/mlx4 prefix and using mlx4_en_xdp_buff
  instead of mlx4_xdp_buff. Applied those same patterns to mlx5.

- Separate device-bound changes into separate patch to make it easier to
  review

Prior art (to record pros/cons for different approaches):

- Stable UAPI approach:
  https://lore.kernel.org/bpf/20220628194812.1453059-1-alexandr.lobakin@intel.com/
- Metadata+BTF_ID appoach:
  https://lore.kernel.org/bpf/166256538687.1434226.15760041133601409770.stgit@firesoul/
- v3:
  https://lore.kernel.org/bpf/20221206024554.3826186-1-sdf@google.com/
- v2:
  https://lore.kernel.org/bpf/20221121182552.2152891-1-sdf@google.com/
- v1:
  https://lore.kernel.org/bpf/20221115030210.3159213-1-sdf@google.com/
- kfuncs v2 RFC:
  https://lore.kernel.org/bpf/20221027200019.4106375-1-sdf@google.com/
- kfuncs v1 RFC:
  https://lore.kernel.org/bpf/20221104032532.1615099-1-sdf@google.com/

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org

Stanislav Fomichev (11):
  bpf: Document XDP RX metadata
  bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded
  bpf: Introduce device-bound XDP programs
  selftests/bpf: Update expected test_offload.py messages
  bpf: XDP metadata RX kfuncs
  veth: Introduce veth_xdp_buff wrapper for xdp_buff
  veth: Support RX XDP metadata
  selftests/bpf: Verify xdp_metadata xdp->af_xdp path
  net/mlx4_en: Introduce wrapper for xdp_buff
  net/mlx4_en: Support RX XDP metadata
  selftests/bpf: Simple program to dump XDP RX metadata

Toke Høiland-Jørgensen (4):
  bpf: Support consuming XDP HW metadata from fext programs
  xsk: Add cb area to struct xdp_buff_xsk
  net/mlx5e: Introduce wrapper for xdp_buff
  net/mlx5e: Support RX XDP metadata

 Documentation/bpf/xdp-rx-metadata.rst         |  90 ++++
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |  13 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |   6 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  63 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  11 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  26 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.h  |  11 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  35 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |   2 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   6 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  99 +++--
 drivers/net/netdevsim/bpf.c                   |   4 -
 drivers/net/veth.c                            |  80 ++--
 include/linux/bpf.h                           |  39 +-
 include/linux/netdevice.h                     |   7 +
 include/net/xdp.h                             |  25 ++
 include/net/xsk_buff_pool.h                   |   5 +
 include/uapi/linux/bpf.h                      |   5 +
 kernel/bpf/core.c                             |  11 +-
 kernel/bpf/offload.c                          | 360 +++++++++------
 kernel/bpf/syscall.c                          |  35 +-
 kernel/bpf/verifier.c                         |  56 ++-
 net/core/dev.c                                |   9 +-
 net/core/filter.c                             |   2 +-
 net/core/xdp.c                                |  44 ++
 tools/include/uapi/linux/bpf.h                |   5 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   8 +-
 .../selftests/bpf/prog_tests/xdp_metadata.c   | 412 ++++++++++++++++++
 .../selftests/bpf/progs/xdp_hw_metadata.c     |  81 ++++
 .../selftests/bpf/progs/xdp_metadata.c        |  56 +++
 .../selftests/bpf/progs/xdp_metadata2.c       |  23 +
 tools/testing/selftests/bpf/test_offload.py   |  10 +-
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 405 +++++++++++++++++
 tools/testing/selftests/bpf/xdp_metadata.h    |  15 +
 36 files changed, 1780 insertions(+), 285 deletions(-)
 create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata2.c
 create mode 100644 tools/testing/selftests/bpf/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/xdp_metadata.h

-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
@ 2022-12-13  2:35 ` Stanislav Fomichev
  2022-12-13 16:37   ` David Vernet
                     ` (2 more replies)
  2022-12-13  2:35 ` [PATCH bpf-next v4 02/15] bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded Stanislav Fomichev
                   ` (13 subsequent siblings)
  14 siblings, 3 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Document all current use-cases and assumptions.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)
 create mode 100644 Documentation/bpf/xdp-rx-metadata.rst

diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
new file mode 100644
index 000000000000..498eae718275
--- /dev/null
+++ b/Documentation/bpf/xdp-rx-metadata.rst
@@ -0,0 +1,90 @@
+===============
+XDP RX Metadata
+===============
+
+XDP programs support creating and passing custom metadata via
+``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
+entities:
+
+1. ``AF_XDP`` consumer.
+2. Kernel core stack via ``XDP_PASS``.
+3. Another device via ``bpf_redirect_map``.
+4. Other BPF programs via ``bpf_tail_call``.
+
+General Design
+==============
+
+XDP has access to a set of kfuncs to manipulate the metadata. Every
+device driver implements these kfuncs. The set of kfuncs is
+declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
+
+Currently, the following kfuncs are supported. In the future, as more
+metadata is supported, this set will grow:
+
+- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
+  indicate whether the device supports RX timestamps
+- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp
+- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to
+  indicate whether the device supports RX hash
+- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash
+
+Within the XDP frame, the metadata layout is as follows::
+
+  +----------+-----------------+------+
+  | headroom | custom metadata | data |
+  +----------+-----------------+------+
+             ^                 ^
+             |                 |
+   xdp_buff->data_meta   xdp_buff->data
+
+AF_XDP
+======
+
+``AF_XDP`` use-case implies that there is a contract between the BPF program
+that redirects XDP frames into the ``XSK`` and the final consumer.
+Thus the BPF program manually allocates a fixed number of
+bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
+of kfuncs to populate it. User-space ``XSK`` consumer, looks
+at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
+
+Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
+
+  +----------+-----------------+------+
+  | headroom | custom metadata | data |
+  +----------+-----------------+------+
+                               ^
+                               |
+                        rx_desc->address
+
+XDP_PASS
+========
+
+This is the path where the packets processed by the XDP program are passed
+into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
+Currently, every driver has a custom kernel code to parse the descriptors and
+populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
+In the future, we'd like to support a case where XDP program can override
+some of that metadata.
+
+The plan of record is to make this path similar to ``bpf_redirect_map``
+so the program can control which metadata is passed to the skb layer.
+
+bpf_redirect_map
+================
+
+``bpf_redirect_map`` can redirect the frame to a different device.
+In this case we don't know ahead of time whether that final consumer
+will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
+Additionally, the final consumer doesn't have access to the original
+hardware descriptor and can't access any of the original metadata.
+
+For this use-case, only custom metadata is currently supported. If
+the frame is eventually passed to the kernel, the skb created from such
+a frame won't have any skb metadata. The ``XSK`` consumer will only
+have access to the custom metadata.
+
+bpf_tail_call
+=============
+
+No special handling here. Tail-called program operates on the same context
+as the original one.
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 02/15] bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
  2022-12-13  2:35 ` [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata Stanislav Fomichev
@ 2022-12-13  2:35 ` Stanislav Fomichev
  2022-12-13  2:35 ` [PATCH bpf-next v4 03/15] bpf: Introduce device-bound XDP programs Stanislav Fomichev
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Jakub Kicinski, David Ahern,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

BPF offloading infra will be reused to implement
bound-but-not-offloaded bpf programs. Rename existing
helpers for clarity. No functional changes.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h   |  8 ++++----
 kernel/bpf/core.c     |  4 ++--
 kernel/bpf/offload.c  |  4 ++--
 kernel/bpf/syscall.c  | 22 +++++++++++-----------
 kernel/bpf/verifier.c | 18 +++++++++---------
 net/core/dev.c        |  4 ++--
 net/core/filter.c     |  2 +-
 7 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3de24cfb7a3d..f8d3a93703f3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2481,12 +2481,12 @@ void unpriv_ebpf_notify(int new_state);
 #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
 int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
 
-static inline bool bpf_prog_is_dev_bound(const struct bpf_prog_aux *aux)
+static inline bool bpf_prog_is_offloaded(const struct bpf_prog_aux *aux)
 {
 	return aux->offload_requested;
 }
 
-static inline bool bpf_map_is_dev_bound(struct bpf_map *map)
+static inline bool bpf_map_is_offloaded(struct bpf_map *map)
 {
 	return unlikely(map->ops == &bpf_map_offload_ops);
 }
@@ -2513,12 +2513,12 @@ static inline int bpf_prog_offload_init(struct bpf_prog *prog,
 	return -EOPNOTSUPP;
 }
 
-static inline bool bpf_prog_is_dev_bound(struct bpf_prog_aux *aux)
+static inline bool bpf_prog_is_offloaded(struct bpf_prog_aux *aux)
 {
 	return false;
 }
 
-static inline bool bpf_map_is_dev_bound(struct bpf_map *map)
+static inline bool bpf_map_is_offloaded(struct bpf_map *map)
 {
 	return false;
 }
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 2e57fc839a5c..641ab412ad7e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2183,7 +2183,7 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
 	 * valid program, which in this case would simply not
 	 * be JITed, but falls back to the interpreter.
 	 */
-	if (!bpf_prog_is_dev_bound(fp->aux)) {
+	if (!bpf_prog_is_offloaded(fp->aux)) {
 		*err = bpf_prog_alloc_jited_linfo(fp);
 		if (*err)
 			return fp;
@@ -2554,7 +2554,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 #endif
 	bpf_free_used_maps(aux);
 	bpf_free_used_btfs(aux);
-	if (bpf_prog_is_dev_bound(aux))
+	if (bpf_prog_is_offloaded(aux))
 		bpf_prog_offload_destroy(aux->prog);
 #ifdef CONFIG_PERF_EVENTS
 	if (aux->prog->has_callchain_buf)
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 13e4efc971e6..f5769a8ecbee 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -549,7 +549,7 @@ static bool __bpf_offload_dev_match(struct bpf_prog *prog,
 	struct bpf_offload_netdev *ondev1, *ondev2;
 	struct bpf_prog_offload *offload;
 
-	if (!bpf_prog_is_dev_bound(prog->aux))
+	if (!bpf_prog_is_offloaded(prog->aux))
 		return false;
 
 	offload = prog->aux->offload;
@@ -581,7 +581,7 @@ bool bpf_offload_prog_map_match(struct bpf_prog *prog, struct bpf_map *map)
 	struct bpf_offloaded_map *offmap;
 	bool ret;
 
-	if (!bpf_map_is_dev_bound(map))
+	if (!bpf_map_is_offloaded(map))
 		return bpf_map_offload_neutral(map);
 	offmap = map_to_offmap(map);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35972afb6850..13bc96035116 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -181,7 +181,7 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
 	int err;
 
 	/* Need to create a kthread, thus must support schedule */
-	if (bpf_map_is_dev_bound(map)) {
+	if (bpf_map_is_offloaded(map)) {
 		return bpf_map_offload_update_elem(map, key, value, flags);
 	} else if (map->map_type == BPF_MAP_TYPE_CPUMAP ||
 		   map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
@@ -238,7 +238,7 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	void *ptr;
 	int err;
 
-	if (bpf_map_is_dev_bound(map))
+	if (bpf_map_is_offloaded(map))
 		return bpf_map_offload_lookup_elem(map, key, value);
 
 	bpf_disable_instrumentation();
@@ -1483,7 +1483,7 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
 		goto err_put;
 	}
 
-	if (bpf_map_is_dev_bound(map)) {
+	if (bpf_map_is_offloaded(map)) {
 		err = bpf_map_offload_delete_elem(map, key);
 		goto out;
 	} else if (IS_FD_PROG_ARRAY(map) ||
@@ -1547,7 +1547,7 @@ static int map_get_next_key(union bpf_attr *attr)
 	if (!next_key)
 		goto free_key;
 
-	if (bpf_map_is_dev_bound(map)) {
+	if (bpf_map_is_offloaded(map)) {
 		err = bpf_map_offload_get_next_key(map, key, next_key);
 		goto out;
 	}
@@ -1605,7 +1605,7 @@ int generic_map_delete_batch(struct bpf_map *map,
 				   map->key_size))
 			break;
 
-		if (bpf_map_is_dev_bound(map)) {
+		if (bpf_map_is_offloaded(map)) {
 			err = bpf_map_offload_delete_elem(map, key);
 			break;
 		}
@@ -1851,7 +1851,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 		   map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
 		   map->map_type == BPF_MAP_TYPE_LRU_HASH ||
 		   map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
-		if (!bpf_map_is_dev_bound(map)) {
+		if (!bpf_map_is_offloaded(map)) {
 			bpf_disable_instrumentation();
 			rcu_read_lock();
 			err = map->ops->map_lookup_and_delete_elem(map, key, value, attr->flags);
@@ -1944,7 +1944,7 @@ static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog)
 	if (!ops)
 		return -EINVAL;
 
-	if (!bpf_prog_is_dev_bound(prog->aux))
+	if (!bpf_prog_is_offloaded(prog->aux))
 		prog->aux->ops = ops;
 	else
 		prog->aux->ops = &bpf_offload_prog_ops;
@@ -2255,7 +2255,7 @@ bool bpf_prog_get_ok(struct bpf_prog *prog,
 
 	if (prog->type != *attach_type)
 		return false;
-	if (bpf_prog_is_dev_bound(prog->aux) && !attach_drv)
+	if (bpf_prog_is_offloaded(prog->aux) && !attach_drv)
 		return false;
 
 	return true;
@@ -2598,7 +2598,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
 	atomic64_set(&prog->aux->refcnt, 1);
 	prog->gpl_compatible = is_gpl ? 1 : 0;
 
-	if (bpf_prog_is_dev_bound(prog->aux)) {
+	if (bpf_prog_is_offloaded(prog->aux)) {
 		err = bpf_prog_offload_init(prog, attr);
 		if (err)
 			goto free_prog_sec;
@@ -3997,7 +3997,7 @@ static int bpf_prog_get_info_by_fd(struct file *file,
 			return -EFAULT;
 	}
 
-	if (bpf_prog_is_dev_bound(prog->aux)) {
+	if (bpf_prog_is_offloaded(prog->aux)) {
 		err = bpf_prog_offload_info_fill(&info, prog);
 		if (err)
 			return err;
@@ -4225,7 +4225,7 @@ static int bpf_map_get_info_by_fd(struct file *file,
 	}
 	info.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id;
 
-	if (bpf_map_is_dev_bound(map)) {
+	if (bpf_map_is_offloaded(map)) {
 		err = bpf_map_offload_info_fill(&info, map);
 		if (err)
 			return err;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a5255a0dcbb6..203d8cfeda70 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13806,7 +13806,7 @@ static int do_check(struct bpf_verifier_env *env)
 			env->prev_log_len = env->log.len_used;
 		}
 
-		if (bpf_prog_is_dev_bound(env->prog->aux)) {
+		if (bpf_prog_is_offloaded(env->prog->aux)) {
 			err = bpf_prog_offload_verify_insn(env, env->insn_idx,
 							   env->prev_insn_idx);
 			if (err)
@@ -14286,7 +14286,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		}
 	}
 
-	if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) &&
+	if ((bpf_prog_is_offloaded(prog->aux) || bpf_map_is_offloaded(map)) &&
 	    !bpf_offload_prog_map_match(prog, map)) {
 		verbose(env, "offload device mismatch between prog and map\n");
 		return -EINVAL;
@@ -14767,7 +14767,7 @@ static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
 	unsigned int orig_prog_len = env->prog->len;
 	int err;
 
-	if (bpf_prog_is_dev_bound(env->prog->aux))
+	if (bpf_prog_is_offloaded(env->prog->aux))
 		bpf_prog_offload_remove_insns(env, off, cnt);
 
 	err = bpf_remove_insns(env->prog, off, cnt);
@@ -14848,7 +14848,7 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
 		else
 			continue;
 
-		if (bpf_prog_is_dev_bound(env->prog->aux))
+		if (bpf_prog_is_offloaded(env->prog->aux))
 			bpf_prog_offload_replace_insn(env, i, &ja);
 
 		memcpy(insn, &ja, sizeof(ja));
@@ -15035,7 +15035,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		}
 	}
 
-	if (bpf_prog_is_dev_bound(env->prog->aux))
+	if (bpf_prog_is_offloaded(env->prog->aux))
 		return 0;
 
 	insn = env->prog->insnsi + delta;
@@ -15435,7 +15435,7 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 	int err = 0;
 
 	if (env->prog->jit_requested &&
-	    !bpf_prog_is_dev_bound(env->prog->aux)) {
+	    !bpf_prog_is_offloaded(env->prog->aux)) {
 		err = jit_subprogs(env);
 		if (err == 0)
 			return 0;
@@ -16922,7 +16922,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
 	if (ret < 0)
 		goto skip_full_check;
 
-	if (bpf_prog_is_dev_bound(env->prog->aux)) {
+	if (bpf_prog_is_offloaded(env->prog->aux)) {
 		ret = bpf_prog_offload_verifier_prep(env->prog);
 		if (ret)
 			goto skip_full_check;
@@ -16935,7 +16935,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
 	ret = do_check_subprogs(env);
 	ret = ret ?: do_check_main(env);
 
-	if (ret == 0 && bpf_prog_is_dev_bound(env->prog->aux))
+	if (ret == 0 && bpf_prog_is_offloaded(env->prog->aux))
 		ret = bpf_prog_offload_finalize(env);
 
 skip_full_check:
@@ -16970,7 +16970,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
 	/* do 32-bit optimization after insn patching has done so those patched
 	 * insns could be handled correctly.
 	 */
-	if (ret == 0 && !bpf_prog_is_dev_bound(env->prog->aux)) {
+	if (ret == 0 && !bpf_prog_is_offloaded(env->prog->aux)) {
 		ret = opt_subreg_zext_lo32_rnd_hi32(env, attr);
 		env->prog->aux->verifier_zext = bpf_jit_needs_zext() ? !ret
 								     : false;
diff --git a/net/core/dev.c b/net/core/dev.c
index 7627c475d991..5d51999cba30 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9224,8 +9224,8 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
 			NL_SET_ERR_MSG(extack, "Native and generic XDP can't be active at the same time");
 			return -EEXIST;
 		}
-		if (!offload && bpf_prog_is_dev_bound(new_prog->aux)) {
-			NL_SET_ERR_MSG(extack, "Using device-bound program without HW_MODE flag is not supported");
+		if (!offload && bpf_prog_is_offloaded(new_prog->aux)) {
+			NL_SET_ERR_MSG(extack, "Using offloaded program without HW_MODE flag is not supported");
 			return -EINVAL;
 		}
 		if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
diff --git a/net/core/filter.c b/net/core/filter.c
index 929358677183..ac1fa0ef6f35 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8728,7 +8728,7 @@ static bool xdp_is_valid_access(int off, int size,
 	}
 
 	if (type == BPF_WRITE) {
-		if (bpf_prog_is_dev_bound(prog->aux)) {
+		if (bpf_prog_is_offloaded(prog->aux)) {
 			switch (off) {
 			case offsetof(struct xdp_md, rx_queue_index):
 				return __is_valid_xdp_access(off, size);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 03/15] bpf: Introduce device-bound XDP programs
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
  2022-12-13  2:35 ` [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata Stanislav Fomichev
  2022-12-13  2:35 ` [PATCH bpf-next v4 02/15] bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded Stanislav Fomichev
@ 2022-12-13  2:35 ` Stanislav Fomichev
  2022-12-13 23:25   ` Martin KaFai Lau
  2022-12-13  2:35 ` [PATCH bpf-next v4 04/15] selftests/bpf: Update expected test_offload.py messages Stanislav Fomichev
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way
to associate a netdev with a BPF program at load time.

Some existing 'offloaded' routines are renamed to 'dev_bound' for
consistency with the rest.

Also moved a bunch of code around to avoid forward declarations.

netdevsim checks are dropped in favor of generic check in dev_xdp_attach.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/netdevsim/bpf.c    |   4 -
 include/linux/bpf.h            |  24 ++-
 include/uapi/linux/bpf.h       |   5 +
 kernel/bpf/core.c              |   4 +-
 kernel/bpf/offload.c           | 293 ++++++++++++++++++++-------------
 kernel/bpf/syscall.c           |   9 +-
 net/core/dev.c                 |   5 +
 tools/include/uapi/linux/bpf.h |   5 +
 8 files changed, 218 insertions(+), 131 deletions(-)

diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 50854265864d..f60eb97e3a62 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -315,10 +315,6 @@ nsim_setup_prog_hw_checks(struct netdevsim *ns, struct netdev_bpf *bpf)
 		NSIM_EA(bpf->extack, "xdpoffload of non-bound program");
 		return -EINVAL;
 	}
-	if (!bpf_offload_dev_match(bpf->prog, ns->netdev)) {
-		NSIM_EA(bpf->extack, "program bound to different dev");
-		return -EINVAL;
-	}
 
 	state = bpf->prog->aux->offload->dev_priv;
 	if (WARN_ON(strcmp(state->state, "xlated"))) {
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f8d3a93703f3..ca22e8b8bd82 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1261,7 +1261,8 @@ struct bpf_prog_aux {
 	enum bpf_prog_type saved_dst_prog_type;
 	enum bpf_attach_type saved_dst_attach_type;
 	bool verifier_zext; /* Zero extensions has been inserted by verifier. */
-	bool offload_requested;
+	bool dev_bound; /* Program is bound to the netdev. */
+	bool offload_requested; /* Program is bound and offloaded to the netdev. */
 	bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
 	bool func_proto_unreliable;
 	bool sleepable;
@@ -2451,7 +2452,7 @@ void __bpf_free_used_maps(struct bpf_prog_aux *aux,
 bool bpf_prog_get_ok(struct bpf_prog *, enum bpf_prog_type *, bool);
 
 int bpf_prog_offload_compile(struct bpf_prog *prog);
-void bpf_prog_offload_destroy(struct bpf_prog *prog);
+void bpf_prog_dev_bound_destroy(struct bpf_prog *prog);
 int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
 			       struct bpf_prog *prog);
 
@@ -2479,7 +2480,13 @@ bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev);
 void unpriv_ebpf_notify(int new_state);
 
 #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
-int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
+int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr);
+void bpf_dev_bound_netdev_unregister(struct net_device *dev);
+
+static inline bool bpf_prog_is_dev_bound(const struct bpf_prog_aux *aux)
+{
+	return aux->dev_bound;
+}
 
 static inline bool bpf_prog_is_offloaded(const struct bpf_prog_aux *aux)
 {
@@ -2507,12 +2514,21 @@ void sock_map_unhash(struct sock *sk);
 void sock_map_destroy(struct sock *sk);
 void sock_map_close(struct sock *sk, long timeout);
 #else
-static inline int bpf_prog_offload_init(struct bpf_prog *prog,
+static inline int bpf_prog_dev_bound_init(struct bpf_prog *prog,
 					union bpf_attr *attr)
 {
 	return -EOPNOTSUPP;
 }
 
+static inline void bpf_dev_bound_netdev_unregister(struct net_device *dev)
+{
+}
+
+static inline bool bpf_prog_is_dev_bound(const struct bpf_prog_aux *aux)
+{
+	return false;
+}
+
 static inline bool bpf_prog_is_offloaded(struct bpf_prog_aux *aux)
 {
 	return false;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 464ca3f01fe7..fa28603a48e7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1156,6 +1156,11 @@ enum bpf_link_type {
  */
 #define BPF_F_XDP_HAS_FRAGS	(1U << 5)
 
+/* If BPF_F_XDP_DEV_BOUND_ONLY is used in BPF_PROG_LOAD command, the loaded
+ * program becomes device-bound but can access XDP metadata.
+ */
+#define BPF_F_XDP_DEV_BOUND_ONLY	(1U << 6)
+
 /* link_create.kprobe_multi.flags used in LINK_CREATE command for
  * BPF_TRACE_KPROBE_MULTI attach type to create return probe.
  */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 641ab412ad7e..d434a994ee04 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2554,8 +2554,8 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 #endif
 	bpf_free_used_maps(aux);
 	bpf_free_used_btfs(aux);
-	if (bpf_prog_is_offloaded(aux))
-		bpf_prog_offload_destroy(aux->prog);
+	if (bpf_prog_is_dev_bound(aux))
+		bpf_prog_dev_bound_destroy(aux->prog);
 #ifdef CONFIG_PERF_EVENTS
 	if (aux->prog->has_callchain_buf)
 		put_callchain_buffers();
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index f5769a8ecbee..f714c941f8ea 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -41,7 +41,7 @@ struct bpf_offload_dev {
 struct bpf_offload_netdev {
 	struct rhash_head l;
 	struct net_device *netdev;
-	struct bpf_offload_dev *offdev;
+	struct bpf_offload_dev *offdev; /* NULL when bound-only */
 	struct list_head progs;
 	struct list_head maps;
 	struct list_head offdev_netdevs;
@@ -56,7 +56,6 @@ static const struct rhashtable_params offdevs_params = {
 };
 
 static struct rhashtable offdevs;
-static bool offdevs_inited;
 
 static int bpf_dev_offload_check(struct net_device *netdev)
 {
@@ -72,12 +71,124 @@ bpf_offload_find_netdev(struct net_device *netdev)
 {
 	lockdep_assert_held(&bpf_devs_lock);
 
-	if (!offdevs_inited)
-		return NULL;
 	return rhashtable_lookup_fast(&offdevs, &netdev, offdevs_params);
 }
 
-int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
+static int __bpf_offload_dev_netdev_register(struct bpf_offload_dev *offdev,
+					     struct net_device *netdev)
+{
+	struct bpf_offload_netdev *ondev;
+	int err;
+
+	ondev = kzalloc(sizeof(*ondev), GFP_KERNEL);
+	if (!ondev)
+		return -ENOMEM;
+
+	ondev->netdev = netdev;
+	ondev->offdev = offdev;
+	INIT_LIST_HEAD(&ondev->progs);
+	INIT_LIST_HEAD(&ondev->maps);
+
+	err = rhashtable_insert_fast(&offdevs, &ondev->l, offdevs_params);
+	if (err) {
+		netdev_warn(netdev, "failed to register for BPF offload\n");
+		goto err_unlock_free;
+	}
+
+	if (offdev)
+		list_add(&ondev->offdev_netdevs, &offdev->netdevs);
+	return 0;
+
+err_unlock_free:
+	up_write(&bpf_devs_lock);
+	kfree(ondev);
+	return err;
+}
+
+static void __bpf_prog_dev_bound_destroy(struct bpf_prog *prog)
+{
+	struct bpf_prog_offload *offload = prog->aux->offload;
+
+	if (offload->dev_state)
+		offload->offdev->ops->destroy(prog);
+
+	/* Make sure BPF_PROG_GET_NEXT_ID can't find this dead program */
+	bpf_prog_free_id(prog, true);
+
+	list_del_init(&offload->offloads);
+	kfree(offload);
+	prog->aux->offload = NULL;
+}
+
+static int bpf_map_offload_ndo(struct bpf_offloaded_map *offmap,
+			       enum bpf_netdev_command cmd)
+{
+	struct netdev_bpf data = {};
+	struct net_device *netdev;
+
+	ASSERT_RTNL();
+
+	data.command = cmd;
+	data.offmap = offmap;
+	/* Caller must make sure netdev is valid */
+	netdev = offmap->netdev;
+
+	return netdev->netdev_ops->ndo_bpf(netdev, &data);
+}
+
+static void __bpf_map_offload_destroy(struct bpf_offloaded_map *offmap)
+{
+	WARN_ON(bpf_map_offload_ndo(offmap, BPF_OFFLOAD_MAP_FREE));
+	/* Make sure BPF_MAP_GET_NEXT_ID can't find this dead map */
+	bpf_map_free_id(&offmap->map, true);
+	list_del_init(&offmap->offloads);
+	offmap->netdev = NULL;
+}
+
+static void __bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
+						struct net_device *netdev)
+{
+	struct bpf_offload_netdev *ondev, *altdev = NULL;
+	struct bpf_offloaded_map *offmap, *mtmp;
+	struct bpf_prog_offload *offload, *ptmp;
+
+	ASSERT_RTNL();
+
+	ondev = rhashtable_lookup_fast(&offdevs, &netdev, offdevs_params);
+	if (WARN_ON(!ondev))
+		return;
+
+	WARN_ON(rhashtable_remove_fast(&offdevs, &ondev->l, offdevs_params));
+
+	/* Try to move the objects to another netdev of the device */
+	if (offdev) {
+		list_del(&ondev->offdev_netdevs);
+		altdev = list_first_entry_or_null(&offdev->netdevs,
+						  struct bpf_offload_netdev,
+						  offdev_netdevs);
+	}
+
+	if (altdev) {
+		list_for_each_entry(offload, &ondev->progs, offloads)
+			offload->netdev = altdev->netdev;
+		list_splice_init(&ondev->progs, &altdev->progs);
+
+		list_for_each_entry(offmap, &ondev->maps, offloads)
+			offmap->netdev = altdev->netdev;
+		list_splice_init(&ondev->maps, &altdev->maps);
+	} else {
+		list_for_each_entry_safe(offload, ptmp, &ondev->progs, offloads)
+			__bpf_prog_dev_bound_destroy(offload->prog);
+		list_for_each_entry_safe(offmap, mtmp, &ondev->maps, offloads)
+			__bpf_map_offload_destroy(offmap);
+	}
+
+	WARN_ON(!list_empty(&ondev->progs));
+	WARN_ON(!list_empty(&ondev->maps));
+	kfree(ondev);
+}
+
+int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr)
 {
 	struct bpf_offload_netdev *ondev;
 	struct bpf_prog_offload *offload;
@@ -87,7 +198,7 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
 	    attr->prog_type != BPF_PROG_TYPE_XDP)
 		return -EINVAL;
 
-	if (attr->prog_flags)
+	if (attr->prog_flags & ~BPF_F_XDP_DEV_BOUND_ONLY)
 		return -EINVAL;
 
 	offload = kzalloc(sizeof(*offload), GFP_USER);
@@ -102,11 +213,25 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
 	if (err)
 		goto err_maybe_put;
 
+	prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_DEV_BOUND_ONLY);
+
 	down_write(&bpf_devs_lock);
 	ondev = bpf_offload_find_netdev(offload->netdev);
 	if (!ondev) {
-		err = -EINVAL;
-		goto err_unlock;
+		if (!bpf_prog_is_offloaded(prog->aux)) {
+			/* When only binding to the device, explicitly
+			 * create an entry in the hashtable. See related
+			 * bpf_dev_bound_try_remove_netdev.
+			 */
+			err = __bpf_offload_dev_netdev_register(NULL, offload->netdev);
+			if (err)
+				goto err_unlock;
+			ondev = bpf_offload_find_netdev(offload->netdev);
+		}
+		if (!ondev) {
+			err = -EINVAL;
+			goto err_unlock;
+		}
 	}
 	offload->offdev = ondev->offdev;
 	prog->aux->offload = offload;
@@ -209,27 +334,28 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
 	up_read(&bpf_devs_lock);
 }
 
-static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
+static void bpf_dev_bound_try_remove_netdev(struct net_device *dev)
 {
-	struct bpf_prog_offload *offload = prog->aux->offload;
-
-	if (offload->dev_state)
-		offload->offdev->ops->destroy(prog);
+	struct bpf_offload_netdev *ondev;
 
-	/* Make sure BPF_PROG_GET_NEXT_ID can't find this dead program */
-	bpf_prog_free_id(prog, true);
+	if (!dev)
+		return;
 
-	list_del_init(&offload->offloads);
-	kfree(offload);
-	prog->aux->offload = NULL;
+	ondev = bpf_offload_find_netdev(dev);
+	if (ondev && !ondev->offdev && list_empty(&ondev->progs))
+		__bpf_offload_dev_netdev_unregister(NULL, dev);
 }
 
-void bpf_prog_offload_destroy(struct bpf_prog *prog)
+void bpf_prog_dev_bound_destroy(struct bpf_prog *prog)
 {
+	rtnl_lock();
 	down_write(&bpf_devs_lock);
-	if (prog->aux->offload)
-		__bpf_prog_offload_destroy(prog);
+	if (prog->aux->offload) {
+		bpf_dev_bound_try_remove_netdev(prog->aux->offload->netdev);
+		__bpf_prog_dev_bound_destroy(prog);
+	}
 	up_write(&bpf_devs_lock);
+	rtnl_unlock();
 }
 
 static int bpf_prog_offload_translate(struct bpf_prog *prog)
@@ -343,22 +469,6 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
 const struct bpf_prog_ops bpf_offload_prog_ops = {
 };
 
-static int bpf_map_offload_ndo(struct bpf_offloaded_map *offmap,
-			       enum bpf_netdev_command cmd)
-{
-	struct netdev_bpf data = {};
-	struct net_device *netdev;
-
-	ASSERT_RTNL();
-
-	data.command = cmd;
-	data.offmap = offmap;
-	/* Caller must make sure netdev is valid */
-	netdev = offmap->netdev;
-
-	return netdev->netdev_ops->ndo_bpf(netdev, &data);
-}
-
 struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 {
 	struct net *net = current->nsproxy->net_ns;
@@ -408,15 +518,6 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 	return ERR_PTR(err);
 }
 
-static void __bpf_map_offload_destroy(struct bpf_offloaded_map *offmap)
-{
-	WARN_ON(bpf_map_offload_ndo(offmap, BPF_OFFLOAD_MAP_FREE));
-	/* Make sure BPF_MAP_GET_NEXT_ID can't find this dead map */
-	bpf_map_free_id(&offmap->map, true);
-	list_del_init(&offmap->offloads);
-	offmap->netdev = NULL;
-}
-
 void bpf_map_offload_map_free(struct bpf_map *map)
 {
 	struct bpf_offloaded_map *offmap = map_to_offmap(map);
@@ -549,7 +650,7 @@ static bool __bpf_offload_dev_match(struct bpf_prog *prog,
 	struct bpf_offload_netdev *ondev1, *ondev2;
 	struct bpf_prog_offload *offload;
 
-	if (!bpf_prog_is_offloaded(prog->aux))
+	if (!bpf_prog_is_dev_bound(prog->aux))
 		return false;
 
 	offload = prog->aux->offload;
@@ -595,32 +696,11 @@ bool bpf_offload_prog_map_match(struct bpf_prog *prog, struct bpf_map *map)
 int bpf_offload_dev_netdev_register(struct bpf_offload_dev *offdev,
 				    struct net_device *netdev)
 {
-	struct bpf_offload_netdev *ondev;
 	int err;
 
-	ondev = kzalloc(sizeof(*ondev), GFP_KERNEL);
-	if (!ondev)
-		return -ENOMEM;
-
-	ondev->netdev = netdev;
-	ondev->offdev = offdev;
-	INIT_LIST_HEAD(&ondev->progs);
-	INIT_LIST_HEAD(&ondev->maps);
-
 	down_write(&bpf_devs_lock);
-	err = rhashtable_insert_fast(&offdevs, &ondev->l, offdevs_params);
-	if (err) {
-		netdev_warn(netdev, "failed to register for BPF offload\n");
-		goto err_unlock_free;
-	}
-
-	list_add(&ondev->offdev_netdevs, &offdev->netdevs);
-	up_write(&bpf_devs_lock);
-	return 0;
-
-err_unlock_free:
+	err = __bpf_offload_dev_netdev_register(offdev, netdev);
 	up_write(&bpf_devs_lock);
-	kfree(ondev);
 	return err;
 }
 EXPORT_SYMBOL_GPL(bpf_offload_dev_netdev_register);
@@ -628,43 +708,8 @@ EXPORT_SYMBOL_GPL(bpf_offload_dev_netdev_register);
 void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
 				       struct net_device *netdev)
 {
-	struct bpf_offload_netdev *ondev, *altdev;
-	struct bpf_offloaded_map *offmap, *mtmp;
-	struct bpf_prog_offload *offload, *ptmp;
-
-	ASSERT_RTNL();
-
 	down_write(&bpf_devs_lock);
-	ondev = rhashtable_lookup_fast(&offdevs, &netdev, offdevs_params);
-	if (WARN_ON(!ondev))
-		goto unlock;
-
-	WARN_ON(rhashtable_remove_fast(&offdevs, &ondev->l, offdevs_params));
-	list_del(&ondev->offdev_netdevs);
-
-	/* Try to move the objects to another netdev of the device */
-	altdev = list_first_entry_or_null(&offdev->netdevs,
-					  struct bpf_offload_netdev,
-					  offdev_netdevs);
-	if (altdev) {
-		list_for_each_entry(offload, &ondev->progs, offloads)
-			offload->netdev = altdev->netdev;
-		list_splice_init(&ondev->progs, &altdev->progs);
-
-		list_for_each_entry(offmap, &ondev->maps, offloads)
-			offmap->netdev = altdev->netdev;
-		list_splice_init(&ondev->maps, &altdev->maps);
-	} else {
-		list_for_each_entry_safe(offload, ptmp, &ondev->progs, offloads)
-			__bpf_prog_offload_destroy(offload->prog);
-		list_for_each_entry_safe(offmap, mtmp, &ondev->maps, offloads)
-			__bpf_map_offload_destroy(offmap);
-	}
-
-	WARN_ON(!list_empty(&ondev->progs));
-	WARN_ON(!list_empty(&ondev->maps));
-	kfree(ondev);
-unlock:
+	__bpf_offload_dev_netdev_unregister(offdev, netdev);
 	up_write(&bpf_devs_lock);
 }
 EXPORT_SYMBOL_GPL(bpf_offload_dev_netdev_unregister);
@@ -673,18 +718,6 @@ struct bpf_offload_dev *
 bpf_offload_dev_create(const struct bpf_prog_offload_ops *ops, void *priv)
 {
 	struct bpf_offload_dev *offdev;
-	int err;
-
-	down_write(&bpf_devs_lock);
-	if (!offdevs_inited) {
-		err = rhashtable_init(&offdevs, &offdevs_params);
-		if (err) {
-			up_write(&bpf_devs_lock);
-			return ERR_PTR(err);
-		}
-		offdevs_inited = true;
-	}
-	up_write(&bpf_devs_lock);
 
 	offdev = kzalloc(sizeof(*offdev), GFP_KERNEL);
 	if (!offdev)
@@ -710,3 +743,29 @@ void *bpf_offload_dev_priv(struct bpf_offload_dev *offdev)
 	return offdev->priv;
 }
 EXPORT_SYMBOL_GPL(bpf_offload_dev_priv);
+
+void bpf_dev_bound_netdev_unregister(struct net_device *dev)
+{
+	struct bpf_offload_netdev *ondev;
+
+	ASSERT_RTNL();
+
+	down_write(&bpf_devs_lock);
+	ondev = bpf_offload_find_netdev(dev);
+	if (ondev && !ondev->offdev)
+		__bpf_offload_dev_netdev_unregister(NULL, ondev->netdev);
+	up_write(&bpf_devs_lock);
+}
+
+static int __init bpf_offload_init(void)
+{
+	int err;
+
+	down_write(&bpf_devs_lock);
+	err = rhashtable_init(&offdevs, &offdevs_params);
+	up_write(&bpf_devs_lock);
+
+	return err;
+}
+
+late_initcall(bpf_offload_init);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 13bc96035116..11c558be4992 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2491,7 +2491,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
 				 BPF_F_TEST_STATE_FREQ |
 				 BPF_F_SLEEPABLE |
 				 BPF_F_TEST_RND_HI32 |
-				 BPF_F_XDP_HAS_FRAGS))
+				 BPF_F_XDP_HAS_FRAGS |
+				 BPF_F_XDP_DEV_BOUND_ONLY))
 		return -EINVAL;
 
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
@@ -2575,7 +2576,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
 	prog->aux->attach_btf = attach_btf;
 	prog->aux->attach_btf_id = attr->attach_btf_id;
 	prog->aux->dst_prog = dst_prog;
-	prog->aux->offload_requested = !!attr->prog_ifindex;
+	prog->aux->dev_bound = !!attr->prog_ifindex;
 	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
 	prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
 
@@ -2598,8 +2599,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
 	atomic64_set(&prog->aux->refcnt, 1);
 	prog->gpl_compatible = is_gpl ? 1 : 0;
 
-	if (bpf_prog_is_offloaded(prog->aux)) {
-		err = bpf_prog_offload_init(prog, attr);
+	if (bpf_prog_is_dev_bound(prog->aux)) {
+		err = bpf_prog_dev_bound_init(prog, attr);
 		if (err)
 			goto free_prog_sec;
 	}
diff --git a/net/core/dev.c b/net/core/dev.c
index 5d51999cba30..194f8116aad4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9228,6 +9228,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
 			NL_SET_ERR_MSG(extack, "Using offloaded program without HW_MODE flag is not supported");
 			return -EINVAL;
 		}
+		if (bpf_prog_is_dev_bound(new_prog->aux) && !bpf_offload_dev_match(new_prog, dev)) {
+			NL_SET_ERR_MSG(extack, "Program bound to different device");
+			return -EINVAL;
+		}
 		if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
 			NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
 			return -EINVAL;
@@ -10813,6 +10817,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
 		/* Shutdown queueing discipline. */
 		dev_shutdown(dev);
 
+		bpf_dev_bound_netdev_unregister(dev);
 		dev_xdp_uninstall(dev);
 
 		netdev_offload_xstats_disable_all(dev);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 464ca3f01fe7..fa28603a48e7 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1156,6 +1156,11 @@ enum bpf_link_type {
  */
 #define BPF_F_XDP_HAS_FRAGS	(1U << 5)
 
+/* If BPF_F_XDP_DEV_BOUND_ONLY is used in BPF_PROG_LOAD command, the loaded
+ * program becomes device-bound but can access XDP metadata.
+ */
+#define BPF_F_XDP_DEV_BOUND_ONLY	(1U << 6)
+
 /* link_create.kprobe_multi.flags used in LINK_CREATE command for
  * BPF_TRACE_KPROBE_MULTI attach type to create return probe.
  */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 04/15] selftests/bpf: Update expected test_offload.py messages
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (2 preceding siblings ...)
  2022-12-13  2:35 ` [PATCH bpf-next v4 03/15] bpf: Introduce device-bound XDP programs Stanislav Fomichev
@ 2022-12-13  2:35 ` Stanislav Fomichev
  2022-12-13  2:35 ` [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs Stanislav Fomichev
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Generic check has a different error message, update the selftest.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/test_offload.py | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py
index 7cb1bc05e5cf..40cba8d368d9 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -1039,7 +1039,7 @@ netns = []
     offload = bpf_pinned("/sys/fs/bpf/offload")
     ret, _, err = sim.set_xdp(offload, "drv", fail=False, include_stderr=True)
     fail(ret == 0, "attached offloaded XDP program to drv")
-    check_extack(err, "Using device-bound program without HW_MODE flag is not supported.", args)
+    check_extack(err, "Using offloaded program without HW_MODE flag is not supported.", args)
     rm("/sys/fs/bpf/offload")
     sim.wait_for_flush()
 
@@ -1088,12 +1088,12 @@ netns = []
     ret, _, err = sim.set_xdp(pinned, "offload",
                               fail=False, include_stderr=True)
     fail(ret == 0, "Pinned program loaded for a different device accepted")
-    check_extack_nsim(err, "program bound to different dev.", args)
+    check_extack(err, "Program bound to different device.", args)
     simdev2.remove()
     ret, _, err = sim.set_xdp(pinned, "offload",
                               fail=False, include_stderr=True)
     fail(ret == 0, "Pinned program loaded for a removed device accepted")
-    check_extack_nsim(err, "xdpoffload of non-bound program.", args)
+    check_extack(err, "Program bound to different device.", args)
     rm(pin_file)
     bpftool_prog_list_wait(expected=0)
 
@@ -1334,12 +1334,12 @@ netns = []
     ret, _, err = simA.set_xdp(progB, "offload", force=True, JSON=False,
                                fail=False, include_stderr=True)
     fail(ret == 0, "cross-ASIC program allowed")
-    check_extack_nsim(err, "program bound to different dev.", args)
+    check_extack(err, "Program bound to different device.", args)
     for d in simdevB.nsims:
         ret, _, err = d.set_xdp(progA, "offload", force=True, JSON=False,
                                 fail=False, include_stderr=True)
         fail(ret == 0, "cross-ASIC program allowed")
-        check_extack_nsim(err, "program bound to different dev.", args)
+        check_extack(err, "Program bound to different device.", args)
 
     start_test("Test multi-dev ASIC cross-dev map reuse...")
 
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (3 preceding siblings ...)
  2022-12-13  2:35 ` [PATCH bpf-next v4 04/15] selftests/bpf: Update expected test_offload.py messages Stanislav Fomichev
@ 2022-12-13  2:35 ` Stanislav Fomichev
  2022-12-13 17:00   ` David Vernet
                     ` (2 more replies)
  2022-12-13  2:35 ` [PATCH bpf-next v4 06/15] bpf: Support consuming XDP HW metadata from fext programs Stanislav Fomichev
                   ` (9 subsequent siblings)
  14 siblings, 3 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible
XDP metatada kfuncs. Not all devices have to implement them. If kfunc is not
supported by the target device, the default implementation is called instead.
The verifier, at load time, replaces a call to the generic kfunc with a call
to the per-device one. Per-device kfunc pointers are stored in separate
struct xdp_metadata_ops.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h       |  2 ++
 include/linux/netdevice.h |  7 +++++++
 include/net/xdp.h         | 25 ++++++++++++++++++++++
 kernel/bpf/core.c         |  7 +++++++
 kernel/bpf/offload.c      | 23 ++++++++++++++++++++
 kernel/bpf/verifier.c     | 29 +++++++++++++++++++++++++-
 net/core/xdp.c            | 44 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ca22e8b8bd82..de6279725f41 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2477,6 +2477,8 @@ void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
 				       struct net_device *netdev);
 bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev);
 
+void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id);
+
 void unpriv_ebpf_notify(int new_state);
 
 #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5aa35c58c342..63786091c60d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -74,6 +74,7 @@ struct udp_tunnel_nic_info;
 struct udp_tunnel_nic;
 struct bpf_prog;
 struct xdp_buff;
+struct xdp_md;
 
 void synchronize_net(void);
 void netdev_set_default_ethtool_ops(struct net_device *dev,
@@ -1613,6 +1614,11 @@ struct net_device_ops {
 						  bool cycles);
 };
 
+struct xdp_metadata_ops {
+	int	(*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
+	int	(*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash);
+};
+
 /**
  * enum netdev_priv_flags - &struct net_device priv_flags
  *
@@ -2044,6 +2050,7 @@ struct net_device {
 	unsigned int		flags;
 	unsigned long long	priv_flags;
 	const struct net_device_ops *netdev_ops;
+	const struct xdp_metadata_ops *xdp_metadata_ops;
 	int			ifindex;
 	unsigned short		gflags;
 	unsigned short		hard_header_len;
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 55dbc68bfffc..152c3a9c1127 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -409,4 +409,29 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
 
 #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE
 
+#define XDP_METADATA_KFUNC_xxx	\
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP, \
+			   bpf_xdp_metadata_rx_timestamp) \
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_HASH, \
+			   bpf_xdp_metadata_rx_hash) \
+
+enum {
+#define XDP_METADATA_KFUNC(name, str) name,
+XDP_METADATA_KFUNC_xxx
+#undef XDP_METADATA_KFUNC
+MAX_XDP_METADATA_KFUNC,
+};
+
+struct xdp_md;
+int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp);
+int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash);
+
+#ifdef CONFIG_NET
+u32 xdp_metadata_kfunc_id(int id);
+bool xdp_is_metadata_kfunc_id(u32 btf_id);
+#else
+static inline u32 xdp_metadata_kfunc_id(int id) { return 0; }
+static inline bool xdp_is_metadata_kfunc_id(u32 btf_id) { return false; }
+#endif
+
 #endif /* __LINUX_NET_XDP_H__ */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index d434a994ee04..c3e501e3e39c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2097,6 +2097,13 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
 	if (fp->kprobe_override)
 		return false;
 
+	/* When tail-calling from a non-dev-bound program to a dev-bound one,
+	 * XDP metadata helpers should be disabled. Until it's implemented,
+	 * prohibit adding dev-bound programs to tail-call maps.
+	 */
+	if (bpf_prog_is_dev_bound(fp->aux))
+		return false;
+
 	spin_lock(&map->owner.lock);
 	if (!map->owner.type) {
 		/* There's no owner yet where we could check for
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index f714c941f8ea..3b6c9023f24d 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -757,6 +757,29 @@ void bpf_dev_bound_netdev_unregister(struct net_device *dev)
 	up_write(&bpf_devs_lock);
 }
 
+void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
+{
+	const struct xdp_metadata_ops *ops;
+	void *p = NULL;
+
+	down_read(&bpf_devs_lock);
+	if (!prog->aux->offload || !prog->aux->offload->netdev)
+		goto out;
+
+	ops = prog->aux->offload->netdev->xdp_metadata_ops;
+	if (!ops)
+		goto out;
+
+	if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
+		p = ops->xmo_rx_timestamp;
+	else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
+		p = ops->xmo_rx_hash;
+out:
+	up_read(&bpf_devs_lock);
+
+	return p;
+}
+
 static int __init bpf_offload_init(void)
 {
 	int err;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 203d8cfeda70..e61fe0472b9b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15479,12 +15479,35 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    struct bpf_insn *insn_buf, int insn_idx, int *cnt)
 {
 	const struct bpf_kfunc_desc *desc;
+	void *xdp_kfunc;
 
 	if (!insn->imm) {
 		verbose(env, "invalid kernel function call not eliminated in verifier pass\n");
 		return -EINVAL;
 	}
 
+	*cnt = 0;
+
+	if (xdp_is_metadata_kfunc_id(insn->imm)) {
+		if (!bpf_prog_is_dev_bound(env->prog->aux)) {
+			verbose(env, "metadata kfuncs require device-bound program\n");
+			return -EINVAL;
+		}
+
+		if (bpf_prog_is_offloaded(env->prog->aux)) {
+			verbose(env, "metadata kfuncs can't be offloaded\n");
+			return -EINVAL;
+		}
+
+		xdp_kfunc = bpf_dev_bound_resolve_kfunc(env->prog, insn->imm);
+		if (xdp_kfunc) {
+			insn->imm = BPF_CALL_IMM(xdp_kfunc);
+			return 0;
+		}
+
+		/* fallback to default kfunc when not supported by netdev */
+	}
+
 	/* insn->imm has the btf func_id. Replace it with
 	 * an address (relative to __bpf_call_base).
 	 */
@@ -15495,7 +15518,6 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		return -EFAULT;
 	}
 
-	*cnt = 0;
 	insn->imm = desc->imm;
 	if (insn->off)
 		return 0;
@@ -16502,6 +16524,11 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 	if (tgt_prog) {
 		struct bpf_prog_aux *aux = tgt_prog->aux;
 
+		if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
+			bpf_log(log, "Replacing device-bound programs not supported\n");
+			return -EINVAL;
+		}
+
 		for (i = 0; i < aux->func_info_cnt; i++)
 			if (aux->func_info[i].type_id == btf_id) {
 				subprog = i;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 844c9d99dc0e..b0d4080249d7 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -4,6 +4,7 @@
  * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.
  */
 #include <linux/bpf.h>
+#include <linux/btf_ids.h>
 #include <linux/filter.h>
 #include <linux/types.h>
 #include <linux/mm.h>
@@ -709,3 +710,46 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
 
 	return nxdpf;
 }
+
+noinline int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
+{
+	return -EOPNOTSUPP;
+}
+
+noinline int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash)
+{
+	return -EOPNOTSUPP;
+}
+
+BTF_SET8_START(xdp_metadata_kfunc_ids)
+#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, 0)
+XDP_METADATA_KFUNC_xxx
+#undef XDP_METADATA_KFUNC
+BTF_SET8_END(xdp_metadata_kfunc_ids)
+
+static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &xdp_metadata_kfunc_ids,
+};
+
+BTF_ID_LIST(xdp_metadata_kfunc_ids_unsorted)
+#define XDP_METADATA_KFUNC(name, str) BTF_ID(func, str)
+XDP_METADATA_KFUNC_xxx
+#undef XDP_METADATA_KFUNC
+
+u32 xdp_metadata_kfunc_id(int id)
+{
+	/* xdp_metadata_kfunc_ids is sorted and can't be used */
+	return xdp_metadata_kfunc_ids_unsorted[id];
+}
+
+bool xdp_is_metadata_kfunc_id(u32 btf_id)
+{
+	return btf_id_set8_contains(&xdp_metadata_kfunc_ids, btf_id);
+}
+
+static int __init xdp_metadata_init(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
+}
+late_initcall(xdp_metadata_init);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 06/15] bpf: Support consuming XDP HW metadata from fext programs
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (4 preceding siblings ...)
  2022-12-13  2:35 ` [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs Stanislav Fomichev
@ 2022-12-13  2:35 ` Stanislav Fomichev
  2022-12-14  1:45   ` Martin KaFai Lau
  2022-12-13  2:35 ` [PATCH bpf-next v4 07/15] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Toke Høiland-Jørgensen

From: Toke Høiland-Jørgensen <toke@redhat.com>

Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP
programs that consume HW metadata, implement support for propagating the
offload information. The extension program doesn't need to set a flag or
ifindex, it these will just be propagated from the target by the verifier.
We need to create a separate offload object for the extension program,
though, since it can be reattached to a different program later (which
means we can't just inhering the offload information from the target).

An additional check is added on attach that the new target is compatible
with the offload information in the extension prog.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h   |  7 ++++++
 kernel/bpf/offload.c  | 52 ++++++++++++++++++++++++++++---------------
 kernel/bpf/syscall.c  |  8 +++++++
 kernel/bpf/verifier.c | 19 +++++++++++-----
 4 files changed, 62 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index de6279725f41..ed288d18bf8d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2482,6 +2482,7 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id);
 void unpriv_ebpf_notify(int new_state);
 
 #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
+int __bpf_prog_dev_bound_init(struct bpf_prog *prog, struct net_device *netdev);
 int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr);
 void bpf_dev_bound_netdev_unregister(struct net_device *dev);
 
@@ -2516,6 +2517,12 @@ void sock_map_unhash(struct sock *sk);
 void sock_map_destroy(struct sock *sk);
 void sock_map_close(struct sock *sk, long timeout);
 #else
+static inline int __bpf_prog_dev_bound_init(struct bpf_prog *prog,
+					    struct net_device *netdev)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int bpf_prog_dev_bound_init(struct bpf_prog *prog,
 					union bpf_attr *attr)
 {
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 3b6c9023f24d..0646dea1b0e1 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -188,17 +188,13 @@ static void __bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
 	kfree(ondev);
 }
 
-int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr)
+int __bpf_prog_dev_bound_init(struct bpf_prog *prog, struct net_device *netdev)
 {
 	struct bpf_offload_netdev *ondev;
 	struct bpf_prog_offload *offload;
 	int err;
 
-	if (attr->prog_type != BPF_PROG_TYPE_SCHED_CLS &&
-	    attr->prog_type != BPF_PROG_TYPE_XDP)
-		return -EINVAL;
-
-	if (attr->prog_flags & ~BPF_F_XDP_DEV_BOUND_ONLY)
+	if (!netdev)
 		return -EINVAL;
 
 	offload = kzalloc(sizeof(*offload), GFP_USER);
@@ -206,14 +202,7 @@ int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr)
 		return -ENOMEM;
 
 	offload->prog = prog;
-
-	offload->netdev = dev_get_by_index(current->nsproxy->net_ns,
-					   attr->prog_ifindex);
-	err = bpf_dev_offload_check(offload->netdev);
-	if (err)
-		goto err_maybe_put;
-
-	prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_DEV_BOUND_ONLY);
+	offload->netdev = netdev;
 
 	down_write(&bpf_devs_lock);
 	ondev = bpf_offload_find_netdev(offload->netdev);
@@ -236,19 +225,46 @@ int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr)
 	offload->offdev = ondev->offdev;
 	prog->aux->offload = offload;
 	list_add_tail(&offload->offloads, &ondev->progs);
-	dev_put(offload->netdev);
 	up_write(&bpf_devs_lock);
 
 	return 0;
 err_unlock:
 	up_write(&bpf_devs_lock);
-err_maybe_put:
-	if (offload->netdev)
-		dev_put(offload->netdev);
 	kfree(offload);
 	return err;
 }
 
+int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr)
+{
+	struct net_device *netdev;
+	int err;
+
+	if (attr->prog_type != BPF_PROG_TYPE_SCHED_CLS &&
+	    attr->prog_type != BPF_PROG_TYPE_XDP)
+		return -EINVAL;
+
+	if (attr->prog_flags & ~BPF_F_XDP_DEV_BOUND_ONLY)
+		return -EINVAL;
+
+	netdev = dev_get_by_index(current->nsproxy->net_ns, attr->prog_ifindex);
+	if (!netdev)
+		return -EINVAL;
+
+	err = bpf_dev_offload_check(netdev);
+	if (err)
+		goto out;
+
+	prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_DEV_BOUND_ONLY);
+
+	err = __bpf_prog_dev_bound_init(prog, netdev);
+	if (err)
+		goto out;
+
+out:
+	dev_put(netdev);
+	return err;
+}
+
 int bpf_prog_offload_verifier_prep(struct bpf_prog *prog)
 {
 	struct bpf_prog_offload *offload;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 11c558be4992..8686475f0dbe 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3021,6 +3021,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 			goto out_put_prog;
 		}
 
+		if (bpf_prog_is_dev_bound(prog->aux) &&
+		    (bpf_prog_is_offloaded(tgt_prog->aux) ||
+		     !bpf_prog_is_dev_bound(tgt_prog->aux) ||
+		     !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) {
+			err = -EINVAL;
+			goto out_put_prog;
+		}
+
 		key = bpf_trampoline_compute_key(tgt_prog, NULL, btf_id);
 	}
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e61fe0472b9b..5c6d6d61e57a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16524,11 +16524,6 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 	if (tgt_prog) {
 		struct bpf_prog_aux *aux = tgt_prog->aux;
 
-		if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
-			bpf_log(log, "Replacing device-bound programs not supported\n");
-			return -EINVAL;
-		}
-
 		for (i = 0; i < aux->func_info_cnt; i++)
 			if (aux->func_info[i].type_id == btf_id) {
 				subprog = i;
@@ -16789,10 +16784,22 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	if (tgt_prog && prog->type == BPF_PROG_TYPE_EXT) {
 		/* to make freplace equivalent to their targets, they need to
 		 * inherit env->ops and expected_attach_type for the rest of the
-		 * verification
+		 * verification; we also need to propagate the prog offload data
+		 * for resolving kfuncs.
 		 */
 		env->ops = bpf_verifier_ops[tgt_prog->type];
 		prog->expected_attach_type = tgt_prog->expected_attach_type;
+
+		if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
+			if (bpf_prog_is_offloaded(tgt_prog->aux))
+				return -EINVAL;
+
+			prog->aux->dev_bound = true;
+			ret = __bpf_prog_dev_bound_init(prog,
+							tgt_prog->aux->offload->netdev);
+			if (ret)
+				return ret;
+		}
 	}
 
 	/* store info about the attachment target that will be used later */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 07/15] veth: Introduce veth_xdp_buff wrapper for xdp_buff
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (5 preceding siblings ...)
  2022-12-13  2:35 ` [PATCH bpf-next v4 06/15] bpf: Support consuming XDP HW metadata from fext programs Stanislav Fomichev
@ 2022-12-13  2:35 ` Stanislav Fomichev
  2022-12-13  2:35 ` [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata Stanislav Fomichev
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

No functional changes. Boilerplate to allow stuffing more data after xdp_buff.

Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/veth.c | 56 +++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 25 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index ac7c0653695f..04ffd8cb2945 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -116,6 +116,10 @@ static struct {
 	{ "peer_ifindex" },
 };
 
+struct veth_xdp_buff {
+	struct xdp_buff xdp;
+};
+
 static int veth_get_link_ksettings(struct net_device *dev,
 				   struct ethtool_link_ksettings *cmd)
 {
@@ -592,23 +596,24 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (likely(xdp_prog)) {
-		struct xdp_buff xdp;
+		struct veth_xdp_buff vxbuf;
+		struct xdp_buff *xdp = &vxbuf.xdp;
 		u32 act;
 
-		xdp_convert_frame_to_buff(frame, &xdp);
-		xdp.rxq = &rq->xdp_rxq;
+		xdp_convert_frame_to_buff(frame, xdp);
+		xdp->rxq = &rq->xdp_rxq;
 
-		act = bpf_prog_run_xdp(xdp_prog, &xdp);
+		act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 		switch (act) {
 		case XDP_PASS:
-			if (xdp_update_frame_from_buff(&xdp, frame))
+			if (xdp_update_frame_from_buff(xdp, frame))
 				goto err_xdp;
 			break;
 		case XDP_TX:
 			orig_frame = *frame;
-			xdp.rxq->mem = frame->mem;
-			if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) {
+			xdp->rxq->mem = frame->mem;
+			if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
 				trace_xdp_exception(rq->dev, xdp_prog, act);
 				frame = &orig_frame;
 				stats->rx_drops++;
@@ -619,8 +624,8 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
 			goto xdp_xmit;
 		case XDP_REDIRECT:
 			orig_frame = *frame;
-			xdp.rxq->mem = frame->mem;
-			if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
+			xdp->rxq->mem = frame->mem;
+			if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
 				frame = &orig_frame;
 				stats->rx_drops++;
 				goto err_xdp;
@@ -801,7 +806,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 {
 	void *orig_data, *orig_data_end;
 	struct bpf_prog *xdp_prog;
-	struct xdp_buff xdp;
+	struct veth_xdp_buff vxbuf;
+	struct xdp_buff *xdp = &vxbuf.xdp;
 	u32 act, metalen;
 	int off;
 
@@ -815,22 +821,22 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	}
 
 	__skb_push(skb, skb->data - skb_mac_header(skb));
-	if (veth_convert_skb_to_xdp_buff(rq, &xdp, &skb))
+	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
 		goto drop;
 
-	orig_data = xdp.data;
-	orig_data_end = xdp.data_end;
+	orig_data = xdp->data;
+	orig_data_end = xdp->data_end;
 
-	act = bpf_prog_run_xdp(xdp_prog, &xdp);
+	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 	switch (act) {
 	case XDP_PASS:
 		break;
 	case XDP_TX:
-		veth_xdp_get(&xdp);
+		veth_xdp_get(xdp);
 		consume_skb(skb);
-		xdp.rxq->mem = rq->xdp_mem;
-		if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) {
+		xdp->rxq->mem = rq->xdp_mem;
+		if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
 			trace_xdp_exception(rq->dev, xdp_prog, act);
 			stats->rx_drops++;
 			goto err_xdp;
@@ -839,10 +845,10 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 		rcu_read_unlock();
 		goto xdp_xmit;
 	case XDP_REDIRECT:
-		veth_xdp_get(&xdp);
+		veth_xdp_get(xdp);
 		consume_skb(skb);
-		xdp.rxq->mem = rq->xdp_mem;
-		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
+		xdp->rxq->mem = rq->xdp_mem;
+		if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
 			stats->rx_drops++;
 			goto err_xdp;
 		}
@@ -862,7 +868,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	/* check if bpf_xdp_adjust_head was used */
-	off = orig_data - xdp.data;
+	off = orig_data - xdp->data;
 	if (off > 0)
 		__skb_push(skb, off);
 	else if (off < 0)
@@ -871,21 +877,21 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	skb_reset_mac_header(skb);
 
 	/* check if bpf_xdp_adjust_tail was used */
-	off = xdp.data_end - orig_data_end;
+	off = xdp->data_end - orig_data_end;
 	if (off != 0)
 		__skb_put(skb, off); /* positive on grow, negative on shrink */
 
 	/* XDP frag metadata (e.g. nr_frags) are updated in eBPF helpers
 	 * (e.g. bpf_xdp_adjust_tail), we need to update data_len here.
 	 */
-	if (xdp_buff_has_frags(&xdp))
+	if (xdp_buff_has_frags(xdp))
 		skb->data_len = skb_shinfo(skb)->xdp_frags_size;
 	else
 		skb->data_len = 0;
 
 	skb->protocol = eth_type_trans(skb, rq->dev);
 
-	metalen = xdp.data - xdp.data_meta;
+	metalen = xdp->data - xdp->data_meta;
 	if (metalen)
 		skb_metadata_set(skb, metalen);
 out:
@@ -898,7 +904,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	return NULL;
 err_xdp:
 	rcu_read_unlock();
-	xdp_return_buff(&xdp);
+	xdp_return_buff(xdp);
 xdp_xmit:
 	return NULL;
 }
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (6 preceding siblings ...)
  2022-12-13  2:35 ` [PATCH bpf-next v4 07/15] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
@ 2022-12-13  2:35 ` Stanislav Fomichev
  2022-12-13 15:55   ` Jesper Dangaard Brouer
  2022-12-13  2:35 ` [PATCH bpf-next v4 09/15] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

The goal is to enable end-to-end testing of the metadata for AF_XDP.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/veth.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 04ffd8cb2945..d5491e7a2798 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -118,6 +118,7 @@ static struct {
 
 struct veth_xdp_buff {
 	struct xdp_buff xdp;
+	struct sk_buff *skb;
 };
 
 static int veth_get_link_ksettings(struct net_device *dev,
@@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
 
 		xdp_convert_frame_to_buff(frame, xdp);
 		xdp->rxq = &rq->xdp_rxq;
+		vxbuf.skb = NULL;
 
 		act = bpf_prog_run_xdp(xdp_prog, xdp);
 
@@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	__skb_push(skb, skb->data - skb_mac_header(skb));
 	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
 		goto drop;
+	vxbuf.skb = skb;
 
 	orig_data = xdp->data;
 	orig_data_end = xdp->data_end;
@@ -1601,6 +1604,21 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
+static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
+{
+	*timestamp = ktime_get_mono_fast_ns();
+	return 0;
+}
+
+static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
+{
+	struct veth_xdp_buff *_ctx = (void *)ctx;
+
+	if (_ctx->skb)
+		*hash = skb_get_hash(_ctx->skb);
+	return 0;
+}
+
 static const struct net_device_ops veth_netdev_ops = {
 	.ndo_init            = veth_dev_init,
 	.ndo_open            = veth_open,
@@ -1622,6 +1640,11 @@ static const struct net_device_ops veth_netdev_ops = {
 	.ndo_get_peer_dev	= veth_peer_dev,
 };
 
+static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
+	.xmo_rx_timestamp		= veth_xdp_rx_timestamp,
+	.xmo_rx_hash			= veth_xdp_rx_hash,
+};
+
 #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \
 		       NETIF_F_RXCSUM | NETIF_F_SCTP_CRC | NETIF_F_HIGHDMA | \
 		       NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ENCAP_ALL | \
@@ -1638,6 +1661,7 @@ static void veth_setup(struct net_device *dev)
 	dev->priv_flags |= IFF_PHONY_HEADROOM;
 
 	dev->netdev_ops = &veth_netdev_ops;
+	dev->xdp_metadata_ops = &veth_xdp_metadata_ops;
 	dev->ethtool_ops = &veth_ethtool_ops;
 	dev->features |= NETIF_F_LLTX;
 	dev->features |= VETH_FEATURES;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 09/15] selftests/bpf: Verify xdp_metadata xdp->af_xdp path
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (7 preceding siblings ...)
  2022-12-13  2:35 ` [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata Stanislav Fomichev
@ 2022-12-13  2:35 ` Stanislav Fomichev
  2022-12-13  2:36 ` [PATCH bpf-next v4 10/15] net/mlx4_en: Introduce wrapper for xdp_buff Stanislav Fomichev
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:35 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

- create new netns
- create veth pair (veTX+veRX)
- setup AF_XDP socket for both interfaces
- attach bpf to veRX
- send packet via veTX
- verify the packet has expected metadata at veRX

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/Makefile          |   2 +-
 .../selftests/bpf/prog_tests/xdp_metadata.c   | 412 ++++++++++++++++++
 .../selftests/bpf/progs/xdp_metadata.c        |  56 +++
 .../selftests/bpf/progs/xdp_metadata2.c       |  23 +
 tools/testing/selftests/bpf/xdp_metadata.h    |  15 +
 5 files changed, 507 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata2.c
 create mode 100644 tools/testing/selftests/bpf/xdp_metadata.h

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index c22c43bbee19..e6cbc04a7920 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -527,7 +527,7 @@ TRUNNER_BPF_PROGS_DIR := progs
 TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c	\
 			 network_helpers.c testing_helpers.c		\
 			 btf_helpers.c flow_dissector_load.h		\
-			 cap_helpers.c test_loader.c
+			 cap_helpers.c test_loader.c xsk.c
 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
 		       $(OUTPUT)/liburandom_read.so			\
 		       $(OUTPUT)/xdp_synproxy				\
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
new file mode 100644
index 000000000000..55f4c8cfba66
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
@@ -0,0 +1,412 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include "xdp_metadata.skel.h"
+#include "xdp_metadata2.skel.h"
+#include "xdp_metadata.h"
+#include "xsk.h"
+
+#include <bpf/btf.h>
+#include <linux/errqueue.h>
+#include <linux/if_link.h>
+#include <linux/net_tstamp.h>
+#include <linux/udp.h>
+#include <sys/mman.h>
+#include <net/if.h>
+#include <poll.h>
+
+#define TX_NAME "veTX"
+#define RX_NAME "veRX"
+
+#define UDP_PAYLOAD_BYTES 4
+
+#define AF_XDP_SOURCE_PORT 1234
+#define AF_XDP_CONSUMER_PORT 8080
+
+#define UMEM_NUM 16
+#define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
+#define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM)
+#define XDP_FLAGS XDP_FLAGS_DRV_MODE
+#define QUEUE_ID 0
+
+#define TX_ADDR "10.0.0.1"
+#define RX_ADDR "10.0.0.2"
+#define PREFIX_LEN "8"
+#define FAMILY AF_INET
+
+#define SYS(cmd) ({ \
+	if (!ASSERT_OK(system(cmd), (cmd))) \
+		goto out; \
+})
+
+struct xsk {
+	void *umem_area;
+	struct xsk_umem *umem;
+	struct xsk_ring_prod fill;
+	struct xsk_ring_cons comp;
+	struct xsk_ring_prod tx;
+	struct xsk_ring_cons rx;
+	struct xsk_socket *socket;
+};
+
+static int open_xsk(const char *ifname, struct xsk *xsk)
+{
+	int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE;
+	const struct xsk_socket_config socket_config = {
+		.rx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD,
+		.xdp_flags = XDP_FLAGS,
+		.bind_flags = XDP_COPY,
+	};
+	const struct xsk_umem_config umem_config = {
+		.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
+		.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE,
+		.flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG,
+	};
+	__u32 idx;
+	u64 addr;
+	int ret;
+	int i;
+
+	xsk->umem_area = mmap(NULL, UMEM_SIZE, PROT_READ | PROT_WRITE, mmap_flags, -1, 0);
+	if (!ASSERT_NEQ(xsk->umem_area, MAP_FAILED, "mmap"))
+		return -1;
+
+	ret = xsk_umem__create(&xsk->umem,
+			       xsk->umem_area, UMEM_SIZE,
+			       &xsk->fill,
+			       &xsk->comp,
+			       &umem_config);
+	if (!ASSERT_OK(ret, "xsk_umem__create"))
+		return ret;
+
+	ret = xsk_socket__create(&xsk->socket, ifname, QUEUE_ID,
+				 xsk->umem,
+				 &xsk->rx,
+				 &xsk->tx,
+				 &socket_config);
+	if (!ASSERT_OK(ret, "xsk_socket__create"))
+		return ret;
+
+	/* First half of umem is for TX. This way address matches 1-to-1
+	 * to the completion queue index.
+	 */
+
+	for (i = 0; i < UMEM_NUM / 2; i++) {
+		addr = i * UMEM_FRAME_SIZE;
+		printf("%p: tx_desc[%d] -> %lx\n", xsk, i, addr);
+	}
+
+	/* Second half of umem is for RX. */
+
+	ret = xsk_ring_prod__reserve(&xsk->fill, UMEM_NUM / 2, &idx);
+	if (!ASSERT_EQ(UMEM_NUM / 2, ret, "xsk_ring_prod__reserve"))
+		return ret;
+	if (!ASSERT_EQ(idx, 0, "fill idx != 0"))
+		return -1;
+
+	for (i = 0; i < UMEM_NUM / 2; i++) {
+		addr = (UMEM_NUM / 2 + i) * UMEM_FRAME_SIZE;
+		printf("%p: rx_desc[%d] -> %lx\n", xsk, i, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, i) = addr;
+	}
+	xsk_ring_prod__submit(&xsk->fill, ret);
+
+	return 0;
+}
+
+static void close_xsk(struct xsk *xsk)
+{
+	if (xsk->umem)
+		xsk_umem__delete(xsk->umem);
+	if (xsk->socket)
+		xsk_socket__delete(xsk->socket);
+	munmap(xsk->umem, UMEM_SIZE);
+}
+
+static void ip_csum(struct iphdr *iph)
+{
+	__u32 sum = 0;
+	__u16 *p;
+	int i;
+
+	iph->check = 0;
+	p = (void *)iph;
+	for (i = 0; i < sizeof(*iph) / sizeof(*p); i++)
+		sum += p[i];
+
+	while (sum >> 16)
+		sum = (sum & 0xffff) + (sum >> 16);
+
+	iph->check = ~sum;
+}
+
+static int generate_packet(struct xsk *xsk, __u16 dst_port)
+{
+	struct xdp_desc *tx_desc;
+	struct udphdr *udph;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+	void *data;
+	__u32 idx;
+	int ret;
+
+	ret = xsk_ring_prod__reserve(&xsk->tx, 1, &idx);
+	if (!ASSERT_EQ(ret, 1, "xsk_ring_prod__reserve"))
+		return -1;
+
+	tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx);
+	tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE;
+	printf("%p: tx_desc[%u]->addr=%llx\n", xsk, idx, tx_desc->addr);
+	data = xsk_umem__get_data(xsk->umem_area, tx_desc->addr);
+
+	eth = data;
+	iph = (void *)(eth + 1);
+	udph = (void *)(iph + 1);
+
+	memcpy(eth->h_dest, "\x00\x00\x00\x00\x00\x02", ETH_ALEN);
+	memcpy(eth->h_source, "\x00\x00\x00\x00\x00\x01", ETH_ALEN);
+	eth->h_proto = htons(ETH_P_IP);
+
+	iph->version = 0x4;
+	iph->ihl = 0x5;
+	iph->tos = 0x9;
+	iph->tot_len = htons(sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES);
+	iph->id = 0;
+	iph->frag_off = 0;
+	iph->ttl = 0;
+	iph->protocol = IPPROTO_UDP;
+	ASSERT_EQ(inet_pton(FAMILY, TX_ADDR, &iph->saddr), 1, "inet_pton(TX_ADDR)");
+	ASSERT_EQ(inet_pton(FAMILY, RX_ADDR, &iph->daddr), 1, "inet_pton(RX_ADDR)");
+	ip_csum(iph);
+
+	udph->source = htons(AF_XDP_SOURCE_PORT);
+	udph->dest = htons(dst_port);
+	udph->len = htons(sizeof(*udph) + UDP_PAYLOAD_BYTES);
+	udph->check = 0;
+
+	memset(udph + 1, 0xAA, UDP_PAYLOAD_BYTES);
+
+	tx_desc->len = sizeof(*eth) + sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES;
+	xsk_ring_prod__submit(&xsk->tx, 1);
+
+	ret = sendto(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, 0);
+	if (!ASSERT_GE(ret, 0, "sendto"))
+		return ret;
+
+	return 0;
+}
+
+static void complete_tx(struct xsk *xsk)
+{
+	__u32 idx;
+	__u64 addr;
+
+	if (ASSERT_EQ(xsk_ring_cons__peek(&xsk->comp, 1, &idx), 1, "xsk_ring_cons__peek")) {
+		addr = *xsk_ring_cons__comp_addr(&xsk->comp, idx);
+
+		printf("%p: refill idx=%u addr=%llx\n", xsk, idx, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr;
+		xsk_ring_prod__submit(&xsk->fill, 1);
+	}
+}
+
+static void refill_rx(struct xsk *xsk, __u64 addr)
+{
+	__u32 idx;
+
+	if (ASSERT_EQ(xsk_ring_prod__reserve(&xsk->fill, 1, &idx), 1, "xsk_ring_prod__reserve")) {
+		printf("%p: complete idx=%u addr=%llx\n", xsk, idx, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr;
+		xsk_ring_prod__submit(&xsk->fill, 1);
+	}
+}
+
+static int verify_xsk_metadata(struct xsk *xsk)
+{
+	const struct xdp_desc *rx_desc;
+	struct pollfd fds = {};
+	struct xdp_meta *meta;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+	__u64 comp_addr;
+	void *data;
+	__u64 addr;
+	__u32 idx;
+	int ret;
+
+	ret = recvfrom(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, NULL);
+	if (!ASSERT_EQ(ret, 0, "recvfrom"))
+		return -1;
+
+	fds.fd = xsk_socket__fd(xsk->socket);
+	fds.events = POLLIN;
+
+	ret = poll(&fds, 1, 1000);
+	if (!ASSERT_GT(ret, 0, "poll"))
+		return -1;
+
+	ret = xsk_ring_cons__peek(&xsk->rx, 1, &idx);
+	if (!ASSERT_EQ(ret, 1, "xsk_ring_cons__peek"))
+		return -2;
+
+	rx_desc = xsk_ring_cons__rx_desc(&xsk->rx, idx);
+	comp_addr = xsk_umem__extract_addr(rx_desc->addr);
+	addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
+	printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
+	       xsk, idx, rx_desc->addr, addr, comp_addr);
+	data = xsk_umem__get_data(xsk->umem_area, addr);
+
+	/* Make sure we got the packet offset correctly. */
+
+	eth = data;
+	ASSERT_EQ(eth->h_proto, htons(ETH_P_IP), "eth->h_proto");
+	iph = (void *)(eth + 1);
+	ASSERT_EQ((int)iph->version, 4, "iph->version");
+
+	/* custom metadata */
+
+	meta = data - sizeof(struct xdp_meta);
+
+	if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp"))
+		return -1;
+
+	if (!ASSERT_NEQ(meta->rx_hash, 0, "rx_hash"))
+		return -1;
+
+	xsk_ring_cons__release(&xsk->rx, 1);
+	refill_rx(xsk, comp_addr);
+
+	return 0;
+}
+
+void test_xdp_metadata(void)
+{
+	struct xdp_metadata2 *bpf_obj2 = NULL;
+	struct xdp_metadata *bpf_obj = NULL;
+	struct bpf_program *new_prog, *prog;
+	struct nstoken *tok = NULL;
+	__u32 queue_id = QUEUE_ID;
+	struct bpf_map *prog_arr;
+	struct xsk tx_xsk = {};
+	struct xsk rx_xsk = {};
+	__u32 val, key = 0;
+	int retries = 10;
+	int rx_ifindex;
+	int sock_fd;
+	int ret;
+
+	/* Setup new networking namespace, with a veth pair. */
+
+	SYS("ip netns add xdp_metadata");
+	tok = open_netns("xdp_metadata");
+	SYS("ip link add numtxqueues 1 numrxqueues 1 " TX_NAME
+	    " type veth peer " RX_NAME " numtxqueues 1 numrxqueues 1");
+	SYS("ip link set dev " TX_NAME " address 00:00:00:00:00:01");
+	SYS("ip link set dev " RX_NAME " address 00:00:00:00:00:02");
+	SYS("ip link set dev " TX_NAME " up");
+	SYS("ip link set dev " RX_NAME " up");
+	SYS("ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME);
+	SYS("ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME);
+
+	rx_ifindex = if_nametoindex(RX_NAME);
+
+	/* Setup separate AF_XDP for TX and RX interfaces. */
+
+	ret = open_xsk(TX_NAME, &tx_xsk);
+	if (!ASSERT_OK(ret, "open_xsk(TX_NAME)"))
+		goto out;
+
+	ret = open_xsk(RX_NAME, &rx_xsk);
+	if (!ASSERT_OK(ret, "open_xsk(RX_NAME)"))
+		goto out;
+
+	bpf_obj = xdp_metadata__open();
+	if (!ASSERT_OK_PTR(bpf_obj, "open skeleton"))
+		goto out;
+
+	prog = bpf_object__find_program_by_name(bpf_obj->obj, "rx");
+	bpf_program__set_ifindex(prog, rx_ifindex);
+	bpf_program__set_flags(prog, BPF_F_XDP_DEV_BOUND_ONLY);
+
+	if (!ASSERT_OK(xdp_metadata__load(bpf_obj), "load skeleton"))
+		goto out;
+
+	/* Make sure we can't add dev-bound programs to prog maps. */
+	prog_arr = bpf_object__find_map_by_name(bpf_obj->obj, "prog_arr");
+	if (!ASSERT_OK_PTR(prog_arr, "no prog_arr map"))
+		goto out;
+
+	val = bpf_program__fd(prog);
+	if (!ASSERT_ERR(bpf_map__update_elem(prog_arr, &key, sizeof(key),
+					     &val, sizeof(val), BPF_ANY),
+			"update prog_arr"))
+		goto out;
+
+	/* Attach BPF program to RX interface. */
+
+	ret = bpf_xdp_attach(rx_ifindex,
+			     bpf_program__fd(bpf_obj->progs.rx),
+			     XDP_FLAGS, NULL);
+	if (!ASSERT_GE(ret, 0, "bpf_xdp_attach"))
+		goto out;
+
+	sock_fd = xsk_socket__fd(rx_xsk.socket);
+	ret = bpf_map_update_elem(bpf_map__fd(bpf_obj->maps.xsk), &queue_id, &sock_fd, 0);
+	if (!ASSERT_GE(ret, 0, "bpf_map_update_elem"))
+		goto out;
+
+	/* Send packet destined to RX AF_XDP socket. */
+	if (!ASSERT_GE(generate_packet(&tx_xsk, AF_XDP_CONSUMER_PORT), 0,
+		       "generate AF_XDP_CONSUMER_PORT"))
+		goto out;
+
+	/* Verify AF_XDP RX packet has proper metadata. */
+	if (!ASSERT_GE(verify_xsk_metadata(&rx_xsk), 0,
+		       "verify_xsk_metadata"))
+		goto out;
+
+	complete_tx(&tx_xsk);
+
+	/* Make sure freplace correctly picks up original bound device
+	 * and doesn't crash.
+	 */
+
+	bpf_obj2 = xdp_metadata2__open();
+	if (!ASSERT_OK_PTR(bpf_obj2, "open skeleton"))
+		goto out;
+
+	new_prog = bpf_object__find_program_by_name(bpf_obj2->obj, "freplace_rx");
+	bpf_program__set_attach_target(new_prog, bpf_program__fd(prog), "rx");
+
+	if (!ASSERT_OK(xdp_metadata2__load(bpf_obj2), "load freplace skeleton"))
+		goto out;
+
+	if (!ASSERT_OK(xdp_metadata2__attach(bpf_obj2), "attach freplace"))
+		goto out;
+
+	/* Send packet to trigger . */
+	if (!ASSERT_GE(generate_packet(&tx_xsk, AF_XDP_CONSUMER_PORT), 0,
+		       "generate freplace packet"))
+		goto out;
+
+	while (!retries--) {
+		if (bpf_obj2->bss->called)
+			break;
+		usleep(10);
+	}
+	ASSERT_GT(bpf_obj2->bss->called, 0, "not called");
+
+out:
+	close_xsk(&rx_xsk);
+	close_xsk(&tx_xsk);
+	if (bpf_obj2)
+		xdp_metadata2__destroy(bpf_obj2);
+	if (bpf_obj)
+		xdp_metadata__destroy(bpf_obj);
+	system("ip netns del xdp_metadata");
+	if (tok)
+		close_netns(tok);
+}
diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c
new file mode 100644
index 000000000000..2f5eb39ea4bc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include "xdp_metadata.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_XSKMAP);
+	__uint(max_entries, 4);
+	__type(key, __u32);
+	__type(value, __u32);
+} xsk SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u32);
+} prog_arr SEC(".maps");
+
+extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx,
+					 __u64 *timestamp) __ksym;
+extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx,
+				    __u32 *hash) __ksym;
+
+SEC("xdp")
+int rx(struct xdp_md *ctx)
+{
+	void *data, *data_meta;
+	struct xdp_meta *meta;
+	int ret;
+
+	/* Reserve enough for all custom metadata. */
+
+	ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
+	if (ret != 0)
+		return XDP_DROP;
+
+	data = (void *)(long)ctx->data;
+	data_meta = (void *)(long)ctx->data_meta;
+
+	if (data_meta + sizeof(struct xdp_meta) > data)
+		return XDP_DROP;
+
+	meta = data_meta;
+
+	/* Export metadata. */
+
+	bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
+	bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash);
+
+	return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata2.c b/tools/testing/selftests/bpf/progs/xdp_metadata2.c
new file mode 100644
index 000000000000..cf69d05451c3
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xdp_metadata2.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include "xdp_metadata.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx,
+				    __u32 *hash) __ksym;
+
+int called;
+
+SEC("freplace/rx")
+int freplace_rx(struct xdp_md *ctx)
+{
+	u32 hash = 0;
+	/* Call _any_ metadata function to make sure we don't crash. */
+	bpf_xdp_metadata_rx_hash(ctx, &hash);
+	called++;
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
new file mode 100644
index 000000000000..f6780fbb0a21
--- /dev/null
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#pragma once
+
+#ifndef ETH_P_IP
+#define ETH_P_IP 0x0800
+#endif
+
+#ifndef ETH_P_IPV6
+#define ETH_P_IPV6 0x86DD
+#endif
+
+struct xdp_meta {
+	__u64 rx_timestamp;
+	__u32 rx_hash;
+};
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 10/15] net/mlx4_en: Introduce wrapper for xdp_buff
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (8 preceding siblings ...)
  2022-12-13  2:35 ` [PATCH bpf-next v4 09/15] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
@ 2022-12-13  2:36 ` Stanislav Fomichev
  2022-12-13  8:56   ` Tariq Toukan
  2022-12-13  2:36 ` [PATCH bpf-next v4 11/15] net/mlx4_en: Support RX XDP metadata Stanislav Fomichev
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:36 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

No functional changes. Boilerplate to allow stuffing more data after xdp_buff.

Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 26 +++++++++++++---------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 8f762fc170b3..014a80af2813 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -661,9 +661,14 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
 #define MLX4_CQE_STATUS_IP_ANY (MLX4_CQE_STATUS_IPV4)
 #endif
 
+struct mlx4_en_xdp_buff {
+	struct xdp_buff xdp;
+};
+
 int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_xdp_buff mxbuf = {};
 	int factor = priv->cqe_factor;
 	struct mlx4_en_rx_ring *ring;
 	struct bpf_prog *xdp_prog;
@@ -671,7 +676,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	bool doorbell_pending;
 	bool xdp_redir_flush;
 	struct mlx4_cqe *cqe;
-	struct xdp_buff xdp;
 	int polled = 0;
 	int index;
 
@@ -681,7 +685,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	ring = priv->rx_ring[cq_ring];
 
 	xdp_prog = rcu_dereference_bh(ring->xdp_prog);
-	xdp_init_buff(&xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
+	xdp_init_buff(&mxbuf.xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
 	doorbell_pending = false;
 	xdp_redir_flush = false;
 
@@ -776,24 +780,24 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 						priv->frag_info[0].frag_size,
 						DMA_FROM_DEVICE);
 
-			xdp_prepare_buff(&xdp, va - frags[0].page_offset,
+			xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
 					 frags[0].page_offset, length, false);
-			orig_data = xdp.data;
+			orig_data = mxbuf.xdp.data;
 
-			act = bpf_prog_run_xdp(xdp_prog, &xdp);
+			act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
 
-			length = xdp.data_end - xdp.data;
-			if (xdp.data != orig_data) {
-				frags[0].page_offset = xdp.data -
-					xdp.data_hard_start;
-				va = xdp.data;
+			length = mxbuf.xdp.data_end - mxbuf.xdp.data;
+			if (mxbuf.xdp.data != orig_data) {
+				frags[0].page_offset = mxbuf.xdp.data -
+					mxbuf.xdp.data_hard_start;
+				va = mxbuf.xdp.data;
 			}
 
 			switch (act) {
 			case XDP_PASS:
 				break;
 			case XDP_REDIRECT:
-				if (likely(!xdp_do_redirect(dev, &xdp, xdp_prog))) {
+				if (likely(!xdp_do_redirect(dev, &mxbuf.xdp, xdp_prog))) {
 					ring->xdp_redirect++;
 					xdp_redir_flush = true;
 					frags[0].page = NULL;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 11/15] net/mlx4_en: Support RX XDP metadata
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (9 preceding siblings ...)
  2022-12-13  2:36 ` [PATCH bpf-next v4 10/15] net/mlx4_en: Introduce wrapper for xdp_buff Stanislav Fomichev
@ 2022-12-13  2:36 ` Stanislav Fomichev
  2022-12-13  8:56   ` Tariq Toukan
  2022-12-13  2:36 ` [PATCH bpf-next v4 12/15] xsk: Add cb area to struct xdp_buff_xsk Stanislav Fomichev
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:36 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

RX timestamp and hash for now. Tested using the prog from the next
patch.

Also enabling xdp metadata support; don't see why it's disabled,
there is enough headroom..

Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_clock.c | 13 +++++---
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  6 ++++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    | 33 ++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h  |  5 +++
 4 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 98b5ffb4d729..9e3b76182088 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -58,9 +58,7 @@ u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe)
 	return hi | lo;
 }
 
-void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
-			    struct skb_shared_hwtstamps *hwts,
-			    u64 timestamp)
+u64 mlx4_en_get_hwtstamp(struct mlx4_en_dev *mdev, u64 timestamp)
 {
 	unsigned int seq;
 	u64 nsec;
@@ -70,8 +68,15 @@ void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
 		nsec = timecounter_cyc2time(&mdev->clock, timestamp);
 	} while (read_seqretry(&mdev->clock_lock, seq));
 
+	return ns_to_ktime(nsec);
+}
+
+void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
+			    struct skb_shared_hwtstamps *hwts,
+			    u64 timestamp)
+{
 	memset(hwts, 0, sizeof(struct skb_shared_hwtstamps));
-	hwts->hwtstamp = ns_to_ktime(nsec);
+	hwts->hwtstamp = mlx4_en_get_hwtstamp(mdev, timestamp);
 }
 
 /**
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 8800d3f1f55c..af4c4858f397 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2889,6 +2889,11 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
 	.ndo_bpf		= mlx4_xdp,
 };
 
+static const struct xdp_metadata_ops mlx4_xdp_metadata_ops = {
+	.xmo_rx_timestamp		= mlx4_en_xdp_rx_timestamp,
+	.xmo_rx_hash			= mlx4_en_xdp_rx_hash,
+};
+
 struct mlx4_en_bond {
 	struct work_struct work;
 	struct mlx4_en_priv *priv;
@@ -3310,6 +3315,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 		dev->netdev_ops = &mlx4_netdev_ops_master;
 	else
 		dev->netdev_ops = &mlx4_netdev_ops;
+	dev->xdp_metadata_ops = &mlx4_xdp_metadata_ops;
 	dev->watchdog_timeo = MLX4_EN_WATCHDOG_TIMEOUT;
 	netif_set_real_num_tx_queues(dev, priv->tx_ring_num[TX]);
 	netif_set_real_num_rx_queues(dev, priv->rx_ring_num);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 014a80af2813..0869d4fff17b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -663,8 +663,35 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
 
 struct mlx4_en_xdp_buff {
 	struct xdp_buff xdp;
+	struct mlx4_cqe *cqe;
+	struct mlx4_en_dev *mdev;
+	struct mlx4_en_rx_ring *ring;
+	struct net_device *dev;
 };
 
+int mlx4_en_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
+{
+	struct mlx4_en_xdp_buff *_ctx = (void *)ctx;
+
+	if (unlikely(_ctx->ring->hwtstamp_rx_filter != HWTSTAMP_FILTER_ALL))
+		return -EOPNOTSUPP;
+
+	*timestamp = mlx4_en_get_hwtstamp(_ctx->mdev,
+					  mlx4_en_get_cqe_ts(_ctx->cqe));
+	return 0;
+}
+
+int mlx4_en_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
+{
+	struct mlx4_en_xdp_buff *_ctx = (void *)ctx;
+
+	if (unlikely(!(_ctx->dev->features & NETIF_F_RXHASH)))
+		return -EOPNOTSUPP;
+
+	*hash = be32_to_cpu(_ctx->cqe->immed_rss_invalid);
+	return 0;
+}
+
 int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -781,8 +808,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 						DMA_FROM_DEVICE);
 
 			xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
-					 frags[0].page_offset, length, false);
+					 frags[0].page_offset, length, true);
 			orig_data = mxbuf.xdp.data;
+			mxbuf.cqe = cqe;
+			mxbuf.mdev = priv->mdev;
+			mxbuf.ring = ring;
+			mxbuf.dev = dev;
 
 			act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index e132ff4c82f2..2f8ef0b30083 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -788,10 +788,15 @@ void mlx4_en_update_pfc_stats_bitmap(struct mlx4_dev *dev,
 int mlx4_en_netdev_event(struct notifier_block *this,
 			 unsigned long event, void *ptr);
 
+struct xdp_md;
+int mlx4_en_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp);
+int mlx4_en_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash);
+
 /*
  * Functions for time stamping
  */
 u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe);
+u64 mlx4_en_get_hwtstamp(struct mlx4_en_dev *mdev, u64 timestamp);
 void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
 			    struct skb_shared_hwtstamps *hwts,
 			    u64 timestamp);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 12/15] xsk: Add cb area to struct xdp_buff_xsk
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (10 preceding siblings ...)
  2022-12-13  2:36 ` [PATCH bpf-next v4 11/15] net/mlx4_en: Support RX XDP metadata Stanislav Fomichev
@ 2022-12-13  2:36 ` Stanislav Fomichev
  2022-12-13  2:36 ` [PATCH bpf-next v4 13/15] net/mlx5e: Introduce wrapper for xdp_buff Stanislav Fomichev
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:36 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Toke Høiland-Jørgensen,
	David Ahern, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

From: Toke Høiland-Jørgensen <toke@redhat.com>

Add an area after the xdp_buff in struct xdp_buff_xsk that drivers can use
to stash extra information to use in metadata kfuncs. The maximum size of
24 bytes means the full xdp_buff_xsk structure will take up exactly two
cache lines (with the cb field spanning both). Also add a macro drivers can
use to check their own wrapping structs against the available size.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/net/xsk_buff_pool.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index f787c3f524b0..3e952e569418 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -19,8 +19,11 @@ struct xdp_sock;
 struct device;
 struct page;
 
+#define XSK_PRIV_MAX 24
+
 struct xdp_buff_xsk {
 	struct xdp_buff xdp;
+	u8 cb[XSK_PRIV_MAX];
 	dma_addr_t dma;
 	dma_addr_t frame_dma;
 	struct xsk_buff_pool *pool;
@@ -28,6 +31,8 @@ struct xdp_buff_xsk {
 	struct list_head free_list_node;
 };
 
+#define XSK_CHECK_PRIV_TYPE(t) BUILD_BUG_ON(sizeof(t) > offsetofend(struct xdp_buff_xsk, cb))
+
 struct xsk_dma_map {
 	dma_addr_t *dma_pages;
 	struct device *dev;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 13/15] net/mlx5e: Introduce wrapper for xdp_buff
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (11 preceding siblings ...)
  2022-12-13  2:36 ` [PATCH bpf-next v4 12/15] xsk: Add cb area to struct xdp_buff_xsk Stanislav Fomichev
@ 2022-12-13  2:36 ` Stanislav Fomichev
  2022-12-13  2:36 ` [PATCH bpf-next v4 14/15] net/mlx5e: Support RX XDP metadata Stanislav Fomichev
  2022-12-13  2:36 ` [PATCH bpf-next v4 15/15] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
  14 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:36 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Toke Høiland-Jørgensen,
	Saeed Mahameed, David Ahern, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

From: Toke Høiland-Jørgensen <toke@redhat.com>

Preparation for implementing HW metadata kfuncs. No functional change.

Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  1 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.h  |  6 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   | 25 +++++----
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 56 +++++++++----------
 5 files changed, 49 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ff5b302531d5..be86a599df97 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -469,6 +469,7 @@ struct mlx5e_txqsq {
 union mlx5e_alloc_unit {
 	struct page *page;
 	struct xdp_buff *xsk;
+	struct mlx5e_xdp_buff *mxbuf;
 };
 
 /* XDP packets can be transmitted in different ways. On completion, we need to
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 20507ef2f956..31bb6806bf5d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -158,8 +158,9 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
 
 /* returns true if packet was consumed by xdp */
 bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
-		      struct bpf_prog *prog, struct xdp_buff *xdp)
+		      struct bpf_prog *prog, struct mlx5e_xdp_buff *mxbuf)
 {
+	struct xdp_buff *xdp = &mxbuf->xdp;
 	u32 act;
 	int err;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
index bc2d9034af5b..389818bf6833 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
@@ -44,10 +44,14 @@
 	(MLX5E_XDP_INLINE_WQE_MAX_DS_CNT * MLX5_SEND_WQE_DS - \
 	 sizeof(struct mlx5_wqe_inline_seg))
 
+struct mlx5e_xdp_buff {
+	struct xdp_buff xdp;
+};
+
 struct mlx5e_xsk_param;
 int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk);
 bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
-		      struct bpf_prog *prog, struct xdp_buff *xdp);
+		      struct bpf_prog *prog, struct mlx5e_xdp_buff *mlctx);
 void mlx5e_xdp_mpwqe_complete(struct mlx5e_xdpsq *sq);
 bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq);
 void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index c91b54d9ff27..9cff82d764e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -22,6 +22,7 @@ int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 		goto err;
 
 	BUILD_BUG_ON(sizeof(wi->alloc_units[0]) != sizeof(wi->alloc_units[0].xsk));
+	XSK_CHECK_PRIV_TYPE(struct mlx5e_xdp_buff);
 	batch = xsk_buff_alloc_batch(rq->xsk_pool, (struct xdp_buff **)wi->alloc_units,
 				     rq->mpwqe.pages_per_wqe);
 
@@ -233,7 +234,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 						    u32 head_offset,
 						    u32 page_idx)
 {
-	struct xdp_buff *xdp = wi->alloc_units[page_idx].xsk;
+	struct mlx5e_xdp_buff *mxbuf = wi->alloc_units[page_idx].mxbuf;
 	struct bpf_prog *prog;
 
 	/* Check packet size. Note LRO doesn't use linear SKB */
@@ -249,9 +250,9 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 	 */
 	WARN_ON_ONCE(head_offset);
 
-	xsk_buff_set_size(xdp, cqe_bcnt);
-	xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
-	net_prefetch(xdp->data);
+	xsk_buff_set_size(&mxbuf->xdp, cqe_bcnt);
+	xsk_buff_dma_sync_for_cpu(&mxbuf->xdp, rq->xsk_pool);
+	net_prefetch(mxbuf->xdp.data);
 
 	/* Possible flows:
 	 * - XDP_REDIRECT to XSKMAP:
@@ -269,7 +270,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 	 */
 
 	prog = rcu_dereference(rq->xdp_prog);
-	if (likely(prog && mlx5e_xdp_handle(rq, NULL, prog, xdp))) {
+	if (likely(prog && mlx5e_xdp_handle(rq, NULL, prog, mxbuf))) {
 		if (likely(__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)))
 			__set_bit(page_idx, wi->xdp_xmit_bitmap); /* non-atomic */
 		return NULL; /* page/packet was consumed by XDP */
@@ -278,14 +279,14 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 	/* XDP_PASS: copy the data from the UMEM to a new SKB and reuse the
 	 * frame. On SKB allocation failure, NULL is returned.
 	 */
-	return mlx5e_xsk_construct_skb(rq, xdp);
+	return mlx5e_xsk_construct_skb(rq, &mxbuf->xdp);
 }
 
 struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 					      struct mlx5e_wqe_frag_info *wi,
 					      u32 cqe_bcnt)
 {
-	struct xdp_buff *xdp = wi->au->xsk;
+	struct mlx5e_xdp_buff *mxbuf = wi->au->mxbuf;
 	struct bpf_prog *prog;
 
 	/* wi->offset is not used in this function, because xdp->data and the
@@ -295,17 +296,17 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 	 */
 	WARN_ON_ONCE(wi->offset);
 
-	xsk_buff_set_size(xdp, cqe_bcnt);
-	xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
-	net_prefetch(xdp->data);
+	xsk_buff_set_size(&mxbuf->xdp, cqe_bcnt);
+	xsk_buff_dma_sync_for_cpu(&mxbuf->xdp, rq->xsk_pool);
+	net_prefetch(mxbuf->xdp.data);
 
 	prog = rcu_dereference(rq->xdp_prog);
-	if (likely(prog && mlx5e_xdp_handle(rq, NULL, prog, xdp)))
+	if (likely(prog && mlx5e_xdp_handle(rq, NULL, prog, mxbuf)))
 		return NULL; /* page/packet was consumed by XDP */
 
 	/* XDP_PASS: copy the data from the UMEM to a new SKB. The frame reuse
 	 * will be handled by mlx5e_free_rx_wqe.
 	 * On SKB allocation failure, NULL is returned.
 	 */
-	return mlx5e_xsk_construct_skb(rq, xdp);
+	return mlx5e_xsk_construct_skb(rq, &mxbuf->xdp);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b1ea0b995d9c..9785b58ff3bc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1565,10 +1565,10 @@ struct sk_buff *mlx5e_build_linear_skb(struct mlx5e_rq *rq, void *va,
 }
 
 static void mlx5e_fill_xdp_buff(struct mlx5e_rq *rq, void *va, u16 headroom,
-				u32 len, struct xdp_buff *xdp)
+				u32 len, struct mlx5e_xdp_buff *mxbuf)
 {
-	xdp_init_buff(xdp, rq->buff.frame0_sz, &rq->xdp_rxq);
-	xdp_prepare_buff(xdp, va, headroom, len, true);
+	xdp_init_buff(&mxbuf->xdp, rq->buff.frame0_sz, &rq->xdp_rxq);
+	xdp_prepare_buff(&mxbuf->xdp, va, headroom, len, true);
 }
 
 static struct sk_buff *
@@ -1595,16 +1595,16 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
 
 	prog = rcu_dereference(rq->xdp_prog);
 	if (prog) {
-		struct xdp_buff xdp;
+		struct mlx5e_xdp_buff mxbuf;
 
 		net_prefetchw(va); /* xdp_frame data area */
-		mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &xdp);
-		if (mlx5e_xdp_handle(rq, au->page, prog, &xdp))
+		mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &mxbuf);
+		if (mlx5e_xdp_handle(rq, au->page, prog, &mxbuf))
 			return NULL; /* page/packet was consumed by XDP */
 
-		rx_headroom = xdp.data - xdp.data_hard_start;
-		metasize = xdp.data - xdp.data_meta;
-		cqe_bcnt = xdp.data_end - xdp.data;
+		rx_headroom = mxbuf.xdp.data - mxbuf.xdp.data_hard_start;
+		metasize = mxbuf.xdp.data - mxbuf.xdp.data_meta;
+		cqe_bcnt = mxbuf.xdp.data_end - mxbuf.xdp.data;
 	}
 	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
 	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt, metasize);
@@ -1626,9 +1626,9 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 	union mlx5e_alloc_unit *au = wi->au;
 	u16 rx_headroom = rq->buff.headroom;
 	struct skb_shared_info *sinfo;
+	struct mlx5e_xdp_buff mxbuf;
 	u32 frag_consumed_bytes;
 	struct bpf_prog *prog;
-	struct xdp_buff xdp;
 	struct sk_buff *skb;
 	dma_addr_t addr;
 	u32 truesize;
@@ -1643,8 +1643,8 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 	net_prefetchw(va); /* xdp_frame data area */
 	net_prefetch(va + rx_headroom);
 
-	mlx5e_fill_xdp_buff(rq, va, rx_headroom, frag_consumed_bytes, &xdp);
-	sinfo = xdp_get_shared_info_from_buff(&xdp);
+	mlx5e_fill_xdp_buff(rq, va, rx_headroom, frag_consumed_bytes, &mxbuf);
+	sinfo = xdp_get_shared_info_from_buff(&mxbuf.xdp);
 	truesize = 0;
 
 	cqe_bcnt -= frag_consumed_bytes;
@@ -1662,13 +1662,13 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 		dma_sync_single_for_cpu(rq->pdev, addr + wi->offset,
 					frag_consumed_bytes, rq->buff.map_dir);
 
-		if (!xdp_buff_has_frags(&xdp)) {
+		if (!xdp_buff_has_frags(&mxbuf.xdp)) {
 			/* Init on the first fragment to avoid cold cache access
 			 * when possible.
 			 */
 			sinfo->nr_frags = 0;
 			sinfo->xdp_frags_size = 0;
-			xdp_buff_set_frags_flag(&xdp);
+			xdp_buff_set_frags_flag(&mxbuf.xdp);
 		}
 
 		frag = &sinfo->frags[sinfo->nr_frags++];
@@ -1677,7 +1677,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 		skb_frag_size_set(frag, frag_consumed_bytes);
 
 		if (page_is_pfmemalloc(au->page))
-			xdp_buff_set_frag_pfmemalloc(&xdp);
+			xdp_buff_set_frag_pfmemalloc(&mxbuf.xdp);
 
 		sinfo->xdp_frags_size += frag_consumed_bytes;
 		truesize += frag_info->frag_stride;
@@ -1690,7 +1690,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 	au = head_wi->au;
 
 	prog = rcu_dereference(rq->xdp_prog);
-	if (prog && mlx5e_xdp_handle(rq, au->page, prog, &xdp)) {
+	if (prog && mlx5e_xdp_handle(rq, au->page, prog, &mxbuf)) {
 		if (test_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
 			int i;
 
@@ -1700,22 +1700,22 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 		return NULL; /* page/packet was consumed by XDP */
 	}
 
-	skb = mlx5e_build_linear_skb(rq, xdp.data_hard_start, rq->buff.frame0_sz,
-				     xdp.data - xdp.data_hard_start,
-				     xdp.data_end - xdp.data,
-				     xdp.data - xdp.data_meta);
+	skb = mlx5e_build_linear_skb(rq, mxbuf.xdp.data_hard_start, rq->buff.frame0_sz,
+				     mxbuf.xdp.data - mxbuf.xdp.data_hard_start,
+				     mxbuf.xdp.data_end - mxbuf.xdp.data,
+				     mxbuf.xdp.data - mxbuf.xdp.data_meta);
 	if (unlikely(!skb))
 		return NULL;
 
 	page_ref_inc(au->page);
 
-	if (unlikely(xdp_buff_has_frags(&xdp))) {
+	if (unlikely(xdp_buff_has_frags(&mxbuf.xdp))) {
 		int i;
 
 		/* sinfo->nr_frags is reset by build_skb, calculate again. */
 		xdp_update_skb_shared_info(skb, wi - head_wi - 1,
 					   sinfo->xdp_frags_size, truesize,
-					   xdp_buff_is_frag_pfmemalloc(&xdp));
+					   xdp_buff_is_frag_pfmemalloc(&mxbuf.xdp));
 
 		for (i = 0; i < sinfo->nr_frags; i++) {
 			skb_frag_t *frag = &sinfo->frags[i];
@@ -1996,19 +1996,19 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 
 	prog = rcu_dereference(rq->xdp_prog);
 	if (prog) {
-		struct xdp_buff xdp;
+		struct mlx5e_xdp_buff mxbuf;
 
 		net_prefetchw(va); /* xdp_frame data area */
-		mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &xdp);
-		if (mlx5e_xdp_handle(rq, au->page, prog, &xdp)) {
+		mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &mxbuf);
+		if (mlx5e_xdp_handle(rq, au->page, prog, &mxbuf)) {
 			if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags))
 				__set_bit(page_idx, wi->xdp_xmit_bitmap); /* non-atomic */
 			return NULL; /* page/packet was consumed by XDP */
 		}
 
-		rx_headroom = xdp.data - xdp.data_hard_start;
-		metasize = xdp.data - xdp.data_meta;
-		cqe_bcnt = xdp.data_end - xdp.data;
+		rx_headroom = mxbuf.xdp.data - mxbuf.xdp.data_hard_start;
+		metasize = mxbuf.xdp.data - mxbuf.xdp.data_meta;
+		cqe_bcnt = mxbuf.xdp.data_end - mxbuf.xdp.data;
 	}
 	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
 	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt, metasize);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 14/15] net/mlx5e: Support RX XDP metadata
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (12 preceding siblings ...)
  2022-12-13  2:36 ` [PATCH bpf-next v4 13/15] net/mlx5e: Introduce wrapper for xdp_buff Stanislav Fomichev
@ 2022-12-13  2:36 ` Stanislav Fomichev
  2022-12-13  2:36 ` [PATCH bpf-next v4 15/15] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
  14 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:36 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, Toke Høiland-Jørgensen,
	Saeed Mahameed, David Ahern, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev

From: Toke Høiland-Jørgensen <toke@redhat.com>

Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
pointer to the mlx5e_skb_from* functions so it can be retrieved from the
XDP ctx to do this.

Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 10 +++-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  | 23 +++++++++
 .../net/ethernet/mellanox/mlx5/core/en/xdp.h  |  5 ++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   | 10 ++++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |  2 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  6 +++
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 51 ++++++++++---------
 7 files changed, 81 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index be86a599df97..b3f7a3e5926c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -627,10 +627,11 @@ struct mlx5e_rq;
 typedef void (*mlx5e_fp_handle_rx_cqe)(struct mlx5e_rq*, struct mlx5_cqe64*);
 typedef struct sk_buff *
 (*mlx5e_fp_skb_from_cqe_mpwrq)(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
-			       u16 cqe_bcnt, u32 head_offset, u32 page_idx);
+			       struct mlx5_cqe64 *cqe, u16 cqe_bcnt,
+			       u32 head_offset, u32 page_idx);
 typedef struct sk_buff *
 (*mlx5e_fp_skb_from_cqe)(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
-			 u32 cqe_bcnt);
+			 struct mlx5_cqe64 *cqe, u32 cqe_bcnt);
 typedef bool (*mlx5e_fp_post_rx_wqes)(struct mlx5e_rq *rq);
 typedef void (*mlx5e_fp_dealloc_wqe)(struct mlx5e_rq*, u16);
 typedef void (*mlx5e_fp_shampo_dealloc_hd)(struct mlx5e_rq*, u16, u16, bool);
@@ -1036,6 +1037,11 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 proto,
 			   u16 vid);
 void mlx5e_timestamp_init(struct mlx5e_priv *priv);
 
+static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config)
+{
+	return config->rx_filter == HWTSTAMP_FILTER_ALL;
+}
+
 struct mlx5e_xsk_param;
 
 struct mlx5e_rq_param;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 31bb6806bf5d..d10d31e12ba2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -156,6 +156,29 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
 	return true;
 }
 
+int mlx5e_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
+{
+	const struct mlx5e_xdp_buff *_ctx = (void *)ctx;
+
+	if (unlikely(!mlx5e_rx_hw_stamp(_ctx->rq->tstamp)))
+		return -EOPNOTSUPP;
+
+	*timestamp =  mlx5e_cqe_ts_to_ns(_ctx->rq->ptp_cyc2time,
+					 _ctx->rq->clock, get_cqe_ts(_ctx->cqe));
+	return 0;
+}
+
+int mlx5e_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
+{
+	const struct mlx5e_xdp_buff *_ctx = (void *)ctx;
+
+	if (unlikely(!(_ctx->xdp.rxq->dev->features & NETIF_F_RXHASH)))
+		return -EOPNOTSUPP;
+
+	*hash = be32_to_cpu(_ctx->cqe->rss_hash_result);
+	return 0;
+}
+
 /* returns true if packet was consumed by xdp */
 bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
 		      struct bpf_prog *prog, struct mlx5e_xdp_buff *mxbuf)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
index 389818bf6833..cb568c62aba0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
@@ -46,6 +46,8 @@
 
 struct mlx5e_xdp_buff {
 	struct xdp_buff xdp;
+	struct mlx5_cqe64 *cqe;
+	struct mlx5e_rq *rq;
 };
 
 struct mlx5e_xsk_param;
@@ -60,6 +62,9 @@ void mlx5e_xdp_rx_poll_complete(struct mlx5e_rq *rq);
 int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
 		   u32 flags);
 
+int mlx5e_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp);
+int mlx5e_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash);
+
 INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq,
 							  struct mlx5e_xmit_data *xdptxd,
 							  struct skb_shared_info *sinfo,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 9cff82d764e3..8bf3029abd3c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -49,6 +49,7 @@ int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 			umr_wqe->inline_mtts[i] = (struct mlx5_mtt) {
 				.ptag = cpu_to_be64(addr | MLX5_EN_WR),
 			};
+			wi->alloc_units[i].mxbuf->rq = rq;
 		}
 	} else if (unlikely(rq->mpwqe.umr_mode == MLX5E_MPWRQ_UMR_MODE_UNALIGNED)) {
 		for (i = 0; i < batch; i++) {
@@ -58,6 +59,7 @@ int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 				.key = rq->mkey_be,
 				.va = cpu_to_be64(addr),
 			};
+			wi->alloc_units[i].mxbuf->rq = rq;
 		}
 	} else if (likely(rq->mpwqe.umr_mode == MLX5E_MPWRQ_UMR_MODE_TRIPLE)) {
 		u32 mapping_size = 1 << (rq->mpwqe.page_shift - 2);
@@ -81,6 +83,7 @@ int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 				.key = rq->mkey_be,
 				.va = cpu_to_be64(rq->wqe_overflow.addr),
 			};
+			wi->alloc_units[i].mxbuf->rq = rq;
 		}
 	} else {
 		__be32 pad_size = cpu_to_be32((1 << rq->mpwqe.page_shift) -
@@ -100,6 +103,7 @@ int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 				.va = cpu_to_be64(rq->wqe_overflow.addr),
 				.bcount = pad_size,
 			};
+			wi->alloc_units[i].mxbuf->rq = rq;
 		}
 	}
 
@@ -230,6 +234,7 @@ static struct sk_buff *mlx5e_xsk_construct_skb(struct mlx5e_rq *rq, struct xdp_b
 
 struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 						    struct mlx5e_mpw_info *wi,
+						    struct mlx5_cqe64 *cqe,
 						    u16 cqe_bcnt,
 						    u32 head_offset,
 						    u32 page_idx)
@@ -250,6 +255,8 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 	 */
 	WARN_ON_ONCE(head_offset);
 
+	/* mxbuf->rq is set on allocation, but cqe is per-packet so set it here */
+	mxbuf->cqe = cqe;
 	xsk_buff_set_size(&mxbuf->xdp, cqe_bcnt);
 	xsk_buff_dma_sync_for_cpu(&mxbuf->xdp, rq->xsk_pool);
 	net_prefetch(mxbuf->xdp.data);
@@ -284,6 +291,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 
 struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 					      struct mlx5e_wqe_frag_info *wi,
+					      struct mlx5_cqe64 *cqe,
 					      u32 cqe_bcnt)
 {
 	struct mlx5e_xdp_buff *mxbuf = wi->au->mxbuf;
@@ -296,6 +304,8 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 	 */
 	WARN_ON_ONCE(wi->offset);
 
+	/* mxbuf->rq is set on allocation, but cqe is per-packet so set it here */
+	mxbuf->cqe = cqe;
 	xsk_buff_set_size(&mxbuf->xdp, cqe_bcnt);
 	xsk_buff_dma_sync_for_cpu(&mxbuf->xdp, rq->xsk_pool);
 	net_prefetch(mxbuf->xdp.data);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index 087c943bd8e9..cefc0ef6105d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -13,11 +13,13 @@ int mlx5e_xsk_alloc_rx_wqes_batched(struct mlx5e_rq *rq, u16 ix, int wqe_bulk);
 int mlx5e_xsk_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk);
 struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 						    struct mlx5e_mpw_info *wi,
+						    struct mlx5_cqe64 *cqe,
 						    u16 cqe_bcnt,
 						    u32 head_offset,
 						    u32 page_idx);
 struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 					      struct mlx5e_wqe_frag_info *wi,
+					      struct mlx5_cqe64 *cqe,
 					      u32 cqe_bcnt);
 
 #endif /* __MLX5_EN_XSK_RX_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 217c8a478977..6395ab141273 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4913,6 +4913,11 @@ const struct net_device_ops mlx5e_netdev_ops = {
 #endif
 };
 
+static const struct xdp_metadata_ops mlx5_xdp_metadata_ops = {
+	.xmo_rx_timestamp		= mlx5e_xdp_rx_timestamp,
+	.xmo_rx_hash			= mlx5e_xdp_rx_hash,
+};
+
 static u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout)
 {
 	int i;
@@ -5053,6 +5058,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 	SET_NETDEV_DEV(netdev, mdev->device);
 
 	netdev->netdev_ops = &mlx5e_netdev_ops;
+	netdev->xdp_metadata_ops = &mlx5_xdp_metadata_ops;
 
 	mlx5e_dcbnl_build_netdev(netdev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 9785b58ff3bc..cbdb9e6199a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -62,10 +62,12 @@
 
 static struct sk_buff *
 mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
-				u16 cqe_bcnt, u32 head_offset, u32 page_idx);
+				struct mlx5_cqe64 *cqe, u16 cqe_bcnt, u32 head_offset,
+				u32 page_idx);
 static struct sk_buff *
 mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
-				   u16 cqe_bcnt, u32 head_offset, u32 page_idx);
+				   struct mlx5_cqe64 *cqe, u16 cqe_bcnt, u32 head_offset,
+				   u32 page_idx);
 static void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 static void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
@@ -76,11 +78,6 @@ const struct mlx5e_rx_handlers mlx5e_rx_handlers_nic = {
 	.handle_rx_cqe_mpwqe_shampo = mlx5e_handle_rx_cqe_mpwrq_shampo,
 };
 
-static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config)
-{
-	return config->rx_filter == HWTSTAMP_FILTER_ALL;
-}
-
 static inline void mlx5e_read_cqe_slot(struct mlx5_cqwq *wq,
 				       u32 cqcc, void *data)
 {
@@ -1564,16 +1561,19 @@ struct sk_buff *mlx5e_build_linear_skb(struct mlx5e_rq *rq, void *va,
 	return skb;
 }
 
-static void mlx5e_fill_xdp_buff(struct mlx5e_rq *rq, void *va, u16 headroom,
-				u32 len, struct mlx5e_xdp_buff *mxbuf)
+static void mlx5e_fill_xdp_buff(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
+				void *va, u16 headroom, u32 len,
+				struct mlx5e_xdp_buff *mxbuf)
 {
 	xdp_init_buff(&mxbuf->xdp, rq->buff.frame0_sz, &rq->xdp_rxq);
 	xdp_prepare_buff(&mxbuf->xdp, va, headroom, len, true);
+	mxbuf->cqe = cqe;
+	mxbuf->rq = rq;
 }
 
 static struct sk_buff *
 mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
-			  u32 cqe_bcnt)
+			  struct mlx5_cqe64 *cqe, u32 cqe_bcnt)
 {
 	union mlx5e_alloc_unit *au = wi->au;
 	u16 rx_headroom = rq->buff.headroom;
@@ -1598,7 +1598,7 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
 		struct mlx5e_xdp_buff mxbuf;
 
 		net_prefetchw(va); /* xdp_frame data area */
-		mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &mxbuf);
+		mlx5e_fill_xdp_buff(rq, cqe, va, rx_headroom, cqe_bcnt, &mxbuf);
 		if (mlx5e_xdp_handle(rq, au->page, prog, &mxbuf))
 			return NULL; /* page/packet was consumed by XDP */
 
@@ -1619,7 +1619,7 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
 
 static struct sk_buff *
 mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
-			     u32 cqe_bcnt)
+			     struct mlx5_cqe64 *cqe, u32 cqe_bcnt)
 {
 	struct mlx5e_rq_frag_info *frag_info = &rq->wqe.info.arr[0];
 	struct mlx5e_wqe_frag_info *head_wi = wi;
@@ -1643,7 +1643,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 	net_prefetchw(va); /* xdp_frame data area */
 	net_prefetch(va + rx_headroom);
 
-	mlx5e_fill_xdp_buff(rq, va, rx_headroom, frag_consumed_bytes, &mxbuf);
+	mlx5e_fill_xdp_buff(rq, cqe, va, rx_headroom, frag_consumed_bytes, &mxbuf);
 	sinfo = xdp_get_shared_info_from_buff(&mxbuf.xdp);
 	truesize = 0;
 
@@ -1766,7 +1766,7 @@ static void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 			      mlx5e_skb_from_cqe_linear,
 			      mlx5e_skb_from_cqe_nonlinear,
 			      mlx5e_xsk_skb_from_cqe_linear,
-			      rq, wi, cqe_bcnt);
+			      rq, wi, cqe, cqe_bcnt);
 	if (!skb) {
 		/* probably for XDP */
 		if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
@@ -1819,7 +1819,7 @@ static void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe,
 			      mlx5e_skb_from_cqe_linear,
 			      mlx5e_skb_from_cqe_nonlinear,
-			      rq, wi, cqe_bcnt);
+			      rq, wi, cqe, cqe_bcnt);
 	if (!skb) {
 		/* probably for XDP */
 		if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
@@ -1878,7 +1878,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_rep(struct mlx5e_rq *rq, struct mlx5_cqe64
 	skb = INDIRECT_CALL_2(rq->mpwqe.skb_from_cqe_mpwrq,
 			      mlx5e_skb_from_cqe_mpwrq_linear,
 			      mlx5e_skb_from_cqe_mpwrq_nonlinear,
-			      rq, wi, cqe_bcnt, head_offset, page_idx);
+			      rq, wi, cqe, cqe_bcnt, head_offset, page_idx);
 	if (!skb)
 		goto mpwrq_cqe_out;
 
@@ -1929,7 +1929,8 @@ mlx5e_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq,
 
 static struct sk_buff *
 mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
-				   u16 cqe_bcnt, u32 head_offset, u32 page_idx)
+				   struct mlx5_cqe64 *cqe, u16 cqe_bcnt, u32 head_offset,
+				   u32 page_idx)
 {
 	union mlx5e_alloc_unit *au = &wi->alloc_units[page_idx];
 	u16 headlen = min_t(u16, MLX5E_RX_MAX_HEAD, cqe_bcnt);
@@ -1968,7 +1969,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 static struct sk_buff *
 mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
-				u16 cqe_bcnt, u32 head_offset, u32 page_idx)
+				struct mlx5_cqe64 *cqe, u16 cqe_bcnt, u32 head_offset,
+				u32 page_idx)
 {
 	union mlx5e_alloc_unit *au = &wi->alloc_units[page_idx];
 	u16 rx_headroom = rq->buff.headroom;
@@ -1999,7 +2001,7 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 		struct mlx5e_xdp_buff mxbuf;
 
 		net_prefetchw(va); /* xdp_frame data area */
-		mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &mxbuf);
+		mlx5e_fill_xdp_buff(rq, cqe, va, rx_headroom, cqe_bcnt, &mxbuf);
 		if (mlx5e_xdp_handle(rq, au->page, prog, &mxbuf)) {
 			if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags))
 				__set_bit(page_idx, wi->xdp_xmit_bitmap); /* non-atomic */
@@ -2163,8 +2165,8 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 		if (likely(head_size))
 			*skb = mlx5e_skb_from_cqe_shampo(rq, wi, cqe, header_index);
 		else
-			*skb = mlx5e_skb_from_cqe_mpwrq_nonlinear(rq, wi, cqe_bcnt, data_offset,
-								  page_idx);
+			*skb = mlx5e_skb_from_cqe_mpwrq_nonlinear(rq, wi, cqe, cqe_bcnt,
+								  data_offset, page_idx);
 		if (unlikely(!*skb))
 			goto free_hd_entry;
 
@@ -2238,7 +2240,8 @@ static void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cq
 			      mlx5e_skb_from_cqe_mpwrq_linear,
 			      mlx5e_skb_from_cqe_mpwrq_nonlinear,
 			      mlx5e_xsk_skb_from_cqe_mpwrq_linear,
-			      rq, wi, cqe_bcnt, head_offset, page_idx);
+			      rq, wi, cqe, cqe_bcnt, head_offset,
+			      page_idx);
 	if (!skb)
 		goto mpwrq_cqe_out;
 
@@ -2483,7 +2486,7 @@ static void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe,
 			      mlx5e_skb_from_cqe_linear,
 			      mlx5e_skb_from_cqe_nonlinear,
-			      rq, wi, cqe_bcnt);
+			      rq, wi, cqe, cqe_bcnt);
 	if (!skb)
 		goto wq_free_wqe;
 
@@ -2575,7 +2578,7 @@ static void mlx5e_trap_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe
 		goto free_wqe;
 	}
 
-	skb = mlx5e_skb_from_cqe_nonlinear(rq, wi, cqe_bcnt);
+	skb = mlx5e_skb_from_cqe_nonlinear(rq, wi, cqe, cqe_bcnt);
 	if (!skb)
 		goto free_wqe;
 
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v4 15/15] selftests/bpf: Simple program to dump XDP RX metadata
  2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
                   ` (13 preceding siblings ...)
  2022-12-13  2:36 ` [PATCH bpf-next v4 14/15] net/mlx5e: Support RX XDP metadata Stanislav Fomichev
@ 2022-12-13  2:36 ` Stanislav Fomichev
  14 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13  2:36 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

To be used for verification of driver implementations. Note that
the skb path is gone from the series, but I'm still keeping the
implementation for any possible future work.

$ xdp_hw_metadata <ifname>

On the other machine:

$ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP
$ echo -n skb | nc -u -q1 <target> 9092 # for skb

Sample output:

  # xdp
  xsk_ring_cons__peek: 1
  0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
  rx_timestamp_supported: 1
  rx_timestamp: 1667850075063948829
  0x19f9090: complete idx=8 addr=8000

  # skb
  found skb hwtstamp = 1668314052.854274681

Decoding:
  # xdp
  rx_timestamp=1667850075.063948829

  $ date -d @1667850075
  Mon Nov  7 11:41:15 AM PST 2022
  $ date
  Mon Nov  7 11:42:05 AM PST 2022

  # skb
  $ date -d @1668314052
  Sat Nov 12 08:34:12 PM PST 2022
  $ date
  Sat Nov 12 08:37:06 PM PST 2022

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   6 +-
 .../selftests/bpf/progs/xdp_hw_metadata.c     |  81 ++++
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 405 ++++++++++++++++++
 4 files changed, 492 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/xdp_hw_metadata.c

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 07d2d0a8c5cb..01e3baeefd4f 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -46,3 +46,4 @@ test_cpp
 xskxceiver
 xdp_redirect_multi
 xdp_synproxy
+xdp_hw_metadata
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index e6cbc04a7920..b7d5d3aa554e 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -83,7 +83,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \
 TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \
 	flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
 	test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \
-	xskxceiver xdp_redirect_multi xdp_synproxy veristat
+	xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata
 
 TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file
 TEST_GEN_FILES += liburandom_read.so
@@ -241,6 +241,9 @@ $(OUTPUT)/test_maps: $(TESTING_HELPERS)
 $(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS)
 $(OUTPUT)/xsk.o: $(BPFOBJ)
 $(OUTPUT)/xskxceiver: $(OUTPUT)/xsk.o
+$(OUTPUT)/xdp_hw_metadata: $(OUTPUT)/xsk.o $(OUTPUT)/xdp_hw_metadata.skel.h
+$(OUTPUT)/xdp_hw_metadata: $(OUTPUT)/network_helpers.o
+$(OUTPUT)/xdp_hw_metadata: LDFLAGS += -static
 
 BPFTOOL ?= $(DEFAULT_BPFTOOL)
 $(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile)    \
@@ -383,6 +386,7 @@ linked_maps.skel.h-deps := linked_maps1.bpf.o linked_maps2.bpf.o
 test_subskeleton.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib.bpf.o test_subskeleton.bpf.o
 test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib.bpf.o
 test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o
+xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o
 
 LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps)))
 
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
new file mode 100644
index 000000000000..25b8178735ee
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include "xdp_metadata.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_XSKMAP);
+	__uint(max_entries, 256);
+	__type(key, __u32);
+	__type(value, __u32);
+} xsk SEC(".maps");
+
+extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx,
+					 __u64 *timestamp) __ksym;
+extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx,
+				    __u32 *hash) __ksym;
+
+SEC("xdp")
+int rx(struct xdp_md *ctx)
+{
+	void *data, *data_meta, *data_end;
+	struct ipv6hdr *ip6h = NULL;
+	struct ethhdr *eth = NULL;
+	struct udphdr *udp = NULL;
+	struct iphdr *iph = NULL;
+	struct xdp_meta *meta;
+	int ret;
+
+	data = (void *)(long)ctx->data;
+	data_end = (void *)(long)ctx->data_end;
+	eth = data;
+	if (eth + 1 < data_end) {
+		if (eth->h_proto == bpf_htons(ETH_P_IP)) {
+			iph = (void *)(eth + 1);
+			if (iph + 1 < data_end && iph->protocol == IPPROTO_UDP)
+				udp = (void *)(iph + 1);
+		}
+		if (eth->h_proto == bpf_htons(ETH_P_IPV6)) {
+			ip6h = (void *)(eth + 1);
+			if (ip6h + 1 < data_end && ip6h->nexthdr == IPPROTO_UDP)
+				udp = (void *)(ip6h + 1);
+		}
+		if (udp && udp + 1 > data_end)
+			udp = NULL;
+	}
+
+	if (!udp)
+		return XDP_PASS;
+
+	if (udp->dest != bpf_htons(9091))
+		return XDP_PASS;
+
+	bpf_printk("forwarding UDP:9091 to AF_XDP");
+
+	ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
+	if (ret != 0) {
+		bpf_printk("bpf_xdp_adjust_meta returned %d", ret);
+		return XDP_PASS;
+	}
+
+	data = (void *)(long)ctx->data;
+	data_meta = (void *)(long)ctx->data_meta;
+	meta = data_meta;
+
+	if (meta + 1 > data) {
+		bpf_printk("bpf_xdp_adjust_meta doesn't appear to work");
+		return XDP_PASS;
+	}
+
+	if (!bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp))
+		bpf_printk("populated rx_timestamp with %u", meta->rx_timestamp);
+
+	if (!bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash))
+		bpf_printk("populated rx_hash with %u", meta->rx_hash);
+
+	return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
new file mode 100644
index 000000000000..b8b7600dc275
--- /dev/null
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -0,0 +1,405 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Reference program for verifying XDP metadata on real HW. Functional test
+ * only, doesn't test the performance.
+ *
+ * RX:
+ * - UDP 9091 packets are diverted into AF_XDP
+ * - Metadata verified:
+ *   - rx_timestamp
+ *   - rx_hash
+ *
+ * TX:
+ * - TBD
+ */
+
+#include <test_progs.h>
+#include <network_helpers.h>
+#include "xdp_hw_metadata.skel.h"
+#include "xsk.h"
+
+#include <error.h>
+#include <linux/errqueue.h>
+#include <linux/if_link.h>
+#include <linux/net_tstamp.h>
+#include <linux/udp.h>
+#include <linux/sockios.h>
+#include <sys/mman.h>
+#include <net/if.h>
+#include <poll.h>
+
+#include "xdp_metadata.h"
+
+#define UMEM_NUM 16
+#define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
+#define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM)
+#define XDP_FLAGS (XDP_FLAGS_DRV_MODE | XDP_FLAGS_REPLACE)
+
+struct xsk {
+	void *umem_area;
+	struct xsk_umem *umem;
+	struct xsk_ring_prod fill;
+	struct xsk_ring_cons comp;
+	struct xsk_ring_prod tx;
+	struct xsk_ring_cons rx;
+	struct xsk_socket *socket;
+};
+
+struct xdp_hw_metadata *bpf_obj;
+struct xsk *rx_xsk;
+const char *ifname;
+int ifindex;
+int rxq;
+
+void test__fail(void) { /* for network_helpers.c */ }
+
+static int open_xsk(const char *ifname, struct xsk *xsk, __u32 queue_id)
+{
+	int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE;
+	const struct xsk_socket_config socket_config = {
+		.rx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD,
+		.xdp_flags = XDP_FLAGS,
+		.bind_flags = XDP_COPY,
+	};
+	const struct xsk_umem_config umem_config = {
+		.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
+		.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE,
+		.flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG,
+	};
+	__u32 idx;
+	u64 addr;
+	int ret;
+	int i;
+
+	xsk->umem_area = mmap(NULL, UMEM_SIZE, PROT_READ | PROT_WRITE, mmap_flags, -1, 0);
+	if (xsk->umem_area == MAP_FAILED)
+		return -ENOMEM;
+
+	ret = xsk_umem__create(&xsk->umem,
+			       xsk->umem_area, UMEM_SIZE,
+			       &xsk->fill,
+			       &xsk->comp,
+			       &umem_config);
+	if (ret)
+		return ret;
+
+	ret = xsk_socket__create(&xsk->socket, ifname, queue_id,
+				 xsk->umem,
+				 &xsk->rx,
+				 &xsk->tx,
+				 &socket_config);
+	if (ret)
+		return ret;
+
+	/* First half of umem is for TX. This way address matches 1-to-1
+	 * to the completion queue index.
+	 */
+
+	for (i = 0; i < UMEM_NUM / 2; i++) {
+		addr = i * UMEM_FRAME_SIZE;
+		printf("%p: tx_desc[%d] -> %lx\n", xsk, i, addr);
+	}
+
+	/* Second half of umem is for RX. */
+
+	ret = xsk_ring_prod__reserve(&xsk->fill, UMEM_NUM / 2, &idx);
+	for (i = 0; i < UMEM_NUM / 2; i++) {
+		addr = (UMEM_NUM / 2 + i) * UMEM_FRAME_SIZE;
+		printf("%p: rx_desc[%d] -> %lx\n", xsk, i, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, i) = addr;
+	}
+	xsk_ring_prod__submit(&xsk->fill, ret);
+
+	return 0;
+}
+
+static void close_xsk(struct xsk *xsk)
+{
+	if (xsk->umem)
+		xsk_umem__delete(xsk->umem);
+	if (xsk->socket)
+		xsk_socket__delete(xsk->socket);
+	munmap(xsk->umem, UMEM_SIZE);
+}
+
+static void refill_rx(struct xsk *xsk, __u64 addr)
+{
+	__u32 idx;
+
+	if (xsk_ring_prod__reserve(&xsk->fill, 1, &idx) == 1) {
+		printf("%p: complete idx=%u addr=%llx\n", xsk, idx, addr);
+		*xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr;
+		xsk_ring_prod__submit(&xsk->fill, 1);
+	}
+}
+
+static void verify_xdp_metadata(void *data)
+{
+	struct xdp_meta *meta;
+
+	meta = data - sizeof(*meta);
+
+	printf("rx_timestamp: %llu\n", meta->rx_timestamp);
+	printf("rx_hash: %u\n", meta->rx_hash);
+}
+
+static void verify_skb_metadata(int fd)
+{
+	char cmsg_buf[1024];
+	char packet_buf[128];
+
+	struct scm_timestamping *ts;
+	struct iovec packet_iov;
+	struct cmsghdr *cmsg;
+	struct msghdr hdr;
+
+	memset(&hdr, 0, sizeof(hdr));
+	hdr.msg_iov = &packet_iov;
+	hdr.msg_iovlen = 1;
+	packet_iov.iov_base = packet_buf;
+	packet_iov.iov_len = sizeof(packet_buf);
+
+	hdr.msg_control = cmsg_buf;
+	hdr.msg_controllen = sizeof(cmsg_buf);
+
+	if (recvmsg(fd, &hdr, 0) < 0)
+		error(-1, errno, "recvmsg");
+
+	for (cmsg = CMSG_FIRSTHDR(&hdr); cmsg != NULL;
+	     cmsg = CMSG_NXTHDR(&hdr, cmsg)) {
+
+		if (cmsg->cmsg_level != SOL_SOCKET)
+			continue;
+
+		switch (cmsg->cmsg_type) {
+		case SCM_TIMESTAMPING:
+			ts = (struct scm_timestamping *)CMSG_DATA(cmsg);
+			if (ts->ts[2].tv_sec || ts->ts[2].tv_nsec) {
+				printf("found skb hwtstamp = %lu.%lu\n",
+				       ts->ts[2].tv_sec, ts->ts[2].tv_nsec);
+				return;
+			}
+			break;
+		default:
+			break;
+		}
+	}
+
+	printf("skb hwtstamp is not found!\n");
+}
+
+static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
+{
+	const struct xdp_desc *rx_desc;
+	struct pollfd fds[rxq + 1];
+	__u64 comp_addr;
+	__u64 addr;
+	__u32 idx;
+	int ret;
+	int i;
+
+	for (i = 0; i < rxq; i++) {
+		fds[i].fd = xsk_socket__fd(rx_xsk[i].socket);
+		fds[i].events = POLLIN;
+		fds[i].revents = 0;
+	}
+
+	fds[rxq].fd = server_fd;
+	fds[rxq].events = POLLIN;
+	fds[rxq].revents = 0;
+
+	while (true) {
+		errno = 0;
+		ret = poll(fds, rxq + 1, 1000);
+		printf("poll: %d (%d)\n", ret, errno);
+		if (ret < 0)
+			break;
+		if (ret == 0)
+			continue;
+
+		if (fds[rxq].revents)
+			verify_skb_metadata(server_fd);
+
+		for (i = 0; i < rxq; i++) {
+			if (fds[i].revents == 0)
+				continue;
+
+			struct xsk *xsk = &rx_xsk[i];
+
+			ret = xsk_ring_cons__peek(&xsk->rx, 1, &idx);
+			printf("xsk_ring_cons__peek: %d\n", ret);
+			if (ret != 1)
+				continue;
+
+			rx_desc = xsk_ring_cons__rx_desc(&xsk->rx, idx);
+			comp_addr = xsk_umem__extract_addr(rx_desc->addr);
+			addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
+			printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
+			       xsk, idx, rx_desc->addr, addr, comp_addr);
+			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr));
+			xsk_ring_cons__release(&xsk->rx, 1);
+			refill_rx(xsk, comp_addr);
+		}
+	}
+
+	return 0;
+}
+
+struct ethtool_channels {
+	__u32	cmd;
+	__u32	max_rx;
+	__u32	max_tx;
+	__u32	max_other;
+	__u32	max_combined;
+	__u32	rx_count;
+	__u32	tx_count;
+	__u32	other_count;
+	__u32	combined_count;
+};
+
+#define ETHTOOL_GCHANNELS	0x0000003c /* Get no of channels */
+
+static int rxq_num(const char *ifname)
+{
+	struct ethtool_channels ch = {
+		.cmd = ETHTOOL_GCHANNELS,
+	};
+
+	struct ifreq ifr = {
+		.ifr_data = (void *)&ch,
+	};
+	strcpy(ifr.ifr_name, ifname);
+	int fd, ret;
+
+	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	if (fd < 0)
+		error(-1, errno, "socket");
+
+	ret = ioctl(fd, SIOCETHTOOL, &ifr);
+	if (ret < 0)
+		error(-1, errno, "socket");
+
+	close(fd);
+
+	return ch.rx_count + ch.combined_count;
+}
+
+static void cleanup(void)
+{
+	LIBBPF_OPTS(bpf_xdp_attach_opts, opts);
+	int ret;
+	int i;
+
+	if (bpf_obj) {
+		opts.old_prog_fd = bpf_program__fd(bpf_obj->progs.rx);
+		if (opts.old_prog_fd >= 0) {
+			printf("detaching bpf program....\n");
+			ret = bpf_xdp_detach(ifindex, XDP_FLAGS, &opts);
+			if (ret)
+				printf("failed to detach XDP program: %d\n", ret);
+		}
+	}
+
+	for (i = 0; i < rxq; i++)
+		close_xsk(&rx_xsk[i]);
+
+	if (bpf_obj)
+		xdp_hw_metadata__destroy(bpf_obj);
+}
+
+static void handle_signal(int sig)
+{
+	/* interrupting poll() is all we need */
+}
+
+static void timestamping_enable(int fd, int val)
+{
+	int ret;
+
+	ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
+	if (ret < 0)
+		error(-1, errno, "setsockopt(SO_TIMESTAMPING)");
+}
+
+int main(int argc, char *argv[])
+{
+	int server_fd = -1;
+	int ret;
+	int i;
+
+	struct bpf_program *prog;
+
+	if (argc != 2) {
+		fprintf(stderr, "pass device name\n");
+		return -1;
+	}
+
+	ifname = argv[1];
+	ifindex = if_nametoindex(ifname);
+	rxq = rxq_num(ifname);
+
+	printf("rxq: %d\n", rxq);
+
+	rx_xsk = malloc(sizeof(struct xsk) * rxq);
+	if (!rx_xsk)
+		error(-1, ENOMEM, "malloc");
+
+	for (i = 0; i < rxq; i++) {
+		printf("open_xsk(%s, %p, %d)\n", ifname, &rx_xsk[i], i);
+		ret = open_xsk(ifname, &rx_xsk[i], i);
+		if (ret)
+			error(-1, -ret, "open_xsk");
+
+		printf("xsk_socket__fd() -> %d\n", xsk_socket__fd(rx_xsk[i].socket));
+	}
+
+	printf("open bpf program...\n");
+	bpf_obj = xdp_hw_metadata__open();
+	if (libbpf_get_error(bpf_obj))
+		error(-1, libbpf_get_error(bpf_obj), "xdp_hw_metadata__open");
+
+	prog = bpf_object__find_program_by_name(bpf_obj->obj, "rx");
+	bpf_program__set_ifindex(prog, ifindex);
+	bpf_program__set_flags(prog, BPF_F_XDP_DEV_BOUND_ONLY);
+
+	printf("load bpf program...\n");
+	ret = xdp_hw_metadata__load(bpf_obj);
+	if (ret)
+		error(-1, -ret, "xdp_hw_metadata__load");
+
+	printf("prepare skb endpoint...\n");
+	server_fd = start_server(AF_INET6, SOCK_DGRAM, NULL, 9092, 1000);
+	if (server_fd < 0)
+		error(-1, errno, "start_server");
+	timestamping_enable(server_fd,
+			    SOF_TIMESTAMPING_SOFTWARE |
+			    SOF_TIMESTAMPING_RAW_HARDWARE);
+
+	printf("prepare xsk map...\n");
+	for (i = 0; i < rxq; i++) {
+		int sock_fd = xsk_socket__fd(rx_xsk[i].socket);
+		__u32 queue_id = i;
+
+		printf("map[%d] = %d\n", queue_id, sock_fd);
+		ret = bpf_map_update_elem(bpf_map__fd(bpf_obj->maps.xsk), &queue_id, &sock_fd, 0);
+		if (ret)
+			error(-1, -ret, "bpf_map_update_elem");
+	}
+
+	printf("attach bpf program...\n");
+	ret = bpf_xdp_attach(ifindex,
+			     bpf_program__fd(bpf_obj->progs.rx),
+			     XDP_FLAGS, NULL);
+	if (ret)
+		error(-1, -ret, "bpf_xdp_attach");
+
+	signal(SIGINT, handle_signal);
+	ret = verify_metadata(rx_xsk, rxq, server_fd);
+	close(server_fd);
+	cleanup();
+	if (ret)
+		error(-1, -ret, "verify_metadata");
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 10/15] net/mlx4_en: Introduce wrapper for xdp_buff
  2022-12-13  2:36 ` [PATCH bpf-next v4 10/15] net/mlx4_en: Introduce wrapper for xdp_buff Stanislav Fomichev
@ 2022-12-13  8:56   ` Tariq Toukan
  0 siblings, 0 replies; 46+ messages in thread
From: Tariq Toukan @ 2022-12-13  8:56 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev



On 12/13/2022 4:36 AM, Stanislav Fomichev wrote:
> No functional changes. Boilerplate to allow stuffing more data after xdp_buff.
> 
> Cc: Tariq Toukan <tariqt@nvidia.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c | 26 +++++++++++++---------
>   1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 8f762fc170b3..014a80af2813 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -661,9 +661,14 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
>   #define MLX4_CQE_STATUS_IP_ANY (MLX4_CQE_STATUS_IPV4)
>   #endif
>   
> +struct mlx4_en_xdp_buff {
> +	struct xdp_buff xdp;
> +};
> +
>   int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
>   {
>   	struct mlx4_en_priv *priv = netdev_priv(dev);
> +	struct mlx4_en_xdp_buff mxbuf = {};
>   	int factor = priv->cqe_factor;
>   	struct mlx4_en_rx_ring *ring;
>   	struct bpf_prog *xdp_prog;
> @@ -671,7 +676,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   	bool doorbell_pending;
>   	bool xdp_redir_flush;
>   	struct mlx4_cqe *cqe;
> -	struct xdp_buff xdp;
>   	int polled = 0;
>   	int index;
>   
> @@ -681,7 +685,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   	ring = priv->rx_ring[cq_ring];
>   
>   	xdp_prog = rcu_dereference_bh(ring->xdp_prog);
> -	xdp_init_buff(&xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
> +	xdp_init_buff(&mxbuf.xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
>   	doorbell_pending = false;
>   	xdp_redir_flush = false;
>   
> @@ -776,24 +780,24 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   						priv->frag_info[0].frag_size,
>   						DMA_FROM_DEVICE);
>   
> -			xdp_prepare_buff(&xdp, va - frags[0].page_offset,
> +			xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
>   					 frags[0].page_offset, length, false);
> -			orig_data = xdp.data;
> +			orig_data = mxbuf.xdp.data;
>   
> -			act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +			act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
>   
> -			length = xdp.data_end - xdp.data;
> -			if (xdp.data != orig_data) {
> -				frags[0].page_offset = xdp.data -
> -					xdp.data_hard_start;
> -				va = xdp.data;
> +			length = mxbuf.xdp.data_end - mxbuf.xdp.data;
> +			if (mxbuf.xdp.data != orig_data) {
> +				frags[0].page_offset = mxbuf.xdp.data -
> +					mxbuf.xdp.data_hard_start;
> +				va = mxbuf.xdp.data;
>   			}
>   
>   			switch (act) {
>   			case XDP_PASS:
>   				break;
>   			case XDP_REDIRECT:
> -				if (likely(!xdp_do_redirect(dev, &xdp, xdp_prog))) {
> +				if (likely(!xdp_do_redirect(dev, &mxbuf.xdp, xdp_prog))) {
>   					ring->xdp_redirect++;
>   					xdp_redir_flush = true;
>   					frags[0].page = NULL;

Thanks for your patches.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 11/15] net/mlx4_en: Support RX XDP metadata
  2022-12-13  2:36 ` [PATCH bpf-next v4 11/15] net/mlx4_en: Support RX XDP metadata Stanislav Fomichev
@ 2022-12-13  8:56   ` Tariq Toukan
  0 siblings, 0 replies; 46+ messages in thread
From: Tariq Toukan @ 2022-12-13  8:56 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, Tariq Toukan, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev



On 12/13/2022 4:36 AM, Stanislav Fomichev wrote:
> RX timestamp and hash for now. Tested using the prog from the next
> patch.
> 
> Also enabling xdp metadata support; don't see why it's disabled,
> there is enough headroom..
> 
> Cc: Tariq Toukan <tariqt@nvidia.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_clock.c | 13 +++++---
>   .../net/ethernet/mellanox/mlx4/en_netdev.c    |  6 ++++
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c    | 33 ++++++++++++++++++-
>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h  |  5 +++
>   4 files changed, 52 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> index 98b5ffb4d729..9e3b76182088 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> @@ -58,9 +58,7 @@ u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe)
>   	return hi | lo;
>   }
>   
> -void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
> -			    struct skb_shared_hwtstamps *hwts,
> -			    u64 timestamp)
> +u64 mlx4_en_get_hwtstamp(struct mlx4_en_dev *mdev, u64 timestamp)
>   {
>   	unsigned int seq;
>   	u64 nsec;
> @@ -70,8 +68,15 @@ void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
>   		nsec = timecounter_cyc2time(&mdev->clock, timestamp);
>   	} while (read_seqretry(&mdev->clock_lock, seq));
>   
> +	return ns_to_ktime(nsec);
> +}
> +
> +void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
> +			    struct skb_shared_hwtstamps *hwts,
> +			    u64 timestamp)
> +{
>   	memset(hwts, 0, sizeof(struct skb_shared_hwtstamps));
> -	hwts->hwtstamp = ns_to_ktime(nsec);
> +	hwts->hwtstamp = mlx4_en_get_hwtstamp(mdev, timestamp);
>   }
>   
>   /**
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 8800d3f1f55c..af4c4858f397 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2889,6 +2889,11 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
>   	.ndo_bpf		= mlx4_xdp,
>   };
>   
> +static const struct xdp_metadata_ops mlx4_xdp_metadata_ops = {
> +	.xmo_rx_timestamp		= mlx4_en_xdp_rx_timestamp,
> +	.xmo_rx_hash			= mlx4_en_xdp_rx_hash,
> +};
> +
>   struct mlx4_en_bond {
>   	struct work_struct work;
>   	struct mlx4_en_priv *priv;
> @@ -3310,6 +3315,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
>   		dev->netdev_ops = &mlx4_netdev_ops_master;
>   	else
>   		dev->netdev_ops = &mlx4_netdev_ops;
> +	dev->xdp_metadata_ops = &mlx4_xdp_metadata_ops;
>   	dev->watchdog_timeo = MLX4_EN_WATCHDOG_TIMEOUT;
>   	netif_set_real_num_tx_queues(dev, priv->tx_ring_num[TX]);
>   	netif_set_real_num_rx_queues(dev, priv->rx_ring_num);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 014a80af2813..0869d4fff17b 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -663,8 +663,35 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
>   
>   struct mlx4_en_xdp_buff {
>   	struct xdp_buff xdp;
> +	struct mlx4_cqe *cqe;
> +	struct mlx4_en_dev *mdev;
> +	struct mlx4_en_rx_ring *ring;
> +	struct net_device *dev;
>   };
>   
> +int mlx4_en_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
> +{
> +	struct mlx4_en_xdp_buff *_ctx = (void *)ctx;
> +
> +	if (unlikely(_ctx->ring->hwtstamp_rx_filter != HWTSTAMP_FILTER_ALL))
> +		return -EOPNOTSUPP;
> +
> +	*timestamp = mlx4_en_get_hwtstamp(_ctx->mdev,
> +					  mlx4_en_get_cqe_ts(_ctx->cqe));
> +	return 0;
> +}
> +
> +int mlx4_en_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
> +{
> +	struct mlx4_en_xdp_buff *_ctx = (void *)ctx;
> +
> +	if (unlikely(!(_ctx->dev->features & NETIF_F_RXHASH)))
> +		return -EOPNOTSUPP;
> +
> +	*hash = be32_to_cpu(_ctx->cqe->immed_rss_invalid);
> +	return 0;
> +}
> +
>   int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
>   {
>   	struct mlx4_en_priv *priv = netdev_priv(dev);
> @@ -781,8 +808,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   						DMA_FROM_DEVICE);
>   
>   			xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset,
> -					 frags[0].page_offset, length, false);
> +					 frags[0].page_offset, length, true);
>   			orig_data = mxbuf.xdp.data;
> +			mxbuf.cqe = cqe;
> +			mxbuf.mdev = priv->mdev;
> +			mxbuf.ring = ring;
> +			mxbuf.dev = dev;
>   
>   			act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp);
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index e132ff4c82f2..2f8ef0b30083 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -788,10 +788,15 @@ void mlx4_en_update_pfc_stats_bitmap(struct mlx4_dev *dev,
>   int mlx4_en_netdev_event(struct notifier_block *this,
>   			 unsigned long event, void *ptr);
>   
> +struct xdp_md;
> +int mlx4_en_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp);
> +int mlx4_en_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash);
> +
>   /*
>    * Functions for time stamping
>    */
>   u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe);
> +u64 mlx4_en_get_hwtstamp(struct mlx4_en_dev *mdev, u64 timestamp);
>   void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
>   			    struct skb_shared_hwtstamps *hwts,
>   			    u64 timestamp);


Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata
  2022-12-13  2:35 ` [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata Stanislav Fomichev
@ 2022-12-13 15:55   ` Jesper Dangaard Brouer
  2022-12-13 20:42     ` Stanislav Fomichev
  0 siblings, 1 reply; 46+ messages in thread
From: Jesper Dangaard Brouer @ 2022-12-13 15:55 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: brouer, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev


On 13/12/2022 03.35, Stanislav Fomichev wrote:
> The goal is to enable end-to-end testing of the metadata for AF_XDP.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>   drivers/net/veth.c | 24 ++++++++++++++++++++++++
>   1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 04ffd8cb2945..d5491e7a2798 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -118,6 +118,7 @@ static struct {
>   
>   struct veth_xdp_buff {
>   	struct xdp_buff xdp;
> +	struct sk_buff *skb;
>   };
>   
>   static int veth_get_link_ksettings(struct net_device *dev,
> @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
>   
>   		xdp_convert_frame_to_buff(frame, xdp);
>   		xdp->rxq = &rq->xdp_rxq;
> +		vxbuf.skb = NULL;
>   
>   		act = bpf_prog_run_xdp(xdp_prog, xdp);
>   
> @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>   	__skb_push(skb, skb->data - skb_mac_header(skb));
>   	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
>   		goto drop;
> +	vxbuf.skb = skb;
>   
>   	orig_data = xdp->data;
>   	orig_data_end = xdp->data_end;
> @@ -1601,6 +1604,21 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>   	}
>   }
>   
> +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
> +{
> +	*timestamp = ktime_get_mono_fast_ns();

This should be reading the hardware timestamp in the SKB.

Details: This hardware timestamp in the SKB is located in
skb_shared_info area, which is also available for xdp_frame (currently
used for multi-buffer purposes).  Thus, when adding xdp-hints "store"
functionality, it would be natural to store the HW TS in the same place.
Making the veth skb/xdp_frame code paths able to share code.

> +	return 0;
> +}
> +
> +static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
> +{
> +	struct veth_xdp_buff *_ctx = (void *)ctx;
> +
> +	if (_ctx->skb)
> +		*hash = skb_get_hash(_ctx->skb);
> +	return 0;
> +}
> +
>   static const struct net_device_ops veth_netdev_ops = {
>   	.ndo_init            = veth_dev_init,
>   	.ndo_open            = veth_open,
> @@ -1622,6 +1640,11 @@ static const struct net_device_ops veth_netdev_ops = {
>   	.ndo_get_peer_dev	= veth_peer_dev,
>   };
>   
> +static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
> +	.xmo_rx_timestamp		= veth_xdp_rx_timestamp,
> +	.xmo_rx_hash			= veth_xdp_rx_hash,
> +};
> +
>   #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \
>   		       NETIF_F_RXCSUM | NETIF_F_SCTP_CRC | NETIF_F_HIGHDMA | \
>   		       NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ENCAP_ALL | \
> @@ -1638,6 +1661,7 @@ static void veth_setup(struct net_device *dev)
>   	dev->priv_flags |= IFF_PHONY_HEADROOM;
>   
>   	dev->netdev_ops = &veth_netdev_ops;
> +	dev->xdp_metadata_ops = &veth_xdp_metadata_ops;
>   	dev->ethtool_ops = &veth_ethtool_ops;
>   	dev->features |= NETIF_F_LLTX;
>   	dev->features |= VETH_FEATURES;


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-13  2:35 ` [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata Stanislav Fomichev
@ 2022-12-13 16:37   ` David Vernet
  2022-12-13 20:42     ` Stanislav Fomichev
  2022-12-14 23:46   ` [xdp-hints] " Toke Høiland-Jørgensen
  2022-12-17  4:20   ` kernel test robot
  2 siblings, 1 reply; 46+ messages in thread
From: David Vernet @ 2022-12-13 16:37 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
> Document all current use-cases and assumptions.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>  1 file changed, 90 insertions(+)
>  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> 
> diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> new file mode 100644
> index 000000000000..498eae718275
> --- /dev/null
> +++ b/Documentation/bpf/xdp-rx-metadata.rst

I think you need to add this to Documentation/bpf/index.rst. Or even
better, maybe it's time to add an xdp/ subdirectory and put all docs
there? Don't want to block your patchset from bikeshedding on this
point, so for now it's fine to just put it in
Documentation/bpf/index.rst until we figure that out.

> @@ -0,0 +1,90 @@
> +===============
> +XDP RX Metadata
> +===============
> +
> +XDP programs support creating and passing custom metadata via
> +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> +entities:

Can you add a couple of sentences to this intro section that explains
what metadata is at a high level?

> +
> +1. ``AF_XDP`` consumer.
> +2. Kernel core stack via ``XDP_PASS``.
> +3. Another device via ``bpf_redirect_map``.
> +4. Other BPF programs via ``bpf_tail_call``.
> +
> +General Design
> +==============
> +
> +XDP has access to a set of kfuncs to manipulate the metadata. Every

"...to manipulate the metadata in an XDP frame." ?

> +device driver implements these kfuncs. The set of kfuncs is

"Every device driver implements these kfuncs" can you be a bit more
specific about which types of device drivers will implement these?

> +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.

Why is it suffixed with _xxx?

> +
> +Currently, the following kfuncs are supported. In the future, as more
> +metadata is supported, this set will grow:
> +
> +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> +  indicate whether the device supports RX timestamps
> +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp

s/returns packet/returns a packet's

> +- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to
> +  indicate whether the device supports RX hash

I don't see bpf_xdp_metadata_rx_timestamp_supported() or
bpf_xdp_metadata_rx_hash_supported() being added in your patch set. Can
you remove these entries until they're actually implemented?

> +- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash

We should probably also add a note that these kfuncs currently just
return -EOPNOTSUPP.

Finally, should we add either some example code showing how to use these
kfuncs, or at the very least some links to their selftests so readers
have example code they can refer to?

> +
> +Within the XDP frame, the metadata layout is as follows::
> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +             ^                 ^
> +             |                 |
> +   xdp_buff->data_meta   xdp_buff->data
> +
> +AF_XDP
> +======
> +
> +``AF_XDP`` use-case implies that there is a contract between the BPF program
> +that redirects XDP frames into the ``XSK`` and the final consumer.

Can you fully spell out what XSK stands for the first time it's used?
Something like "...that redirects XDP frames into the ``AF_XDP`` socket
(``XSK``) and the final consumer." Applies anywhere else you think
appropriate as well.

> +Thus the BPF program manually allocates a fixed number of
> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> +of kfuncs to populate it. User-space ``XSK`` consumer, looks

s/User-space/The user-space

Also, it feels like it might read better without the comma, and by
doing something like s/looks at/computes. Wdyt?

> +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> +
> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +                               ^
> +                               |
> +                        rx_desc->address
> +
> +XDP_PASS
> +========
> +
> +This is the path where the packets processed by the XDP program are passed
> +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.

s/creates ``skb``/creates the ``skb``

> +Currently, every driver has a custom kernel code to parse the descriptors and
> +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
> +In the future, we'd like to support a case where XDP program can override

s/where XDP program/where an XDP program

> +some of that metadata.
> +
> +The plan of record is to make this path similar to ``bpf_redirect_map``
> +so the program can control which metadata is passed to the skb layer.
> +
> +bpf_redirect_map
> +================
> +
> +``bpf_redirect_map`` can redirect the frame to a different device.
> +In this case we don't know ahead of time whether that final consumer
> +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> +Additionally, the final consumer doesn't have access to the original
> +hardware descriptor and can't access any of the original metadata.
> +
> +For this use-case, only custom metadata is currently supported. If
> +the frame is eventually passed to the kernel, the skb created from such
> +a frame won't have any skb metadata. The ``XSK`` consumer will only
> +have access to the custom metadata.
> +
> +bpf_tail_call
> +=============
> +
> +No special handling here. Tail-called program operates on the same context

s/Tail-called program/A tail-called program

> +as the original one.
> -- 
> 2.39.0.rc1.256.g54fd8350bd-goog
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs
  2022-12-13  2:35 ` [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs Stanislav Fomichev
@ 2022-12-13 17:00   ` David Vernet
  2022-12-13 20:42     ` Stanislav Fomichev
  2022-12-14  1:53   ` Martin KaFai Lau
  2022-12-14 10:54   ` [xdp-hints] " Toke Høiland-Jørgensen
  2 siblings, 1 reply; 46+ messages in thread
From: David Vernet @ 2022-12-13 17:00 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Mon, Dec 12, 2022 at 06:35:55PM -0800, Stanislav Fomichev wrote:
> Define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible
> XDP metatada kfuncs. Not all devices have to implement them. If kfunc is not
> supported by the target device, the default implementation is called instead.
> The verifier, at load time, replaces a call to the generic kfunc with a call
> to the per-device one. Per-device kfunc pointers are stored in separate
> struct xdp_metadata_ops.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/bpf.h       |  2 ++
>  include/linux/netdevice.h |  7 +++++++
>  include/net/xdp.h         | 25 ++++++++++++++++++++++
>  kernel/bpf/core.c         |  7 +++++++
>  kernel/bpf/offload.c      | 23 ++++++++++++++++++++
>  kernel/bpf/verifier.c     | 29 +++++++++++++++++++++++++-
>  net/core/xdp.c            | 44 +++++++++++++++++++++++++++++++++++++++
>  7 files changed, 136 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index ca22e8b8bd82..de6279725f41 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2477,6 +2477,8 @@ void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
>  				       struct net_device *netdev);
>  bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev);
>  
> +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id);
> +
>  void unpriv_ebpf_notify(int new_state);
>  
>  #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5aa35c58c342..63786091c60d 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -74,6 +74,7 @@ struct udp_tunnel_nic_info;
>  struct udp_tunnel_nic;
>  struct bpf_prog;
>  struct xdp_buff;
> +struct xdp_md;
>  
>  void synchronize_net(void);
>  void netdev_set_default_ethtool_ops(struct net_device *dev,
> @@ -1613,6 +1614,11 @@ struct net_device_ops {
>  						  bool cycles);
>  };
>  
> +struct xdp_metadata_ops {
> +	int	(*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
> +	int	(*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash);
> +};
> +
>  /**
>   * enum netdev_priv_flags - &struct net_device priv_flags
>   *
> @@ -2044,6 +2050,7 @@ struct net_device {
>  	unsigned int		flags;
>  	unsigned long long	priv_flags;
>  	const struct net_device_ops *netdev_ops;
> +	const struct xdp_metadata_ops *xdp_metadata_ops;
>  	int			ifindex;
>  	unsigned short		gflags;
>  	unsigned short		hard_header_len;
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 55dbc68bfffc..152c3a9c1127 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -409,4 +409,29 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
>  
>  #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE
>  
> +#define XDP_METADATA_KFUNC_xxx	\
> +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP, \
> +			   bpf_xdp_metadata_rx_timestamp) \
> +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_HASH, \
> +			   bpf_xdp_metadata_rx_hash) \
> +
> +enum {
> +#define XDP_METADATA_KFUNC(name, str) name,
> +XDP_METADATA_KFUNC_xxx
> +#undef XDP_METADATA_KFUNC
> +MAX_XDP_METADATA_KFUNC,
> +};
> +
> +struct xdp_md;
> +int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp);
> +int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash);

We don't usually export function signatures like this for kfuncs as
nobody in the main kernel should be linking against it. See [0].

[0]: https://docs.kernel.org/bpf/kfuncs.html#creating-a-wrapper-kfunc

> +
> +#ifdef CONFIG_NET
> +u32 xdp_metadata_kfunc_id(int id);
> +bool xdp_is_metadata_kfunc_id(u32 btf_id);
> +#else
> +static inline u32 xdp_metadata_kfunc_id(int id) { return 0; }
> +static inline bool xdp_is_metadata_kfunc_id(u32 btf_id) { return false; }
> +#endif
> +
>  #endif /* __LINUX_NET_XDP_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index d434a994ee04..c3e501e3e39c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2097,6 +2097,13 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
>  	if (fp->kprobe_override)
>  		return false;
>  
> +	/* When tail-calling from a non-dev-bound program to a dev-bound one,
> +	 * XDP metadata helpers should be disabled. Until it's implemented,
> +	 * prohibit adding dev-bound programs to tail-call maps.
> +	 */
> +	if (bpf_prog_is_dev_bound(fp->aux))
> +		return false;
> +
>  	spin_lock(&map->owner.lock);
>  	if (!map->owner.type) {
>  		/* There's no owner yet where we could check for
> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> index f714c941f8ea..3b6c9023f24d 100644
> --- a/kernel/bpf/offload.c
> +++ b/kernel/bpf/offload.c
> @@ -757,6 +757,29 @@ void bpf_dev_bound_netdev_unregister(struct net_device *dev)
>  	up_write(&bpf_devs_lock);
>  }
>  
> +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
> +{
> +	const struct xdp_metadata_ops *ops;
> +	void *p = NULL;
> +
> +	down_read(&bpf_devs_lock);
> +	if (!prog->aux->offload || !prog->aux->offload->netdev)
> +		goto out;
> +
> +	ops = prog->aux->offload->netdev->xdp_metadata_ops;
> +	if (!ops)
> +		goto out;
> +
> +	if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
> +		p = ops->xmo_rx_timestamp;
> +	else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
> +		p = ops->xmo_rx_hash;
> +out:
> +	up_read(&bpf_devs_lock);
> +
> +	return p;
> +}
> +
>  static int __init bpf_offload_init(void)
>  {
>  	int err;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 203d8cfeda70..e61fe0472b9b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -15479,12 +15479,35 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			    struct bpf_insn *insn_buf, int insn_idx, int *cnt)
>  {
>  	const struct bpf_kfunc_desc *desc;
> +	void *xdp_kfunc;
>  
>  	if (!insn->imm) {
>  		verbose(env, "invalid kernel function call not eliminated in verifier pass\n");
>  		return -EINVAL;
>  	}
>  
> +	*cnt = 0;
> +
> +	if (xdp_is_metadata_kfunc_id(insn->imm)) {
> +		if (!bpf_prog_is_dev_bound(env->prog->aux)) {
> +			verbose(env, "metadata kfuncs require device-bound program\n");
> +			return -EINVAL;
> +		}
> +
> +		if (bpf_prog_is_offloaded(env->prog->aux)) {
> +			verbose(env, "metadata kfuncs can't be offloaded\n");
> +			return -EINVAL;
> +		}
> +
> +		xdp_kfunc = bpf_dev_bound_resolve_kfunc(env->prog, insn->imm);
> +		if (xdp_kfunc) {
> +			insn->imm = BPF_CALL_IMM(xdp_kfunc);
> +			return 0;
> +		}

Per another comment, should these xdp kfuncs use special_kfunc_list, or
some other variant that lives in verifier.c? I'll admit that I'm not
quite following why you wouldn't need to do the find_kfunc_desc() call
below, so apologies if I'm just totally off here.

> +
> +		/* fallback to default kfunc when not supported by netdev */
> +	}
> +
>  	/* insn->imm has the btf func_id. Replace it with
>  	 * an address (relative to __bpf_call_base).
>  	 */
> @@ -15495,7 +15518,6 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  		return -EFAULT;
>  	}
>  
> -	*cnt = 0;
>  	insn->imm = desc->imm;
>  	if (insn->off)
>  		return 0;
> @@ -16502,6 +16524,11 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>  	if (tgt_prog) {
>  		struct bpf_prog_aux *aux = tgt_prog->aux;
>  
> +		if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
> +			bpf_log(log, "Replacing device-bound programs not supported\n");
> +			return -EINVAL;
> +		}
> +
>  		for (i = 0; i < aux->func_info_cnt; i++)
>  			if (aux->func_info[i].type_id == btf_id) {
>  				subprog = i;
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 844c9d99dc0e..b0d4080249d7 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -4,6 +4,7 @@
>   * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.
>   */
>  #include <linux/bpf.h>
> +#include <linux/btf_ids.h>
>  #include <linux/filter.h>
>  #include <linux/types.h>
>  #include <linux/mm.h>
> @@ -709,3 +710,46 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
>  
>  	return nxdpf;
>  }
> +
> +noinline int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +noinline int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash)
> +{
> +	return -EOPNOTSUPP;
> +}

I don't _think_ noinline should be necessary here given that the
function is global, though tbh I'm not sure if leaving it off will break
LTO. We currently don't use any attributes like this on other kfuncs
(e.g. [1]), but maybe we should?

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/helpers.c#n2034

> +
> +BTF_SET8_START(xdp_metadata_kfunc_ids)
> +#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, 0)

IMO 'str' isn't the right parameter name here given that it's the actual
symbol and is not a string. What about _func or _symbol instead? Also
IMO 'name' is a bit misleading -- I'd go with something like '_enum'. I
wish there were a way for the preprocessor to auto-uppercase so you
could just define a single field that was used both for defining the
enum and for defining the symbol name.

> +XDP_METADATA_KFUNC_xxx
> +#undef XDP_METADATA_KFUNC
> +BTF_SET8_END(xdp_metadata_kfunc_ids)
> +
> +static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
> +	.owner = THIS_MODULE,
> +	.set   = &xdp_metadata_kfunc_ids,
> +};
> +
> +BTF_ID_LIST(xdp_metadata_kfunc_ids_unsorted)
> +#define XDP_METADATA_KFUNC(name, str) BTF_ID(func, str)
> +XDP_METADATA_KFUNC_xxx
> +#undef XDP_METADATA_KFUNC
> +
> +u32 xdp_metadata_kfunc_id(int id)
> +{
> +	/* xdp_metadata_kfunc_ids is sorted and can't be used */
> +	return xdp_metadata_kfunc_ids_unsorted[id];
> +}
> +
> +bool xdp_is_metadata_kfunc_id(u32 btf_id)
> +{
> +	return btf_id_set8_contains(&xdp_metadata_kfunc_ids, btf_id);
> +}

The verifier already has a notion of "special kfuncs" via a
special_kfunc_list that exists in verifier.c. Maybe we should be using
that given that is only used in the verifier anyways? OTOH, it's nice
that all of the complexity of e.g. accounting for #ifdef CONFIG_NET is
contained here, so I also like your approach. It just seems like a
divergence from how things are being done for other kfuncs so I figured
it was worth discussing.

> +
> +static int __init xdp_metadata_init(void)
> +{
> +	return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
> +}
> +late_initcall(xdp_metadata_init);
> -- 
> 2.39.0.rc1.256.g54fd8350bd-goog
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs
  2022-12-13 17:00   ` David Vernet
@ 2022-12-13 20:42     ` Stanislav Fomichev
  2022-12-13 21:45       ` David Vernet
  0 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13 20:42 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Tue, Dec 13, 2022 at 9:00 AM David Vernet <void@manifault.com> wrote:
>
> On Mon, Dec 12, 2022 at 06:35:55PM -0800, Stanislav Fomichev wrote:
> > Define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible
> > XDP metatada kfuncs. Not all devices have to implement them. If kfunc is not
> > supported by the target device, the default implementation is called instead.
> > The verifier, at load time, replaces a call to the generic kfunc with a call
> > to the per-device one. Per-device kfunc pointers are stored in separate
> > struct xdp_metadata_ops.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > Cc: xdp-hints@xdp-project.net
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  include/linux/bpf.h       |  2 ++
> >  include/linux/netdevice.h |  7 +++++++
> >  include/net/xdp.h         | 25 ++++++++++++++++++++++
> >  kernel/bpf/core.c         |  7 +++++++
> >  kernel/bpf/offload.c      | 23 ++++++++++++++++++++
> >  kernel/bpf/verifier.c     | 29 +++++++++++++++++++++++++-
> >  net/core/xdp.c            | 44 +++++++++++++++++++++++++++++++++++++++
> >  7 files changed, 136 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index ca22e8b8bd82..de6279725f41 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2477,6 +2477,8 @@ void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
> >                                      struct net_device *netdev);
> >  bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev);
> >
> > +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id);
> > +
> >  void unpriv_ebpf_notify(int new_state);
> >
> >  #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 5aa35c58c342..63786091c60d 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -74,6 +74,7 @@ struct udp_tunnel_nic_info;
> >  struct udp_tunnel_nic;
> >  struct bpf_prog;
> >  struct xdp_buff;
> > +struct xdp_md;
> >
> >  void synchronize_net(void);
> >  void netdev_set_default_ethtool_ops(struct net_device *dev,
> > @@ -1613,6 +1614,11 @@ struct net_device_ops {
> >                                                 bool cycles);
> >  };
> >
> > +struct xdp_metadata_ops {
> > +     int     (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
> > +     int     (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash);
> > +};
> > +
> >  /**
> >   * enum netdev_priv_flags - &struct net_device priv_flags
> >   *
> > @@ -2044,6 +2050,7 @@ struct net_device {
> >       unsigned int            flags;
> >       unsigned long long      priv_flags;
> >       const struct net_device_ops *netdev_ops;
> > +     const struct xdp_metadata_ops *xdp_metadata_ops;
> >       int                     ifindex;
> >       unsigned short          gflags;
> >       unsigned short          hard_header_len;
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 55dbc68bfffc..152c3a9c1127 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -409,4 +409,29 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
> >
> >  #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE
> >
> > +#define XDP_METADATA_KFUNC_xxx       \
> > +     XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP, \
> > +                        bpf_xdp_metadata_rx_timestamp) \
> > +     XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_HASH, \
> > +                        bpf_xdp_metadata_rx_hash) \
> > +
> > +enum {
> > +#define XDP_METADATA_KFUNC(name, str) name,
> > +XDP_METADATA_KFUNC_xxx
> > +#undef XDP_METADATA_KFUNC
> > +MAX_XDP_METADATA_KFUNC,
> > +};
> > +
> > +struct xdp_md;
> > +int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp);
> > +int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash);
>
> We don't usually export function signatures like this for kfuncs as
> nobody in the main kernel should be linking against it. See [0].
>
> [0]: https://docs.kernel.org/bpf/kfuncs.html#creating-a-wrapper-kfunc

Oh, thanks, that's very helpful. As you might have guessed, I've added
those signatures to make the compiler happy :-(

> > +
> > +#ifdef CONFIG_NET
> > +u32 xdp_metadata_kfunc_id(int id);
> > +bool xdp_is_metadata_kfunc_id(u32 btf_id);
> > +#else
> > +static inline u32 xdp_metadata_kfunc_id(int id) { return 0; }
> > +static inline bool xdp_is_metadata_kfunc_id(u32 btf_id) { return false; }
> > +#endif
> > +
> >  #endif /* __LINUX_NET_XDP_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index d434a994ee04..c3e501e3e39c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2097,6 +2097,13 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
> >       if (fp->kprobe_override)
> >               return false;
> >
> > +     /* When tail-calling from a non-dev-bound program to a dev-bound one,
> > +      * XDP metadata helpers should be disabled. Until it's implemented,
> > +      * prohibit adding dev-bound programs to tail-call maps.
> > +      */
> > +     if (bpf_prog_is_dev_bound(fp->aux))
> > +             return false;
> > +
> >       spin_lock(&map->owner.lock);
> >       if (!map->owner.type) {
> >               /* There's no owner yet where we could check for
> > diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> > index f714c941f8ea..3b6c9023f24d 100644
> > --- a/kernel/bpf/offload.c
> > +++ b/kernel/bpf/offload.c
> > @@ -757,6 +757,29 @@ void bpf_dev_bound_netdev_unregister(struct net_device *dev)
> >       up_write(&bpf_devs_lock);
> >  }
> >
> > +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
> > +{
> > +     const struct xdp_metadata_ops *ops;
> > +     void *p = NULL;
> > +
> > +     down_read(&bpf_devs_lock);
> > +     if (!prog->aux->offload || !prog->aux->offload->netdev)
> > +             goto out;
> > +
> > +     ops = prog->aux->offload->netdev->xdp_metadata_ops;
> > +     if (!ops)
> > +             goto out;
> > +
> > +     if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
> > +             p = ops->xmo_rx_timestamp;
> > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
> > +             p = ops->xmo_rx_hash;
> > +out:
> > +     up_read(&bpf_devs_lock);
> > +
> > +     return p;
> > +}
> > +
> >  static int __init bpf_offload_init(void)
> >  {
> >       int err;
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 203d8cfeda70..e61fe0472b9b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -15479,12 +15479,35 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >                           struct bpf_insn *insn_buf, int insn_idx, int *cnt)
> >  {
> >       const struct bpf_kfunc_desc *desc;
> > +     void *xdp_kfunc;
> >
> >       if (!insn->imm) {
> >               verbose(env, "invalid kernel function call not eliminated in verifier pass\n");
> >               return -EINVAL;
> >       }
> >
> > +     *cnt = 0;
> > +
> > +     if (xdp_is_metadata_kfunc_id(insn->imm)) {
> > +             if (!bpf_prog_is_dev_bound(env->prog->aux)) {
> > +                     verbose(env, "metadata kfuncs require device-bound program\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             if (bpf_prog_is_offloaded(env->prog->aux)) {
> > +                     verbose(env, "metadata kfuncs can't be offloaded\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             xdp_kfunc = bpf_dev_bound_resolve_kfunc(env->prog, insn->imm);
> > +             if (xdp_kfunc) {
> > +                     insn->imm = BPF_CALL_IMM(xdp_kfunc);
> > +                     return 0;
> > +             }
>
> Per another comment, should these xdp kfuncs use special_kfunc_list, or
> some other variant that lives in verifier.c? I'll admit that I'm not
> quite following why you wouldn't need to do the find_kfunc_desc() call
> below, so apologies if I'm just totally off here.

Here I'm trying to short-circuit that generic verifier handling and do
kfunc resolving myself, so not sure. Will comment about
special_kfunc_list below.

> > +
> > +             /* fallback to default kfunc when not supported by netdev */
> > +     }
> > +
> >       /* insn->imm has the btf func_id. Replace it with
> >        * an address (relative to __bpf_call_base).
> >        */
> > @@ -15495,7 +15518,6 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >               return -EFAULT;
> >       }
> >
> > -     *cnt = 0;
> >       insn->imm = desc->imm;
> >       if (insn->off)
> >               return 0;
> > @@ -16502,6 +16524,11 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> >       if (tgt_prog) {
> >               struct bpf_prog_aux *aux = tgt_prog->aux;
> >
> > +             if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
> > +                     bpf_log(log, "Replacing device-bound programs not supported\n");
> > +                     return -EINVAL;
> > +             }
> > +
> >               for (i = 0; i < aux->func_info_cnt; i++)
> >                       if (aux->func_info[i].type_id == btf_id) {
> >                               subprog = i;
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index 844c9d99dc0e..b0d4080249d7 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -4,6 +4,7 @@
> >   * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.
> >   */
> >  #include <linux/bpf.h>
> > +#include <linux/btf_ids.h>
> >  #include <linux/filter.h>
> >  #include <linux/types.h>
> >  #include <linux/mm.h>
> > @@ -709,3 +710,46 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
> >
> >       return nxdpf;
> >  }
> > +
> > +noinline int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
> > +{
> > +     return -EOPNOTSUPP;
> > +}
> > +
> > +noinline int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash)
> > +{
> > +     return -EOPNOTSUPP;
> > +}
>
> I don't _think_ noinline should be necessary here given that the
> function is global, though tbh I'm not sure if leaving it off will break
> LTO. We currently don't use any attributes like this on other kfuncs
> (e.g. [1]), but maybe we should?
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/helpers.c#n2034

Hm, I guess since I'm not really directly calling these anywhere,
there is no chance they are going to be inlined? Will try to drop and
see what happens..

> > +
> > +BTF_SET8_START(xdp_metadata_kfunc_ids)
> > +#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, 0)
>
> IMO 'str' isn't the right parameter name here given that it's the actual
> symbol and is not a string. What about _func or _symbol instead? Also
> IMO 'name' is a bit misleading -- I'd go with something like '_enum'. I
> wish there were a way for the preprocessor to auto-uppercase so you
> could just define a single field that was used both for defining the
> enum and for defining the symbol name.

How about I do the following:

enum {
#define XDP_METADATA_KFUNC(name, _) name,
XDP_METADATA_KFUNC_xxx
#undef XDP_METADATA_KFUNC
MAX_XDP_METADATA_KFUNC,
};

And then this in the .c file:

BTF_SET8_START(xdp_metadata_kfunc_ids)
#define XDP_METADATA_KFUNC(_, name) BTF_ID_FLAGS(func, name, 0)
XDP_METADATA_KFUNC_xxx
#undef XDP_METADATA_KFUNC
BTF_SET8_END(xdp_metadata_kfunc_ids)

Should be a bit more clear what and where I use? Otherwise, using
_func might seem a bit confusing in:
#define XDP_METADATA_KFUNC(_enum, _func) BTF_ID_FLAGS(func, _func, 0)

The "func, _func" part. Or maybe that's fine.. WDYT?

> > +XDP_METADATA_KFUNC_xxx
> > +#undef XDP_METADATA_KFUNC
> > +BTF_SET8_END(xdp_metadata_kfunc_ids)
> > +
> > +static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
> > +     .owner = THIS_MODULE,
> > +     .set   = &xdp_metadata_kfunc_ids,
> > +};
> > +
> > +BTF_ID_LIST(xdp_metadata_kfunc_ids_unsorted)
> > +#define XDP_METADATA_KFUNC(name, str) BTF_ID(func, str)
> > +XDP_METADATA_KFUNC_xxx
> > +#undef XDP_METADATA_KFUNC
> > +
> > +u32 xdp_metadata_kfunc_id(int id)
> > +{
> > +     /* xdp_metadata_kfunc_ids is sorted and can't be used */
> > +     return xdp_metadata_kfunc_ids_unsorted[id];
> > +}
> > +
> > +bool xdp_is_metadata_kfunc_id(u32 btf_id)
> > +{
> > +     return btf_id_set8_contains(&xdp_metadata_kfunc_ids, btf_id);
> > +}
>
> The verifier already has a notion of "special kfuncs" via a
> special_kfunc_list that exists in verifier.c. Maybe we should be using
> that given that is only used in the verifier anyways? OTOH, it's nice
> that all of the complexity of e.g. accounting for #ifdef CONFIG_NET is
> contained here, so I also like your approach. It just seems like a
> divergence from how things are being done for other kfuncs so I figured
> it was worth discussing.

Yeah, idk, I've tried not to add more to the already huge verifier.c file :-(
If we were to put everything into verifier.c, I'd still need some
extra special_xdp_kfunc_list for those xdp kfuncs to be able to
distinguish them from the rest...
So yeah, not sure, I'd prefer to keep everything in xdp.c and not
pollute the more generic verifier.c, but I'm fine either way. LMK if
you feel strongly about it, can move.


> > +
> > +static int __init xdp_metadata_init(void)
> > +{
> > +     return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set);
> > +}
> > +late_initcall(xdp_metadata_init);
> > --
> > 2.39.0.rc1.256.g54fd8350bd-goog
> >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-13 16:37   ` David Vernet
@ 2022-12-13 20:42     ` Stanislav Fomichev
  2022-12-14 10:34       ` [xdp-hints] " Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13 20:42 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
>
> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
> > Document all current use-cases and assumptions.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > Cc: xdp-hints@xdp-project.net
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
> >  1 file changed, 90 insertions(+)
> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> >
> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> > new file mode 100644
> > index 000000000000..498eae718275
> > --- /dev/null
> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
>
> I think you need to add this to Documentation/bpf/index.rst. Or even
> better, maybe it's time to add an xdp/ subdirectory and put all docs
> there? Don't want to block your patchset from bikeshedding on this
> point, so for now it's fine to just put it in
> Documentation/bpf/index.rst until we figure that out.

Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
and reference form Documentation/networking/index.rst? Since it's more
relevant to networking than the core bpf?

> > @@ -0,0 +1,90 @@
> > +===============
> > +XDP RX Metadata
> > +===============
> > +
> > +XDP programs support creating and passing custom metadata via
> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> > +entities:
>
> Can you add a couple of sentences to this intro section that explains
> what metadata is at a high level?

I'm gonna copy-paste here what I'm adding, feel free to reply back if
still unclear. (so we don't have to wait another week to discuss the
changes)

XDP programs support creating and passing custom metadata via
``bpf_xdp_adjust_meta``. The metadata can contain some extra information
about the packet: timestamps, hash, vlan and tunneling information, etc.
This metadata can be consumed by the following entities:

> > +
> > +1. ``AF_XDP`` consumer.
> > +2. Kernel core stack via ``XDP_PASS``.
> > +3. Another device via ``bpf_redirect_map``.
> > +4. Other BPF programs via ``bpf_tail_call``.
> > +
> > +General Design
> > +==============
> > +
> > +XDP has access to a set of kfuncs to manipulate the metadata. Every
>
> "...to manipulate the metadata in an XDP frame." ?

SG!

> > +device driver implements these kfuncs. The set of kfuncs is
>
> "Every device driver implements these kfuncs" can you be a bit more
> specific about which types of device drivers will implement these?

How about the following?
Every device driver that wishes to expose additional packet metadata
can implement these kfuncs.

> > +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
>
> Why is it suffixed with _xxx?

I'm following the precedent of BTF_SOCK_TYPE_xxx and
BTF_TRACING_TYPE_xxx. LMK if you prefer a better name here.

> > +
> > +Currently, the following kfuncs are supported. In the future, as more
> > +metadata is supported, this set will grow:
> > +
> > +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> > +  indicate whether the device supports RX timestamps
> > +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp
>
> s/returns packet/returns a packet's

ty!

> > +- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to
> > +  indicate whether the device supports RX hash
>
> I don't see bpf_xdp_metadata_rx_timestamp_supported() or
> bpf_xdp_metadata_rx_hash_supported() being added in your patch set. Can
> you remove these entries until they're actually implemented?

Ooh, good catch, I've removed them for this round. Will remove from
the doc as well.

> > +- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash
>
> We should probably also add a note that these kfuncs currently just
> return -EOPNOTSUPP.

The default ones return EOPNOTSUPP. Maybe I can clarify with the following?

Not all kfuncs have to be implemented by the device driver; when not
implemented, the default ones that return ``-EOPNOTSUPP`` will be
used.

> Finally, should we add either some example code showing how to use these
> kfuncs, or at the very least some links to their selftests so readers
> have example code they can refer to?

Good idea, will reference
tools/testing/selftests/bpf/progs/xdp_metadata.c and
tools/testing/selftests/bpf/prog_tests/xdp_metadata.c.

Example
=======
See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and
``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of
BPF program that handles XDP metadata.

> > +
> > +Within the XDP frame, the metadata layout is as follows::
> > +
> > +  +----------+-----------------+------+
> > +  | headroom | custom metadata | data |
> > +  +----------+-----------------+------+
> > +             ^                 ^
> > +             |                 |
> > +   xdp_buff->data_meta   xdp_buff->data
> > +
> > +AF_XDP
> > +======
> > +
> > +``AF_XDP`` use-case implies that there is a contract between the BPF program
> > +that redirects XDP frames into the ``XSK`` and the final consumer.
>
> Can you fully spell out what XSK stands for the first time it's used?
> Something like "...that redirects XDP frames into the ``AF_XDP`` socket
> (``XSK``) and the final consumer." Applies anywhere else you think
> appropriate as well.

SG!

> > +Thus the BPF program manually allocates a fixed number of
> > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> > +of kfuncs to populate it. User-space ``XSK`` consumer, looks
>
> s/User-space/The user-space
>
> Also, it feels like it might read better without the comma, and by
> doing something like s/looks at/computes. Wdyt?

Ageed.

> > +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> > +
> > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> > +
> > +  +----------+-----------------+------+
> > +  | headroom | custom metadata | data |
> > +  +----------+-----------------+------+
> > +                               ^
> > +                               |
> > +                        rx_desc->address
> > +
> > +XDP_PASS
> > +========
> > +
> > +This is the path where the packets processed by the XDP program are passed
> > +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
>
> s/creates ``skb``/creates the ``skb``

Ack.

> > +Currently, every driver has a custom kernel code to parse the descriptors and
> > +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
> > +In the future, we'd like to support a case where XDP program can override
>
> s/where XDP program/where an XDP program

Same here, will fix, thanks!

> > +some of that metadata.
> > +
> > +The plan of record is to make this path similar to ``bpf_redirect_map``
> > +so the program can control which metadata is passed to the skb layer.
> > +
> > +bpf_redirect_map
> > +================
> > +
> > +``bpf_redirect_map`` can redirect the frame to a different device.
> > +In this case we don't know ahead of time whether that final consumer
> > +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> > +Additionally, the final consumer doesn't have access to the original
> > +hardware descriptor and can't access any of the original metadata.
> > +
> > +For this use-case, only custom metadata is currently supported. If
> > +the frame is eventually passed to the kernel, the skb created from such
> > +a frame won't have any skb metadata. The ``XSK`` consumer will only
> > +have access to the custom metadata.
> > +
> > +bpf_tail_call
> > +=============
> > +
> > +No special handling here. Tail-called program operates on the same context
>
> s/Tail-called program/A tail-called program

Ty!


> > +as the original one.
> > --
> > 2.39.0.rc1.256.g54fd8350bd-goog
> >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata
  2022-12-13 15:55   ` Jesper Dangaard Brouer
@ 2022-12-13 20:42     ` Stanislav Fomichev
  2022-12-14  9:48       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-13 20:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, brouer, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Tue, Dec 13, 2022 at 7:55 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 13/12/2022 03.35, Stanislav Fomichev wrote:
> > The goal is to enable end-to-end testing of the metadata for AF_XDP.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > Cc: xdp-hints@xdp-project.net
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >   drivers/net/veth.c | 24 ++++++++++++++++++++++++
> >   1 file changed, 24 insertions(+)
> >
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index 04ffd8cb2945..d5491e7a2798 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -118,6 +118,7 @@ static struct {
> >
> >   struct veth_xdp_buff {
> >       struct xdp_buff xdp;
> > +     struct sk_buff *skb;
> >   };
> >
> >   static int veth_get_link_ksettings(struct net_device *dev,
> > @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
> >
> >               xdp_convert_frame_to_buff(frame, xdp);
> >               xdp->rxq = &rq->xdp_rxq;
> > +             vxbuf.skb = NULL;
> >
> >               act = bpf_prog_run_xdp(xdp_prog, xdp);
> >
> > @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >       __skb_push(skb, skb->data - skb_mac_header(skb));
> >       if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
> >               goto drop;
> > +     vxbuf.skb = skb;
> >
> >       orig_data = xdp->data;
> >       orig_data_end = xdp->data_end;
> > @@ -1601,6 +1604,21 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> >       }
> >   }
> >
> > +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
> > +{
> > +     *timestamp = ktime_get_mono_fast_ns();
>
> This should be reading the hardware timestamp in the SKB.
>
> Details: This hardware timestamp in the SKB is located in
> skb_shared_info area, which is also available for xdp_frame (currently
> used for multi-buffer purposes).  Thus, when adding xdp-hints "store"
> functionality, it would be natural to store the HW TS in the same place.
> Making the veth skb/xdp_frame code paths able to share code.

Does something like the following look acceptable as well?

*timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;
if (!*timestamp)
        *timestamp = ktime_get_mono_fast_ns(); /* sw fallback */

Because I'd like to be able to test this path in the selftests. As
long as I get some number from veth_xdp_rx_timestamp, I can test it.
No amount of SOF_TIMESTAMPING_{SOFTWARE,RX_SOFTWARE,RAW_HARDWARE}
triggers non-zero hwtstamp for xsk receive path. Any suggestions?


> > +     return 0;
> > +}
> > +
> > +static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
> > +{
> > +     struct veth_xdp_buff *_ctx = (void *)ctx;
> > +
> > +     if (_ctx->skb)
> > +             *hash = skb_get_hash(_ctx->skb);
> > +     return 0;
> > +}
> > +
> >   static const struct net_device_ops veth_netdev_ops = {
> >       .ndo_init            = veth_dev_init,
> >       .ndo_open            = veth_open,
> > @@ -1622,6 +1640,11 @@ static const struct net_device_ops veth_netdev_ops = {
> >       .ndo_get_peer_dev       = veth_peer_dev,
> >   };
> >
> > +static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
> > +     .xmo_rx_timestamp               = veth_xdp_rx_timestamp,
> > +     .xmo_rx_hash                    = veth_xdp_rx_hash,
> > +};
> > +
> >   #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \
> >                      NETIF_F_RXCSUM | NETIF_F_SCTP_CRC | NETIF_F_HIGHDMA | \
> >                      NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ENCAP_ALL | \
> > @@ -1638,6 +1661,7 @@ static void veth_setup(struct net_device *dev)
> >       dev->priv_flags |= IFF_PHONY_HEADROOM;
> >
> >       dev->netdev_ops = &veth_netdev_ops;
> > +     dev->xdp_metadata_ops = &veth_xdp_metadata_ops;
> >       dev->ethtool_ops = &veth_ethtool_ops;
> >       dev->features |= NETIF_F_LLTX;
> >       dev->features |= VETH_FEATURES;
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs
  2022-12-13 20:42     ` Stanislav Fomichev
@ 2022-12-13 21:45       ` David Vernet
  0 siblings, 0 replies; 46+ messages in thread
From: David Vernet @ 2022-12-13 21:45 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Tue, Dec 13, 2022 at 12:42:30PM -0800, Stanislav Fomichev wrote:

[...]

> > We don't usually export function signatures like this for kfuncs as
> > nobody in the main kernel should be linking against it. See [0].
> >
> > [0]: https://docs.kernel.org/bpf/kfuncs.html#creating-a-wrapper-kfunc
> 
> Oh, thanks, that's very helpful. As you might have guessed, I've added
> those signatures to make the compiler happy :-(

No problem, and yeah, it's a pain :-( It would be really nice if we
could do something like this:

#define __kfunc __attribute__((nowarn("Wmissing-protoypes")))

But that attribute doesn't exist.

> > > +     if (xdp_is_metadata_kfunc_id(insn->imm)) {
> > > +             if (!bpf_prog_is_dev_bound(env->prog->aux)) {
> > > +                     verbose(env, "metadata kfuncs require device-bound program\n");
> > > +                     return -EINVAL;
> > > +             }
> > > +
> > > +             if (bpf_prog_is_offloaded(env->prog->aux)) {
> > > +                     verbose(env, "metadata kfuncs can't be offloaded\n");
> > > +                     return -EINVAL;
> > > +             }
> > > +
> > > +             xdp_kfunc = bpf_dev_bound_resolve_kfunc(env->prog, insn->imm);
> > > +             if (xdp_kfunc) {
> > > +                     insn->imm = BPF_CALL_IMM(xdp_kfunc);
> > > +                     return 0;
> > > +             }
> >
> > Per another comment, should these xdp kfuncs use special_kfunc_list, or
> > some other variant that lives in verifier.c? I'll admit that I'm not
> > quite following why you wouldn't need to do the find_kfunc_desc() call
> > below, so apologies if I'm just totally off here.
> 
> Here I'm trying to short-circuit that generic verifier handling and do
> kfunc resolving myself, so not sure. Will comment about
> special_kfunc_list below.

Understood -- if it's totally separate then do what you need to do. My
only "objection" is that it's a bit sad when we have divergent /
special-case handling in the verifier between all these different kfunc
/ helpers / etc, but I think it's inevitable until we do a larger
refactoring.  It's contained to fixup_kfunc_call() at least, so IMO it's
fine.

[...]

> 
> > > +
> > > +             /* fallback to default kfunc when not supported by netdev */
> > > +     }
> > > +
> > >       /* insn->imm has the btf func_id. Replace it with
> > >        * an address (relative to __bpf_call_base).
> > >        */
> > > @@ -15495,7 +15518,6 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> > >               return -EFAULT;
> > >       }
> > >
> > > -     *cnt = 0;
> > >       insn->imm = desc->imm;
> > >       if (insn->off)
> > >               return 0;
> > > @@ -16502,6 +16524,11 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> > >       if (tgt_prog) {
> > >               struct bpf_prog_aux *aux = tgt_prog->aux;
> > >
> > > +             if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
> > > +                     bpf_log(log, "Replacing device-bound programs not supported\n");
> > > +                     return -EINVAL;
> > > +             }
> > > +
> > >               for (i = 0; i < aux->func_info_cnt; i++)
> > >                       if (aux->func_info[i].type_id == btf_id) {
> > >                               subprog = i;
> > > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > index 844c9d99dc0e..b0d4080249d7 100644
> > > --- a/net/core/xdp.c
> > > +++ b/net/core/xdp.c
> > > @@ -4,6 +4,7 @@
> > >   * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.
> > >   */
> > >  #include <linux/bpf.h>
> > > +#include <linux/btf_ids.h>
> > >  #include <linux/filter.h>
> > >  #include <linux/types.h>
> > >  #include <linux/mm.h>
> > > @@ -709,3 +710,46 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
> > >
> > >       return nxdpf;
> > >  }
> > > +
> > > +noinline int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
> > > +{
> > > +     return -EOPNOTSUPP;
> > > +}
> > > +
> > > +noinline int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash)
> > > +{
> > > +     return -EOPNOTSUPP;
> > > +}
> >
> > I don't _think_ noinline should be necessary here given that the
> > function is global, though tbh I'm not sure if leaving it off will break
> > LTO. We currently don't use any attributes like this on other kfuncs
> > (e.g. [1]), but maybe we should?
> >
> > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/helpers.c#n2034
> 
> Hm, I guess since I'm not really directly calling these anywhere,
> there is no chance they are going to be inlined? Will try to drop and
> see what happens..

Yeah, if it's a global symbol I think you should be OK. Again, we need
to figure out the story for LTO though. Later on I think we should add a
__kfunc macro which handles this invisibly for all kfunc definitions.

> > > +
> > > +BTF_SET8_START(xdp_metadata_kfunc_ids)
> > > +#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, 0)
> >
> > IMO 'str' isn't the right parameter name here given that it's the actual
> > symbol and is not a string. What about _func or _symbol instead? Also
> > IMO 'name' is a bit misleading -- I'd go with something like '_enum'. I
> > wish there were a way for the preprocessor to auto-uppercase so you
> > could just define a single field that was used both for defining the
> > enum and for defining the symbol name.
> 
> How about I do the following:
> 
> enum {
> #define XDP_METADATA_KFUNC(name, _) name,
> XDP_METADATA_KFUNC_xxx
> #undef XDP_METADATA_KFUNC
> MAX_XDP_METADATA_KFUNC,
> };

Looks good!

> 
> And then this in the .c file:
> 
> BTF_SET8_START(xdp_metadata_kfunc_ids)
> #define XDP_METADATA_KFUNC(_, name) BTF_ID_FLAGS(func, name, 0)
> XDP_METADATA_KFUNC_xxx
> #undef XDP_METADATA_KFUNC
> BTF_SET8_END(xdp_metadata_kfunc_ids)

Here as well.

> 
> Should be a bit more clear what and where I use? Otherwise, using
> _func might seem a bit confusing in:
> #define XDP_METADATA_KFUNC(_enum, _func) BTF_ID_FLAGS(func, _func, 0)
> 
> The "func, _func" part. Or maybe that's fine.. WDYT?

LGTM, thanks!

> > > +XDP_METADATA_KFUNC_xxx
> > > +#undef XDP_METADATA_KFUNC
> > > +BTF_SET8_END(xdp_metadata_kfunc_ids)
> > > +
> > > +static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
> > > +     .owner = THIS_MODULE,
> > > +     .set   = &xdp_metadata_kfunc_ids,
> > > +};
> > > +
> > > +BTF_ID_LIST(xdp_metadata_kfunc_ids_unsorted)
> > > +#define XDP_METADATA_KFUNC(name, str) BTF_ID(func, str)
> > > +XDP_METADATA_KFUNC_xxx
> > > +#undef XDP_METADATA_KFUNC
> > > +
> > > +u32 xdp_metadata_kfunc_id(int id)
> > > +{
> > > +     /* xdp_metadata_kfunc_ids is sorted and can't be used */
> > > +     return xdp_metadata_kfunc_ids_unsorted[id];
> > > +}
> > > +
> > > +bool xdp_is_metadata_kfunc_id(u32 btf_id)
> > > +{
> > > +     return btf_id_set8_contains(&xdp_metadata_kfunc_ids, btf_id);
> > > +}
> >
> > The verifier already has a notion of "special kfuncs" via a
> > special_kfunc_list that exists in verifier.c. Maybe we should be using
> > that given that is only used in the verifier anyways? OTOH, it's nice
> > that all of the complexity of e.g. accounting for #ifdef CONFIG_NET is
> > contained here, so I also like your approach. It just seems like a
> > divergence from how things are being done for other kfuncs so I figured
> > it was worth discussing.
> 
> Yeah, idk, I've tried not to add more to the already huge verifier.c file :-(
> If we were to put everything into verifier.c, I'd still need some
> extra special_xdp_kfunc_list for those xdp kfuncs to be able to
> distinguish them from the rest...
> So yeah, not sure, I'd prefer to keep everything in xdp.c and not
> pollute the more generic verifier.c, but I'm fine either way. LMK if
> you feel strongly about it, can move.

IMO not polluting the already enormous verifier.c is definitely the
right thing to do -- especially for kfuncs like this which are going to
be defined throughout the kernel. So yeah, you can keep what you have.
And maybe at some point we should pull more logic out of verifier.c and
into the locations where the kfuncs are implemented as you're doing
here.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 03/15] bpf: Introduce device-bound XDP programs
  2022-12-13  2:35 ` [PATCH bpf-next v4 03/15] bpf: Introduce device-bound XDP programs Stanislav Fomichev
@ 2022-12-13 23:25   ` Martin KaFai Lau
  2022-12-14 18:42     ` Stanislav Fomichev
  0 siblings, 1 reply; 46+ messages in thread
From: Martin KaFai Lau @ 2022-12-13 23:25 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: ast, daniel, andrii, song, yhs, john.fastabend, kpsingh, haoluo,
	jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, bpf

On 12/12/22 6:35 PM, Stanislav Fomichev wrote:
> New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way
> to associate a netdev with a BPF program at load time.
> 
> Some existing 'offloaded' routines are renamed to 'dev_bound' for
> consistency with the rest.
> 
> Also moved a bunch of code around to avoid forward declarations.

There are too many things in one patch.  It becomes quite hard to follow, eg. I 
have to go back-and-forth a few times within this patch to confirm what change 
is just a move.  Please put the "moved a bunch of code around to avoid forward 
declarations" in one individual patch and also the 
"late_initcall(bpf_offload_init)" change in another individual patch.

[ ... ]

> -int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> +static int __bpf_offload_dev_netdev_register(struct bpf_offload_dev *offdev,
> +					     struct net_device *netdev)
> +{
> +	struct bpf_offload_netdev *ondev;
> +	int err;
> +
> +	ondev = kzalloc(sizeof(*ondev), GFP_KERNEL);
> +	if (!ondev)
> +		return -ENOMEM;
> +
> +	ondev->netdev = netdev;
> +	ondev->offdev = offdev;
> +	INIT_LIST_HEAD(&ondev->progs);
> +	INIT_LIST_HEAD(&ondev->maps);
> +
> +	err = rhashtable_insert_fast(&offdevs, &ondev->l, offdevs_params);
> +	if (err) {
> +		netdev_warn(netdev, "failed to register for BPF offload\n");
> +		goto err_unlock_free;
> +	}
> +
> +	if (offdev)
> +		list_add(&ondev->offdev_netdevs, &offdev->netdevs);
> +	return 0;
> +
> +err_unlock_free:
> +	up_write(&bpf_devs_lock);

No need to handle bpf_devs_lock in the "__" version of the register() helper? 
The goto label probably also needs another name, eg. "err_free".

> +	kfree(ondev);
> +	return err;
> +}
> +

[ ... ]

> +int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr)
>   {
>   	struct bpf_offload_netdev *ondev;
>   	struct bpf_prog_offload *offload;
> @@ -87,7 +198,7 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
>   	    attr->prog_type != BPF_PROG_TYPE_XDP)
>   		return -EINVAL;
>   
> -	if (attr->prog_flags)
> +	if (attr->prog_flags & ~BPF_F_XDP_DEV_BOUND_ONLY)
>   		return -EINVAL;
>   
>   	offload = kzalloc(sizeof(*offload), GFP_USER);
> @@ -102,11 +213,25 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
>   	if (err)
>   		goto err_maybe_put;
>   
> +	prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_DEV_BOUND_ONLY);
> +
>   	down_write(&bpf_devs_lock);
>   	ondev = bpf_offload_find_netdev(offload->netdev);
>   	if (!ondev) {
> -		err = -EINVAL;
> -		goto err_unlock;
> +		if (!bpf_prog_is_offloaded(prog->aux)) {
> +			/* When only binding to the device, explicitly
> +			 * create an entry in the hashtable. See related
> +			 * bpf_dev_bound_try_remove_netdev.
> +			 */
> +			err = __bpf_offload_dev_netdev_register(NULL, offload->netdev);
> +			if (err)
> +				goto err_unlock;
> +			ondev = bpf_offload_find_netdev(offload->netdev);
> +		}
> +		if (!ondev) {

nit.  A bit confusing because the "ondev = bpf_offload_find_netdev(...)" above 
should not fail but "!ondev" is tested again here.  I think the intention is to 
fail on the 'bpf_prog_is_offloaded() == true' case. May be:

		if (bpf_prog_is_offloaded(prog->aux)) {
			err = -EINVAL;
			goto err_unlock;
		}
		/* When only binding to the device, explicitly
		 * ...
		 */
		err = __bpf_offload_dev_netdev_register(NULL, offload->netdev);
		if (err)
			goto err_unlock;
		ondev = bpf_offload_find_netdev(offload->netdev);

> +			err = -EINVAL;
> +			goto err_unlock;
> +		}
>   	}
>   	offload->offdev = ondev->offdev;
>   	prog->aux->offload = offload;
> @@ -209,27 +334,28 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
>   	up_read(&bpf_devs_lock);
>   }
>   
> -static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
> +static void bpf_dev_bound_try_remove_netdev(struct net_device *dev)
>   {
> -	struct bpf_prog_offload *offload = prog->aux->offload;
> -
> -	if (offload->dev_state)
> -		offload->offdev->ops->destroy(prog);
> +	struct bpf_offload_netdev *ondev;
>   
> -	/* Make sure BPF_PROG_GET_NEXT_ID can't find this dead program */
> -	bpf_prog_free_id(prog, true);
> +	if (!dev)
> +		return;
>   
> -	list_del_init(&offload->offloads);
> -	kfree(offload);
> -	prog->aux->offload = NULL;
> +	ondev = bpf_offload_find_netdev(dev);
> +	if (ondev && !ondev->offdev && list_empty(&ondev->progs))

hmm....list_empty(&ondev->progs) is tested here but will it be empty? ...

> +		__bpf_offload_dev_netdev_unregister(NULL, dev);
>   }
>   
> -void bpf_prog_offload_destroy(struct bpf_prog *prog)
> +void bpf_prog_dev_bound_destroy(struct bpf_prog *prog)
>   {
> +	rtnl_lock();
>   	down_write(&bpf_devs_lock);
> -	if (prog->aux->offload)
> -		__bpf_prog_offload_destroy(prog);
> +	if (prog->aux->offload) {
> +		bpf_dev_bound_try_remove_netdev(prog->aux->offload->netdev);

... the "prog" here is still linked to ondev->progs, right?
because __bpf_prog_dev_bound_destroy() is called later below.

nit. May be the bpf_dev_bound_try_remove_netdev() should be folded/merged back 
into bpf_prog_dev_bound_destroy() to make things more clear.

> +		__bpf_prog_dev_bound_destroy(prog); > +	}
>   	up_write(&bpf_devs_lock);
> +	rtnl_unlock();
>   }

[ ... ]

> +static int __init bpf_offload_init(void)
> +{
> +	int err;
> +
> +	down_write(&bpf_devs_lock);

lock is probably not needed.

> +	err = rhashtable_init(&offdevs, &offdevs_params);
> +	up_write(&bpf_devs_lock);
> +
> +	return err;
> +}
> +
> +late_initcall(bpf_offload_init);

[ ... ]

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5d51999cba30..194f8116aad4 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9228,6 +9228,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
>   			NL_SET_ERR_MSG(extack, "Using offloaded program without HW_MODE flag is not supported");
>   			return -EINVAL;
>   		}
> +		if (bpf_prog_is_dev_bound(new_prog->aux) && !bpf_offload_dev_match(new_prog, dev)) {
> +			NL_SET_ERR_MSG(extack, "Program bound to different device");
> +			return -EINVAL;
> +		}
>   		if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
>   			NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
>   			return -EINVAL;
> @@ -10813,6 +10817,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
>   		/* Shutdown queueing discipline. */
>   		dev_shutdown(dev);
>   
> +		bpf_dev_bound_netdev_unregister(dev);

Does it matter if bpf_dev_bound_netdev_unregister(dev) is called before 
dev_xdp_uninstall(dev)?  Asking because it seems more logic to unregister dev 
after detaching xdp progs.

>   		dev_xdp_uninstall(dev);
>   
>   		netdev_offload_xstats_disable_all(dev);



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 06/15] bpf: Support consuming XDP HW metadata from fext programs
  2022-12-13  2:35 ` [PATCH bpf-next v4 06/15] bpf: Support consuming XDP HW metadata from fext programs Stanislav Fomichev
@ 2022-12-14  1:45   ` Martin KaFai Lau
  2022-12-14 10:41     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Martin KaFai Lau @ 2022-12-14  1:45 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: ast, daniel, andrii, song, yhs, john.fastabend, kpsingh, haoluo,
	jolsa, Toke Høiland-Jørgensen, bpf

On 12/12/22 6:35 PM, Stanislav Fomichev wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP
> programs that consume HW metadata, implement support for propagating the
> offload information. The extension program doesn't need to set a flag or
> ifindex, it these will just be propagated from the target by the verifier.

s/it/because/ ... these will just be propagated....

> We need to create a separate offload object for the extension program,
> though, since it can be reattached to a different program later (which
> means we can't just inhering the offload information from the target).

hmm.... inheriting?

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 11c558be4992..8686475f0dbe 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3021,6 +3021,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>   			goto out_put_prog;
>   		}
>   
> +		if (bpf_prog_is_dev_bound(prog->aux) &&


> +		    (bpf_prog_is_offloaded(tgt_prog->aux) ||
> +		     !bpf_prog_is_dev_bound(tgt_prog->aux) ||
> +		     !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) {

hmm... tgt_prog->aux->offload does not look safe without taking bpf_devs_lock. 
offload could be NULL, no?

It probably needs a bpf_prog_dev_bound_match(prog, tgt_prog) which takes the lock.

> +			err = -EINVAL;
> +			goto out_put_prog;
> +		}
> +
>   		key = bpf_trampoline_compute_key(tgt_prog, NULL, btf_id);
>   	}
>   
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index e61fe0472b9b..5c6d6d61e57a 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -16524,11 +16524,6 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>   	if (tgt_prog) {
>   		struct bpf_prog_aux *aux = tgt_prog->aux;
>   
> -		if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
> -			bpf_log(log, "Replacing device-bound programs not supported\n");
> -			return -EINVAL;
> -		}
> -
>   		for (i = 0; i < aux->func_info_cnt; i++)
>   			if (aux->func_info[i].type_id == btf_id) {
>   				subprog = i;
> @@ -16789,10 +16784,22 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
>   	if (tgt_prog && prog->type == BPF_PROG_TYPE_EXT) {
>   		/* to make freplace equivalent to their targets, they need to
>   		 * inherit env->ops and expected_attach_type for the rest of the
> -		 * verification
> +		 * verification; we also need to propagate the prog offload data
> +		 * for resolving kfuncs.
>   		 */
>   		env->ops = bpf_verifier_ops[tgt_prog->type];
>   		prog->expected_attach_type = tgt_prog->expected_attach_type;
> +
> +		if (bpf_prog_is_dev_bound(tgt_prog->aux)) {
> +			if (bpf_prog_is_offloaded(tgt_prog->aux))
> +				return -EINVAL;
> +
> +			prog->aux->dev_bound = true;
> +			ret = __bpf_prog_dev_bound_init(prog,
> +							tgt_prog->aux->offload->netdev);

Same here for tgt_prog->aux->offload.  bpf_prog_dev_bound_init() will need to 
take an extra dst_prog arg, like bpf_prog_dev_bound_init(prog, attr, dst_prog). 
It should be called earlier in syscall.c.

> +			if (ret)
> +				return ret;
> +		}
>   	}
>   
>   	/* store info about the attachment target that will be used later */


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs
  2022-12-13  2:35 ` [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs Stanislav Fomichev
  2022-12-13 17:00   ` David Vernet
@ 2022-12-14  1:53   ` Martin KaFai Lau
  2022-12-14 18:43     ` Stanislav Fomichev
  2022-12-14 10:54   ` [xdp-hints] " Toke Høiland-Jørgensen
  2 siblings, 1 reply; 46+ messages in thread
From: Martin KaFai Lau @ 2022-12-14  1:53 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: ast, daniel, andrii, song, yhs, john.fastabend, kpsingh, haoluo,
	jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, bpf

On 12/12/22 6:35 PM, Stanislav Fomichev wrote:
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index ca22e8b8bd82..de6279725f41 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2477,6 +2477,8 @@ void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
>   				       struct net_device *netdev);
>   bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev);
>   
> +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id);
> +

This probably requires an inline version for !CONFIG_NET.

> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index d434a994ee04..c3e501e3e39c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2097,6 +2097,13 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
>   	if (fp->kprobe_override)
>   		return false;
>   
> +	/* When tail-calling from a non-dev-bound program to a dev-bound one,
> +	 * XDP metadata helpers should be disabled. Until it's implemented,
> +	 * prohibit adding dev-bound programs to tail-call maps.
> +	 */
> +	if (bpf_prog_is_dev_bound(fp->aux))
> +		return false;
> +
>   	spin_lock(&map->owner.lock);
>   	if (!map->owner.type) {
>   		/* There's no owner yet where we could check for
> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> index f714c941f8ea..3b6c9023f24d 100644
> --- a/kernel/bpf/offload.c
> +++ b/kernel/bpf/offload.c
> @@ -757,6 +757,29 @@ void bpf_dev_bound_netdev_unregister(struct net_device *dev)
>   	up_write(&bpf_devs_lock);
>   }
>   
> +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
> +{
> +	const struct xdp_metadata_ops *ops;
> +	void *p = NULL;
> +
> +	down_read(&bpf_devs_lock);
> +	if (!prog->aux->offload || !prog->aux->offload->netdev)

This happens when netdev is unregistered in the middle of bpf_prog_load and the 
bpf_offload_dev_match() will eventually fail during dev_xdp_attach()? A comment 
will be useful.

> +		goto out;
> +
> +	ops = prog->aux->offload->netdev->xdp_metadata_ops;
> +	if (!ops)
> +		goto out;
> +
> +	if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
> +		p = ops->xmo_rx_timestamp;
> +	else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
> +		p = ops->xmo_rx_hash;
> +out:
> +	up_read(&bpf_devs_lock);
> +
> +	return p;
> +}
> +
>   static int __init bpf_offload_init(void)
>   {
>   	int err;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 203d8cfeda70..e61fe0472b9b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -15479,12 +15479,35 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   			    struct bpf_insn *insn_buf, int insn_idx, int *cnt)
>   {
>   	const struct bpf_kfunc_desc *desc;
> +	void *xdp_kfunc;
>   
>   	if (!insn->imm) {
>   		verbose(env, "invalid kernel function call not eliminated in verifier pass\n");
>   		return -EINVAL;
>   	}
>   
> +	*cnt = 0;
> +
> +	if (xdp_is_metadata_kfunc_id(insn->imm)) {
> +		if (!bpf_prog_is_dev_bound(env->prog->aux)) {

The "xdp_is_metadata_kfunc_id() && (!bpf_prog_is_dev_bound() || 
bpf_prog_is_offloaded())" test should have been done much earlier in 
add_kfunc_call(). Then the later stage of the verifier does not have to keep 
worrying about it like here.

nit. may be rename xdp_is_metadata_kfunc_id() to bpf_dev_bound_kfunc_id() and 
hide the "!bpf_prog_is_dev_bound() || bpf_prog_is_offloaded()" test into 
bpf_dev_bound_kfunc_check(&env->log, env->prog).

The change in fixup_kfunc_call could then become:

	if (bpf_dev_bound_kfunc_id(insn->imm)) {
		xdp_kfunc = bpf_dev_bound_resolve_kfunc(env->prog, insn->imm);
		/* ... */
	}

> +			verbose(env, "metadata kfuncs require device-bound program\n");
> +			return -EINVAL;
> +		}
> +
> +		if (bpf_prog_is_offloaded(env->prog->aux)) {
> +			verbose(env, "metadata kfuncs can't be offloaded\n");
> +			return -EINVAL;
> +		}
> +
> +		xdp_kfunc = bpf_dev_bound_resolve_kfunc(env->prog, insn->imm);
> +		if (xdp_kfunc) {
> +			insn->imm = BPF_CALL_IMM(xdp_kfunc);
> +			return 0;
> +		}
> +
> +		/* fallback to default kfunc when not supported by netdev */
> +	}
> +



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata
  2022-12-13 20:42     ` Stanislav Fomichev
@ 2022-12-14  9:48       ` Jesper Dangaard Brouer
  2022-12-14 10:47         ` [xdp-hints] " Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Jesper Dangaard Brouer @ 2022-12-14  9:48 UTC (permalink / raw)
  To: Stanislav Fomichev, Jesper Dangaard Brouer
  Cc: brouer, bpf, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev



On 13/12/2022 21.42, Stanislav Fomichev wrote:
> On Tue, Dec 13, 2022 at 7:55 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>>
>> On 13/12/2022 03.35, Stanislav Fomichev wrote:
>>> The goal is to enable end-to-end testing of the metadata for AF_XDP.
>>>
>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>> Cc: David Ahern <dsahern@gmail.com>
>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>> Cc: Willem de Bruijn <willemb@google.com>
>>> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>>> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>>> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>>> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>>> Cc: Maryam Tahhan <mtahhan@redhat.com>
>>> Cc: xdp-hints@xdp-project.net
>>> Cc: netdev@vger.kernel.org
>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>> ---
>>>    drivers/net/veth.c | 24 ++++++++++++++++++++++++
>>>    1 file changed, 24 insertions(+)
>>>
>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>> index 04ffd8cb2945..d5491e7a2798 100644
>>> --- a/drivers/net/veth.c
>>> +++ b/drivers/net/veth.c
>>> @@ -118,6 +118,7 @@ static struct {
>>>
>>>    struct veth_xdp_buff {
>>>        struct xdp_buff xdp;
>>> +     struct sk_buff *skb;
>>>    };
>>>
>>>    static int veth_get_link_ksettings(struct net_device *dev,
>>> @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
>>>
>>>                xdp_convert_frame_to_buff(frame, xdp);
>>>                xdp->rxq = &rq->xdp_rxq;
>>> +             vxbuf.skb = NULL;
>>>
>>>                act = bpf_prog_run_xdp(xdp_prog, xdp);
>>>
>>> @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>>>        __skb_push(skb, skb->data - skb_mac_header(skb));
>>>        if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
>>>                goto drop;
>>> +     vxbuf.skb = skb;
>>>
>>>        orig_data = xdp->data;
>>>        orig_data_end = xdp->data_end;
>>> @@ -1601,6 +1604,21 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>>>        }
>>>    }
>>>
>>> +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
>>> +{
>>> +     *timestamp = ktime_get_mono_fast_ns();
>>
>> This should be reading the hardware timestamp in the SKB.
>>
>> Details: This hardware timestamp in the SKB is located in
>> skb_shared_info area, which is also available for xdp_frame (currently
>> used for multi-buffer purposes).  Thus, when adding xdp-hints "store"
>> functionality, it would be natural to store the HW TS in the same place.
>> Making the veth skb/xdp_frame code paths able to share code.
> 
> Does something like the following look acceptable as well?
> 
> *timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;
> if (!*timestamp)
>          *timestamp = ktime_get_mono_fast_ns(); /* sw fallback */
> 

How can the BPF programmer tell the difference between getting hardware 
or software timestamp?

This will get really confusing when someone implements a tcpdump feature
(like/extend xdpdump) and some packets (e.g. PTP packets) have HW
timestamps and some don't.  The time sequence in the pcap will be strange.

> Because I'd like to be able to test this path in the selftests. As
> long as I get some number from veth_xdp_rx_timestamp, I can test it.
> No amount of SOF_TIMESTAMPING_{SOFTWARE,RX_SOFTWARE,RAW_HARDWARE}
> triggers non-zero hwtstamp for xsk receive path. Any suggestions?
> 

You could implement the "store" operation I mentioned before.
For testing you can store an arbitrary value in the timestamp and check
it later by reading it back.

I can see you have changed the API to send down a pointer. Thus, a
simple flag could implement the writing the provided timestamp.

Regarding flags for reading the timestamp.  Should we be able to specify
what clock type we are asking for?
Have you notice that tcpdump can ask for different types of
timestamps[1]. e.g. for hardware timestamps it is either
'adapter_unsynced' or 'adaptor'. (See semantic in [1])

  # tcpdump -ni eth1 -j adapter_unsynced --time-stamp-precision=nano

[1] https://www.tcpdump.org/manpages/pcap-tstamp.7.txt


Hint list which your NIC 'adaptor' supports via this cmdline:

  # tcpdump -i mlx5p2 --list-time-stamp-types

--Jesper


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-13 20:42     ` Stanislav Fomichev
@ 2022-12-14 10:34       ` Toke Høiland-Jørgensen
  2022-12-14 18:42         ` Stanislav Fomichev
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-12-14 10:34 UTC (permalink / raw)
  To: Stanislav Fomichev, David Vernet
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Stanislav Fomichev <sdf@google.com> writes:

> On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
>>
>> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
>> > Document all current use-cases and assumptions.
>> >
>> > Cc: John Fastabend <john.fastabend@gmail.com>
>> > Cc: David Ahern <dsahern@gmail.com>
>> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
>> > Cc: Jakub Kicinski <kuba@kernel.org>
>> > Cc: Willem de Bruijn <willemb@google.com>
>> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>> > Cc: Maryam Tahhan <mtahhan@redhat.com>
>> > Cc: xdp-hints@xdp-project.net
>> > Cc: netdev@vger.kernel.org
>> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> > ---
>> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>> >  1 file changed, 90 insertions(+)
>> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
>> >
>> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
>> > new file mode 100644
>> > index 000000000000..498eae718275
>> > --- /dev/null
>> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
>>
>> I think you need to add this to Documentation/bpf/index.rst. Or even
>> better, maybe it's time to add an xdp/ subdirectory and put all docs
>> there? Don't want to block your patchset from bikeshedding on this
>> point, so for now it's fine to just put it in
>> Documentation/bpf/index.rst until we figure that out.
>
> Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
> and reference form Documentation/networking/index.rst? Since it's more
> relevant to networking than the core bpf?
>
>> > @@ -0,0 +1,90 @@
>> > +===============
>> > +XDP RX Metadata
>> > +===============
>> > +
>> > +XDP programs support creating and passing custom metadata via
>> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
>> > +entities:
>>
>> Can you add a couple of sentences to this intro section that explains
>> what metadata is at a high level?
>
> I'm gonna copy-paste here what I'm adding, feel free to reply back if
> still unclear. (so we don't have to wait another week to discuss the
> changes)
>
> XDP programs support creating and passing custom metadata via
> ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
> about the packet: timestamps, hash, vlan and tunneling information, etc.
> This metadata can be consumed by the following entities:

This is not really accurate, though? The metadata area itself can
contain whatever the XDP program wants it to, and I think you're
conflating the "old" usage for arbitrary storage with the driver-kfunc
metadata support.

I think we should clear separate the two: the metadata area is just a
place to store data (and is not consumed by the stack, except that
TC-BPF programs can access it), and the driver kfuncs are just a general
way to get data out of the drivers (and has nothing to do with the
metadata area, you can just get the data into stack variables).

While it would be good to have a documentation of the general metadata
area stuff somewhere, I don't think it necessarily have to be part of
this series, so maybe just stick to documenting the kfuncs?

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 06/15] bpf: Support consuming XDP HW metadata from fext programs
  2022-12-14  1:45   ` Martin KaFai Lau
@ 2022-12-14 10:41     ` Toke Høiland-Jørgensen
  2022-12-14 18:43       ` Stanislav Fomichev
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-12-14 10:41 UTC (permalink / raw)
  To: Martin KaFai Lau, Stanislav Fomichev
  Cc: ast, daniel, andrii, song, yhs, john.fastabend, kpsingh, haoluo,
	jolsa, bpf

Martin KaFai Lau <martin.lau@linux.dev> writes:

> On 12/12/22 6:35 PM, Stanislav Fomichev wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> 
>> Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP
>> programs that consume HW metadata, implement support for propagating the
>> offload information. The extension program doesn't need to set a flag or
>> ifindex, it these will just be propagated from the target by the verifier.
>
> s/it/because/ ... these will just be propagated....

Yeah, or just drop 'it' :)

>> We need to create a separate offload object for the extension program,
>> though, since it can be reattached to a different program later (which
>> means we can't just inhering the offload information from the target).
>
> hmm.... inheriting?

Think I meant to write "we can't just inherit"

>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 11c558be4992..8686475f0dbe 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -3021,6 +3021,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>>   			goto out_put_prog;
>>   		}
>>   
>> +		if (bpf_prog_is_dev_bound(prog->aux) &&
>
>
>> +		    (bpf_prog_is_offloaded(tgt_prog->aux) ||
>> +		     !bpf_prog_is_dev_bound(tgt_prog->aux) ||
>> +		     !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) {
>
> hmm... tgt_prog->aux->offload does not look safe without taking bpf_devs_lock. 
> offload could be NULL, no?
>
> It probably needs a bpf_prog_dev_bound_match(prog, tgt_prog) which takes the lock.

Hmm, right, I was kinda expecting that this would not go away while
tgt_prog was alive, but I see now that's not the case due to the
unregister hook. So yeah, needs locking (same below) :)

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata
  2022-12-14  9:48       ` Jesper Dangaard Brouer
@ 2022-12-14 10:47         ` Toke Høiland-Jørgensen
  2022-12-14 18:09           ` Martin KaFai Lau
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-12-14 10:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Stanislav Fomichev, Jesper Dangaard Brouer
  Cc: brouer, bpf, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Jesper Dangaard Brouer <jbrouer@redhat.com> writes:

> On 13/12/2022 21.42, Stanislav Fomichev wrote:
>> On Tue, Dec 13, 2022 at 7:55 AM Jesper Dangaard Brouer
>> <jbrouer@redhat.com> wrote:
>>>
>>>
>>> On 13/12/2022 03.35, Stanislav Fomichev wrote:
>>>> The goal is to enable end-to-end testing of the metadata for AF_XDP.
>>>>
>>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>>> Cc: David Ahern <dsahern@gmail.com>
>>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>>> Cc: Willem de Bruijn <willemb@google.com>
>>>> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>>>> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>>>> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>>>> Cc: Maryam Tahhan <mtahhan@redhat.com>
>>>> Cc: xdp-hints@xdp-project.net
>>>> Cc: netdev@vger.kernel.org
>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>>> ---
>>>>    drivers/net/veth.c | 24 ++++++++++++++++++++++++
>>>>    1 file changed, 24 insertions(+)
>>>>
>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>>> index 04ffd8cb2945..d5491e7a2798 100644
>>>> --- a/drivers/net/veth.c
>>>> +++ b/drivers/net/veth.c
>>>> @@ -118,6 +118,7 @@ static struct {
>>>>
>>>>    struct veth_xdp_buff {
>>>>        struct xdp_buff xdp;
>>>> +     struct sk_buff *skb;
>>>>    };
>>>>
>>>>    static int veth_get_link_ksettings(struct net_device *dev,
>>>> @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
>>>>
>>>>                xdp_convert_frame_to_buff(frame, xdp);
>>>>                xdp->rxq = &rq->xdp_rxq;
>>>> +             vxbuf.skb = NULL;
>>>>
>>>>                act = bpf_prog_run_xdp(xdp_prog, xdp);
>>>>
>>>> @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>>>>        __skb_push(skb, skb->data - skb_mac_header(skb));
>>>>        if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
>>>>                goto drop;
>>>> +     vxbuf.skb = skb;
>>>>
>>>>        orig_data = xdp->data;
>>>>        orig_data_end = xdp->data_end;
>>>> @@ -1601,6 +1604,21 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>>>>        }
>>>>    }
>>>>
>>>> +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
>>>> +{
>>>> +     *timestamp = ktime_get_mono_fast_ns();
>>>
>>> This should be reading the hardware timestamp in the SKB.
>>>
>>> Details: This hardware timestamp in the SKB is located in
>>> skb_shared_info area, which is also available for xdp_frame (currently
>>> used for multi-buffer purposes).  Thus, when adding xdp-hints "store"
>>> functionality, it would be natural to store the HW TS in the same place.
>>> Making the veth skb/xdp_frame code paths able to share code.
>> 
>> Does something like the following look acceptable as well?
>> 
>> *timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;
>> if (!*timestamp)
>>          *timestamp = ktime_get_mono_fast_ns(); /* sw fallback */
>> 
>
> How can the BPF programmer tell the difference between getting hardware 
> or software timestamp?
>
> This will get really confusing when someone implements a tcpdump feature
> (like/extend xdpdump) and some packets (e.g. PTP packets) have HW
> timestamps and some don't.  The time sequence in the pcap will be strange.
>
>> Because I'd like to be able to test this path in the selftests. As
>> long as I get some number from veth_xdp_rx_timestamp, I can test it.
>> No amount of SOF_TIMESTAMPING_{SOFTWARE,RX_SOFTWARE,RAW_HARDWARE}
>> triggers non-zero hwtstamp for xsk receive path. Any suggestions?
>> 
>
> You could implement the "store" operation I mentioned before.
> For testing you can store an arbitrary value in the timestamp and check
> it later by reading it back.
>
> I can see you have changed the API to send down a pointer. Thus, a
> simple flag could implement the writing the provided timestamp.
>
> Regarding flags for reading the timestamp.  Should we be able to specify
> what clock type we are asking for?
> Have you notice that tcpdump can ask for different types of
> timestamps[1]. e.g. for hardware timestamps it is either
> 'adapter_unsynced' or 'adaptor'. (See semantic in [1])
>
>   # tcpdump -ni eth1 -j adapter_unsynced --time-stamp-precision=nano

I don't think it makes sense for XDP to *ask* for a specific timestamp
(any individual packet probably only has one, right?). But we could add
a way to query which type of timestamp is available, either as a second
argument to the same function, or as a separate one. Same thing for the
hash, BTW (skb_set_hash() also takes a type argument).

I guess the easiest would be to just add a second parameter to both
getter functions for the type, but maybe there's a performance argument
for having it be a separate kfunc (at least for timestamp)?

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs
  2022-12-13  2:35 ` [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs Stanislav Fomichev
  2022-12-13 17:00   ` David Vernet
  2022-12-14  1:53   ` Martin KaFai Lau
@ 2022-12-14 10:54   ` Toke Høiland-Jørgensen
  2022-12-14 18:43     ` Stanislav Fomichev
  2 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-12-14 10:54 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Stanislav Fomichev <sdf@google.com> writes:

[..]

> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index d434a994ee04..c3e501e3e39c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2097,6 +2097,13 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
>  	if (fp->kprobe_override)
>  		return false;
>  
> +	/* When tail-calling from a non-dev-bound program to a dev-bound one,
> +	 * XDP metadata helpers should be disabled. Until it's implemented,
> +	 * prohibit adding dev-bound programs to tail-call maps.
> +	 */
> +	if (bpf_prog_is_dev_bound(fp->aux))
> +		return false;
> +

nit: the comment is slightly inaccurate as the program running in a
devmap/cpumap has nothing to do with tail calls. maybe replace it with:

"XDP programs inserted into maps are not guaranteed to run on a
particular netdev (and can run outside driver context entirely in the
case of devmap and cpumap). Until device checks are implemented,
prohibit adding dev-bound programs to program maps."

Also, there needs to be a check in bpf_prog_test_run_xdp() to reject
dev-bound programs there as well...

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata
  2022-12-14 10:47         ` [xdp-hints] " Toke Høiland-Jørgensen
@ 2022-12-14 18:09           ` Martin KaFai Lau
  2022-12-14 18:44             ` Stanislav Fomichev
  0 siblings, 1 reply; 46+ messages in thread
From: Martin KaFai Lau @ 2022-12-14 18:09 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: brouer, bpf, ast, daniel, andrii, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On 12/14/22 2:47 AM, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
> 
>> On 13/12/2022 21.42, Stanislav Fomichev wrote:
>>> On Tue, Dec 13, 2022 at 7:55 AM Jesper Dangaard Brouer
>>> <jbrouer@redhat.com> wrote:
>>>>
>>>>
>>>> On 13/12/2022 03.35, Stanislav Fomichev wrote:
>>>>> The goal is to enable end-to-end testing of the metadata for AF_XDP.
>>>>>
>>>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>>>> Cc: David Ahern <dsahern@gmail.com>
>>>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>>>> Cc: Willem de Bruijn <willemb@google.com>
>>>>> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>>>>> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>>>>> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>>>>> Cc: Maryam Tahhan <mtahhan@redhat.com>
>>>>> Cc: xdp-hints@xdp-project.net
>>>>> Cc: netdev@vger.kernel.org
>>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>>>> ---
>>>>>     drivers/net/veth.c | 24 ++++++++++++++++++++++++
>>>>>     1 file changed, 24 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>>>> index 04ffd8cb2945..d5491e7a2798 100644
>>>>> --- a/drivers/net/veth.c
>>>>> +++ b/drivers/net/veth.c
>>>>> @@ -118,6 +118,7 @@ static struct {
>>>>>
>>>>>     struct veth_xdp_buff {
>>>>>         struct xdp_buff xdp;
>>>>> +     struct sk_buff *skb;
>>>>>     };
>>>>>
>>>>>     static int veth_get_link_ksettings(struct net_device *dev,
>>>>> @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
>>>>>
>>>>>                 xdp_convert_frame_to_buff(frame, xdp);
>>>>>                 xdp->rxq = &rq->xdp_rxq;
>>>>> +             vxbuf.skb = NULL;
>>>>>
>>>>>                 act = bpf_prog_run_xdp(xdp_prog, xdp);
>>>>>
>>>>> @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>>>>>         __skb_push(skb, skb->data - skb_mac_header(skb));
>>>>>         if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
>>>>>                 goto drop;
>>>>> +     vxbuf.skb = skb;
>>>>>
>>>>>         orig_data = xdp->data;
>>>>>         orig_data_end = xdp->data_end;
>>>>> @@ -1601,6 +1604,21 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>>>>>         }
>>>>>     }
>>>>>
>>>>> +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
>>>>> +{
>>>>> +     *timestamp = ktime_get_mono_fast_ns();
>>>>
>>>> This should be reading the hardware timestamp in the SKB.
>>>>
>>>> Details: This hardware timestamp in the SKB is located in
>>>> skb_shared_info area, which is also available for xdp_frame (currently
>>>> used for multi-buffer purposes).  Thus, when adding xdp-hints "store"
>>>> functionality, it would be natural to store the HW TS in the same place.
>>>> Making the veth skb/xdp_frame code paths able to share code.
>>>
>>> Does something like the following look acceptable as well?
>>>
>>> *timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;

If it is to test the kfunc and ensure veth_xdp_rx_timestamp is called, this 
alone should be enough. skb_hwtstamps(_ctx->skb)->hwtstamp should be 0 if 
hwtstamp is unavailable?  The test can initialize the 'u64 *timestamp' arg to 
non-zero first.  If it is not good enough, an fentry tracing can be done to 
veth_xdp_rx_timestamp to ensure it is called also.  There is also fmod_ret that 
could change the return value but the timestamp is not the return value though.

If the above is not enough, another direction of thought could be doing it 
through bpf_prog_test_run_xdp() but it will need a way to initialize the 
veth_xdp_buff.

That said, overall, I don't think it is a good idea to bend the 
veth_xdp_rx_timestamp kfunc too much only for testing purpose unless there is no 
other way out.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 03/15] bpf: Introduce device-bound XDP programs
  2022-12-13 23:25   ` Martin KaFai Lau
@ 2022-12-14 18:42     ` Stanislav Fomichev
  0 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-14 18:42 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: ast, daniel, andrii, song, yhs, john.fastabend, kpsingh, haoluo,
	jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, bpf

On Tue, Dec 13, 2022 at 3:25 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 12/12/22 6:35 PM, Stanislav Fomichev wrote:
> > New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way
> > to associate a netdev with a BPF program at load time.
> >
> > Some existing 'offloaded' routines are renamed to 'dev_bound' for
> > consistency with the rest.
> >
> > Also moved a bunch of code around to avoid forward declarations.
>
> There are too many things in one patch.  It becomes quite hard to follow, eg. I
> have to go back-and-forth a few times within this patch to confirm what change
> is just a move.  Please put the "moved a bunch of code around to avoid forward
> declarations" in one individual patch and also the
> "late_initcall(bpf_offload_init)" change in another individual patch.

Ugh, sorry, good point will definitely split more :-(

> [ ... ]
>
> > -int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> > +static int __bpf_offload_dev_netdev_register(struct bpf_offload_dev *offdev,
> > +                                          struct net_device *netdev)
> > +{
> > +     struct bpf_offload_netdev *ondev;
> > +     int err;
> > +
> > +     ondev = kzalloc(sizeof(*ondev), GFP_KERNEL);
> > +     if (!ondev)
> > +             return -ENOMEM;
> > +
> > +     ondev->netdev = netdev;
> > +     ondev->offdev = offdev;
> > +     INIT_LIST_HEAD(&ondev->progs);
> > +     INIT_LIST_HEAD(&ondev->maps);
> > +
> > +     err = rhashtable_insert_fast(&offdevs, &ondev->l, offdevs_params);
> > +     if (err) {
> > +             netdev_warn(netdev, "failed to register for BPF offload\n");
> > +             goto err_unlock_free;
> > +     }
> > +
> > +     if (offdev)
> > +             list_add(&ondev->offdev_netdevs, &offdev->netdevs);
> > +     return 0;
> > +
> > +err_unlock_free:
> > +     up_write(&bpf_devs_lock);
>
> No need to handle bpf_devs_lock in the "__" version of the register() helper?
> The goto label probably also needs another name, eg. "err_free".

Ah, not sure how I missed that, thanks!

> > +     kfree(ondev);
> > +     return err;
> > +}
> > +
>
> [ ... ]
>
> > +int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr)
> >   {
> >       struct bpf_offload_netdev *ondev;
> >       struct bpf_prog_offload *offload;
> > @@ -87,7 +198,7 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> >           attr->prog_type != BPF_PROG_TYPE_XDP)
> >               return -EINVAL;
> >
> > -     if (attr->prog_flags)
> > +     if (attr->prog_flags & ~BPF_F_XDP_DEV_BOUND_ONLY)
> >               return -EINVAL;
> >
> >       offload = kzalloc(sizeof(*offload), GFP_USER);
> > @@ -102,11 +213,25 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> >       if (err)
> >               goto err_maybe_put;
> >
> > +     prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_DEV_BOUND_ONLY);
> > +
> >       down_write(&bpf_devs_lock);
> >       ondev = bpf_offload_find_netdev(offload->netdev);
> >       if (!ondev) {
> > -             err = -EINVAL;
> > -             goto err_unlock;
> > +             if (!bpf_prog_is_offloaded(prog->aux)) {
> > +                     /* When only binding to the device, explicitly
> > +                      * create an entry in the hashtable. See related
> > +                      * bpf_dev_bound_try_remove_netdev.
> > +                      */
> > +                     err = __bpf_offload_dev_netdev_register(NULL, offload->netdev);
> > +                     if (err)
> > +                             goto err_unlock;
> > +                     ondev = bpf_offload_find_netdev(offload->netdev);
> > +             }
> > +             if (!ondev) {
>
> nit.  A bit confusing because the "ondev = bpf_offload_find_netdev(...)" above
> should not fail but "!ondev" is tested again here.  I think the intention is to
> fail on the 'bpf_prog_is_offloaded() == true' case. May be:
>
>                 if (bpf_prog_is_offloaded(prog->aux)) {
>                         err = -EINVAL;
>                         goto err_unlock;
>                 }
>                 /* When only binding to the device, explicitly
>                  * ...
>                  */
>                 err = __bpf_offload_dev_netdev_register(NULL, offload->netdev);
>                 if (err)
>                         goto err_unlock;
>                 ondev = bpf_offload_find_netdev(offload->netdev);
>

Yeah, that looks better, thx!

> > +                     err = -EINVAL;
> > +                     goto err_unlock;
> > +             }
> >       }
> >       offload->offdev = ondev->offdev;
> >       prog->aux->offload = offload;
> > @@ -209,27 +334,28 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
> >       up_read(&bpf_devs_lock);
> >   }
> >
> > -static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
> > +static void bpf_dev_bound_try_remove_netdev(struct net_device *dev)
> >   {
> > -     struct bpf_prog_offload *offload = prog->aux->offload;
> > -
> > -     if (offload->dev_state)
> > -             offload->offdev->ops->destroy(prog);
> > +     struct bpf_offload_netdev *ondev;
> >
> > -     /* Make sure BPF_PROG_GET_NEXT_ID can't find this dead program */
> > -     bpf_prog_free_id(prog, true);
> > +     if (!dev)
> > +             return;
> >
> > -     list_del_init(&offload->offloads);
> > -     kfree(offload);
> > -     prog->aux->offload = NULL;
> > +     ondev = bpf_offload_find_netdev(dev);
> > +     if (ondev && !ondev->offdev && list_empty(&ondev->progs))
>
> hmm....list_empty(&ondev->progs) is tested here but will it be empty? ...

Ugh, yeah, need to move that list_del_init(&offload->offloads) to
somewhere before bpf_dev_bound_try_remove_netdev.

> > +             __bpf_offload_dev_netdev_unregister(NULL, dev);
> >   }
> >
> > -void bpf_prog_offload_destroy(struct bpf_prog *prog)
> > +void bpf_prog_dev_bound_destroy(struct bpf_prog *prog)
> >   {
> > +     rtnl_lock();
> >       down_write(&bpf_devs_lock);
> > -     if (prog->aux->offload)
> > -             __bpf_prog_offload_destroy(prog);
> > +     if (prog->aux->offload) {
> > +             bpf_dev_bound_try_remove_netdev(prog->aux->offload->netdev);
>
> ... the "prog" here is still linked to ondev->progs, right?
> because __bpf_prog_dev_bound_destroy() is called later below.

Agreed, right.

> nit. May be the bpf_dev_bound_try_remove_netdev() should be folded/merged back
> into bpf_prog_dev_bound_destroy() to make things more clear.

Makes sense.

> > +             __bpf_prog_dev_bound_destroy(prog); > + }
> >       up_write(&bpf_devs_lock);
> > +     rtnl_unlock();
> >   }
>
> [ ... ]
>
> > +static int __init bpf_offload_init(void)
> > +{
> > +     int err;
> > +
> > +     down_write(&bpf_devs_lock);
>
> lock is probably not needed.

Sure, will drop.

> > +     err = rhashtable_init(&offdevs, &offdevs_params);
> > +     up_write(&bpf_devs_lock);
> > +
> > +     return err;
> > +}
> > +
> > +late_initcall(bpf_offload_init);
>
> [ ... ]
>
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 5d51999cba30..194f8116aad4 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -9228,6 +9228,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
> >                       NL_SET_ERR_MSG(extack, "Using offloaded program without HW_MODE flag is not supported");
> >                       return -EINVAL;
> >               }
> > +             if (bpf_prog_is_dev_bound(new_prog->aux) && !bpf_offload_dev_match(new_prog, dev)) {
> > +                     NL_SET_ERR_MSG(extack, "Program bound to different device");
> > +                     return -EINVAL;
> > +             }
> >               if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
> >                       NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
> >                       return -EINVAL;
> > @@ -10813,6 +10817,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
> >               /* Shutdown queueing discipline. */
> >               dev_shutdown(dev);
> >
> > +             bpf_dev_bound_netdev_unregister(dev);
>
> Does it matter if bpf_dev_bound_netdev_unregister(dev) is called before
> dev_xdp_uninstall(dev)?  Asking because it seems more logic to unregister dev
> after detaching xdp progs.

By running it first I was hoping to catch any possible issues. Agreed
that doing it after makes more sense, will move.

> >               dev_xdp_uninstall(dev);
> >
> >               netdev_offload_xstats_disable_all(dev);
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-14 10:34       ` [xdp-hints] " Toke Høiland-Jørgensen
@ 2022-12-14 18:42         ` Stanislav Fomichev
  2022-12-14 23:46           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-14 18:42 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David Vernet, bpf, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Wed, Dec 14, 2022 at 2:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
> >>
> >> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
> >> > Document all current use-cases and assumptions.
> >> >
> >> > Cc: John Fastabend <john.fastabend@gmail.com>
> >> > Cc: David Ahern <dsahern@gmail.com>
> >> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> >> > Cc: Jakub Kicinski <kuba@kernel.org>
> >> > Cc: Willem de Bruijn <willemb@google.com>
> >> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> >> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> >> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> >> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> >> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> >> > Cc: xdp-hints@xdp-project.net
> >> > Cc: netdev@vger.kernel.org
> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >> > ---
> >> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
> >> >  1 file changed, 90 insertions(+)
> >> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> >> >
> >> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> >> > new file mode 100644
> >> > index 000000000000..498eae718275
> >> > --- /dev/null
> >> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
> >>
> >> I think you need to add this to Documentation/bpf/index.rst. Or even
> >> better, maybe it's time to add an xdp/ subdirectory and put all docs
> >> there? Don't want to block your patchset from bikeshedding on this
> >> point, so for now it's fine to just put it in
> >> Documentation/bpf/index.rst until we figure that out.
> >
> > Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
> > and reference form Documentation/networking/index.rst? Since it's more
> > relevant to networking than the core bpf?
> >
> >> > @@ -0,0 +1,90 @@
> >> > +===============
> >> > +XDP RX Metadata
> >> > +===============
> >> > +
> >> > +XDP programs support creating and passing custom metadata via
> >> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> >> > +entities:
> >>
> >> Can you add a couple of sentences to this intro section that explains
> >> what metadata is at a high level?
> >
> > I'm gonna copy-paste here what I'm adding, feel free to reply back if
> > still unclear. (so we don't have to wait another week to discuss the
> > changes)
> >
> > XDP programs support creating and passing custom metadata via
> > ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
> > about the packet: timestamps, hash, vlan and tunneling information, etc.
> > This metadata can be consumed by the following entities:
>
> This is not really accurate, though? The metadata area itself can
> contain whatever the XDP program wants it to, and I think you're
> conflating the "old" usage for arbitrary storage with the driver-kfunc
> metadata support.
>
> I think we should clear separate the two: the metadata area is just a
> place to store data (and is not consumed by the stack, except that
> TC-BPF programs can access it), and the driver kfuncs are just a general
> way to get data out of the drivers (and has nothing to do with the
> metadata area, you can just get the data into stack variables).
>
> While it would be good to have a documentation of the general metadata
> area stuff somewhere, I don't think it necessarily have to be part of
> this series, so maybe just stick to documenting the kfuncs?

Maybe I can reword to something like below?

The metadata can be used to store some extra information about the
packet timestamps, hash, vlan and tunneling information, etc.

This way we are not actually defining what it is, but hinting about
how it's commonly used?

> -Toke
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 06/15] bpf: Support consuming XDP HW metadata from fext programs
  2022-12-14 10:41     ` Toke Høiland-Jørgensen
@ 2022-12-14 18:43       ` Stanislav Fomichev
  2022-12-14 22:19         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-14 18:43 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Martin KaFai Lau, ast, daniel, andrii, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, bpf

On Wed, Dec 14, 2022 at 2:41 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Martin KaFai Lau <martin.lau@linux.dev> writes:
>
> > On 12/12/22 6:35 PM, Stanislav Fomichev wrote:
> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >>
> >> Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP
> >> programs that consume HW metadata, implement support for propagating the
> >> offload information. The extension program doesn't need to set a flag or
> >> ifindex, it these will just be propagated from the target by the verifier.
> >
> > s/it/because/ ... these will just be propagated....
>
> Yeah, or just drop 'it' :)
>
> >> We need to create a separate offload object for the extension program,
> >> though, since it can be reattached to a different program later (which
> >> means we can't just inhering the offload information from the target).
> >
> > hmm.... inheriting?
>
> Think I meant to write "we can't just inherit"
>
> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >> index 11c558be4992..8686475f0dbe 100644
> >> --- a/kernel/bpf/syscall.c
> >> +++ b/kernel/bpf/syscall.c
> >> @@ -3021,6 +3021,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
> >>                      goto out_put_prog;
> >>              }
> >>
> >> +            if (bpf_prog_is_dev_bound(prog->aux) &&
> >
> >
> >> +                (bpf_prog_is_offloaded(tgt_prog->aux) ||
> >> +                 !bpf_prog_is_dev_bound(tgt_prog->aux) ||
> >> +                 !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) {
> >
> > hmm... tgt_prog->aux->offload does not look safe without taking bpf_devs_lock.
> > offload could be NULL, no?
> >
> > It probably needs a bpf_prog_dev_bound_match(prog, tgt_prog) which takes the lock.
>
> Hmm, right, I was kinda expecting that this would not go away while
> tgt_prog was alive, but I see now that's not the case due to the
> unregister hook. So yeah, needs locking (same below) :)

Agreed, thanks! These seem easy enough to address on my side, so I'll
take care of them (and will keep your attribution).

> -Toke
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs
  2022-12-14 10:54   ` [xdp-hints] " Toke Høiland-Jørgensen
@ 2022-12-14 18:43     ` Stanislav Fomichev
  0 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-14 18:43 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

On Wed, Dec 14, 2022 at 2:54 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> [..]
>
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index d434a994ee04..c3e501e3e39c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2097,6 +2097,13 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
> >       if (fp->kprobe_override)
> >               return false;
> >
> > +     /* When tail-calling from a non-dev-bound program to a dev-bound one,
> > +      * XDP metadata helpers should be disabled. Until it's implemented,
> > +      * prohibit adding dev-bound programs to tail-call maps.
> > +      */
> > +     if (bpf_prog_is_dev_bound(fp->aux))
> > +             return false;
> > +
>
> nit: the comment is slightly inaccurate as the program running in a
> devmap/cpumap has nothing to do with tail calls. maybe replace it with:
>
> "XDP programs inserted into maps are not guaranteed to run on a
> particular netdev (and can run outside driver context entirely in the
> case of devmap and cpumap). Until device checks are implemented,
> prohibit adding dev-bound programs to program maps."

SG.

> Also, there needs to be a check in bpf_prog_test_run_xdp() to reject
> dev-bound programs there as well...

Ah, totally forgot about this part, thanks for reminding!

> -Toke
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs
  2022-12-14  1:53   ` Martin KaFai Lau
@ 2022-12-14 18:43     ` Stanislav Fomichev
  0 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-14 18:43 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: ast, daniel, andrii, song, yhs, john.fastabend, kpsingh, haoluo,
	jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn,
	Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, bpf

On Tue, Dec 13, 2022 at 5:54 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 12/12/22 6:35 PM, Stanislav Fomichev wrote:
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index ca22e8b8bd82..de6279725f41 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2477,6 +2477,8 @@ void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
> >                                      struct net_device *netdev);
> >   bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev);
> >
> > +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id);
> > +
>
> This probably requires an inline version for !CONFIG_NET.

Yeah, not sure why my confings didn't catch this :-(

> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index d434a994ee04..c3e501e3e39c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2097,6 +2097,13 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
> >       if (fp->kprobe_override)
> >               return false;
> >
> > +     /* When tail-calling from a non-dev-bound program to a dev-bound one,
> > +      * XDP metadata helpers should be disabled. Until it's implemented,
> > +      * prohibit adding dev-bound programs to tail-call maps.
> > +      */
> > +     if (bpf_prog_is_dev_bound(fp->aux))
> > +             return false;
> > +
> >       spin_lock(&map->owner.lock);
> >       if (!map->owner.type) {
> >               /* There's no owner yet where we could check for
> > diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> > index f714c941f8ea..3b6c9023f24d 100644
> > --- a/kernel/bpf/offload.c
> > +++ b/kernel/bpf/offload.c
> > @@ -757,6 +757,29 @@ void bpf_dev_bound_netdev_unregister(struct net_device *dev)
> >       up_write(&bpf_devs_lock);
> >   }
> >
> > +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
> > +{
> > +     const struct xdp_metadata_ops *ops;
> > +     void *p = NULL;
> > +
> > +     down_read(&bpf_devs_lock);
> > +     if (!prog->aux->offload || !prog->aux->offload->netdev)
>
> This happens when netdev is unregistered in the middle of bpf_prog_load and the
> bpf_offload_dev_match() will eventually fail during dev_xdp_attach()? A comment
> will be useful.

Right, that's the expectation - we load/verify the prog but it's
essentially un-attach-able.
Will try to clarify here.

> > +             goto out;
> > +
> > +     ops = prog->aux->offload->netdev->xdp_metadata_ops;
> > +     if (!ops)
> > +             goto out;
> > +
> > +     if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP))
> > +             p = ops->xmo_rx_timestamp;
> > +     else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
> > +             p = ops->xmo_rx_hash;
> > +out:
> > +     up_read(&bpf_devs_lock);
> > +
> > +     return p;
> > +}
> > +
> >   static int __init bpf_offload_init(void)
> >   {
> >       int err;
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 203d8cfeda70..e61fe0472b9b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -15479,12 +15479,35 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >                           struct bpf_insn *insn_buf, int insn_idx, int *cnt)
> >   {
> >       const struct bpf_kfunc_desc *desc;
> > +     void *xdp_kfunc;
> >
> >       if (!insn->imm) {
> >               verbose(env, "invalid kernel function call not eliminated in verifier pass\n");
> >               return -EINVAL;
> >       }
> >
> > +     *cnt = 0;
> > +
> > +     if (xdp_is_metadata_kfunc_id(insn->imm)) {
> > +             if (!bpf_prog_is_dev_bound(env->prog->aux)) {
>
> The "xdp_is_metadata_kfunc_id() && (!bpf_prog_is_dev_bound() ||
> bpf_prog_is_offloaded())" test should have been done much earlier in
> add_kfunc_call(). Then the later stage of the verifier does not have to keep
> worrying about it like here.
>
> nit. may be rename xdp_is_metadata_kfunc_id() to bpf_dev_bound_kfunc_id() and
> hide the "!bpf_prog_is_dev_bound() || bpf_prog_is_offloaded()" test into
> bpf_dev_bound_kfunc_check(&env->log, env->prog).
>
> The change in fixup_kfunc_call could then become:
>
>         if (bpf_dev_bound_kfunc_id(insn->imm)) {
>                 xdp_kfunc = bpf_dev_bound_resolve_kfunc(env->prog, insn->imm);
>                 /* ... */
>         }

Makes sense, ty!


> > +                     verbose(env, "metadata kfuncs require device-bound program\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             if (bpf_prog_is_offloaded(env->prog->aux)) {
> > +                     verbose(env, "metadata kfuncs can't be offloaded\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             xdp_kfunc = bpf_dev_bound_resolve_kfunc(env->prog, insn->imm);
> > +             if (xdp_kfunc) {
> > +                     insn->imm = BPF_CALL_IMM(xdp_kfunc);
> > +                     return 0;
> > +             }
> > +
> > +             /* fallback to default kfunc when not supported by netdev */
> > +     }
> > +
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata
  2022-12-14 18:09           ` Martin KaFai Lau
@ 2022-12-14 18:44             ` Stanislav Fomichev
  0 siblings, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-14 18:44 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: brouer, bpf, ast, daniel, andrii, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Anatoly Burakov, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Wed, Dec 14, 2022 at 10:09 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 12/14/22 2:47 AM, Toke Høiland-Jørgensen wrote:
> > Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
> >
> >> On 13/12/2022 21.42, Stanislav Fomichev wrote:
> >>> On Tue, Dec 13, 2022 at 7:55 AM Jesper Dangaard Brouer
> >>> <jbrouer@redhat.com> wrote:
> >>>>
> >>>>
> >>>> On 13/12/2022 03.35, Stanislav Fomichev wrote:
> >>>>> The goal is to enable end-to-end testing of the metadata for AF_XDP.
> >>>>>
> >>>>> Cc: John Fastabend <john.fastabend@gmail.com>
> >>>>> Cc: David Ahern <dsahern@gmail.com>
> >>>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> >>>>> Cc: Jakub Kicinski <kuba@kernel.org>
> >>>>> Cc: Willem de Bruijn <willemb@google.com>
> >>>>> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> >>>>> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> >>>>> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> >>>>> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> >>>>> Cc: Maryam Tahhan <mtahhan@redhat.com>
> >>>>> Cc: xdp-hints@xdp-project.net
> >>>>> Cc: netdev@vger.kernel.org
> >>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >>>>> ---
> >>>>>     drivers/net/veth.c | 24 ++++++++++++++++++++++++
> >>>>>     1 file changed, 24 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> >>>>> index 04ffd8cb2945..d5491e7a2798 100644
> >>>>> --- a/drivers/net/veth.c
> >>>>> +++ b/drivers/net/veth.c
> >>>>> @@ -118,6 +118,7 @@ static struct {
> >>>>>
> >>>>>     struct veth_xdp_buff {
> >>>>>         struct xdp_buff xdp;
> >>>>> +     struct sk_buff *skb;
> >>>>>     };
> >>>>>
> >>>>>     static int veth_get_link_ksettings(struct net_device *dev,
> >>>>> @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
> >>>>>
> >>>>>                 xdp_convert_frame_to_buff(frame, xdp);
> >>>>>                 xdp->rxq = &rq->xdp_rxq;
> >>>>> +             vxbuf.skb = NULL;
> >>>>>
> >>>>>                 act = bpf_prog_run_xdp(xdp_prog, xdp);
> >>>>>
> >>>>> @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >>>>>         __skb_push(skb, skb->data - skb_mac_header(skb));
> >>>>>         if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
> >>>>>                 goto drop;
> >>>>> +     vxbuf.skb = skb;
> >>>>>
> >>>>>         orig_data = xdp->data;
> >>>>>         orig_data_end = xdp->data_end;
> >>>>> @@ -1601,6 +1604,21 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> >>>>>         }
> >>>>>     }
> >>>>>
> >>>>> +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
> >>>>> +{
> >>>>> +     *timestamp = ktime_get_mono_fast_ns();
> >>>>
> >>>> This should be reading the hardware timestamp in the SKB.
> >>>>
> >>>> Details: This hardware timestamp in the SKB is located in
> >>>> skb_shared_info area, which is also available for xdp_frame (currently
> >>>> used for multi-buffer purposes).  Thus, when adding xdp-hints "store"
> >>>> functionality, it would be natural to store the HW TS in the same place.
> >>>> Making the veth skb/xdp_frame code paths able to share code.
> >>>
> >>> Does something like the following look acceptable as well?
> >>>
> >>> *timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;
>
> If it is to test the kfunc and ensure veth_xdp_rx_timestamp is called, this
> alone should be enough. skb_hwtstamps(_ctx->skb)->hwtstamp should be 0 if
> hwtstamp is unavailable?  The test can initialize the 'u64 *timestamp' arg to
> non-zero first.  If it is not good enough, an fentry tracing can be done to
> veth_xdp_rx_timestamp to ensure it is called also.  There is also fmod_ret that
> could change the return value but the timestamp is not the return value though.
>
> If the above is not enough, another direction of thought could be doing it
> through bpf_prog_test_run_xdp() but it will need a way to initialize the
> veth_xdp_buff.
>
> That said, overall, I don't think it is a good idea to bend the
> veth_xdp_rx_timestamp kfunc too much only for testing purpose unless there is no
> other way out.

Oh, good point about just making sure veth_xdp_rx_timestamp returns
timestamp=0. That should be enough, thanks!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 06/15] bpf: Support consuming XDP HW metadata from fext programs
  2022-12-14 18:43       ` Stanislav Fomichev
@ 2022-12-14 22:19         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-12-14 22:19 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Martin KaFai Lau, ast, daniel, andrii, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa, bpf

Stanislav Fomichev <sdf@google.com> writes:

> On Wed, Dec 14, 2022 at 2:41 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Martin KaFai Lau <martin.lau@linux.dev> writes:
>>
>> > On 12/12/22 6:35 PM, Stanislav Fomichev wrote:
>> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >>
>> >> Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP
>> >> programs that consume HW metadata, implement support for propagating the
>> >> offload information. The extension program doesn't need to set a flag or
>> >> ifindex, it these will just be propagated from the target by the verifier.
>> >
>> > s/it/because/ ... these will just be propagated....
>>
>> Yeah, or just drop 'it' :)
>>
>> >> We need to create a separate offload object for the extension program,
>> >> though, since it can be reattached to a different program later (which
>> >> means we can't just inhering the offload information from the target).
>> >
>> > hmm.... inheriting?
>>
>> Think I meant to write "we can't just inherit"
>>
>> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> >> index 11c558be4992..8686475f0dbe 100644
>> >> --- a/kernel/bpf/syscall.c
>> >> +++ b/kernel/bpf/syscall.c
>> >> @@ -3021,6 +3021,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>> >>                      goto out_put_prog;
>> >>              }
>> >>
>> >> +            if (bpf_prog_is_dev_bound(prog->aux) &&
>> >
>> >
>> >> +                (bpf_prog_is_offloaded(tgt_prog->aux) ||
>> >> +                 !bpf_prog_is_dev_bound(tgt_prog->aux) ||
>> >> +                 !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) {
>> >
>> > hmm... tgt_prog->aux->offload does not look safe without taking bpf_devs_lock.
>> > offload could be NULL, no?
>> >
>> > It probably needs a bpf_prog_dev_bound_match(prog, tgt_prog) which takes the lock.
>>
>> Hmm, right, I was kinda expecting that this would not go away while
>> tgt_prog was alive, but I see now that's not the case due to the
>> unregister hook. So yeah, needs locking (same below) :)
>
> Agreed, thanks! These seem easy enough to address on my side, so I'll
> take care of them (and will keep your attribution).

Awesome, thanks!

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-13  2:35 ` [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata Stanislav Fomichev
  2022-12-13 16:37   ` David Vernet
@ 2022-12-14 23:46   ` Toke Høiland-Jørgensen
  2022-12-17  4:20   ` kernel test robot
  2 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-12-14 23:46 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Stanislav Fomichev <sdf@google.com> writes:

> Document all current use-cases and assumptions.

Below is a set of slightly more constructive suggestions for how to edit
this so it's not confusing the metadata area description with the kfunc
list:

> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>  1 file changed, 90 insertions(+)
>  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
>
> diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> new file mode 100644
> index 000000000000..498eae718275
> --- /dev/null
> +++ b/Documentation/bpf/xdp-rx-metadata.rst
> @@ -0,0 +1,90 @@
> +===============
> +XDP RX Metadata
> +===============
> +
> +XDP programs support creating and passing custom metadata via
> +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> +entities:
> +
> +1. ``AF_XDP`` consumer.
> +2. Kernel core stack via ``XDP_PASS``.
> +3. Another device via ``bpf_redirect_map``.
> +4. Other BPF programs via ``bpf_tail_call``.

I'd replace the above with a short introduction, like:

"This document describes how an XDP program can access hardware metadata
related to a packet using a set of helper functions, and how it can pass
that metadata on to other consumers."

> +General Design
> +==============
> +
> +XDP has access to a set of kfuncs to manipulate the metadata. Every
> +device driver implements these kfuncs. The set of kfuncs is
> +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
> +
> +Currently, the following kfuncs are supported. In the future, as more
> +metadata is supported, this set will grow:
> +
> +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> +  indicate whether the device supports RX timestamps
> +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp
> +- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to
> +  indicate whether the device supports RX hash
> +- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash

Keep the above (with David's comments), then add a bit of extra text,
here:

"The XDP program can use these kfuncs to read the metadata into stack
variables for its own consumption. Or, to pass the metadata on to other
consumers, an XDP program can store it into the metadata area carried
ahead of the packet.

> +Within the XDP frame, the metadata layout is as follows::
> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +             ^                 ^
> +             |                 |
> +   xdp_buff->data_meta   xdp_buff->data

Add:

"The XDP program can store individual metadata items into this data_meta
area in whichever format it chooses. Later consumers of the metadata
will have to agree on the format by some out of band contract (like for
the AF_XDP use case, see below)."

> +AF_XDP
> +======
> +
> +``AF_XDP`` use-case implies that there is a contract between the BPF program
> +that redirects XDP frames into the ``XSK`` and the final consumer.
> +Thus the BPF program manually allocates a fixed number of
> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> +of kfuncs to populate it. User-space ``XSK`` consumer, looks
> +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> +
> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +                               ^
> +                               |
> +                        rx_desc->address
> +
> +XDP_PASS
> +========
> +
> +This is the path where the packets processed by the XDP program are passed
> +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
> +Currently, every driver has a custom kernel code to parse the descriptors and
> +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.

Add: ", and the XDP metadata is not used by the kernel when building
skbs. However, TC-BPF programs can access the XDP metadata area using
the data_meta pointer."

> +In the future, we'd like to support a case where XDP program can override
> +some of that metadata.

s/some of that metadata/some of the metadata used for building skbs/.

> +The plan of record is to make this path similar to ``bpf_redirect_map``
> +so the program can control which metadata is passed to the skb layer.

I'm not sure we are quite agreed on this part, just drop for now (it's
sorta covered by the above)?

> +bpf_redirect_map
> +================
> +
> +``bpf_redirect_map`` can redirect the frame to a different device.
> +In this case we don't know ahead of time whether that final consumer
> +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> +Additionally, the final consumer doesn't have access to the original
> +hardware descriptor and can't access any of the original metadata.

Replace this paragraph with: "``bpf_redirect_map`` can redirect the
frame to a different device. Some devices (like virtual ethernet links)
support running a second XDP program after the redirect. However, the
final consumer doesn't have access to the original hardware descriptor
and can't access any of the original metadata. The same applies to XDP
programs installed into devmaps and cpumaps."

> +For this use-case, only custom metadata is currently supported. If
> +the frame is eventually passed to the kernel, the skb created from such
> +a frame won't have any skb metadata. The ``XSK`` consumer will only
> +have access to the custom metadata.

Reword as:

"This means that for redirected packets only custom metadata is
currently supported, which has to be prepared by the initial XDP program
before redirect. If +the frame is eventually passed to the kernel, the
skb created from such a frame won't have any hardware metadata populated
in its skb. And if such a packet is later redirected into an ``XSK``,
that will also only have access to the custom metadata."

> +bpf_tail_call
> +=============
> +
> +No special handling here. Tail-called program operates on the same context
> +as the original one.

Replace this with a statement that it is in fact *not* supported in tail
maps :)

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-14 18:42         ` Stanislav Fomichev
@ 2022-12-14 23:46           ` Toke Høiland-Jørgensen
  2022-12-15  3:48             ` Stanislav Fomichev
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-12-14 23:46 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: David Vernet, bpf, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

Stanislav Fomichev <sdf@google.com> writes:

> On Wed, Dec 14, 2022 at 2:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
>> >>
>> >> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
>> >> > Document all current use-cases and assumptions.
>> >> >
>> >> > Cc: John Fastabend <john.fastabend@gmail.com>
>> >> > Cc: David Ahern <dsahern@gmail.com>
>> >> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
>> >> > Cc: Jakub Kicinski <kuba@kernel.org>
>> >> > Cc: Willem de Bruijn <willemb@google.com>
>> >> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>> >> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>> >> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>> >> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>> >> > Cc: Maryam Tahhan <mtahhan@redhat.com>
>> >> > Cc: xdp-hints@xdp-project.net
>> >> > Cc: netdev@vger.kernel.org
>> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> >> > ---
>> >> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>> >> >  1 file changed, 90 insertions(+)
>> >> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
>> >> >
>> >> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
>> >> > new file mode 100644
>> >> > index 000000000000..498eae718275
>> >> > --- /dev/null
>> >> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
>> >>
>> >> I think you need to add this to Documentation/bpf/index.rst. Or even
>> >> better, maybe it's time to add an xdp/ subdirectory and put all docs
>> >> there? Don't want to block your patchset from bikeshedding on this
>> >> point, so for now it's fine to just put it in
>> >> Documentation/bpf/index.rst until we figure that out.
>> >
>> > Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
>> > and reference form Documentation/networking/index.rst? Since it's more
>> > relevant to networking than the core bpf?
>> >
>> >> > @@ -0,0 +1,90 @@
>> >> > +===============
>> >> > +XDP RX Metadata
>> >> > +===============
>> >> > +
>> >> > +XDP programs support creating and passing custom metadata via
>> >> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
>> >> > +entities:
>> >>
>> >> Can you add a couple of sentences to this intro section that explains
>> >> what metadata is at a high level?
>> >
>> > I'm gonna copy-paste here what I'm adding, feel free to reply back if
>> > still unclear. (so we don't have to wait another week to discuss the
>> > changes)
>> >
>> > XDP programs support creating and passing custom metadata via
>> > ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
>> > about the packet: timestamps, hash, vlan and tunneling information, etc.
>> > This metadata can be consumed by the following entities:
>>
>> This is not really accurate, though? The metadata area itself can
>> contain whatever the XDP program wants it to, and I think you're
>> conflating the "old" usage for arbitrary storage with the driver-kfunc
>> metadata support.
>>
>> I think we should clear separate the two: the metadata area is just a
>> place to store data (and is not consumed by the stack, except that
>> TC-BPF programs can access it), and the driver kfuncs are just a general
>> way to get data out of the drivers (and has nothing to do with the
>> metadata area, you can just get the data into stack variables).
>>
>> While it would be good to have a documentation of the general metadata
>> area stuff somewhere, I don't think it necessarily have to be part of
>> this series, so maybe just stick to documenting the kfuncs?
>
> Maybe I can reword to something like below?
>
> The metadata can be used to store some extra information about the
> packet timestamps, hash, vlan and tunneling information, etc.
>
> This way we are not actually defining what it is, but hinting about
> how it's commonly used?

Sent another reply to the original patch with some comments that are
hopefully a bit more constructive :)

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-14 23:46           ` Toke Høiland-Jørgensen
@ 2022-12-15  3:48             ` Stanislav Fomichev
  2022-12-15 14:04               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2022-12-15  3:48 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David Vernet, bpf, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

On Wed, Dec 14, 2022 at 3:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > On Wed, Dec 14, 2022 at 2:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Stanislav Fomichev <sdf@google.com> writes:
> >>
> >> > On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
> >> >>
> >> >> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
> >> >> > Document all current use-cases and assumptions.
> >> >> >
> >> >> > Cc: John Fastabend <john.fastabend@gmail.com>
> >> >> > Cc: David Ahern <dsahern@gmail.com>
> >> >> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> >> >> > Cc: Jakub Kicinski <kuba@kernel.org>
> >> >> > Cc: Willem de Bruijn <willemb@google.com>
> >> >> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> >> >> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> >> >> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> >> >> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> >> >> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> >> >> > Cc: xdp-hints@xdp-project.net
> >> >> > Cc: netdev@vger.kernel.org
> >> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >> >> > ---
> >> >> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
> >> >> >  1 file changed, 90 insertions(+)
> >> >> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> >> >> >
> >> >> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> >> >> > new file mode 100644
> >> >> > index 000000000000..498eae718275
> >> >> > --- /dev/null
> >> >> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
> >> >>
> >> >> I think you need to add this to Documentation/bpf/index.rst. Or even
> >> >> better, maybe it's time to add an xdp/ subdirectory and put all docs
> >> >> there? Don't want to block your patchset from bikeshedding on this
> >> >> point, so for now it's fine to just put it in
> >> >> Documentation/bpf/index.rst until we figure that out.
> >> >
> >> > Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
> >> > and reference form Documentation/networking/index.rst? Since it's more
> >> > relevant to networking than the core bpf?
> >> >
> >> >> > @@ -0,0 +1,90 @@
> >> >> > +===============
> >> >> > +XDP RX Metadata
> >> >> > +===============
> >> >> > +
> >> >> > +XDP programs support creating and passing custom metadata via
> >> >> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> >> >> > +entities:
> >> >>
> >> >> Can you add a couple of sentences to this intro section that explains
> >> >> what metadata is at a high level?
> >> >
> >> > I'm gonna copy-paste here what I'm adding, feel free to reply back if
> >> > still unclear. (so we don't have to wait another week to discuss the
> >> > changes)
> >> >
> >> > XDP programs support creating and passing custom metadata via
> >> > ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
> >> > about the packet: timestamps, hash, vlan and tunneling information, etc.
> >> > This metadata can be consumed by the following entities:
> >>
> >> This is not really accurate, though? The metadata area itself can
> >> contain whatever the XDP program wants it to, and I think you're
> >> conflating the "old" usage for arbitrary storage with the driver-kfunc
> >> metadata support.
> >>
> >> I think we should clear separate the two: the metadata area is just a
> >> place to store data (and is not consumed by the stack, except that
> >> TC-BPF programs can access it), and the driver kfuncs are just a general
> >> way to get data out of the drivers (and has nothing to do with the
> >> metadata area, you can just get the data into stack variables).
> >>
> >> While it would be good to have a documentation of the general metadata
> >> area stuff somewhere, I don't think it necessarily have to be part of
> >> this series, so maybe just stick to documenting the kfuncs?
> >
> > Maybe I can reword to something like below?
> >
> > The metadata can be used to store some extra information about the
> > packet timestamps, hash, vlan and tunneling information, etc.
> >
> > This way we are not actually defining what it is, but hinting about
> > how it's commonly used?
>
> Sent another reply to the original patch with some comments that are
> hopefully a bit more constructive :)

Thanks, everything makes sense, will incorporate. I'll also try to
share the patches privately with you sometime tomorrow maybe; not
super comfortable sending them out with a bunch of changes on top of
your authorship without some kind of ack from you :-)

> -Toke
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-15  3:48             ` Stanislav Fomichev
@ 2022-12-15 14:04               ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-12-15 14:04 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: David Vernet, bpf, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

Stanislav Fomichev <sdf@google.com> writes:

> On Wed, Dec 14, 2022 at 3:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > On Wed, Dec 14, 2022 at 2:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Stanislav Fomichev <sdf@google.com> writes:
>> >>
>> >> > On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
>> >> >>
>> >> >> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
>> >> >> > Document all current use-cases and assumptions.
>> >> >> >
>> >> >> > Cc: John Fastabend <john.fastabend@gmail.com>
>> >> >> > Cc: David Ahern <dsahern@gmail.com>
>> >> >> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
>> >> >> > Cc: Jakub Kicinski <kuba@kernel.org>
>> >> >> > Cc: Willem de Bruijn <willemb@google.com>
>> >> >> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>> >> >> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>> >> >> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>> >> >> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>> >> >> > Cc: Maryam Tahhan <mtahhan@redhat.com>
>> >> >> > Cc: xdp-hints@xdp-project.net
>> >> >> > Cc: netdev@vger.kernel.org
>> >> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> >> >> > ---
>> >> >> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>> >> >> >  1 file changed, 90 insertions(+)
>> >> >> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
>> >> >> >
>> >> >> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
>> >> >> > new file mode 100644
>> >> >> > index 000000000000..498eae718275
>> >> >> > --- /dev/null
>> >> >> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
>> >> >>
>> >> >> I think you need to add this to Documentation/bpf/index.rst. Or even
>> >> >> better, maybe it's time to add an xdp/ subdirectory and put all docs
>> >> >> there? Don't want to block your patchset from bikeshedding on this
>> >> >> point, so for now it's fine to just put it in
>> >> >> Documentation/bpf/index.rst until we figure that out.
>> >> >
>> >> > Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
>> >> > and reference form Documentation/networking/index.rst? Since it's more
>> >> > relevant to networking than the core bpf?
>> >> >
>> >> >> > @@ -0,0 +1,90 @@
>> >> >> > +===============
>> >> >> > +XDP RX Metadata
>> >> >> > +===============
>> >> >> > +
>> >> >> > +XDP programs support creating and passing custom metadata via
>> >> >> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
>> >> >> > +entities:
>> >> >>
>> >> >> Can you add a couple of sentences to this intro section that explains
>> >> >> what metadata is at a high level?
>> >> >
>> >> > I'm gonna copy-paste here what I'm adding, feel free to reply back if
>> >> > still unclear. (so we don't have to wait another week to discuss the
>> >> > changes)
>> >> >
>> >> > XDP programs support creating and passing custom metadata via
>> >> > ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
>> >> > about the packet: timestamps, hash, vlan and tunneling information, etc.
>> >> > This metadata can be consumed by the following entities:
>> >>
>> >> This is not really accurate, though? The metadata area itself can
>> >> contain whatever the XDP program wants it to, and I think you're
>> >> conflating the "old" usage for arbitrary storage with the driver-kfunc
>> >> metadata support.
>> >>
>> >> I think we should clear separate the two: the metadata area is just a
>> >> place to store data (and is not consumed by the stack, except that
>> >> TC-BPF programs can access it), and the driver kfuncs are just a general
>> >> way to get data out of the drivers (and has nothing to do with the
>> >> metadata area, you can just get the data into stack variables).
>> >>
>> >> While it would be good to have a documentation of the general metadata
>> >> area stuff somewhere, I don't think it necessarily have to be part of
>> >> this series, so maybe just stick to documenting the kfuncs?
>> >
>> > Maybe I can reword to something like below?
>> >
>> > The metadata can be used to store some extra information about the
>> > packet timestamps, hash, vlan and tunneling information, etc.
>> >
>> > This way we are not actually defining what it is, but hinting about
>> > how it's commonly used?
>>
>> Sent another reply to the original patch with some comments that are
>> hopefully a bit more constructive :)
>
> Thanks, everything makes sense, will incorporate. I'll also try to
> share the patches privately with you sometime tomorrow maybe; not
> super comfortable sending them out with a bunch of changes on top of
> your authorship without some kind of ack from you :-)

OK, sounds good. Tomorrow (Friday) is my last day before the holidays,
but happy to look things over before I sign off :)

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
  2022-12-13  2:35 ` [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata Stanislav Fomichev
  2022-12-13 16:37   ` David Vernet
  2022-12-14 23:46   ` [xdp-hints] " Toke Høiland-Jørgensen
@ 2022-12-17  4:20   ` kernel test robot
  2 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2022-12-17  4:20 UTC (permalink / raw)
  To: Stanislav Fomichev, bpf
  Cc: oe-kbuild-all, ast, daniel, andrii, martin.lau, song, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern,
	Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev

[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]

Hi Stanislav,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/xdp-hints-via-kfuncs/20221213-103902
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20221213023605.737383-2-sdf%40google.com
patch subject: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
reproduce:
        # https://github.com/intel-lab-lkp/linux/commit/7f9cb6dfe1965bc856540839d159aa81314c7ce6
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Stanislav-Fomichev/xdp-hints-via-kfuncs/20221213-103902
        git checkout 7f9cb6dfe1965bc856540839d159aa81314c7ce6
        make menuconfig
        # enable CONFIG_COMPILE_TEST, CONFIG_WARN_MISSING_DOCUMENTS, CONFIG_WARN_ABI_ERRORS
        make htmldocs

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> Documentation/bpf/xdp-rx-metadata.rst: WARNING: document isn't included in any toctree

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 38906 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 6.1.0-rc8 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc-11 (Debian 11.3.0-8) 11.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=110300
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23900
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23900
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_COMPILE_TEST=y
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
# CONFIG_WATCH_QUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100
# end of Timers subsystem

CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
# end of BPF subsystem

CONFIG_PREEMPT_NONE_BUILD=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
# CONFIG_PREEMPT_DYNAMIC is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# end of RCU Subsystem

# CONFIG_IKCONFIG is not set
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_CGROUPS is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_TIME_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_BOOT_CONFIG is not set
# CONFIG_INITRAMFS_PRESERVE_MTIME is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_LD_ORPHAN_WARN=y
CONFIG_SYSCTL=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
# CONFIG_EXPERT is not set
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_RSEQ=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# end of Kernel Performance Events And Counters

# CONFIG_PROFILING is not set
# end of General setup

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_NR_GPIO=1024
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
# CONFIG_SMP is not set
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_MPPARSE=y
# CONFIG_GOLDFISH is not set
# CONFIG_X86_CPU_RESCTRL is not set
# CONFIG_X86_EXTENDED_PLATFORM is not set
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
# CONFIG_HYPERVISOR_GUEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_HPET_TIMER=y
CONFIG_DMI=y
CONFIG_NR_CPUS_RANGE_BEGIN=1
CONFIG_NR_CPUS_RANGE_END=1
CONFIG_NR_CPUS_DEFAULT=1
CONFIG_NR_CPUS=1
CONFIG_UP_LATE_INIT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
# CONFIG_X86_MCE is not set

#
# Performance monitoring
#
# CONFIG_PERF_EVENTS_AMD_POWER is not set
# CONFIG_PERF_EVENTS_AMD_UNCORE is not set
# CONFIG_PERF_EVENTS_AMD_BRS is not set
# end of Performance monitoring

CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
# CONFIG_X86_IOPL_IOPERM is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
# CONFIG_X86_5LEVEL is not set
CONFIG_X86_DIRECT_GBPAGES=y
# CONFIG_AMD_MEM_ENCRYPT is not set
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_X86_UMIP=y
CONFIG_CC_HAS_IBT=y
# CONFIG_X86_KERNEL_IBT is not set
# CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is not set
CONFIG_X86_INTEL_TSX_MODE_OFF=y
# CONFIG_X86_INTEL_TSX_MODE_ON is not set
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x1000000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_LEGACY_VSYSCALL_XONLY=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_MODIFY_LDT_SYSCALL=y
# CONFIG_STRICT_SIGALTSTACK_SIZE is not set
CONFIG_HAVE_LIVEPATCH=y
# end of Processor type and features

CONFIG_CC_HAS_SLS=y
CONFIG_CC_HAS_RETURN_THUNK=y
# CONFIG_SPECULATION_MITIGATIONS is not set
CONFIG_ARCH_HAS_ADD_PAGES=y
CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y

#
# Power management and ACPI options
#
# CONFIG_SUSPEND is not set
# CONFIG_PM is not set
CONFIG_ARCH_SUPPORTS_ACPI=y
# CONFIG_ACPI is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
# end of CPU Frequency scaling

#
# CPU Idle
#
# CONFIG_CPU_IDLE is not set
# end of CPU Idle
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
CONFIG_ISA_DMA_API=y
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
# CONFIG_IA32_EMULATION is not set
# CONFIG_X86_X32_ABI is not set
# end of Binary Emulations

CONFIG_HAVE_KVM=y
# CONFIG_VIRTUALIZATION is not set
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_GENERIC_ENTRY=y
# CONFIG_JUMP_LABEL is not set
# CONFIG_STATIC_CALL_SELFTEST is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_RUST=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_MERGE_VMAS=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
# CONFIG_SECCOMP is not set
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
# CONFIG_STACKPROTECTOR is not set
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_LTO_NONE=y
CONFIG_ARCH_SUPPORTS_CFI_CLANG=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_CONTEXT_TRACKING_USER=y
CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PUD=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_HAVE_ARCH_HUGE_VMALLOC=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=28
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_HAVE_OBJTOOL=y
CONFIG_HAVE_JUMP_LABEL_HACK=y
CONFIG_HAVE_NOINSTR_HACK=y
CONFIG_HAVE_NOINSTR_VALIDATION=y
CONFIG_HAVE_UACCESS_VALIDATION=y
CONFIG_HAVE_STACK_VALIDATION=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
# CONFIG_COMPAT_32BIT_TIME is not set
CONFIG_HAVE_ARCH_VMAP_STACK=y
# CONFIG_VMAP_STACK is not set
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_HAVE_STATIC_CALL=y
CONFIG_HAVE_STATIC_CALL_INLINE=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_PAGE_TABLE_CHECK=y
CONFIG_ARCH_HAS_ELFCORE_COMPAT=y
CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH=y
CONFIG_DYNAMIC_SIGFRAME=y
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y

#
# GCOV-based kernel profiling
#
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# CONFIG_GCC_PLUGINS is not set
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not set
CONFIG_BLOCK=y
# CONFIG_BLOCK_LEGACY_AUTOLOAD is not set
# CONFIG_BLK_DEV_BSGLIB is not set
# CONFIG_BLK_DEV_INTEGRITY is not set
# CONFIG_BLK_DEV_ZONED is not set
# CONFIG_BLK_WBT is not set
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_EFI_PARTITION=y
# end of Partition Types

#
# IO Schedulers
#
# CONFIG_MQ_IOSCHED_DEADLINE is not set
# CONFIG_MQ_IOSCHED_KYBER is not set
# CONFIG_IOSCHED_BFQ is not set
# end of IO Schedulers

CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y

#
# Executable file formats
#
# CONFIG_BINFMT_ELF is not set
# CONFIG_BINFMT_SCRIPT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
# CONFIG_SWAP is not set

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLAB_MERGE_DEFAULT is not set
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
# CONFIG_SLUB_STATS is not set
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
# CONFIG_SPARSEMEM_VMEMMAP is not set
CONFIG_HAVE_FAST_GUP=y
CONFIG_EXCLUSIVE_SYSTEM_RAM=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
# CONFIG_COMPACTION is not set
# CONFIG_PAGE_REPORTING is not set
CONFIG_PHYS_ADDR_T_64BIT=y
# CONFIG_KSM is not set
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_WANTS_THP_SWAP=y
# CONFIG_TRANSPARENT_HUGEPAGE is not set
CONFIG_NEED_PER_CPU_KM=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
# CONFIG_CMA is not set
CONFIG_GENERIC_EARLY_IOREMAP=y
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ARCH_HAS_PTE_DEVMAP=y
CONFIG_ZONE_DMA=y
CONFIG_ZONE_DMA32=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_PERCPU_STATS is not set

#
# GUP_TEST needs to have DEBUG_FS enabled
#
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_SECRETMEM=y
# CONFIG_ANON_VMA_NAME is not set
# CONFIG_USERFAULTFD is not set
# CONFIG_LRU_GEN is not set

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

# CONFIG_NET is not set

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
# CONFIG_EISA is not set
CONFIG_HAVE_PCI=y
# CONFIG_PCI is not set
# CONFIG_PCCARD is not set

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
# CONFIG_DEVTMPFS is not set
# CONFIG_STANDALONE is not set
# CONFIG_PREVENT_FIRMWARE_BUILD is not set

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
# CONFIG_FW_LOADER_COMPRESS is not set
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_ARM_INTEGRATOR_LM is not set
# CONFIG_BT1_APB is not set
# CONFIG_BT1_AXI is not set
# CONFIG_HISILICON_LPC is not set
# CONFIG_INTEL_IXP4XX_EB is not set
# CONFIG_QCOM_EBI2 is not set
# CONFIG_MHI_BUS is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# CONFIG_ARM_SCMI_PROTOCOL is not set
# end of ARM System Control and Management Interface Protocol

# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_DMIID is not set
# CONFIG_DMI_SYSFS is not set
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# CONFIG_FW_CFG_SYSFS is not set
# CONFIG_SYSFB_SIMPLEFB is not set
# CONFIG_BCM47XX_NVRAM is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
# CONFIG_PARPORT is not set
# CONFIG_BLK_DEV is not set

#
# NVME Support
#
# CONFIG_NVME_FC is not set
# end of NVME Support

#
# Misc devices
#
# CONFIG_DUMMY_IRQ is not set
# CONFIG_ATMEL_SSC is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_QCOM_COINCELL is not set
# CONFIG_SRAM is not set
# CONFIG_XILINX_SDFEC is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
# end of EEPROM support

#
# Texas Instruments shared transport line discipline
#
# end of Texas Instruments shared transport line discipline

#
# Altera FPGA firmware download module (requires I2C)
#
# CONFIG_ECHO is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
# CONFIG_RAID_ATTRS is not set
# CONFIG_SCSI is not set
# end of SCSI device support

# CONFIG_ATA is not set
# CONFIG_MD is not set
# CONFIG_TARGET_CORE is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# end of IEEE 1394 (FireWire) support

# CONFIG_MACINTOSH_DRIVERS is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
# CONFIG_RMI4_CORE is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
# CONFIG_LDISC_AUTOLOAD is not set

#
# Serial drivers
#
# CONFIG_SERIAL_8250 is not set

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_AMBA_PL010 is not set
# CONFIG_SERIAL_MESON is not set
# CONFIG_SERIAL_CLPS711X is not set
# CONFIG_SERIAL_SAMSUNG is not set
# CONFIG_SERIAL_TEGRA is not set
# CONFIG_SERIAL_IMX is not set
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_SH_SCI is not set
# CONFIG_SERIAL_MSM is not set
# CONFIG_SERIAL_VT8500 is not set
# CONFIG_SERIAL_OMAP is not set
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_TIMBERDALE is not set
# CONFIG_SERIAL_BCM63XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_MXS_AUART is not set
# CONFIG_SERIAL_MPS2_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_ST_ASC is not set
# CONFIG_SERIAL_STM32 is not set
# CONFIG_SERIAL_OWL is not set
# CONFIG_SERIAL_RDA is not set
# CONFIG_SERIAL_LITEUART is not set
# CONFIG_SERIAL_SUNPLUS is not set
# end of Serial drivers

# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NULL_TTY is not set
# CONFIG_SERIAL_DEV_BUS is not set
# CONFIG_VIRTIO_CONSOLE is not set
# CONFIG_IPMI_HANDLER is not set
# CONFIG_ASPEED_KCS_IPMI_BMC is not set
# CONFIG_NPCM7XX_KCS_IPMI_BMC is not set
# CONFIG_HW_RANDOM is not set
# CONFIG_MWAVE is not set
# CONFIG_DEVMEM is not set
# CONFIG_NVRAM is not set
# CONFIG_HANGCHECK_TIMER is not set
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
# CONFIG_RANDOM_TRUST_CPU is not set
# CONFIG_RANDOM_TRUST_BOOTLOADER is not set
# end of Character devices

#
# I2C support
#
# CONFIG_I2C is not set
# end of I2C support

# CONFIG_I3C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
# CONFIG_PPS is not set

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# end of PTP clock support

# CONFIG_PINCTRL is not set
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
# CONFIG_POWER_RESET is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_SUN4I_GPADC is not set
# CONFIG_MFD_AT91_USART is not set
# CONFIG_MFD_MADERA is not set
# CONFIG_MFD_EXYNOS_LPASS is not set
# CONFIG_MFD_MXS_LRADC is not set
# CONFIG_MFD_MX25_TSADC is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_PM8XXX is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SUN6I_PRCM is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_TQMX86 is not set
# CONFIG_MFD_STM32_LPTIMER is not set
# CONFIG_MFD_STM32_TIMERS is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
# CONFIG_RC_CORE is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_IMX_IPUV3_CORE is not set
# CONFIG_DRM is not set

#
# ARM devices
#
# end of ARM devices

#
# Frame buffer Devices
#
# CONFIG_FB is not set
# CONFIG_MMP_DISP is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
# CONFIG_LCD_CLASS_DEVICE is not set
# CONFIG_BACKLIGHT_CLASS_DEVICE is not set
# end of Backlight & LCD device support

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
# end of Console display driver support
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
# CONFIG_HID is not set
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set

#
# DMABUF options
#
# CONFIG_SYNC_FILE is not set
# CONFIG_DMABUF_HEAPS is not set
# end of DMABUF options

# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
# CONFIG_VFIO is not set
# CONFIG_VIRT_DRIVERS is not set
# CONFIG_VIRTIO_MENU is not set
# CONFIG_VHOST_MENU is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
# CONFIG_COMEDI is not set
# CONFIG_STAGING is not set
# CONFIG_CHROME_PLATFORMS is not set
# CONFIG_MELLANOX_PLATFORM is not set
# CONFIG_OLPC_XO175 is not set
# CONFIG_SURFACE_PLATFORMS is not set
# CONFIG_X86_PLATFORM_DEVICES is not set
# CONFIG_COMMON_CLK is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# CONFIG_BCM2835_TIMER is not set
# CONFIG_BCM_KONA_TIMER is not set
# CONFIG_DAVINCI_TIMER is not set
# CONFIG_DIGICOLOR_TIMER is not set
# CONFIG_OMAP_DM_TIMER is not set
# CONFIG_DW_APB_TIMER is not set
# CONFIG_FTTMR010_TIMER is not set
# CONFIG_IXP4XX_TIMER is not set
# CONFIG_MESON6_TIMER is not set
# CONFIG_OWL_TIMER is not set
# CONFIG_RDA_TIMER is not set
# CONFIG_SUN4I_TIMER is not set
# CONFIG_TEGRA_TIMER is not set
# CONFIG_VT8500_TIMER is not set
# CONFIG_NPCM7XX_TIMER is not set
# CONFIG_ASM9260_TIMER is not set
# CONFIG_CLKSRC_DBX500_PRCMU is not set
# CONFIG_CLPS711X_TIMER is not set
# CONFIG_MXS_TIMER is not set
# CONFIG_NSPIRE_TIMER is not set
# CONFIG_INTEGRATOR_AP_TIMER is not set
# CONFIG_CLKSRC_PISTACHIO is not set
# CONFIG_CLKSRC_STM32_LP is not set
# CONFIG_ARMV7M_SYSTICK is not set
# CONFIG_ATMEL_PIT is not set
# CONFIG_ATMEL_ST is not set
# CONFIG_CLKSRC_SAMSUNG_PWM is not set
# CONFIG_FSL_FTM_TIMER is not set
# CONFIG_OXNAS_RPS_TIMER is not set
# CONFIG_MTK_TIMER is not set
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_RENESAS_OSTM is not set
# CONFIG_SH_TIMER_TMU is not set
# CONFIG_EM_TIMER_STI is not set
# CONFIG_CLKSRC_PXA is not set
# CONFIG_TIMER_IMX_SYS_CTR is not set
# CONFIG_CLKSRC_ST_LPC is not set
# CONFIG_GXP_TIMER is not set
# CONFIG_MSC313E_TIMER is not set
# CONFIG_MICROCHIP_PIT64B is not set
# end of Clock Source drivers

# CONFIG_MAILBOX is not set
# CONFIG_IOMMU_SUPPORT is not set

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# CONFIG_MESON_CANVAS is not set
# CONFIG_MESON_CLK_MEASURE is not set
# CONFIG_MESON_GX_SOCINFO is not set
# CONFIG_MESON_MX_SOCINFO is not set
# end of Amlogic SoC drivers

#
# Apple SoC drivers
#
# CONFIG_APPLE_SART is not set
# end of Apple SoC drivers

#
# ASPEED SoC drivers
#
# CONFIG_ASPEED_LPC_CTRL is not set
# CONFIG_ASPEED_LPC_SNOOP is not set
# CONFIG_ASPEED_UART_ROUTING is not set
# CONFIG_ASPEED_P2A_CTRL is not set
# CONFIG_ASPEED_SOCINFO is not set
# end of ASPEED SoC drivers

# CONFIG_AT91_SOC_ID is not set
# CONFIG_AT91_SOC_SFR is not set

#
# Broadcom SoC drivers
#
# CONFIG_SOC_BCM63XX is not set
# CONFIG_SOC_BRCMSTB is not set
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# fujitsu SoC drivers
#
# end of fujitsu SoC drivers

#
# i.MX SoC drivers
#
# CONFIG_SOC_IMX8M is not set
# CONFIG_SOC_IMX9 is not set
# end of i.MX SoC drivers

#
# IXP4xx SoC drivers
#
# CONFIG_IXP4XX_QMGR is not set
# CONFIG_IXP4XX_NPE is not set
# end of IXP4xx SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# CONFIG_LITEX_SOC_CONTROLLER is not set
# end of Enable LiteX SoC Builder specific drivers

#
# MediaTek SoC drivers
#
# CONFIG_MTK_CMDQ is not set
# CONFIG_MTK_DEVAPC is not set
# CONFIG_MTK_INFRACFG is not set
# CONFIG_MTK_MMSYS is not set
# end of MediaTek SoC drivers

#
# Qualcomm SoC drivers
#
# CONFIG_QCOM_GENI_SE is not set
# CONFIG_QCOM_GSBI is not set
# CONFIG_QCOM_LLCC is not set
# CONFIG_QCOM_RPMH is not set
# CONFIG_QCOM_SPM is not set
# CONFIG_QCOM_ICC_BWMON is not set
# end of Qualcomm SoC drivers

# CONFIG_SOC_RENESAS is not set
# CONFIG_ROCKCHIP_GRF is not set
# CONFIG_SOC_SAMSUNG is not set
# CONFIG_SOC_TI is not set
# CONFIG_UX500_SOC_ID is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_PWM is not set

#
# IRQ chip support
#
# CONFIG_AL_FIC is not set
# CONFIG_RENESAS_INTC_IRQPIN is not set
# CONFIG_RENESAS_IRQC is not set
# CONFIG_RENESAS_RZA1_IRQC is not set
# CONFIG_RENESAS_RZG2L_IRQC is not set
# CONFIG_SL28CPLD_INTC is not set
# CONFIG_TS4800_IRQ is not set
# CONFIG_INGENIC_TCU_IRQ is not set
# CONFIG_IRQ_UNIPHIER_AIDET is not set
# CONFIG_MESON_IRQ_GPIO is not set
# CONFIG_IMX_IRQSTEER is not set
# CONFIG_IMX_INTMUX is not set
# CONFIG_EXYNOS_IRQ_COMBINER is not set
# CONFIG_MST_IRQ is not set
# CONFIG_MCHP_EIC is not set
# CONFIG_SUNPLUS_SP7021_INTC is not set
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_PHY_PISTACHIO_USB is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_PHY_BCM63XX_USBH is not set
# CONFIG_BCM_KONA_USB2_PHY is not set
# end of PHY drivers for Broadcom platforms

# CONFIG_PHY_HI6220_USB is not set
# CONFIG_PHY_HI3660_USB is not set
# CONFIG_PHY_HI3670_USB is not set
# CONFIG_PHY_HI3670_PCIE is not set
# CONFIG_PHY_HISTB_COMBPHY is not set
# CONFIG_PHY_HISI_INNO_USB2 is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_PHY_PXA_USB is not set
# CONFIG_PHY_MMP3_USB is not set
# CONFIG_PHY_MMP3_HSIC is not set
# CONFIG_PHY_MT7621_PCI is not set
# CONFIG_PHY_RALINK_USB is not set
# CONFIG_PHY_RCAR_GEN3_USB3 is not set
# CONFIG_PHY_ROCKCHIP_DPHY_RX0 is not set
# CONFIG_PHY_ROCKCHIP_PCIE is not set
# CONFIG_PHY_ROCKCHIP_SNPS_PCIE3 is not set
# CONFIG_PHY_EXYNOS_MIPI_VIDEO is not set
# CONFIG_PHY_SAMSUNG_USB2 is not set
# CONFIG_PHY_ST_SPEAR1310_MIPHY is not set
# CONFIG_PHY_ST_SPEAR1340_MIPHY is not set
# CONFIG_PHY_TEGRA194_P2U is not set
# CONFIG_PHY_DA8XX_USB is not set
# CONFIG_OMAP_CONTROL_PHY is not set
# CONFIG_TI_PIPE3 is not set
# CONFIG_PHY_INTEL_KEEMBAY_EMMC is not set
# CONFIG_PHY_INTEL_KEEMBAY_USB is not set
# CONFIG_PHY_INTEL_LGM_EMMC is not set
# CONFIG_PHY_XILINX_ZYNQMP is not set
# end of PHY Subsystem

# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# CONFIG_ARM_CCN is not set
# CONFIG_ARM_CMN is not set
# CONFIG_FSL_IMX8_DDR_PMU is not set
# CONFIG_XGENE_PMU is not set
# CONFIG_ARM_DMC620_PMU is not set
# CONFIG_MARVELL_CN10K_TAD_PMU is not set
# CONFIG_ALIBABA_UNCORE_DRW_PMU is not set
# CONFIG_MARVELL_CN10K_DDR_PMU is not set
# end of Performance monitor support

# CONFIG_RAS is not set

#
# Android
#
# CONFIG_ANDROID_BINDER_IPC is not set
# end of Android

# CONFIG_DAX is not set
# CONFIG_NVMEM is not set

#
# HW tracing support
#
# CONFIG_STM is not set
# CONFIG_INTEL_TH is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_TEE is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
# CONFIG_EXT4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_BTRFS_FS is not set
# CONFIG_NILFS2_FS is not set
# CONFIG_F2FS_FS is not set
CONFIG_EXPORTFS=y
# CONFIG_EXPORTFS_BLOCK_OPS is not set
CONFIG_FILE_LOCKING=y
# CONFIG_FS_ENCRYPTION is not set
# CONFIG_FS_VERITY is not set
# CONFIG_DNOTIFY is not set
# CONFIG_INOTIFY_USER is not set
# CONFIG_FANOTIFY is not set
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_FUSE_FS is not set
# CONFIG_OVERLAY_FS is not set

#
# Caches
#
# CONFIG_FSCACHE is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
# CONFIG_UDF_FS is not set
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROC_CHILDREN is not set
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
# CONFIG_TMPFS is not set
# CONFIG_HUGETLBFS is not set
CONFIG_ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
# CONFIG_CONFIGFS_FS is not set
# end of Pseudo filesystems

# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NLS is not set
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
# CONFIG_FORTIFY_SOURCE is not set
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
CONFIG_CC_HAS_ZERO_CALL_USED_REGS=y
# CONFIG_ZERO_CALL_USED_REGS is not set
# end of Memory initialization

CONFIG_RANDSTRUCT_NONE=y
# end of Kernel hardening options
# end of Security options

# CONFIG_CRYPTO is not set

#
# Library routines
#
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
# CONFIG_CORDIC is not set
# CONFIG_PRIME_NUMBERS is not set
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
# CONFIG_CRYPTO_LIB_CHACHA is not set
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# end of Crypto library routines

# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
# CONFIG_CRC_T10DIF is not set
# CONFIG_CRC64_ROCKSOFT is not set
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC64 is not set
# CONFIG_CRC4 is not set
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
# CONFIG_CRC8 is not set
# CONFIG_RANDOM32_SELFTEST is not set
# CONFIG_XZ_DEC is not set
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_SWIOTLB=y
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_IRQ_POLL is not set
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_HAS_COPY_MC=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_SBITMAP=y
# CONFIG_PARMAN is not set
# CONFIG_OBJAGG is not set
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
# CONFIG_PRINTK_TIME is not set
# CONFIG_PRINTK_CALLER is not set
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DYNAMIC_DEBUG_CORE is not set
# CONFIG_SYMBOLIC_ERRNAME is not set
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

# CONFIG_DEBUG_KERNEL is not set

#
# Compile-time checks and compiler options
#
CONFIG_AS_HAS_NON_CONST_LEB128=y
CONFIG_FRAME_WARN=2048
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_OBJTOOL=y
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
# CONFIG_MAGIC_SYSRQ is not set
# CONFIG_DEBUG_FS is not set
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_HAVE_KCSAN_COMPILER=y
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# end of Networking Debugging

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_PAGE_TABLE_CHECK is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP=y
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
CONFIG_HAVE_ARCH_KMSAN=y
# end of Memory Debugging

#
# Debug Oops, Lockups and Hangs
#
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_WW_MUTEX_SELFTEST is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set

#
# Debug kernel data structures
#
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# end of Debug kernel data structures

#
# RCU Debugging
#
# end of RCU Debugging

CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_HAVE_RETHOOK=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_NO_PATCHABLE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_OBJTOOL_MCOUNT=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_HAVE_BUILDTIME_MCOUNT_SORT=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y

#
# x86 Debugging
#
# CONFIG_X86_VERBOSE_BOOTUP is not set
CONFIG_EARLY_PRINTK=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_UNWINDER_ORC=y
# CONFIG_UNWINDER_FRAME_POINTER is not set
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
# CONFIG_KUNIT is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
# CONFIG_RUNTIME_TESTING_MENU is not set
CONFIG_ARCH_USE_MEMTEST=y
# CONFIG_MEMTEST is not set
# end of Kernel Testing and Coverage

#
# Rust hacking
#
# end of Rust hacking

CONFIG_WARN_MISSING_DOCUMENTS=y
CONFIG_WARN_ABI_ERRORS=y
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2022-12-17  4:21 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-13  2:35 [PATCH bpf-next v4 00/15] xdp: hints via kfuncs Stanislav Fomichev
2022-12-13  2:35 ` [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata Stanislav Fomichev
2022-12-13 16:37   ` David Vernet
2022-12-13 20:42     ` Stanislav Fomichev
2022-12-14 10:34       ` [xdp-hints] " Toke Høiland-Jørgensen
2022-12-14 18:42         ` Stanislav Fomichev
2022-12-14 23:46           ` Toke Høiland-Jørgensen
2022-12-15  3:48             ` Stanislav Fomichev
2022-12-15 14:04               ` Toke Høiland-Jørgensen
2022-12-14 23:46   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-12-17  4:20   ` kernel test robot
2022-12-13  2:35 ` [PATCH bpf-next v4 02/15] bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded Stanislav Fomichev
2022-12-13  2:35 ` [PATCH bpf-next v4 03/15] bpf: Introduce device-bound XDP programs Stanislav Fomichev
2022-12-13 23:25   ` Martin KaFai Lau
2022-12-14 18:42     ` Stanislav Fomichev
2022-12-13  2:35 ` [PATCH bpf-next v4 04/15] selftests/bpf: Update expected test_offload.py messages Stanislav Fomichev
2022-12-13  2:35 ` [PATCH bpf-next v4 05/15] bpf: XDP metadata RX kfuncs Stanislav Fomichev
2022-12-13 17:00   ` David Vernet
2022-12-13 20:42     ` Stanislav Fomichev
2022-12-13 21:45       ` David Vernet
2022-12-14  1:53   ` Martin KaFai Lau
2022-12-14 18:43     ` Stanislav Fomichev
2022-12-14 10:54   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-12-14 18:43     ` Stanislav Fomichev
2022-12-13  2:35 ` [PATCH bpf-next v4 06/15] bpf: Support consuming XDP HW metadata from fext programs Stanislav Fomichev
2022-12-14  1:45   ` Martin KaFai Lau
2022-12-14 10:41     ` Toke Høiland-Jørgensen
2022-12-14 18:43       ` Stanislav Fomichev
2022-12-14 22:19         ` Toke Høiland-Jørgensen
2022-12-13  2:35 ` [PATCH bpf-next v4 07/15] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-12-13  2:35 ` [PATCH bpf-next v4 08/15] veth: Support RX XDP metadata Stanislav Fomichev
2022-12-13 15:55   ` Jesper Dangaard Brouer
2022-12-13 20:42     ` Stanislav Fomichev
2022-12-14  9:48       ` Jesper Dangaard Brouer
2022-12-14 10:47         ` [xdp-hints] " Toke Høiland-Jørgensen
2022-12-14 18:09           ` Martin KaFai Lau
2022-12-14 18:44             ` Stanislav Fomichev
2022-12-13  2:35 ` [PATCH bpf-next v4 09/15] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-12-13  2:36 ` [PATCH bpf-next v4 10/15] net/mlx4_en: Introduce wrapper for xdp_buff Stanislav Fomichev
2022-12-13  8:56   ` Tariq Toukan
2022-12-13  2:36 ` [PATCH bpf-next v4 11/15] net/mlx4_en: Support RX XDP metadata Stanislav Fomichev
2022-12-13  8:56   ` Tariq Toukan
2022-12-13  2:36 ` [PATCH bpf-next v4 12/15] xsk: Add cb area to struct xdp_buff_xsk Stanislav Fomichev
2022-12-13  2:36 ` [PATCH bpf-next v4 13/15] net/mlx5e: Introduce wrapper for xdp_buff Stanislav Fomichev
2022-12-13  2:36 ` [PATCH bpf-next v4 14/15] net/mlx5e: Support RX XDP metadata Stanislav Fomichev
2022-12-13  2:36 ` [PATCH bpf-next v4 15/15] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.