[PATCH bpf-next v3 00/12] xdp: hints via kfuncs

* [PATCH bpf-next v3 00/12] xdp: hints via kfuncs
@ 2022-12-06  2:45 Stanislav Fomichev
  2022-12-06  2:45 ` [PATCH bpf-next v3 01/12] bpf: Document XDP RX metadata Stanislav Fomichev
                   ` (12 more replies)
  0 siblings, 13 replies; 61+ messages in thread
From: Stanislav Fomichev @ 2022-12-06  2:45 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
	Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev

Please see the first patch in the series for the overall
design and use-cases.

Changes since v3:

- Rework prog->bound_netdev refcounting (Jakub/Marin)

  Now it's based on the offload.c framework. It mostly fits, except
  I had to automatically insert a HT entry for the netdev. In the
  offloaded case, the netdev is added via a call to
  bpf_offload_dev_netdev_register from the driver init path; with
  a dev-bound programs, we have to manually add (and remove) the entry.

  As suggested by Toke, I'm also prohibiting putting dev-bound programs
  into prog-array map; essentially prohibiting tail calling into it.
  I'm also disabling freplace of the dev-bound programs. Both of those
  restrictions can be loosened up eventually.
  Note that we don't require maps to be dev-bound when the program is
  dev-bound.

  Confirmed with the test_offload.py that the existing parts are still
  operational.

- Fix compile issues with CONFIG_NET=n and mlx5 driver (lkp@intel.com)

Changes since v2:

- Rework bpf_prog_aux->xdp_netdev refcnt (Martin)

  Switched to dropping the count early, after loading / verification is
  done. At attach time, the pointer value is used only for comparing
  the actual netdev at attach vs netdev at load.

  (potentially can be a problem if the same slub slot is reused
  for another netdev later on?)

- Use correct RX queue number in xdp_hw_metadata (Toke / Jakub)

- Fix wrongly placed '*cnt=0' in fixup_kfunc_call after merge (Toke)

- Fix sorted BTF_SET8_START (Toke)

  Introduce old-school unsorted BTF_ID_LIST for lookup purposes.

- Zero-initialize mlx4_xdp_buff (Tariq)

- Separate common timestamp handling into mlx4_en_get_hwtstamp (Tariq)

- mlx5 patches (Toke)

  Note, I've renamed the following for consistency with the rest:
  - s/mlx5_xdp_ctx/mlx5_xdp_buff/
  - s/mctx/mxbuf/

Changes since v1:

- Drop xdp->skb metadata path (Jakub)

  No consensus yet on exposing xdp_skb_metadata in UAPI. Exploring
  whether everyone would be ok with kfunc to access that part..
  Will follow up separately.

- Drop kfunc unrolling (Alexei)

  Starting with simple code to resolve per-device ndo kfuncs.
  We can always go back to unrolling and keep the same kfuncs
  interface in the future.

- Add rx hash metadata (Toke)

  Not adding the rest (csum/hash_type/etc), I'd like us to agree on
  the framework.

- use dev_get_by_index and add proper refcnt (Toke)

Changes since last RFC:

- drop ice/bnxt example implementation (Alexander)

  -ENOHARDWARE to test

- fix/test mlx4 implementation

  Confirmed that I get reasonable looking timestamp.
  The last patch in the series is the small xsk program that can
  be used to dump incoming metadata.

- bpf_push64/bpf_pop64 (Alexei)

  x86_64+arm64(untested)+disassembler

- struct xdp_to_skb_metadata -> struct xdp_skb_metadata (Toke)

  s/xdp_to_skb/xdp_skb/

- Documentation/bpf/xdp-rx-metadata.rst

  Documents functionality, assumptions and limitations.

- bpf_xdp_metadata_export_to_skb returns true/false (Martin)

  Plus xdp_md->skb_metadata field to access it.

- BPF_F_XDP_HAS_METADATA flag (Toke/Martin)

  Drop magic, use the flag instead.

- drop __randomize_layout

  Not sure it's possible to sanely expose it via UAPI. Because every
  .o potentially gets its own randomized layout, test_progs
  refuses to link.

- remove __net_timestamp in veth driver (John/Jesper)

  Instead, calling ktime_get from the kfunc; enough for the selftests.

Future work on RX side:

- Support more devices besides veth and mlx4
- Support more metadata besides RX timestamp.
- Convert skb_metadata_set() callers to xdp_convert_skb_metadata()
  which handles extra xdp_skb_metadata

Prior art (to record pros/cons for different approaches):

- Stable UAPI approach:
  https://lore.kernel.org/bpf/20220628194812.1453059-1-alexandr.lobakin@intel.com/
- Metadata+BTF_ID appoach:
  https://lore.kernel.org/bpf/166256538687.1434226.15760041133601409770.stgit@firesoul/
- v1:
  https://lore.kernel.org/bpf/20221115030210.3159213-1-sdf@google.com/T/#t
- kfuncs v2 RFC:
  https://lore.kernel.org/bpf/20221027200019.4106375-1-sdf@google.com/
- kfuncs v1 RFC:
  https://lore.kernel.org/bpf/20221104032532.1615099-1-sdf@google.com/

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org

Stanislav Fomichev (9):
  bpf: Document XDP RX metadata
  bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded
  bpf: XDP metadata RX kfuncs
  veth: Introduce veth_xdp_buff wrapper for xdp_buff
  veth: Support RX XDP metadata
  selftests/bpf: Verify xdp_metadata xdp->af_xdp path
  mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff
  mxl4: Support RX XDP metadata
  selftests/bpf: Simple program to dump XDP RX metadata

Toke Høiland-Jørgensen (3):
  xsk: Add cb area to struct xdp_buff_xsk
  mlx5: Introduce mlx5_xdp_buff wrapper for xdp_buff
  mlx5: Support RX XDP metadata

 Documentation/bpf/xdp-rx-metadata.rst         |  90 ++++
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |  13 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  10 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  68 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  11 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  32 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.h  |  13 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  35 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |   2 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   4 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  98 ++---
 drivers/net/veth.c                            |  88 ++--
 include/linux/bpf.h                           |  26 +-
 include/linux/mlx4/device.h                   |   7 +
 include/linux/netdevice.h                     |   5 +
 include/net/xdp.h                             |  29 ++
 include/net/xsk_buff_pool.h                   |   5 +
 include/uapi/linux/bpf.h                      |   5 +
 kernel/bpf/arraymap.c                         |  17 +-
 kernel/bpf/core.c                             |   2 +-
 kernel/bpf/offload.c                          | 162 +++++--
 kernel/bpf/syscall.c                          |  25 +-
 kernel/bpf/verifier.c                         |  42 +-
 net/core/dev.c                                |   7 +-
 net/core/filter.c                             |   2 +-
 net/core/xdp.c                                |  58 +++
 tools/include/uapi/linux/bpf.h                |   5 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   8 +-
 .../selftests/bpf/prog_tests/xdp_metadata.c   | 394 +++++++++++++++++
 .../selftests/bpf/progs/xdp_hw_metadata.c     |  93 ++++
 .../selftests/bpf/progs/xdp_metadata.c        |  70 +++
 .../selftests/bpf/progs/xdp_metadata2.c       |  15 +
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 405 ++++++++++++++++++
 tools/testing/selftests/bpf/xdp_metadata.h    |   7 +
 36 files changed, 1688 insertions(+), 167 deletions(-)
 create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata2.c
 create mode 100644 tools/testing/selftests/bpf/xdp_hw_metadata.c
 create mode 100644 tools/testing/selftests/bpf/xdp_metadata.h

-- 
2.39.0.rc0.267.gcb52ba06e7-goog

^ permalink raw reply	[flat|nested] 61+ messages in thread