All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 bpf 0/5] New netdev feature flags for XDP
@ 2020-12-04 10:28 ` alardam
  0 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:28 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Implement support for checking if a netdev has native XDP and AF_XDP zero
copy support. Previously, there was no way to do this other than to try
to create an AF_XDP socket on the interface or load an XDP program and
see if it worked. This commit changes this by extending existing
netdev_features in the following way:
 * xdp        - full XDP support (XDP_{TX, PASS, DROP, ABORT, REDIRECT})
 * af-xdp-zc  - AF_XDP zero copy support
NICs supporting these features are updated by turning the corresponding
netdev feature flags on.

NOTE:
 Only the compilation check was performed for:
  - ice, 
  - igb,
  - mlx5, 
  - mlx4.
  - bnxt, 
  - dpaa2, 
  - mvmeta, 
  - mvpp2, 
  - qede,
  - sfc, 
  - netsec, 
  - cpsw, 
  - xen, 
  - netronome
  - ena
  - virtio_net.

Libbpf library is extended in order to provide a simple API for gathering
information about XDP supported capabilities of a netdev. This API
utilizes netlink interface towards kernel. With this function it is
possible to get xsk supported options for netdev beforehand.
The new API is used in core xsk code as well as in the xdpsock sample.

These new flags also solve the problem with strict recognition of zero
copy support. The problem is that there are drivers out there that only
support XDP partially, so it is possible to successfully load the XDP
program in native mode, but it will still not be able to support zero-copy
as it does not have XDP_REDIRECT support. With af-xdp-zc flag the check
is possible and trivial.

Changes since v1:
 * Replace netdev_feature flags variable with a bitmap of XDP-specific
   properties. New kernel and uapi interfaces are added to handle access
   to the XDP netdev properties bitmap. [Toke]

 * Set more fine grained XPD properties for netdevs when necessary. [Toke]

 * Extend ethtool netlink interface in order to get access to the XDP
   bitmap (XDP_PROPERTIES_GET). [Toke]

 * Removed the libbpf patches for now.
---
Marek Majtyka (5):
  net: ethtool: add xdp properties flag set
  drivers/net: turn XDP properties on
  xsk: add usage of xdp properties flags
  xsk: add check for full support of XDP in bind
  ethtool: provide xdp info with XDP_PROPERTIES_GET

 .../networking/netdev-xdp-properties.rst      | 42 ++++++++
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  2 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  1 +
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  3 +
 drivers/net/ethernet/intel/ice/ice_main.c     |  4 +
 drivers/net/ethernet/intel/igb/igb_main.c     |  2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  3 +
 drivers/net/ethernet/marvell/mvneta.c         |  3 +
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  3 +
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  2 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  3 +
 .../ethernet/netronome/nfp/nfp_net_common.c   |  5 +
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  2 +
 drivers/net/ethernet/sfc/efx.c                |  2 +
 drivers/net/ethernet/socionext/netsec.c       |  2 +
 drivers/net/ethernet/ti/cpsw.c                |  3 +
 drivers/net/ethernet/ti/cpsw_new.c            |  2 +
 drivers/net/tun.c                             |  4 +
 drivers/net/veth.c                            |  2 +
 drivers/net/virtio_net.c                      |  2 +
 drivers/net/xen-netfront.c                    |  2 +
 include/linux/netdevice.h                     |  2 +
 include/linux/xdp_properties.h                | 53 +++++++++++
 include/net/xdp.h                             | 95 +++++++++++++++++++
 include/net/xdp_sock_drv.h                    | 10 ++
 include/uapi/linux/ethtool.h                  |  1 +
 include/uapi/linux/ethtool_netlink.h          | 14 +++
 include/uapi/linux/if_xdp.h                   |  1 +
 include/uapi/linux/xdp_properties.h           | 32 +++++++
 net/ethtool/Makefile                          |  2 +-
 net/ethtool/common.c                          | 11 +++
 net/ethtool/common.h                          |  4 +
 net/ethtool/netlink.c                         | 38 +++++---
 net/ethtool/netlink.h                         |  2 +
 net/ethtool/strset.c                          |  5 +
 net/ethtool/xdp.c                             | 76 +++++++++++++++
 net/xdp/xsk.c                                 |  4 +-
 net/xdp/xsk_buff_pool.c                       | 20 +++-
 tools/include/uapi/linux/if_xdp.h             |  1 +
 tools/lib/bpf/xsk.c                           |  3 +
 41 files changed, 449 insertions(+), 20 deletions(-)
 create mode 100644 Documentation/networking/netdev-xdp-properties.rst
 create mode 100644 include/linux/xdp_properties.h
 create mode 100644 include/uapi/linux/xdp_properties.h
 create mode 100644 net/ethtool/xdp.c


base-commit: eceae70bdeaeb6b8ceb662983cf663ff352fbc96
-- 
2.27.0


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 0/5] New netdev feature flags for XDP
@ 2020-12-04 10:28 ` alardam
  0 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:28 UTC (permalink / raw)
  To: intel-wired-lan

From: Marek Majtyka <marekx.majtyka@intel.com>

Implement support for checking if a netdev has native XDP and AF_XDP zero
copy support. Previously, there was no way to do this other than to try
to create an AF_XDP socket on the interface or load an XDP program and
see if it worked. This commit changes this by extending existing
netdev_features in the following way:
 * xdp        - full XDP support (XDP_{TX, PASS, DROP, ABORT, REDIRECT})
 * af-xdp-zc  - AF_XDP zero copy support
NICs supporting these features are updated by turning the corresponding
netdev feature flags on.

NOTE:
 Only the compilation check was performed for:
  - ice, 
  - igb,
  - mlx5, 
  - mlx4.
  - bnxt, 
  - dpaa2, 
  - mvmeta, 
  - mvpp2, 
  - qede,
  - sfc, 
  - netsec, 
  - cpsw, 
  - xen, 
  - netronome
  - ena
  - virtio_net.

Libbpf library is extended in order to provide a simple API for gathering
information about XDP supported capabilities of a netdev. This API
utilizes netlink interface towards kernel. With this function it is
possible to get xsk supported options for netdev beforehand.
The new API is used in core xsk code as well as in the xdpsock sample.

These new flags also solve the problem with strict recognition of zero
copy support. The problem is that there are drivers out there that only
support XDP partially, so it is possible to successfully load the XDP
program in native mode, but it will still not be able to support zero-copy
as it does not have XDP_REDIRECT support. With af-xdp-zc flag the check
is possible and trivial.

Changes since v1:
 * Replace netdev_feature flags variable with a bitmap of XDP-specific
   properties. New kernel and uapi interfaces are added to handle access
   to the XDP netdev properties bitmap. [Toke]

 * Set more fine grained XPD properties for netdevs when necessary. [Toke]

 * Extend ethtool netlink interface in order to get access to the XDP
   bitmap (XDP_PROPERTIES_GET). [Toke]

 * Removed the libbpf patches for now.
---
Marek Majtyka (5):
  net: ethtool: add xdp properties flag set
  drivers/net: turn XDP properties on
  xsk: add usage of xdp properties flags
  xsk: add check for full support of XDP in bind
  ethtool: provide xdp info with XDP_PROPERTIES_GET

 .../networking/netdev-xdp-properties.rst      | 42 ++++++++
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  2 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  1 +
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  3 +
 drivers/net/ethernet/intel/ice/ice_main.c     |  4 +
 drivers/net/ethernet/intel/igb/igb_main.c     |  2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  3 +
 drivers/net/ethernet/marvell/mvneta.c         |  3 +
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  3 +
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  2 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  3 +
 .../ethernet/netronome/nfp/nfp_net_common.c   |  5 +
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  2 +
 drivers/net/ethernet/sfc/efx.c                |  2 +
 drivers/net/ethernet/socionext/netsec.c       |  2 +
 drivers/net/ethernet/ti/cpsw.c                |  3 +
 drivers/net/ethernet/ti/cpsw_new.c            |  2 +
 drivers/net/tun.c                             |  4 +
 drivers/net/veth.c                            |  2 +
 drivers/net/virtio_net.c                      |  2 +
 drivers/net/xen-netfront.c                    |  2 +
 include/linux/netdevice.h                     |  2 +
 include/linux/xdp_properties.h                | 53 +++++++++++
 include/net/xdp.h                             | 95 +++++++++++++++++++
 include/net/xdp_sock_drv.h                    | 10 ++
 include/uapi/linux/ethtool.h                  |  1 +
 include/uapi/linux/ethtool_netlink.h          | 14 +++
 include/uapi/linux/if_xdp.h                   |  1 +
 include/uapi/linux/xdp_properties.h           | 32 +++++++
 net/ethtool/Makefile                          |  2 +-
 net/ethtool/common.c                          | 11 +++
 net/ethtool/common.h                          |  4 +
 net/ethtool/netlink.c                         | 38 +++++---
 net/ethtool/netlink.h                         |  2 +
 net/ethtool/strset.c                          |  5 +
 net/ethtool/xdp.c                             | 76 +++++++++++++++
 net/xdp/xsk.c                                 |  4 +-
 net/xdp/xsk_buff_pool.c                       | 20 +++-
 tools/include/uapi/linux/if_xdp.h             |  1 +
 tools/lib/bpf/xsk.c                           |  3 +
 41 files changed, 449 insertions(+), 20 deletions(-)
 create mode 100644 Documentation/networking/netdev-xdp-properties.rst
 create mode 100644 include/linux/xdp_properties.h
 create mode 100644 include/uapi/linux/xdp_properties.h
 create mode 100644 net/ethtool/xdp.c


base-commit: eceae70bdeaeb6b8ceb662983cf663ff352fbc96
-- 
2.27.0


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 10:28 ` [Intel-wired-lan] " alardam
@ 2020-12-04 10:28   ` alardam
  -1 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:28 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Implement support for checking what kind of xdp functionality a netdev
supports. Previously, there was no way to do this other than to try
to create an AF_XDP socket on the interface or load an XDP program and see
if it worked. This commit changes this by adding a new variable which
describes all xdp supported functions on pretty detailed level:
 - aborted
 - drop
 - pass
 - tx
 - redirect
 - zero copy
 - hardware offload.

Zerocopy mode requires that redirect xdp operation is implemented
in a driver and the driver supports also zero copy mode.
Full mode requires that all xdp operation are implemented in the driver.
Basic mode is just full mode without redirect operation.

Initially, these new flags are disabled for all drivers by default.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 .../networking/netdev-xdp-properties.rst      | 42 ++++++++
 include/linux/netdevice.h                     |  2 +
 include/linux/xdp_properties.h                | 53 +++++++++++
 include/net/xdp.h                             | 95 +++++++++++++++++++
 include/net/xdp_sock_drv.h                    | 10 ++
 include/uapi/linux/ethtool.h                  |  1 +
 include/uapi/linux/xdp_properties.h           | 32 +++++++
 net/ethtool/common.c                          | 11 +++
 net/ethtool/common.h                          |  4 +
 net/ethtool/strset.c                          |  5 +
 10 files changed, 255 insertions(+)
 create mode 100644 Documentation/networking/netdev-xdp-properties.rst
 create mode 100644 include/linux/xdp_properties.h
 create mode 100644 include/uapi/linux/xdp_properties.h

diff --git a/Documentation/networking/netdev-xdp-properties.rst b/Documentation/networking/netdev-xdp-properties.rst
new file mode 100644
index 000000000000..4a434a1c512b
--- /dev/null
+++ b/Documentation/networking/netdev-xdp-properties.rst
@@ -0,0 +1,42 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Netdev XDP properties
+=====================
+
+ * XDP PROPERTIES FLAGS
+
+Following netdev xdp properties flags can be retrieve over netlink ethtool
+interface the same way as netdev feature flags. These properties flags are
+read only and cannot be change in the runtime.
+
+
+*  XDP_ABORTED
+
+This property informs if netdev supports xdp aborted action.
+
+*  XDP_DROP
+
+This property informs if netdev supports xdp drop action.
+
+*  XDP_PASS
+
+This property informs if netdev supports xdp pass action.
+
+*  XDP_TX
+
+This property informs if netdev supports xdp tx action.
+
+*  XDP_REDIRECT
+
+This property informs if netdev supports xdp redirect action.
+It assumes the all beforehand mentioned flags are enabled.
+
+*  XDP_ZEROCOPY
+
+This property informs if netdev driver supports xdp zero copy.
+It assumes the all beforehand mentioned flags are enabled.
+
+*  XDP_HW_OFFLOAD
+
+This property informs if netdev driver supports xdp hw oflloading.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 52d1cc2bd8a7..2544c7f0e1b7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -43,6 +43,7 @@
 #include <net/xdp.h>
 
 #include <linux/netdev_features.h>
+#include <linux/xdp_properties.h>
 #include <linux/neighbour.h>
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
@@ -2171,6 +2172,7 @@ struct net_device {
 
 	/* protected by rtnl_lock */
 	struct bpf_xdp_entity	xdp_state[__MAX_XDP_MODE];
+	xdp_properties_t	xdp_properties;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/linux/xdp_properties.h b/include/linux/xdp_properties.h
new file mode 100644
index 000000000000..c72c9bcc50de
--- /dev/null
+++ b/include/linux/xdp_properties.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Network device xdp properties.
+ */
+#ifndef _LINUX_XDP_PROPERTIES_H
+#define _LINUX_XDP_PROPERTIES_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <asm/byteorder.h>
+
+typedef u64 xdp_properties_t;
+
+enum {
+	XDP_F_ABORTED_BIT,
+	XDP_F_DROP_BIT,
+	XDP_F_PASS_BIT,
+	XDP_F_TX_BIT,
+	XDP_F_REDIRECT_BIT,
+	XDP_F_ZEROCOPY_BIT,
+	XDP_F_HW_OFFLOAD_BIT,
+
+	/*
+	 * Add your fresh new property above and remember to update
+	 * xdp_properties_strings [] in net/core/ethtool.c and maybe
+	 * some xdp_properties mask #defines below. Please also describe it
+	 * in Documentation/networking/xdp_properties.rst.
+	 */
+
+	/**/XDP_PROPERTIES_COUNT
+};
+
+#define __XDP_F_BIT(bit)	((xdp_properties_t)1 << (bit))
+#define __XDP_F(name)		__XDP_F_BIT(XDP_F_##name##_BIT)
+
+#define XDP_F_ABORTED		__XDP_F(ABORTED)
+#define XDP_F_DROP		__XDP_F(DROP)
+#define XDP_F_PASS		__XDP_F(PASS)
+#define XDP_F_TX		__XDP_F(TX)
+#define XDP_F_REDIRECT		__XDP_F(REDIRECT)
+#define XDP_F_ZEROCOPY		__XDP_F(ZEROCOPY)
+#define XDP_F_HW_OFFLOAD	__XDP_F(HW_OFFLOAD)
+
+#define XDP_F_BASIC		(XDP_F_ABORTED |	\
+				 XDP_F_DROP |		\
+				 XDP_F_PASS |		\
+				 XDP_F_TX)
+
+#define XDP_F_FULL		(XDP_F_BASIC | XDP_F_REDIRECT)
+
+#define XDP_F_FULL_ZC		(XDP_F_FULL | XDP_F_ZEROCOPY)
+
+#endif /* _LINUX_XDP_PROPERTIES_H */
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 700ad5db7f5d..a9fabc1282cf 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -7,6 +7,7 @@
 #define __LINUX_NET_XDP_H__
 
 #include <linux/skbuff.h> /* skb_shared_info */
+#include <linux/xdp_properties.h>
 
 /**
  * DOC: XDP RX-queue information
@@ -255,6 +256,100 @@ struct xdp_attachment_info {
 	u32 flags;
 };
 
+#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
+
+static __always_inline void
+xdp_set_aborted_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_ABORTED;
+}
+
+static __always_inline void
+xdp_set_pass_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_PASS;
+}
+
+static __always_inline void
+xdp_set_drop_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_DROP;
+}
+
+static __always_inline void
+xdp_set_tx_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_TX;
+}
+
+static __always_inline void
+xdp_set_redirect_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_REDIRECT;
+}
+
+static __always_inline void
+xdp_set_hw_offload_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_HW_OFFLOAD;
+}
+
+static __always_inline void
+xdp_set_basic_properties(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_BASIC;
+}
+
+static __always_inline void
+xdp_set_full_properties(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_FULL;
+}
+
+#else
+
+static __always_inline void
+xdp_set_aborted_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_pass_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_drop_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_tx_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_redirect_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_hw_offload_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_basic_properties(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_full_properties(xdp_properties_t *properties)
+{
+}
+
+#endif
+
 struct netdev_bpf;
 bool xdp_attachment_flags_ok(struct xdp_attachment_info *info,
 			     struct netdev_bpf *bpf);
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 4e295541e396..48a3b6d165c7 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -8,6 +8,7 @@
 
 #include <net/xdp_sock.h>
 #include <net/xsk_buff_pool.h>
+#include <linux/xdp_properties.h>
 
 #ifdef CONFIG_XDP_SOCKETS
 
@@ -117,6 +118,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 	xp_dma_sync_for_device(pool, dma, size);
 }
 
+static inline void xsk_set_zc_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_ZEROCOPY;
+}
+
 #else
 
 static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
@@ -242,6 +248,10 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 {
 }
 
+static inline void xsk_set_zc_property(xdp_properties_t *properties)
+{
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_DRV_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 9ca87bc73c44..dfcb0e2c98b2 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -688,6 +688,7 @@ enum ethtool_stringset {
 	ETH_SS_TS_TX_TYPES,
 	ETH_SS_TS_RX_FILTERS,
 	ETH_SS_UDP_TUNNEL_TYPES,
+	ETH_SS_XDP_PROPERTIES,
 
 	/* add new constants above here */
 	ETH_SS_COUNT
diff --git a/include/uapi/linux/xdp_properties.h b/include/uapi/linux/xdp_properties.h
new file mode 100644
index 000000000000..e85be03eb707
--- /dev/null
+++ b/include/uapi/linux/xdp_properties.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+/*
+ * Copyright (c) 2020 Intel
+ */
+
+#ifndef __UAPI_LINUX_XDP_PROPERTIES__
+#define __UAPI_LINUX_XDP_PROPERTIES__
+
+/* ETH_GSTRING_LEN define is needed. */
+#include <linux/ethtool.h>
+
+#define XDP_PROPERTIES_ABORTED_STR	"xdp-aborted"
+#define XDP_PROPERTIES_DROP_STR		"xdp-drop"
+#define XDP_PROPERTIES_PASS_STR		"xdp-pass"
+#define XDP_PROPERTIES_TX_STR		"xdp-tx"
+#define XDP_PROPERTIES_REDIRECT_STR	"xdp-redirect"
+#define XDP_PROPERTIES_ZEROCOPY_STR	"xdp-zerocopy"
+#define XDP_PROPERTIES_HW_OFFLOAD_STR	"xdp-hw-offload"
+
+#define	DECLARE_XDP_PROPERTIES_TABLE(name)		\
+	const char name[][ETH_GSTRING_LEN] = {		\
+		XDP_PROPERTIES_ABORTED_STR,		\
+		XDP_PROPERTIES_DROP_STR,		\
+		XDP_PROPERTIES_PASS_STR,		\
+		XDP_PROPERTIES_TX_STR,			\
+		XDP_PROPERTIES_REDIRECT_STR,		\
+		XDP_PROPERTIES_ZEROCOPY_STR,		\
+		XDP_PROPERTIES_HW_OFFLOAD_STR,		\
+	}
+
+#endif  /* __UAPI_LINUX_XDP_PROPERTIES__ */
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 24036e3055a1..8f15f96b8922 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -4,6 +4,7 @@
 #include <linux/net_tstamp.h>
 #include <linux/phy.h>
 #include <linux/rtnetlink.h>
+#include <uapi/linux/xdp_properties.h>
 
 #include "common.h"
 
@@ -283,6 +284,16 @@ const char udp_tunnel_type_names[][ETH_GSTRING_LEN] = {
 static_assert(ARRAY_SIZE(udp_tunnel_type_names) ==
 	      __ETHTOOL_UDP_TUNNEL_TYPE_CNT);
 
+const char xdp_properties_strings[XDP_PROPERTIES_COUNT][ETH_GSTRING_LEN] = {
+	[XDP_F_ABORTED_BIT] =		XDP_PROPERTIES_ABORTED_STR,
+	[XDP_F_DROP_BIT] =		XDP_PROPERTIES_DROP_STR,
+	[XDP_F_PASS_BIT] =		XDP_PROPERTIES_PASS_STR,
+	[XDP_F_TX_BIT] =		XDP_PROPERTIES_TX_STR,
+	[XDP_F_REDIRECT_BIT] =		XDP_PROPERTIES_REDIRECT_STR,
+	[XDP_F_ZEROCOPY_BIT] =		XDP_PROPERTIES_ZEROCOPY_STR,
+	[XDP_F_HW_OFFLOAD_BIT] =	XDP_PROPERTIES_HW_OFFLOAD_STR,
+};
+
 /* return false if legacy contained non-0 deprecated fields
  * maxtxpkt/maxrxpkt. rest of ksettings always updated
  */
diff --git a/net/ethtool/common.h b/net/ethtool/common.h
index 3d9251c95a8b..85a35f8781eb 100644
--- a/net/ethtool/common.h
+++ b/net/ethtool/common.h
@@ -5,8 +5,10 @@
 
 #include <linux/netdevice.h>
 #include <linux/ethtool.h>
+#include <linux/xdp_properties.h>
 
 #define ETHTOOL_DEV_FEATURE_WORDS	DIV_ROUND_UP(NETDEV_FEATURE_COUNT, 32)
+#define ETHTOOL_XDP_PROPERTIES_WORDS	DIV_ROUND_UP(XDP_PROPERTIES_COUNT, 32)
 
 /* compose link mode index from speed, type and duplex */
 #define ETHTOOL_LINK_MODE(speed, type, duplex) \
@@ -22,6 +24,8 @@ extern const char
 tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN];
 extern const char
 phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN];
+extern const char
+xdp_properties_strings[XDP_PROPERTIES_COUNT][ETH_GSTRING_LEN];
 extern const char link_mode_names[][ETH_GSTRING_LEN];
 extern const char netif_msg_class_names[][ETH_GSTRING_LEN];
 extern const char wol_mode_names[][ETH_GSTRING_LEN];
diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c
index 0baad0ce1832..684e751b31a9 100644
--- a/net/ethtool/strset.c
+++ b/net/ethtool/strset.c
@@ -80,6 +80,11 @@ static const struct strset_info info_template[] = {
 		.count		= __ETHTOOL_UDP_TUNNEL_TYPE_CNT,
 		.strings	= udp_tunnel_type_names,
 	},
+	[ETH_SS_XDP_PROPERTIES] = {
+		.per_dev	= false,
+		.count		= ARRAY_SIZE(xdp_properties_strings),
+		.strings	= xdp_properties_strings,
+	},
 };
 
 struct strset_req_info {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-04 10:28   ` alardam
  0 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:28 UTC (permalink / raw)
  To: intel-wired-lan

From: Marek Majtyka <marekx.majtyka@intel.com>

Implement support for checking what kind of xdp functionality a netdev
supports. Previously, there was no way to do this other than to try
to create an AF_XDP socket on the interface or load an XDP program and see
if it worked. This commit changes this by adding a new variable which
describes all xdp supported functions on pretty detailed level:
 - aborted
 - drop
 - pass
 - tx
 - redirect
 - zero copy
 - hardware offload.

Zerocopy mode requires that redirect xdp operation is implemented
in a driver and the driver supports also zero copy mode.
Full mode requires that all xdp operation are implemented in the driver.
Basic mode is just full mode without redirect operation.

Initially, these new flags are disabled for all drivers by default.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 .../networking/netdev-xdp-properties.rst      | 42 ++++++++
 include/linux/netdevice.h                     |  2 +
 include/linux/xdp_properties.h                | 53 +++++++++++
 include/net/xdp.h                             | 95 +++++++++++++++++++
 include/net/xdp_sock_drv.h                    | 10 ++
 include/uapi/linux/ethtool.h                  |  1 +
 include/uapi/linux/xdp_properties.h           | 32 +++++++
 net/ethtool/common.c                          | 11 +++
 net/ethtool/common.h                          |  4 +
 net/ethtool/strset.c                          |  5 +
 10 files changed, 255 insertions(+)
 create mode 100644 Documentation/networking/netdev-xdp-properties.rst
 create mode 100644 include/linux/xdp_properties.h
 create mode 100644 include/uapi/linux/xdp_properties.h

diff --git a/Documentation/networking/netdev-xdp-properties.rst b/Documentation/networking/netdev-xdp-properties.rst
new file mode 100644
index 000000000000..4a434a1c512b
--- /dev/null
+++ b/Documentation/networking/netdev-xdp-properties.rst
@@ -0,0 +1,42 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Netdev XDP properties
+=====================
+
+ * XDP PROPERTIES FLAGS
+
+Following netdev xdp properties flags can be retrieve over netlink ethtool
+interface the same way as netdev feature flags. These properties flags are
+read only and cannot be change in the runtime.
+
+
+*  XDP_ABORTED
+
+This property informs if netdev supports xdp aborted action.
+
+*  XDP_DROP
+
+This property informs if netdev supports xdp drop action.
+
+*  XDP_PASS
+
+This property informs if netdev supports xdp pass action.
+
+*  XDP_TX
+
+This property informs if netdev supports xdp tx action.
+
+*  XDP_REDIRECT
+
+This property informs if netdev supports xdp redirect action.
+It assumes the all beforehand mentioned flags are enabled.
+
+*  XDP_ZEROCOPY
+
+This property informs if netdev driver supports xdp zero copy.
+It assumes the all beforehand mentioned flags are enabled.
+
+*  XDP_HW_OFFLOAD
+
+This property informs if netdev driver supports xdp hw oflloading.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 52d1cc2bd8a7..2544c7f0e1b7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -43,6 +43,7 @@
 #include <net/xdp.h>
 
 #include <linux/netdev_features.h>
+#include <linux/xdp_properties.h>
 #include <linux/neighbour.h>
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
@@ -2171,6 +2172,7 @@ struct net_device {
 
 	/* protected by rtnl_lock */
 	struct bpf_xdp_entity	xdp_state[__MAX_XDP_MODE];
+	xdp_properties_t	xdp_properties;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/linux/xdp_properties.h b/include/linux/xdp_properties.h
new file mode 100644
index 000000000000..c72c9bcc50de
--- /dev/null
+++ b/include/linux/xdp_properties.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Network device xdp properties.
+ */
+#ifndef _LINUX_XDP_PROPERTIES_H
+#define _LINUX_XDP_PROPERTIES_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <asm/byteorder.h>
+
+typedef u64 xdp_properties_t;
+
+enum {
+	XDP_F_ABORTED_BIT,
+	XDP_F_DROP_BIT,
+	XDP_F_PASS_BIT,
+	XDP_F_TX_BIT,
+	XDP_F_REDIRECT_BIT,
+	XDP_F_ZEROCOPY_BIT,
+	XDP_F_HW_OFFLOAD_BIT,
+
+	/*
+	 * Add your fresh new property above and remember to update
+	 * xdp_properties_strings [] in net/core/ethtool.c and maybe
+	 * some xdp_properties mask #defines below. Please also describe it
+	 * in Documentation/networking/xdp_properties.rst.
+	 */
+
+	/**/XDP_PROPERTIES_COUNT
+};
+
+#define __XDP_F_BIT(bit)	((xdp_properties_t)1 << (bit))
+#define __XDP_F(name)		__XDP_F_BIT(XDP_F_##name##_BIT)
+
+#define XDP_F_ABORTED		__XDP_F(ABORTED)
+#define XDP_F_DROP		__XDP_F(DROP)
+#define XDP_F_PASS		__XDP_F(PASS)
+#define XDP_F_TX		__XDP_F(TX)
+#define XDP_F_REDIRECT		__XDP_F(REDIRECT)
+#define XDP_F_ZEROCOPY		__XDP_F(ZEROCOPY)
+#define XDP_F_HW_OFFLOAD	__XDP_F(HW_OFFLOAD)
+
+#define XDP_F_BASIC		(XDP_F_ABORTED |	\
+				 XDP_F_DROP |		\
+				 XDP_F_PASS |		\
+				 XDP_F_TX)
+
+#define XDP_F_FULL		(XDP_F_BASIC | XDP_F_REDIRECT)
+
+#define XDP_F_FULL_ZC		(XDP_F_FULL | XDP_F_ZEROCOPY)
+
+#endif /* _LINUX_XDP_PROPERTIES_H */
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 700ad5db7f5d..a9fabc1282cf 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -7,6 +7,7 @@
 #define __LINUX_NET_XDP_H__
 
 #include <linux/skbuff.h> /* skb_shared_info */
+#include <linux/xdp_properties.h>
 
 /**
  * DOC: XDP RX-queue information
@@ -255,6 +256,100 @@ struct xdp_attachment_info {
 	u32 flags;
 };
 
+#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
+
+static __always_inline void
+xdp_set_aborted_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_ABORTED;
+}
+
+static __always_inline void
+xdp_set_pass_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_PASS;
+}
+
+static __always_inline void
+xdp_set_drop_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_DROP;
+}
+
+static __always_inline void
+xdp_set_tx_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_TX;
+}
+
+static __always_inline void
+xdp_set_redirect_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_REDIRECT;
+}
+
+static __always_inline void
+xdp_set_hw_offload_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_HW_OFFLOAD;
+}
+
+static __always_inline void
+xdp_set_basic_properties(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_BASIC;
+}
+
+static __always_inline void
+xdp_set_full_properties(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_FULL;
+}
+
+#else
+
+static __always_inline void
+xdp_set_aborted_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_pass_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_drop_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_tx_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_redirect_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_hw_offload_property(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_basic_properties(xdp_properties_t *properties)
+{
+}
+
+static __always_inline void
+xdp_set_full_properties(xdp_properties_t *properties)
+{
+}
+
+#endif
+
 struct netdev_bpf;
 bool xdp_attachment_flags_ok(struct xdp_attachment_info *info,
 			     struct netdev_bpf *bpf);
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 4e295541e396..48a3b6d165c7 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -8,6 +8,7 @@
 
 #include <net/xdp_sock.h>
 #include <net/xsk_buff_pool.h>
+#include <linux/xdp_properties.h>
 
 #ifdef CONFIG_XDP_SOCKETS
 
@@ -117,6 +118,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 	xp_dma_sync_for_device(pool, dma, size);
 }
 
+static inline void xsk_set_zc_property(xdp_properties_t *properties)
+{
+	*properties |= XDP_F_ZEROCOPY;
+}
+
 #else
 
 static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
@@ -242,6 +248,10 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 {
 }
 
+static inline void xsk_set_zc_property(xdp_properties_t *properties)
+{
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_DRV_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 9ca87bc73c44..dfcb0e2c98b2 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -688,6 +688,7 @@ enum ethtool_stringset {
 	ETH_SS_TS_TX_TYPES,
 	ETH_SS_TS_RX_FILTERS,
 	ETH_SS_UDP_TUNNEL_TYPES,
+	ETH_SS_XDP_PROPERTIES,
 
 	/* add new constants above here */
 	ETH_SS_COUNT
diff --git a/include/uapi/linux/xdp_properties.h b/include/uapi/linux/xdp_properties.h
new file mode 100644
index 000000000000..e85be03eb707
--- /dev/null
+++ b/include/uapi/linux/xdp_properties.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+/*
+ * Copyright (c) 2020 Intel
+ */
+
+#ifndef __UAPI_LINUX_XDP_PROPERTIES__
+#define __UAPI_LINUX_XDP_PROPERTIES__
+
+/* ETH_GSTRING_LEN define is needed. */
+#include <linux/ethtool.h>
+
+#define XDP_PROPERTIES_ABORTED_STR	"xdp-aborted"
+#define XDP_PROPERTIES_DROP_STR		"xdp-drop"
+#define XDP_PROPERTIES_PASS_STR		"xdp-pass"
+#define XDP_PROPERTIES_TX_STR		"xdp-tx"
+#define XDP_PROPERTIES_REDIRECT_STR	"xdp-redirect"
+#define XDP_PROPERTIES_ZEROCOPY_STR	"xdp-zerocopy"
+#define XDP_PROPERTIES_HW_OFFLOAD_STR	"xdp-hw-offload"
+
+#define	DECLARE_XDP_PROPERTIES_TABLE(name)		\
+	const char name[][ETH_GSTRING_LEN] = {		\
+		XDP_PROPERTIES_ABORTED_STR,		\
+		XDP_PROPERTIES_DROP_STR,		\
+		XDP_PROPERTIES_PASS_STR,		\
+		XDP_PROPERTIES_TX_STR,			\
+		XDP_PROPERTIES_REDIRECT_STR,		\
+		XDP_PROPERTIES_ZEROCOPY_STR,		\
+		XDP_PROPERTIES_HW_OFFLOAD_STR,		\
+	}
+
+#endif  /* __UAPI_LINUX_XDP_PROPERTIES__ */
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 24036e3055a1..8f15f96b8922 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -4,6 +4,7 @@
 #include <linux/net_tstamp.h>
 #include <linux/phy.h>
 #include <linux/rtnetlink.h>
+#include <uapi/linux/xdp_properties.h>
 
 #include "common.h"
 
@@ -283,6 +284,16 @@ const char udp_tunnel_type_names[][ETH_GSTRING_LEN] = {
 static_assert(ARRAY_SIZE(udp_tunnel_type_names) ==
 	      __ETHTOOL_UDP_TUNNEL_TYPE_CNT);
 
+const char xdp_properties_strings[XDP_PROPERTIES_COUNT][ETH_GSTRING_LEN] = {
+	[XDP_F_ABORTED_BIT] =		XDP_PROPERTIES_ABORTED_STR,
+	[XDP_F_DROP_BIT] =		XDP_PROPERTIES_DROP_STR,
+	[XDP_F_PASS_BIT] =		XDP_PROPERTIES_PASS_STR,
+	[XDP_F_TX_BIT] =		XDP_PROPERTIES_TX_STR,
+	[XDP_F_REDIRECT_BIT] =		XDP_PROPERTIES_REDIRECT_STR,
+	[XDP_F_ZEROCOPY_BIT] =		XDP_PROPERTIES_ZEROCOPY_STR,
+	[XDP_F_HW_OFFLOAD_BIT] =	XDP_PROPERTIES_HW_OFFLOAD_STR,
+};
+
 /* return false if legacy contained non-0 deprecated fields
  * maxtxpkt/maxrxpkt. rest of ksettings always updated
  */
diff --git a/net/ethtool/common.h b/net/ethtool/common.h
index 3d9251c95a8b..85a35f8781eb 100644
--- a/net/ethtool/common.h
+++ b/net/ethtool/common.h
@@ -5,8 +5,10 @@
 
 #include <linux/netdevice.h>
 #include <linux/ethtool.h>
+#include <linux/xdp_properties.h>
 
 #define ETHTOOL_DEV_FEATURE_WORDS	DIV_ROUND_UP(NETDEV_FEATURE_COUNT, 32)
+#define ETHTOOL_XDP_PROPERTIES_WORDS	DIV_ROUND_UP(XDP_PROPERTIES_COUNT, 32)
 
 /* compose link mode index from speed, type and duplex */
 #define ETHTOOL_LINK_MODE(speed, type, duplex) \
@@ -22,6 +24,8 @@ extern const char
 tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN];
 extern const char
 phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN];
+extern const char
+xdp_properties_strings[XDP_PROPERTIES_COUNT][ETH_GSTRING_LEN];
 extern const char link_mode_names[][ETH_GSTRING_LEN];
 extern const char netif_msg_class_names[][ETH_GSTRING_LEN];
 extern const char wol_mode_names[][ETH_GSTRING_LEN];
diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c
index 0baad0ce1832..684e751b31a9 100644
--- a/net/ethtool/strset.c
+++ b/net/ethtool/strset.c
@@ -80,6 +80,11 @@ static const struct strset_info info_template[] = {
 		.count		= __ETHTOOL_UDP_TUNNEL_TYPE_CNT,
 		.strings	= udp_tunnel_type_names,
 	},
+	[ETH_SS_XDP_PROPERTIES] = {
+		.per_dev	= false,
+		.count		= ARRAY_SIZE(xdp_properties_strings),
+		.strings	= xdp_properties_strings,
+	},
 };
 
 struct strset_req_info {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH v2 bpf 2/5] drivers/net: turn XDP properties on
  2020-12-04 10:28 ` [Intel-wired-lan] " alardam
@ 2020-12-04 10:28   ` alardam
  -1 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:28 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Turn 'hw-offload' property flag on for:
 - netronome.

Turn 'native' and 'zerocopy' properties flags on for:
 - i40e
 - ice
 - ixgbe
 - mlx5.

Turn 'native' properties flags on for:
 - igb
 - tun
 - veth
 - dpaa2
 - mvneta
 - mvpp2
 - qede
 - sfc
 - netsec
 - cpsw
 - xen
 - virtio_net.

Turn 'basic' (tx, pass, aborted and drop) properties flags on for:
 - netronome
 - ena
 - mlx4.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c        | 2 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt.c           | 1 +
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c    | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c         | 3 +++
 drivers/net/ethernet/intel/ice/ice_main.c           | 4 ++++
 drivers/net/ethernet/intel/igb/igb_main.c           | 2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c       | 3 +++
 drivers/net/ethernet/marvell/mvneta.c               | 3 +++
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c     | 3 +++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c      | 2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c   | 3 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 5 +++++
 drivers/net/ethernet/qlogic/qede/qede_main.c        | 2 ++
 drivers/net/ethernet/sfc/efx.c                      | 2 ++
 drivers/net/ethernet/socionext/netsec.c             | 2 ++
 drivers/net/ethernet/ti/cpsw.c                      | 3 +++
 drivers/net/ethernet/ti/cpsw_new.c                  | 2 ++
 drivers/net/tun.c                                   | 4 ++++
 drivers/net/veth.c                                  | 2 ++
 drivers/net/virtio_net.c                            | 2 ++
 drivers/net/xen-netfront.c                          | 2 ++
 21 files changed, 53 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6ad59f0068f6..a0a7558d733b 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -4290,6 +4290,8 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	netdev->priv_flags |= IFF_UNICAST_FLT;
 
+	xdp_set_basic_properties(&netdev->xdp_properties);
+
 	u64_stats_init(&adapter->syncp);
 
 	rc = ena_enable_msix_and_set_admin_interrupts(adapter);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 725d929eddb1..5a153102d73b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -12604,6 +12604,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->features |= dev->hw_features | NETIF_F_HIGHDMA;
 	if (dev->features & NETIF_F_GRO_HW)
 		dev->features &= ~NETIF_F_LRO;
+	xdp_set_full_properties(&dev->xdp_properties);
 	dev->priv_flags |= IFF_UNICAST_FLT;
 
 #ifdef CONFIG_BNXT_SRIOV
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 40953980e846..abdd4ceed6f2 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -4014,6 +4014,7 @@ static int dpaa2_eth_netdev_init(struct net_device *net_dev)
 			    NETIF_F_SG | NETIF_F_HIGHDMA |
 			    NETIF_F_LLTX | NETIF_F_HW_TC;
 	net_dev->hw_features = net_dev->features;
+	xdp_set_full_properties(&net_dev->xdp_properties);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 4f8a2154b93f..6e5dae9b871f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12875,6 +12875,9 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	netdev->features |= hw_features | NETIF_F_HW_VLAN_CTAG_FILTER;
 	netdev->hw_enc_features |= NETIF_F_TSO_MANGLEID;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+	xsk_set_zc_property(&netdev->xdp_properties);
+
 	if (vsi->type == I40E_VSI_MAIN) {
 		SET_NETDEV_DEV(netdev, &pf->pdev->dev);
 		ether_addr_copy(mac_addr, hw->mac.perm_addr);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 2dea4d0e9415..638942df136b 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -13,6 +13,7 @@
 #include "ice_dcb_lib.h"
 #include "ice_dcb_nl.h"
 #include "ice_devlink.h"
+#include <net/xdp_sock_drv.h>
 
 #define DRV_SUMMARY	"Intel(R) Ethernet Connection E800 Series Linux Driver"
 static const char ice_driver_string[] = DRV_SUMMARY;
@@ -2979,6 +2980,9 @@ static int ice_cfg_netdev(struct ice_vsi *vsi)
 
 	ice_set_netdev_features(netdev);
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+	xsk_set_zc_properties(&netdev->xdp_properties);
+
 	ice_set_ops(netdev);
 
 	if (vsi->type == ICE_VSI_PF) {
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 6a4ef4934fcf..ed7e0a2efe1a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3297,6 +3297,8 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	netdev->priv_flags |= IFF_UNICAST_FLT;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+
 	/* MTU range: 68 - 9216 */
 	netdev->min_mtu = ETH_MIN_MTU;
 	netdev->max_mtu = MAX_STD_JUMBO_FRAME_SIZE;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 50e6b8b6ba7b..6fa98bf48e21 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10844,6 +10844,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	netdev->priv_flags |= IFF_UNICAST_FLT;
 	netdev->priv_flags |= IFF_SUPP_NOFCS;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+	xsk_set_zc_property(&netdev->xdp_properties);
+
 	/* MTU range: 68 - 9710 */
 	netdev->min_mtu = ETH_MIN_MTU;
 	netdev->max_mtu = IXGBE_MAX_JUMBO_FRAME_SIZE - (ETH_HLEN + ETH_FCS_LEN);
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ba6dcb19bb1d..6431772b4706 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -5262,6 +5262,9 @@ static int mvneta_probe(struct platform_device *pdev)
 			NETIF_F_TSO | NETIF_F_RXCSUM;
 	dev->hw_features |= dev->features;
 	dev->vlan_features |= dev->features;
+
+	xdp_set_full_properties(&dev->xdp_properties);
+
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 	dev->gso_max_segs = MVNETA_MAX_TSO_SEGS;
 
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 5504cbc24970..4d6a86b40403 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -6475,6 +6475,9 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		mvpp2_set_hw_csum(port, port->pool_long->id);
 
 	dev->vlan_features |= features;
+
+	xdp_set_full_properties(&dev->xdp_properties);
+
 	dev->gso_max_segs = MVPP2_MAX_TSO_SEGS;
 	dev->priv_flags |= IFF_UNICAST_FLT;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 106513f772c3..3b81c98b85a0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -3395,6 +3395,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 		priv->rss_hash_fn = ETH_RSS_HASH_TOP;
 	}
 
+	xdp_set_basic_properties(&dev->xdp_properties);
+
 	/* MTU range: 68 - hw-specific max */
 	dev->min_mtu = ETH_MIN_MTU;
 	dev->max_mtu = priv->max_mtu;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 427fc376fe1a..0f6055528a32 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4981,6 +4981,9 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 	netdev->features         |= NETIF_F_HIGHDMA;
 	netdev->features         |= NETIF_F_HW_VLAN_STAG_FILTER;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+	xsk_set_zc_property(&netdev->xdp_properties);
+
 	netdev->priv_flags       |= IFF_UNICAST_FLT;
 
 	mlx5e_set_netdev_dev_addr(netdev);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b4acf2f41e84..37280465326c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -4099,8 +4099,13 @@ int nfp_net_init(struct nfp_net *nn)
 		return err;
 
 	if (nn->dp.netdev) {
+		struct net_device *dev = nn->dp.netdev;
+
 		nfp_net_netdev_init(nn);
 
+		xdp_set_hw_offload_property(&dev->xdp_properties);
+		xdp_set_basic_properties(&dev->xdp_properties);
+
 		err = nfp_ccm_mbox_init(nn);
 		if (err)
 			return err;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 9cf960a6d007..fc11fae05857 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -842,6 +842,8 @@ static void qede_init_ndev(struct qede_dev *edev)
 
 	ndev->hw_features = hw_features;
 
+	xdp_set_full_properties(&ndev->xdp_properties);
+
 	/* MTU range: 46 - 9600 */
 	ndev->min_mtu = ETH_ZLEN - ETH_HLEN;
 	ndev->max_mtu = QEDE_MAX_JUMBO_PACKET_SIZE;
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 718308076341..bbf6d3255040 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1111,6 +1111,8 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
 	netif_info(efx, probe, efx->net_dev,
 		   "Solarflare NIC detected\n");
 
+	xdp_set_full_properties(&efx->net_dev->xdp_properties);
+
 	if (!efx->type->is_vf)
 		efx_probe_vpd_strings(efx);
 
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 27d3c9d9210e..df1f952f678a 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -2100,6 +2100,8 @@ static int netsec_probe(struct platform_device *pdev)
 				NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 	ndev->hw_features = ndev->features;
 
+	xdp_set_full_properties(&ndev->xdp_properties);
+
 	priv->rx_cksum_offload_flag = true;
 
 	ret = netsec_register_mdio(priv, phy_addr);
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 9fd1f77190ad..02fd7275e477 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1476,6 +1476,8 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 	cpsw->slaves[1].ndev = ndev;
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
 
+	xdp_set_full_properties(&ndev->xdp_properties);
+
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
 
@@ -1654,6 +1656,7 @@ static int cpsw_probe(struct platform_device *pdev)
 	cpsw->slaves[0].ndev = ndev;
 
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
+	xdp_set_full_properties(&ndev->xdp_properties);
 
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index f779d2e1b5c5..22bf1b0d4d48 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -1416,6 +1416,8 @@ static int cpsw_create_ports(struct cpsw_common *cpsw)
 		ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER |
 				  NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_NETNS_LOCAL;
 
+		xdp_set_full_properties(&ndev->xdp_properties);
+
 		ndev->netdev_ops = &cpsw_netdev_ops;
 		ndev->ethtool_ops = &cpsw_ethtool_ops;
 		SET_NETDEV_DEV(ndev, dev);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8867d39db6ac..6d16e878b1bd 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2721,6 +2721,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 				     ~(NETIF_F_HW_VLAN_CTAG_TX |
 				       NETIF_F_HW_VLAN_STAG_TX);
 
+		/* Currently tap does not support XDP, only tun does. */
+		if (tun->flags == IFF_TUN)
+			xdp_set_full_properties(&dev->xdp_properties);
+
 		tun->flags = (tun->flags & ~TUN_FEATURES) |
 			      (ifr->ifr_flags & TUN_FEATURES);
 
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 9bd37c7151f8..5a48823a0377 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1270,6 +1270,8 @@ static void veth_setup(struct net_device *dev)
 	dev->hw_features = VETH_FEATURES;
 	dev->hw_enc_features = VETH_FEATURES;
 	dev->mpls_features = NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE;
+
+	xdp_set_full_properties(&dev->xdp_properties);
 }
 
 /*
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 052975ea0af4..f05a45942d37 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3018,6 +3018,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 
 	dev->vlan_features = dev->features;
 
+	xdp_set_full_properties(&dev->xdp_properties);
+
 	/* MTU range: 68 - 65535 */
 	dev->min_mtu = MIN_MTU;
 	dev->max_mtu = MAX_MTU;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index b01848ef4649..e2c3c668abae 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1556,6 +1556,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
          */
 	netdev->features |= netdev->hw_features;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+
 	netdev->ethtool_ops = &xennet_ethtool_ops;
 	netdev->min_mtu = ETH_MIN_MTU;
 	netdev->max_mtu = XEN_NETIF_MAX_TX_SIZE;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 2/5] drivers/net: turn XDP properties on
@ 2020-12-04 10:28   ` alardam
  0 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:28 UTC (permalink / raw)
  To: intel-wired-lan

From: Marek Majtyka <marekx.majtyka@intel.com>

Turn 'hw-offload' property flag on for:
 - netronome.

Turn 'native' and 'zerocopy' properties flags on for:
 - i40e
 - ice
 - ixgbe
 - mlx5.

Turn 'native' properties flags on for:
 - igb
 - tun
 - veth
 - dpaa2
 - mvneta
 - mvpp2
 - qede
 - sfc
 - netsec
 - cpsw
 - xen
 - virtio_net.

Turn 'basic' (tx, pass, aborted and drop) properties flags on for:
 - netronome
 - ena
 - mlx4.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c        | 2 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt.c           | 1 +
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c    | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c         | 3 +++
 drivers/net/ethernet/intel/ice/ice_main.c           | 4 ++++
 drivers/net/ethernet/intel/igb/igb_main.c           | 2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c       | 3 +++
 drivers/net/ethernet/marvell/mvneta.c               | 3 +++
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c     | 3 +++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c      | 2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c   | 3 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 5 +++++
 drivers/net/ethernet/qlogic/qede/qede_main.c        | 2 ++
 drivers/net/ethernet/sfc/efx.c                      | 2 ++
 drivers/net/ethernet/socionext/netsec.c             | 2 ++
 drivers/net/ethernet/ti/cpsw.c                      | 3 +++
 drivers/net/ethernet/ti/cpsw_new.c                  | 2 ++
 drivers/net/tun.c                                   | 4 ++++
 drivers/net/veth.c                                  | 2 ++
 drivers/net/virtio_net.c                            | 2 ++
 drivers/net/xen-netfront.c                          | 2 ++
 21 files changed, 53 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6ad59f0068f6..a0a7558d733b 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -4290,6 +4290,8 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	netdev->priv_flags |= IFF_UNICAST_FLT;
 
+	xdp_set_basic_properties(&netdev->xdp_properties);
+
 	u64_stats_init(&adapter->syncp);
 
 	rc = ena_enable_msix_and_set_admin_interrupts(adapter);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 725d929eddb1..5a153102d73b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -12604,6 +12604,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->features |= dev->hw_features | NETIF_F_HIGHDMA;
 	if (dev->features & NETIF_F_GRO_HW)
 		dev->features &= ~NETIF_F_LRO;
+	xdp_set_full_properties(&dev->xdp_properties);
 	dev->priv_flags |= IFF_UNICAST_FLT;
 
 #ifdef CONFIG_BNXT_SRIOV
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 40953980e846..abdd4ceed6f2 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -4014,6 +4014,7 @@ static int dpaa2_eth_netdev_init(struct net_device *net_dev)
 			    NETIF_F_SG | NETIF_F_HIGHDMA |
 			    NETIF_F_LLTX | NETIF_F_HW_TC;
 	net_dev->hw_features = net_dev->features;
+	xdp_set_full_properties(&net_dev->xdp_properties);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 4f8a2154b93f..6e5dae9b871f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12875,6 +12875,9 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	netdev->features |= hw_features | NETIF_F_HW_VLAN_CTAG_FILTER;
 	netdev->hw_enc_features |= NETIF_F_TSO_MANGLEID;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+	xsk_set_zc_property(&netdev->xdp_properties);
+
 	if (vsi->type == I40E_VSI_MAIN) {
 		SET_NETDEV_DEV(netdev, &pf->pdev->dev);
 		ether_addr_copy(mac_addr, hw->mac.perm_addr);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 2dea4d0e9415..638942df136b 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -13,6 +13,7 @@
 #include "ice_dcb_lib.h"
 #include "ice_dcb_nl.h"
 #include "ice_devlink.h"
+#include <net/xdp_sock_drv.h>
 
 #define DRV_SUMMARY	"Intel(R) Ethernet Connection E800 Series Linux Driver"
 static const char ice_driver_string[] = DRV_SUMMARY;
@@ -2979,6 +2980,9 @@ static int ice_cfg_netdev(struct ice_vsi *vsi)
 
 	ice_set_netdev_features(netdev);
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+	xsk_set_zc_properties(&netdev->xdp_properties);
+
 	ice_set_ops(netdev);
 
 	if (vsi->type == ICE_VSI_PF) {
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 6a4ef4934fcf..ed7e0a2efe1a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3297,6 +3297,8 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	netdev->priv_flags |= IFF_UNICAST_FLT;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+
 	/* MTU range: 68 - 9216 */
 	netdev->min_mtu = ETH_MIN_MTU;
 	netdev->max_mtu = MAX_STD_JUMBO_FRAME_SIZE;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 50e6b8b6ba7b..6fa98bf48e21 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10844,6 +10844,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	netdev->priv_flags |= IFF_UNICAST_FLT;
 	netdev->priv_flags |= IFF_SUPP_NOFCS;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+	xsk_set_zc_property(&netdev->xdp_properties);
+
 	/* MTU range: 68 - 9710 */
 	netdev->min_mtu = ETH_MIN_MTU;
 	netdev->max_mtu = IXGBE_MAX_JUMBO_FRAME_SIZE - (ETH_HLEN + ETH_FCS_LEN);
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ba6dcb19bb1d..6431772b4706 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -5262,6 +5262,9 @@ static int mvneta_probe(struct platform_device *pdev)
 			NETIF_F_TSO | NETIF_F_RXCSUM;
 	dev->hw_features |= dev->features;
 	dev->vlan_features |= dev->features;
+
+	xdp_set_full_properties(&dev->xdp_properties);
+
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 	dev->gso_max_segs = MVNETA_MAX_TSO_SEGS;
 
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 5504cbc24970..4d6a86b40403 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -6475,6 +6475,9 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		mvpp2_set_hw_csum(port, port->pool_long->id);
 
 	dev->vlan_features |= features;
+
+	xdp_set_full_properties(&dev->xdp_properties);
+
 	dev->gso_max_segs = MVPP2_MAX_TSO_SEGS;
 	dev->priv_flags |= IFF_UNICAST_FLT;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 106513f772c3..3b81c98b85a0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -3395,6 +3395,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 		priv->rss_hash_fn = ETH_RSS_HASH_TOP;
 	}
 
+	xdp_set_basic_properties(&dev->xdp_properties);
+
 	/* MTU range: 68 - hw-specific max */
 	dev->min_mtu = ETH_MIN_MTU;
 	dev->max_mtu = priv->max_mtu;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 427fc376fe1a..0f6055528a32 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4981,6 +4981,9 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 	netdev->features         |= NETIF_F_HIGHDMA;
 	netdev->features         |= NETIF_F_HW_VLAN_STAG_FILTER;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+	xsk_set_zc_property(&netdev->xdp_properties);
+
 	netdev->priv_flags       |= IFF_UNICAST_FLT;
 
 	mlx5e_set_netdev_dev_addr(netdev);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b4acf2f41e84..37280465326c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -4099,8 +4099,13 @@ int nfp_net_init(struct nfp_net *nn)
 		return err;
 
 	if (nn->dp.netdev) {
+		struct net_device *dev = nn->dp.netdev;
+
 		nfp_net_netdev_init(nn);
 
+		xdp_set_hw_offload_property(&dev->xdp_properties);
+		xdp_set_basic_properties(&dev->xdp_properties);
+
 		err = nfp_ccm_mbox_init(nn);
 		if (err)
 			return err;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 9cf960a6d007..fc11fae05857 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -842,6 +842,8 @@ static void qede_init_ndev(struct qede_dev *edev)
 
 	ndev->hw_features = hw_features;
 
+	xdp_set_full_properties(&ndev->xdp_properties);
+
 	/* MTU range: 46 - 9600 */
 	ndev->min_mtu = ETH_ZLEN - ETH_HLEN;
 	ndev->max_mtu = QEDE_MAX_JUMBO_PACKET_SIZE;
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 718308076341..bbf6d3255040 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1111,6 +1111,8 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
 	netif_info(efx, probe, efx->net_dev,
 		   "Solarflare NIC detected\n");
 
+	xdp_set_full_properties(&efx->net_dev->xdp_properties);
+
 	if (!efx->type->is_vf)
 		efx_probe_vpd_strings(efx);
 
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 27d3c9d9210e..df1f952f678a 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -2100,6 +2100,8 @@ static int netsec_probe(struct platform_device *pdev)
 				NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 	ndev->hw_features = ndev->features;
 
+	xdp_set_full_properties(&ndev->xdp_properties);
+
 	priv->rx_cksum_offload_flag = true;
 
 	ret = netsec_register_mdio(priv, phy_addr);
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 9fd1f77190ad..02fd7275e477 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1476,6 +1476,8 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 	cpsw->slaves[1].ndev = ndev;
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
 
+	xdp_set_full_properties(&ndev->xdp_properties);
+
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
 
@@ -1654,6 +1656,7 @@ static int cpsw_probe(struct platform_device *pdev)
 	cpsw->slaves[0].ndev = ndev;
 
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
+	xdp_set_full_properties(&ndev->xdp_properties);
 
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index f779d2e1b5c5..22bf1b0d4d48 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -1416,6 +1416,8 @@ static int cpsw_create_ports(struct cpsw_common *cpsw)
 		ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER |
 				  NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_NETNS_LOCAL;
 
+		xdp_set_full_properties(&ndev->xdp_properties);
+
 		ndev->netdev_ops = &cpsw_netdev_ops;
 		ndev->ethtool_ops = &cpsw_ethtool_ops;
 		SET_NETDEV_DEV(ndev, dev);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8867d39db6ac..6d16e878b1bd 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2721,6 +2721,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 				     ~(NETIF_F_HW_VLAN_CTAG_TX |
 				       NETIF_F_HW_VLAN_STAG_TX);
 
+		/* Currently tap does not support XDP, only tun does. */
+		if (tun->flags == IFF_TUN)
+			xdp_set_full_properties(&dev->xdp_properties);
+
 		tun->flags = (tun->flags & ~TUN_FEATURES) |
 			      (ifr->ifr_flags & TUN_FEATURES);
 
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 9bd37c7151f8..5a48823a0377 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1270,6 +1270,8 @@ static void veth_setup(struct net_device *dev)
 	dev->hw_features = VETH_FEATURES;
 	dev->hw_enc_features = VETH_FEATURES;
 	dev->mpls_features = NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE;
+
+	xdp_set_full_properties(&dev->xdp_properties);
 }
 
 /*
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 052975ea0af4..f05a45942d37 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3018,6 +3018,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 
 	dev->vlan_features = dev->features;
 
+	xdp_set_full_properties(&dev->xdp_properties);
+
 	/* MTU range: 68 - 65535 */
 	dev->min_mtu = MIN_MTU;
 	dev->max_mtu = MAX_MTU;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index b01848ef4649..e2c3c668abae 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1556,6 +1556,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
          */
 	netdev->features |= netdev->hw_features;
 
+	xdp_set_full_properties(&netdev->xdp_properties);
+
 	netdev->ethtool_ops = &xennet_ethtool_ops;
 	netdev->min_mtu = ETH_MIN_MTU;
 	netdev->max_mtu = XEN_NETIF_MAX_TX_SIZE;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH v2 bpf 3/5] xsk: add usage of xdp properties flags
  2020-12-04 10:28 ` [Intel-wired-lan] " alardam
@ 2020-12-04 10:28   ` alardam
  -1 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:28 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Change necessary condition check for XSK from ndo functions to
xdp properties flags.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 net/xdp/xsk_buff_pool.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 96bb607853ad..7ff82e2b2b43 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -158,8 +158,7 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 		/* For copy-mode, we are done. */
 		return 0;
 
-	if (!netdev->netdev_ops->ndo_bpf ||
-	    !netdev->netdev_ops->ndo_xsk_wakeup) {
+	if ((netdev->xdp_properties & XDP_F_FULL_ZC) != XDP_F_FULL_ZC) {
 		err = -EOPNOTSUPP;
 		goto err_unreg_pool;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 3/5] xsk: add usage of xdp properties flags
@ 2020-12-04 10:28   ` alardam
  0 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:28 UTC (permalink / raw)
  To: intel-wired-lan

From: Marek Majtyka <marekx.majtyka@intel.com>

Change necessary condition check for XSK from ndo functions to
xdp properties flags.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 net/xdp/xsk_buff_pool.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 96bb607853ad..7ff82e2b2b43 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -158,8 +158,7 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 		/* For copy-mode, we are done. */
 		return 0;
 
-	if (!netdev->netdev_ops->ndo_bpf ||
-	    !netdev->netdev_ops->ndo_xsk_wakeup) {
+	if ((netdev->xdp_properties & XDP_F_FULL_ZC) != XDP_F_FULL_ZC) {
 		err = -EOPNOTSUPP;
 		goto err_unreg_pool;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH v2 bpf 4/5] xsk: add check for full support of XDP in bind
  2020-12-04 10:28 ` [Intel-wired-lan] " alardam
@ 2020-12-04 10:29   ` alardam
  -1 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:29 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Add check for full support of XDP in AF_XDP socket bind.

To be able to use an AF_XDP socket with zero-copy, there needs to be
support for both XDP_REDIRECT in the driver (XDP native mode) and the
driver needs to support zero-copy. The problem is that there are
drivers out there that only support XDP partially, so it is possible
to successfully load the XDP program in native mode, but it will still
not be able to support zero-copy as it does not have XDP_REDIRECT
support. We can now alleviate this problem by using the new XDP netdev
capability that signifies if full XDP support is indeed present. This
check can be triggered by a new bind flag called
XDP_CHECK_NATIVE_MODE.

To simplify usage, this check is triggered automatically from inside
libbpf library via turning on the new XDP_CHECK_NATIVE_MODE flag if and
only if the driver mode is selected for the socket. As a result, the
xsk_bind function decides if the native mode for a given interface makes
sense or not using xdp netdev feature flags. Eventually the xsk socket is
bound or an error is returned. Apart from this change and to catch all
invalid inputs in a single place, an additional check is set to forbid
sbk mode and zero copy settings at the same time as that combination makes
no sense.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 include/uapi/linux/if_xdp.h       |  1 +
 net/xdp/xsk.c                     |  4 ++--
 net/xdp/xsk_buff_pool.c           | 17 ++++++++++++++++-
 tools/include/uapi/linux/if_xdp.h |  1 +
 tools/lib/bpf/xsk.c               |  3 +++
 5 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index a78a8096f4ce..8f47754dacce 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -25,6 +25,7 @@
  * application.
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
+#define XDP_CHECK_NATIVE_MODE (1 << 4)
 
 /* Flags for xsk_umem_config flags */
 #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 7588e599a048..3b45754274bb 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -764,7 +764,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 
 	flags = sxdp->sxdp_flags;
 	if (flags & ~(XDP_SHARED_UMEM | XDP_COPY | XDP_ZEROCOPY |
-		      XDP_USE_NEED_WAKEUP))
+		      XDP_USE_NEED_WAKEUP | XDP_CHECK_NATIVE_MODE))
 		return -EINVAL;
 
 	rtnl_lock();
@@ -792,7 +792,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		struct socket *sock;
 
 		if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY) ||
-		    (flags & XDP_USE_NEED_WAKEUP)) {
+		    (flags & XDP_USE_NEED_WAKEUP) || (flags & XDP_CHECK_NATIVE_MODE)) {
 			/* Cannot specify flags for shared sockets. */
 			err = -EINVAL;
 			goto out_unlock;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 7ff82e2b2b43..47e283ea1dca 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -123,7 +123,7 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
 static int __xp_assign_dev(struct xsk_buff_pool *pool,
 			   struct net_device *netdev, u16 queue_id, u16 flags)
 {
-	bool force_zc, force_copy;
+	bool force_zc, force_copy, force_check;
 	struct netdev_bpf bpf;
 	int err = 0;
 
@@ -131,10 +131,24 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 
 	force_zc = flags & XDP_ZEROCOPY;
 	force_copy = flags & XDP_COPY;
+	force_check = flags & XDP_CHECK_NATIVE_MODE;
+
 
 	if (force_zc && force_copy)
 		return -EINVAL;
 
+	if (!(flags & XDP_SHARED_UMEM)) {
+		if (force_check) {
+			/* forbid driver mode without full XDP support */
+			if (!(XDP_F_REDIRECT & netdev->xdp_properties))
+				return -EOPNOTSUPP;
+		} else {
+			/* forbid skb mode and zero copy */
+			if (force_zc)
+				return -EINVAL;
+		}
+	}
+
 	if (xsk_get_pool_from_qid(netdev, queue_id))
 		return -EBUSY;
 
@@ -204,6 +218,7 @@ int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_umem *umem,
 		return -EINVAL;
 
 	flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY;
+	flags |= XDP_SHARED_UMEM;
 	if (pool->uses_need_wakeup)
 		flags |= XDP_USE_NEED_WAKEUP;
 
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index a78a8096f4ce..8f47754dacce 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -25,6 +25,7 @@
  * application.
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
+#define XDP_CHECK_NATIVE_MODE (1 << 4)
 
 /* Flags for xsk_umem_config flags */
 #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index e3e41ceeb1bc..c309d2c87be3 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -18,6 +18,7 @@
 #include <linux/ethtool.h>
 #include <linux/filter.h>
 #include <linux/if_ether.h>
+#include <linux/if_link.h>
 #include <linux/if_packet.h>
 #include <linux/if_xdp.h>
 #include <linux/kernel.h>
@@ -901,6 +902,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
 		sxdp.sxdp_shared_umem_fd = umem->fd;
 	} else {
 		sxdp.sxdp_flags = xsk->config.bind_flags;
+		if (xsk->config.xdp_flags & XDP_FLAGS_DRV_MODE)
+			sxdp.sxdp_flags |= XDP_CHECK_NATIVE_MODE;
 	}
 
 	err = bind(xsk->fd, (struct sockaddr *)&sxdp, sizeof(sxdp));
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 4/5] xsk: add check for full support of XDP in bind
@ 2020-12-04 10:29   ` alardam
  0 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:29 UTC (permalink / raw)
  To: intel-wired-lan

From: Marek Majtyka <marekx.majtyka@intel.com>

Add check for full support of XDP in AF_XDP socket bind.

To be able to use an AF_XDP socket with zero-copy, there needs to be
support for both XDP_REDIRECT in the driver (XDP native mode) and the
driver needs to support zero-copy. The problem is that there are
drivers out there that only support XDP partially, so it is possible
to successfully load the XDP program in native mode, but it will still
not be able to support zero-copy as it does not have XDP_REDIRECT
support. We can now alleviate this problem by using the new XDP netdev
capability that signifies if full XDP support is indeed present. This
check can be triggered by a new bind flag called
XDP_CHECK_NATIVE_MODE.

To simplify usage, this check is triggered automatically from inside
libbpf library via turning on the new XDP_CHECK_NATIVE_MODE flag if and
only if the driver mode is selected for the socket. As a result, the
xsk_bind function decides if the native mode for a given interface makes
sense or not using xdp netdev feature flags. Eventually the xsk socket is
bound or an error is returned. Apart from this change and to catch all
invalid inputs in a single place, an additional check is set to forbid
sbk mode and zero copy settings at the same time as that combination makes
no sense.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 include/uapi/linux/if_xdp.h       |  1 +
 net/xdp/xsk.c                     |  4 ++--
 net/xdp/xsk_buff_pool.c           | 17 ++++++++++++++++-
 tools/include/uapi/linux/if_xdp.h |  1 +
 tools/lib/bpf/xsk.c               |  3 +++
 5 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index a78a8096f4ce..8f47754dacce 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -25,6 +25,7 @@
  * application.
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
+#define XDP_CHECK_NATIVE_MODE (1 << 4)
 
 /* Flags for xsk_umem_config flags */
 #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 7588e599a048..3b45754274bb 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -764,7 +764,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 
 	flags = sxdp->sxdp_flags;
 	if (flags & ~(XDP_SHARED_UMEM | XDP_COPY | XDP_ZEROCOPY |
-		      XDP_USE_NEED_WAKEUP))
+		      XDP_USE_NEED_WAKEUP | XDP_CHECK_NATIVE_MODE))
 		return -EINVAL;
 
 	rtnl_lock();
@@ -792,7 +792,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		struct socket *sock;
 
 		if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY) ||
-		    (flags & XDP_USE_NEED_WAKEUP)) {
+		    (flags & XDP_USE_NEED_WAKEUP) || (flags & XDP_CHECK_NATIVE_MODE)) {
 			/* Cannot specify flags for shared sockets. */
 			err = -EINVAL;
 			goto out_unlock;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 7ff82e2b2b43..47e283ea1dca 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -123,7 +123,7 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
 static int __xp_assign_dev(struct xsk_buff_pool *pool,
 			   struct net_device *netdev, u16 queue_id, u16 flags)
 {
-	bool force_zc, force_copy;
+	bool force_zc, force_copy, force_check;
 	struct netdev_bpf bpf;
 	int err = 0;
 
@@ -131,10 +131,24 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 
 	force_zc = flags & XDP_ZEROCOPY;
 	force_copy = flags & XDP_COPY;
+	force_check = flags & XDP_CHECK_NATIVE_MODE;
+
 
 	if (force_zc && force_copy)
 		return -EINVAL;
 
+	if (!(flags & XDP_SHARED_UMEM)) {
+		if (force_check) {
+			/* forbid driver mode without full XDP support */
+			if (!(XDP_F_REDIRECT & netdev->xdp_properties))
+				return -EOPNOTSUPP;
+		} else {
+			/* forbid skb mode and zero copy */
+			if (force_zc)
+				return -EINVAL;
+		}
+	}
+
 	if (xsk_get_pool_from_qid(netdev, queue_id))
 		return -EBUSY;
 
@@ -204,6 +218,7 @@ int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_umem *umem,
 		return -EINVAL;
 
 	flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY;
+	flags |= XDP_SHARED_UMEM;
 	if (pool->uses_need_wakeup)
 		flags |= XDP_USE_NEED_WAKEUP;
 
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index a78a8096f4ce..8f47754dacce 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -25,6 +25,7 @@
  * application.
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
+#define XDP_CHECK_NATIVE_MODE (1 << 4)
 
 /* Flags for xsk_umem_config flags */
 #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index e3e41ceeb1bc..c309d2c87be3 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -18,6 +18,7 @@
 #include <linux/ethtool.h>
 #include <linux/filter.h>
 #include <linux/if_ether.h>
+#include <linux/if_link.h>
 #include <linux/if_packet.h>
 #include <linux/if_xdp.h>
 #include <linux/kernel.h>
@@ -901,6 +902,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
 		sxdp.sxdp_shared_umem_fd = umem->fd;
 	} else {
 		sxdp.sxdp_flags = xsk->config.bind_flags;
+		if (xsk->config.xdp_flags & XDP_FLAGS_DRV_MODE)
+			sxdp.sxdp_flags |= XDP_CHECK_NATIVE_MODE;
 	}
 
 	err = bind(xsk->fd, (struct sockaddr *)&sxdp, sizeof(sxdp));
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH v2 bpf 5/5] ethtool: provide xdp info with XDP_PROPERTIES_GET
  2020-12-04 10:28 ` [Intel-wired-lan] " alardam
@ 2020-12-04 10:29   ` alardam
  -1 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:29 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Implement XDP_PROPERTIES_GET request to get network device
information about supported xdp functionalities.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 include/uapi/linux/ethtool_netlink.h | 14 +++++
 net/ethtool/Makefile                 |  2 +-
 net/ethtool/netlink.c                | 38 +++++++++-----
 net/ethtool/netlink.h                |  2 +
 net/ethtool/xdp.c                    | 76 ++++++++++++++++++++++++++++
 5 files changed, 117 insertions(+), 15 deletions(-)
 create mode 100644 net/ethtool/xdp.c

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index e2bf36e6964b..764d6edc2862 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -42,6 +42,7 @@ enum {
 	ETHTOOL_MSG_CABLE_TEST_ACT,
 	ETHTOOL_MSG_CABLE_TEST_TDR_ACT,
 	ETHTOOL_MSG_TUNNEL_INFO_GET,
+	ETHTOOL_MSG_XDP_PROPERTIES_GET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -80,6 +81,7 @@ enum {
 	ETHTOOL_MSG_CABLE_TEST_NTF,
 	ETHTOOL_MSG_CABLE_TEST_TDR_NTF,
 	ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
+	ETHTOOL_MSG_XDP_PROPERTIES_GET_REPLY,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -628,6 +630,18 @@ enum {
 	ETHTOOL_A_TUNNEL_INFO_MAX = (__ETHTOOL_A_TUNNEL_INFO_CNT - 1)
 };
 
+/* XDP_PROPERTIES */
+
+enum {
+	ETHTOOL_A_XDP_PROPERTIES_UNSPEC,
+	ETHTOOL_A_XDP_PROPERTIES_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_XDP_PROPERTIES_DATA,				/* bitset */
+
+	/* add new constants above here */
+	__ETHTOOL_A_XDP_PROPERTIES_CNT,
+	ETHTOOL_A_XDP_PROPERTIES_MAX = __ETHTOOL_A_XDP_PROPERTIES_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 7a849ff22dad..23d49eb07a7f 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
 ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
 		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
 		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
-		   tunnels.o
+		   tunnels.o xdp.o
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 50d3c8896f91..06c943c78a11 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -231,20 +231,21 @@ struct ethnl_dump_ctx {
 
 static const struct ethnl_request_ops *
 ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
-	[ETHTOOL_MSG_STRSET_GET]	= &ethnl_strset_request_ops,
-	[ETHTOOL_MSG_LINKINFO_GET]	= &ethnl_linkinfo_request_ops,
-	[ETHTOOL_MSG_LINKMODES_GET]	= &ethnl_linkmodes_request_ops,
-	[ETHTOOL_MSG_LINKSTATE_GET]	= &ethnl_linkstate_request_ops,
-	[ETHTOOL_MSG_DEBUG_GET]		= &ethnl_debug_request_ops,
-	[ETHTOOL_MSG_WOL_GET]		= &ethnl_wol_request_ops,
-	[ETHTOOL_MSG_FEATURES_GET]	= &ethnl_features_request_ops,
-	[ETHTOOL_MSG_PRIVFLAGS_GET]	= &ethnl_privflags_request_ops,
-	[ETHTOOL_MSG_RINGS_GET]		= &ethnl_rings_request_ops,
-	[ETHTOOL_MSG_CHANNELS_GET]	= &ethnl_channels_request_ops,
-	[ETHTOOL_MSG_COALESCE_GET]	= &ethnl_coalesce_request_ops,
-	[ETHTOOL_MSG_PAUSE_GET]		= &ethnl_pause_request_ops,
-	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
-	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
+	[ETHTOOL_MSG_STRSET_GET]		= &ethnl_strset_request_ops,
+	[ETHTOOL_MSG_LINKINFO_GET]		= &ethnl_linkinfo_request_ops,
+	[ETHTOOL_MSG_LINKMODES_GET]		= &ethnl_linkmodes_request_ops,
+	[ETHTOOL_MSG_LINKSTATE_GET]		= &ethnl_linkstate_request_ops,
+	[ETHTOOL_MSG_DEBUG_GET]			= &ethnl_debug_request_ops,
+	[ETHTOOL_MSG_WOL_GET]			= &ethnl_wol_request_ops,
+	[ETHTOOL_MSG_FEATURES_GET]		= &ethnl_features_request_ops,
+	[ETHTOOL_MSG_PRIVFLAGS_GET]		= &ethnl_privflags_request_ops,
+	[ETHTOOL_MSG_RINGS_GET]			= &ethnl_rings_request_ops,
+	[ETHTOOL_MSG_CHANNELS_GET]		= &ethnl_channels_request_ops,
+	[ETHTOOL_MSG_COALESCE_GET]		= &ethnl_coalesce_request_ops,
+	[ETHTOOL_MSG_PAUSE_GET]			= &ethnl_pause_request_ops,
+	[ETHTOOL_MSG_EEE_GET]			= &ethnl_eee_request_ops,
+	[ETHTOOL_MSG_TSINFO_GET]		= &ethnl_tsinfo_request_ops,
+	[ETHTOOL_MSG_XDP_PROPERTIES_GET]	= &ethnl_xdp_request_ops,
 };
 
 static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
@@ -912,6 +913,15 @@ static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_tunnel_info_get_policy,
 		.maxattr = ARRAY_SIZE(ethnl_tunnel_info_get_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_XDP_PROPERTIES_GET,
+		.doit	= ethnl_default_doit,
+		.start	= ethnl_default_start,
+		.dumpit	= ethnl_default_dumpit,
+		.done	= ethnl_default_done,
+		.policy = ethnl_properties_get_policy,
+		.maxattr = ARRAY_SIZE(ethnl_properties_get_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index d8efec516d86..c5875e97b707 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -344,6 +344,7 @@ extern const struct ethnl_request_ops ethnl_coalesce_request_ops;
 extern const struct ethnl_request_ops ethnl_pause_request_ops;
 extern const struct ethnl_request_ops ethnl_eee_request_ops;
 extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
+extern const struct ethnl_request_ops ethnl_xdp_request_ops;
 
 extern const struct nla_policy ethnl_header_policy[ETHTOOL_A_HEADER_FLAGS + 1];
 extern const struct nla_policy ethnl_header_policy_stats[ETHTOOL_A_HEADER_FLAGS + 1];
@@ -375,6 +376,7 @@ extern const struct nla_policy ethnl_tsinfo_get_policy[ETHTOOL_A_TSINFO_HEADER +
 extern const struct nla_policy ethnl_cable_test_act_policy[ETHTOOL_A_CABLE_TEST_HEADER + 1];
 extern const struct nla_policy ethnl_cable_test_tdr_act_policy[ETHTOOL_A_CABLE_TEST_TDR_CFG + 1];
 extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INFO_HEADER + 1];
+extern const struct nla_policy ethnl_properties_get_policy[ETHTOOL_A_XDP_PROPERTIES_HEADER + 1];
 
 int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
 int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info);
diff --git a/net/ethtool/xdp.c b/net/ethtool/xdp.c
new file mode 100644
index 000000000000..fc0e87b6ed80
--- /dev/null
+++ b/net/ethtool/xdp.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "netlink.h"
+#include "common.h"
+#include "bitset.h"
+
+struct properties_req_info {
+	struct ethnl_req_info	base;
+};
+
+struct properties_reply_data {
+	struct ethnl_reply_data	base;
+	u32			properties[ETHTOOL_XDP_PROPERTIES_WORDS];
+};
+
+const struct nla_policy ethnl_properties_get_policy[] = {
+	[ETHTOOL_A_XDP_PROPERTIES_HEADER]	=
+		NLA_POLICY_NESTED(ethnl_header_policy),
+};
+
+#define PROPERTIES_REPDATA(__reply_base) \
+	container_of(__reply_base, struct properties_reply_data, base)
+
+static void ethnl_properties_to_bitmap32(u32 *dest, xdp_properties_t src)
+{
+	unsigned int i;
+
+	for (i = 0; i < ETHTOOL_XDP_PROPERTIES_WORDS; i++)
+		dest[i] = src >> (32 * i);
+}
+
+static int properties_prepare_data(const struct ethnl_req_info *req_base,
+				   struct ethnl_reply_data *reply_base,
+				   struct genl_info *info)
+{
+	struct properties_reply_data *data = PROPERTIES_REPDATA(reply_base);
+	struct net_device *dev = reply_base->dev;
+
+	ethnl_properties_to_bitmap32(data->properties, dev->xdp_properties);
+
+	return 0;
+}
+
+static int properties_reply_size(const struct ethnl_req_info *req_base,
+				 const struct ethnl_reply_data *reply_base)
+{
+	const struct properties_reply_data *data = PROPERTIES_REPDATA(reply_base);
+	bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS;
+
+	return ethnl_bitset32_size(data->properties, NULL, XDP_PROPERTIES_COUNT,
+				   xdp_properties_strings, compact);
+}
+
+static int properties_fill_reply(struct sk_buff *skb,
+				 const struct ethnl_req_info *req_base,
+				 const struct ethnl_reply_data *reply_base)
+{
+	const struct properties_reply_data *data = PROPERTIES_REPDATA(reply_base);
+	bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS;
+
+	return ethnl_put_bitset32(skb, ETHTOOL_A_XDP_PROPERTIES_DATA, data->properties,
+				  NULL, XDP_PROPERTIES_COUNT,
+				  xdp_properties_strings, compact);
+}
+
+const struct ethnl_request_ops ethnl_xdp_request_ops = {
+	.request_cmd		= ETHTOOL_MSG_XDP_PROPERTIES_GET,
+	.reply_cmd		= ETHTOOL_MSG_XDP_PROPERTIES_GET_REPLY,
+	.hdr_attr		= ETHTOOL_A_XDP_PROPERTIES_HEADER,
+	.req_info_size		= sizeof(struct properties_req_info),
+	.reply_data_size	= sizeof(struct properties_reply_data),
+
+	.prepare_data		= properties_prepare_data,
+	.reply_size		= properties_reply_size,
+	.fill_reply		= properties_fill_reply,
+};
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 5/5] ethtool: provide xdp info with XDP_PROPERTIES_GET
@ 2020-12-04 10:29   ` alardam
  0 siblings, 0 replies; 120+ messages in thread
From: alardam @ 2020-12-04 10:29 UTC (permalink / raw)
  To: intel-wired-lan

From: Marek Majtyka <marekx.majtyka@intel.com>

Implement XDP_PROPERTIES_GET request to get network device
information about supported xdp functionalities.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 include/uapi/linux/ethtool_netlink.h | 14 +++++
 net/ethtool/Makefile                 |  2 +-
 net/ethtool/netlink.c                | 38 +++++++++-----
 net/ethtool/netlink.h                |  2 +
 net/ethtool/xdp.c                    | 76 ++++++++++++++++++++++++++++
 5 files changed, 117 insertions(+), 15 deletions(-)
 create mode 100644 net/ethtool/xdp.c

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index e2bf36e6964b..764d6edc2862 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -42,6 +42,7 @@ enum {
 	ETHTOOL_MSG_CABLE_TEST_ACT,
 	ETHTOOL_MSG_CABLE_TEST_TDR_ACT,
 	ETHTOOL_MSG_TUNNEL_INFO_GET,
+	ETHTOOL_MSG_XDP_PROPERTIES_GET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -80,6 +81,7 @@ enum {
 	ETHTOOL_MSG_CABLE_TEST_NTF,
 	ETHTOOL_MSG_CABLE_TEST_TDR_NTF,
 	ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
+	ETHTOOL_MSG_XDP_PROPERTIES_GET_REPLY,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -628,6 +630,18 @@ enum {
 	ETHTOOL_A_TUNNEL_INFO_MAX = (__ETHTOOL_A_TUNNEL_INFO_CNT - 1)
 };
 
+/* XDP_PROPERTIES */
+
+enum {
+	ETHTOOL_A_XDP_PROPERTIES_UNSPEC,
+	ETHTOOL_A_XDP_PROPERTIES_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_XDP_PROPERTIES_DATA,				/* bitset */
+
+	/* add new constants above here */
+	__ETHTOOL_A_XDP_PROPERTIES_CNT,
+	ETHTOOL_A_XDP_PROPERTIES_MAX = __ETHTOOL_A_XDP_PROPERTIES_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 7a849ff22dad..23d49eb07a7f 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
 ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
 		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
 		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
-		   tunnels.o
+		   tunnels.o xdp.o
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 50d3c8896f91..06c943c78a11 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -231,20 +231,21 @@ struct ethnl_dump_ctx {
 
 static const struct ethnl_request_ops *
 ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
-	[ETHTOOL_MSG_STRSET_GET]	= &ethnl_strset_request_ops,
-	[ETHTOOL_MSG_LINKINFO_GET]	= &ethnl_linkinfo_request_ops,
-	[ETHTOOL_MSG_LINKMODES_GET]	= &ethnl_linkmodes_request_ops,
-	[ETHTOOL_MSG_LINKSTATE_GET]	= &ethnl_linkstate_request_ops,
-	[ETHTOOL_MSG_DEBUG_GET]		= &ethnl_debug_request_ops,
-	[ETHTOOL_MSG_WOL_GET]		= &ethnl_wol_request_ops,
-	[ETHTOOL_MSG_FEATURES_GET]	= &ethnl_features_request_ops,
-	[ETHTOOL_MSG_PRIVFLAGS_GET]	= &ethnl_privflags_request_ops,
-	[ETHTOOL_MSG_RINGS_GET]		= &ethnl_rings_request_ops,
-	[ETHTOOL_MSG_CHANNELS_GET]	= &ethnl_channels_request_ops,
-	[ETHTOOL_MSG_COALESCE_GET]	= &ethnl_coalesce_request_ops,
-	[ETHTOOL_MSG_PAUSE_GET]		= &ethnl_pause_request_ops,
-	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
-	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
+	[ETHTOOL_MSG_STRSET_GET]		= &ethnl_strset_request_ops,
+	[ETHTOOL_MSG_LINKINFO_GET]		= &ethnl_linkinfo_request_ops,
+	[ETHTOOL_MSG_LINKMODES_GET]		= &ethnl_linkmodes_request_ops,
+	[ETHTOOL_MSG_LINKSTATE_GET]		= &ethnl_linkstate_request_ops,
+	[ETHTOOL_MSG_DEBUG_GET]			= &ethnl_debug_request_ops,
+	[ETHTOOL_MSG_WOL_GET]			= &ethnl_wol_request_ops,
+	[ETHTOOL_MSG_FEATURES_GET]		= &ethnl_features_request_ops,
+	[ETHTOOL_MSG_PRIVFLAGS_GET]		= &ethnl_privflags_request_ops,
+	[ETHTOOL_MSG_RINGS_GET]			= &ethnl_rings_request_ops,
+	[ETHTOOL_MSG_CHANNELS_GET]		= &ethnl_channels_request_ops,
+	[ETHTOOL_MSG_COALESCE_GET]		= &ethnl_coalesce_request_ops,
+	[ETHTOOL_MSG_PAUSE_GET]			= &ethnl_pause_request_ops,
+	[ETHTOOL_MSG_EEE_GET]			= &ethnl_eee_request_ops,
+	[ETHTOOL_MSG_TSINFO_GET]		= &ethnl_tsinfo_request_ops,
+	[ETHTOOL_MSG_XDP_PROPERTIES_GET]	= &ethnl_xdp_request_ops,
 };
 
 static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
@@ -912,6 +913,15 @@ static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_tunnel_info_get_policy,
 		.maxattr = ARRAY_SIZE(ethnl_tunnel_info_get_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_XDP_PROPERTIES_GET,
+		.doit	= ethnl_default_doit,
+		.start	= ethnl_default_start,
+		.dumpit	= ethnl_default_dumpit,
+		.done	= ethnl_default_done,
+		.policy = ethnl_properties_get_policy,
+		.maxattr = ARRAY_SIZE(ethnl_properties_get_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index d8efec516d86..c5875e97b707 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -344,6 +344,7 @@ extern const struct ethnl_request_ops ethnl_coalesce_request_ops;
 extern const struct ethnl_request_ops ethnl_pause_request_ops;
 extern const struct ethnl_request_ops ethnl_eee_request_ops;
 extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
+extern const struct ethnl_request_ops ethnl_xdp_request_ops;
 
 extern const struct nla_policy ethnl_header_policy[ETHTOOL_A_HEADER_FLAGS + 1];
 extern const struct nla_policy ethnl_header_policy_stats[ETHTOOL_A_HEADER_FLAGS + 1];
@@ -375,6 +376,7 @@ extern const struct nla_policy ethnl_tsinfo_get_policy[ETHTOOL_A_TSINFO_HEADER +
 extern const struct nla_policy ethnl_cable_test_act_policy[ETHTOOL_A_CABLE_TEST_HEADER + 1];
 extern const struct nla_policy ethnl_cable_test_tdr_act_policy[ETHTOOL_A_CABLE_TEST_TDR_CFG + 1];
 extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INFO_HEADER + 1];
+extern const struct nla_policy ethnl_properties_get_policy[ETHTOOL_A_XDP_PROPERTIES_HEADER + 1];
 
 int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
 int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info);
diff --git a/net/ethtool/xdp.c b/net/ethtool/xdp.c
new file mode 100644
index 000000000000..fc0e87b6ed80
--- /dev/null
+++ b/net/ethtool/xdp.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "netlink.h"
+#include "common.h"
+#include "bitset.h"
+
+struct properties_req_info {
+	struct ethnl_req_info	base;
+};
+
+struct properties_reply_data {
+	struct ethnl_reply_data	base;
+	u32			properties[ETHTOOL_XDP_PROPERTIES_WORDS];
+};
+
+const struct nla_policy ethnl_properties_get_policy[] = {
+	[ETHTOOL_A_XDP_PROPERTIES_HEADER]	=
+		NLA_POLICY_NESTED(ethnl_header_policy),
+};
+
+#define PROPERTIES_REPDATA(__reply_base) \
+	container_of(__reply_base, struct properties_reply_data, base)
+
+static void ethnl_properties_to_bitmap32(u32 *dest, xdp_properties_t src)
+{
+	unsigned int i;
+
+	for (i = 0; i < ETHTOOL_XDP_PROPERTIES_WORDS; i++)
+		dest[i] = src >> (32 * i);
+}
+
+static int properties_prepare_data(const struct ethnl_req_info *req_base,
+				   struct ethnl_reply_data *reply_base,
+				   struct genl_info *info)
+{
+	struct properties_reply_data *data = PROPERTIES_REPDATA(reply_base);
+	struct net_device *dev = reply_base->dev;
+
+	ethnl_properties_to_bitmap32(data->properties, dev->xdp_properties);
+
+	return 0;
+}
+
+static int properties_reply_size(const struct ethnl_req_info *req_base,
+				 const struct ethnl_reply_data *reply_base)
+{
+	const struct properties_reply_data *data = PROPERTIES_REPDATA(reply_base);
+	bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS;
+
+	return ethnl_bitset32_size(data->properties, NULL, XDP_PROPERTIES_COUNT,
+				   xdp_properties_strings, compact);
+}
+
+static int properties_fill_reply(struct sk_buff *skb,
+				 const struct ethnl_req_info *req_base,
+				 const struct ethnl_reply_data *reply_base)
+{
+	const struct properties_reply_data *data = PROPERTIES_REPDATA(reply_base);
+	bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS;
+
+	return ethnl_put_bitset32(skb, ETHTOOL_A_XDP_PROPERTIES_DATA, data->properties,
+				  NULL, XDP_PROPERTIES_COUNT,
+				  xdp_properties_strings, compact);
+}
+
+const struct ethnl_request_ops ethnl_xdp_request_ops = {
+	.request_cmd		= ETHTOOL_MSG_XDP_PROPERTIES_GET,
+	.reply_cmd		= ETHTOOL_MSG_XDP_PROPERTIES_GET_REPLY,
+	.hdr_attr		= ETHTOOL_A_XDP_PROPERTIES_HEADER,
+	.req_info_size		= sizeof(struct properties_req_info),
+	.reply_data_size	= sizeof(struct properties_reply_data),
+
+	.prepare_data		= properties_prepare_data,
+	.reply_size		= properties_reply_size,
+	.fill_reply		= properties_fill_reply,
+};
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 10:28   ` [Intel-wired-lan] " alardam
@ 2020-12-04 12:18     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-04 12:18 UTC (permalink / raw)
  To: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, daniel, netdev, davem, john.fastabend, hawk
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

alardam@gmail.com writes:

> From: Marek Majtyka <marekx.majtyka@intel.com>
>
> Implement support for checking what kind of xdp functionality a netdev
> supports. Previously, there was no way to do this other than to try
> to create an AF_XDP socket on the interface or load an XDP program and see
> if it worked. This commit changes this by adding a new variable which
> describes all xdp supported functions on pretty detailed level:

I like the direction this is going! :)

>  - aborted
>  - drop
>  - pass
>  - tx
>  - redirect

Drivers can in principle implement support for the XDP_REDIRECT return
code (and calling xdp_do_redirect()) without implementing ndo_xdp_xmit()
for being the *target* of a redirect. While my quick grepping doesn't
turn up any drivers that do only one of these right now, I think we've
had examples of it in the past, so it would probably be better to split
the redirect feature flag in two.

This would also make it trivial to replace the check in __xdp_enqueue()
(in devmap.c) from looking at whether the ndo is defined, and just
checking the flag. It would be great if you could do this as part of
this series.

Maybe we could even make the 'redirect target' flag be set automatically
if a driver implements ndo_xdp_xmit?

>  - zero copy
>  - hardware offload.
>
> Zerocopy mode requires that redirect xdp operation is implemented
> in a driver and the driver supports also zero copy mode.
> Full mode requires that all xdp operation are implemented in the driver.
> Basic mode is just full mode without redirect operation.
>
> Initially, these new flags are disabled for all drivers by default.
>
> Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
> ---
>  .../networking/netdev-xdp-properties.rst      | 42 ++++++++
>  include/linux/netdevice.h                     |  2 +
>  include/linux/xdp_properties.h                | 53 +++++++++++
>  include/net/xdp.h                             | 95 +++++++++++++++++++
>  include/net/xdp_sock_drv.h                    | 10 ++
>  include/uapi/linux/ethtool.h                  |  1 +
>  include/uapi/linux/xdp_properties.h           | 32 +++++++
>  net/ethtool/common.c                          | 11 +++
>  net/ethtool/common.h                          |  4 +
>  net/ethtool/strset.c                          |  5 +
>  10 files changed, 255 insertions(+)
>  create mode 100644 Documentation/networking/netdev-xdp-properties.rst
>  create mode 100644 include/linux/xdp_properties.h
>  create mode 100644 include/uapi/linux/xdp_properties.h
>
> diff --git a/Documentation/networking/netdev-xdp-properties.rst b/Documentation/networking/netdev-xdp-properties.rst
> new file mode 100644
> index 000000000000..4a434a1c512b
> --- /dev/null
> +++ b/Documentation/networking/netdev-xdp-properties.rst
> @@ -0,0 +1,42 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +Netdev XDP properties
> +=====================
> +
> + * XDP PROPERTIES FLAGS
> +
> +Following netdev xdp properties flags can be retrieve over netlink ethtool
> +interface the same way as netdev feature flags. These properties flags are
> +read only and cannot be change in the runtime.
> +
> +
> +*  XDP_ABORTED
> +
> +This property informs if netdev supports xdp aborted action.
> +
> +*  XDP_DROP
> +
> +This property informs if netdev supports xdp drop action.
> +
> +*  XDP_PASS
> +
> +This property informs if netdev supports xdp pass action.
> +
> +*  XDP_TX
> +
> +This property informs if netdev supports xdp tx action.
> +
> +*  XDP_REDIRECT
> +
> +This property informs if netdev supports xdp redirect action.
> +It assumes the all beforehand mentioned flags are enabled.
> +
> +*  XDP_ZEROCOPY
> +
> +This property informs if netdev driver supports xdp zero copy.
> +It assumes the all beforehand mentioned flags are enabled.

Nit: I think 'XDP_ZEROCOPY' can lead people to think that this is
zero-copy support for all XDP operations, which is obviously not the
case. So maybe 'XDP_SOCK_ZEROCOPY' (and update the description to
mention AF_XDP sockets explicitly)?

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-04 12:18     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2020-12-04 12:18 UTC (permalink / raw)
  To: intel-wired-lan

alardam at gmail.com writes:

> From: Marek Majtyka <marekx.majtyka@intel.com>
>
> Implement support for checking what kind of xdp functionality a netdev
> supports. Previously, there was no way to do this other than to try
> to create an AF_XDP socket on the interface or load an XDP program and see
> if it worked. This commit changes this by adding a new variable which
> describes all xdp supported functions on pretty detailed level:

I like the direction this is going! :)

>  - aborted
>  - drop
>  - pass
>  - tx
>  - redirect

Drivers can in principle implement support for the XDP_REDIRECT return
code (and calling xdp_do_redirect()) without implementing ndo_xdp_xmit()
for being the *target* of a redirect. While my quick grepping doesn't
turn up any drivers that do only one of these right now, I think we've
had examples of it in the past, so it would probably be better to split
the redirect feature flag in two.

This would also make it trivial to replace the check in __xdp_enqueue()
(in devmap.c) from looking at whether the ndo is defined, and just
checking the flag. It would be great if you could do this as part of
this series.

Maybe we could even make the 'redirect target' flag be set automatically
if a driver implements ndo_xdp_xmit?

>  - zero copy
>  - hardware offload.
>
> Zerocopy mode requires that redirect xdp operation is implemented
> in a driver and the driver supports also zero copy mode.
> Full mode requires that all xdp operation are implemented in the driver.
> Basic mode is just full mode without redirect operation.
>
> Initially, these new flags are disabled for all drivers by default.
>
> Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
> ---
>  .../networking/netdev-xdp-properties.rst      | 42 ++++++++
>  include/linux/netdevice.h                     |  2 +
>  include/linux/xdp_properties.h                | 53 +++++++++++
>  include/net/xdp.h                             | 95 +++++++++++++++++++
>  include/net/xdp_sock_drv.h                    | 10 ++
>  include/uapi/linux/ethtool.h                  |  1 +
>  include/uapi/linux/xdp_properties.h           | 32 +++++++
>  net/ethtool/common.c                          | 11 +++
>  net/ethtool/common.h                          |  4 +
>  net/ethtool/strset.c                          |  5 +
>  10 files changed, 255 insertions(+)
>  create mode 100644 Documentation/networking/netdev-xdp-properties.rst
>  create mode 100644 include/linux/xdp_properties.h
>  create mode 100644 include/uapi/linux/xdp_properties.h
>
> diff --git a/Documentation/networking/netdev-xdp-properties.rst b/Documentation/networking/netdev-xdp-properties.rst
> new file mode 100644
> index 000000000000..4a434a1c512b
> --- /dev/null
> +++ b/Documentation/networking/netdev-xdp-properties.rst
> @@ -0,0 +1,42 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +Netdev XDP properties
> +=====================
> +
> + * XDP PROPERTIES FLAGS
> +
> +Following netdev xdp properties flags can be retrieve over netlink ethtool
> +interface the same way as netdev feature flags. These properties flags are
> +read only and cannot be change in the runtime.
> +
> +
> +*  XDP_ABORTED
> +
> +This property informs if netdev supports xdp aborted action.
> +
> +*  XDP_DROP
> +
> +This property informs if netdev supports xdp drop action.
> +
> +*  XDP_PASS
> +
> +This property informs if netdev supports xdp pass action.
> +
> +*  XDP_TX
> +
> +This property informs if netdev supports xdp tx action.
> +
> +*  XDP_REDIRECT
> +
> +This property informs if netdev supports xdp redirect action.
> +It assumes the all beforehand mentioned flags are enabled.
> +
> +*  XDP_ZEROCOPY
> +
> +This property informs if netdev driver supports xdp zero copy.
> +It assumes the all beforehand mentioned flags are enabled.

Nit: I think 'XDP_ZEROCOPY' can lead people to think that this is
zero-copy support for all XDP operations, which is obviously not the
case. So maybe 'XDP_SOCK_ZEROCOPY' (and update the description to
mention AF_XDP sockets explicitly)?

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 2/5] drivers/net: turn XDP properties on
  2020-12-04 10:28   ` [Intel-wired-lan] " alardam
@ 2020-12-04 12:19     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-04 12:19 UTC (permalink / raw)
  To: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, daniel, netdev, davem, john.fastabend, hawk
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

alardam@gmail.com writes:

> From: Marek Majtyka <marekx.majtyka@intel.com>
>
> Turn 'hw-offload' property flag on for:
>  - netronome.

Can you add this to netdevsim as well, please? That way we can add a
test for it in test_offload.py once the userspace bits land in
ethtool...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 2/5] drivers/net: turn XDP properties on
@ 2020-12-04 12:19     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2020-12-04 12:19 UTC (permalink / raw)
  To: intel-wired-lan

alardam at gmail.com writes:

> From: Marek Majtyka <marekx.majtyka@intel.com>
>
> Turn 'hw-offload' property flag on for:
>  - netronome.

Can you add this to netdevsim as well, please? That way we can add a
test for it in test_offload.py once the userspace bits land in
ethtool...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 12:18     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
@ 2020-12-04 12:46       ` Maciej Fijalkowski
  -1 siblings, 0 replies; 120+ messages in thread
From: Maciej Fijalkowski @ 2020-12-04 12:46 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, daniel, netdev, davem, john.fastabend, hawk, jonathan.lemon,
	bpf, jeffrey.t.kirsher, maciejromanfijalkowski, intel-wired-lan,
	Marek Majtyka

On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke Høiland-Jørgensen wrote:
> alardam@gmail.com writes:
> 
> > From: Marek Majtyka <marekx.majtyka@intel.com>
> >
> > Implement support for checking what kind of xdp functionality a netdev
> > supports. Previously, there was no way to do this other than to try
> > to create an AF_XDP socket on the interface or load an XDP program and see
> > if it worked. This commit changes this by adding a new variable which
> > describes all xdp supported functions on pretty detailed level:
> 
> I like the direction this is going! :)
> 
> >  - aborted
> >  - drop
> >  - pass
> >  - tx
> >  - redirect
> 
> Drivers can in principle implement support for the XDP_REDIRECT return
> code (and calling xdp_do_redirect()) without implementing ndo_xdp_xmit()
> for being the *target* of a redirect. While my quick grepping doesn't
> turn up any drivers that do only one of these right now, I think we've
> had examples of it in the past, so it would probably be better to split
> the redirect feature flag in two.
> 
> This would also make it trivial to replace the check in __xdp_enqueue()
> (in devmap.c) from looking at whether the ndo is defined, and just
> checking the flag. It would be great if you could do this as part of
> this series.
> 
> Maybe we could even make the 'redirect target' flag be set automatically
> if a driver implements ndo_xdp_xmit?

+1

> 
> >  - zero copy
> >  - hardware offload.
> >
> > Zerocopy mode requires that redirect xdp operation is implemented
> > in a driver and the driver supports also zero copy mode.
> > Full mode requires that all xdp operation are implemented in the driver.
> > Basic mode is just full mode without redirect operation.
> >
> > Initially, these new flags are disabled for all drivers by default.
> >
> > Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
> > ---
> >  .../networking/netdev-xdp-properties.rst      | 42 ++++++++
> >  include/linux/netdevice.h                     |  2 +
> >  include/linux/xdp_properties.h                | 53 +++++++++++
> >  include/net/xdp.h                             | 95 +++++++++++++++++++
> >  include/net/xdp_sock_drv.h                    | 10 ++
> >  include/uapi/linux/ethtool.h                  |  1 +
> >  include/uapi/linux/xdp_properties.h           | 32 +++++++
> >  net/ethtool/common.c                          | 11 +++
> >  net/ethtool/common.h                          |  4 +
> >  net/ethtool/strset.c                          |  5 +
> >  10 files changed, 255 insertions(+)
> >  create mode 100644 Documentation/networking/netdev-xdp-properties.rst
> >  create mode 100644 include/linux/xdp_properties.h
> >  create mode 100644 include/uapi/linux/xdp_properties.h
> >
> > diff --git a/Documentation/networking/netdev-xdp-properties.rst b/Documentation/networking/netdev-xdp-properties.rst
> > new file mode 100644
> > index 000000000000..4a434a1c512b
> > --- /dev/null
> > +++ b/Documentation/networking/netdev-xdp-properties.rst
> > @@ -0,0 +1,42 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +=====================
> > +Netdev XDP properties
> > +=====================
> > +
> > + * XDP PROPERTIES FLAGS
> > +
> > +Following netdev xdp properties flags can be retrieve over netlink ethtool
> > +interface the same way as netdev feature flags. These properties flags are
> > +read only and cannot be change in the runtime.
> > +
> > +
> > +*  XDP_ABORTED
> > +
> > +This property informs if netdev supports xdp aborted action.
> > +
> > +*  XDP_DROP
> > +
> > +This property informs if netdev supports xdp drop action.
> > +
> > +*  XDP_PASS
> > +
> > +This property informs if netdev supports xdp pass action.
> > +
> > +*  XDP_TX
> > +
> > +This property informs if netdev supports xdp tx action.
> > +
> > +*  XDP_REDIRECT
> > +
> > +This property informs if netdev supports xdp redirect action.
> > +It assumes the all beforehand mentioned flags are enabled.
> > +
> > +*  XDP_ZEROCOPY
> > +
> > +This property informs if netdev driver supports xdp zero copy.
> > +It assumes the all beforehand mentioned flags are enabled.
> 
> Nit: I think 'XDP_ZEROCOPY' can lead people to think that this is
> zero-copy support for all XDP operations, which is obviously not the
> case. So maybe 'XDP_SOCK_ZEROCOPY' (and update the description to
> mention AF_XDP sockets explicitly)?

AF_XDP_ZEROCOPY?

> 
> -Toke
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-04 12:46       ` Maciej Fijalkowski
  0 siblings, 0 replies; 120+ messages in thread
From: Maciej Fijalkowski @ 2020-12-04 12:46 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke H?iland-J?rgensen wrote:
> alardam at gmail.com writes:
> 
> > From: Marek Majtyka <marekx.majtyka@intel.com>
> >
> > Implement support for checking what kind of xdp functionality a netdev
> > supports. Previously, there was no way to do this other than to try
> > to create an AF_XDP socket on the interface or load an XDP program and see
> > if it worked. This commit changes this by adding a new variable which
> > describes all xdp supported functions on pretty detailed level:
> 
> I like the direction this is going! :)
> 
> >  - aborted
> >  - drop
> >  - pass
> >  - tx
> >  - redirect
> 
> Drivers can in principle implement support for the XDP_REDIRECT return
> code (and calling xdp_do_redirect()) without implementing ndo_xdp_xmit()
> for being the *target* of a redirect. While my quick grepping doesn't
> turn up any drivers that do only one of these right now, I think we've
> had examples of it in the past, so it would probably be better to split
> the redirect feature flag in two.
> 
> This would also make it trivial to replace the check in __xdp_enqueue()
> (in devmap.c) from looking at whether the ndo is defined, and just
> checking the flag. It would be great if you could do this as part of
> this series.
> 
> Maybe we could even make the 'redirect target' flag be set automatically
> if a driver implements ndo_xdp_xmit?

+1

> 
> >  - zero copy
> >  - hardware offload.
> >
> > Zerocopy mode requires that redirect xdp operation is implemented
> > in a driver and the driver supports also zero copy mode.
> > Full mode requires that all xdp operation are implemented in the driver.
> > Basic mode is just full mode without redirect operation.
> >
> > Initially, these new flags are disabled for all drivers by default.
> >
> > Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
> > ---
> >  .../networking/netdev-xdp-properties.rst      | 42 ++++++++
> >  include/linux/netdevice.h                     |  2 +
> >  include/linux/xdp_properties.h                | 53 +++++++++++
> >  include/net/xdp.h                             | 95 +++++++++++++++++++
> >  include/net/xdp_sock_drv.h                    | 10 ++
> >  include/uapi/linux/ethtool.h                  |  1 +
> >  include/uapi/linux/xdp_properties.h           | 32 +++++++
> >  net/ethtool/common.c                          | 11 +++
> >  net/ethtool/common.h                          |  4 +
> >  net/ethtool/strset.c                          |  5 +
> >  10 files changed, 255 insertions(+)
> >  create mode 100644 Documentation/networking/netdev-xdp-properties.rst
> >  create mode 100644 include/linux/xdp_properties.h
> >  create mode 100644 include/uapi/linux/xdp_properties.h
> >
> > diff --git a/Documentation/networking/netdev-xdp-properties.rst b/Documentation/networking/netdev-xdp-properties.rst
> > new file mode 100644
> > index 000000000000..4a434a1c512b
> > --- /dev/null
> > +++ b/Documentation/networking/netdev-xdp-properties.rst
> > @@ -0,0 +1,42 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +=====================
> > +Netdev XDP properties
> > +=====================
> > +
> > + * XDP PROPERTIES FLAGS
> > +
> > +Following netdev xdp properties flags can be retrieve over netlink ethtool
> > +interface the same way as netdev feature flags. These properties flags are
> > +read only and cannot be change in the runtime.
> > +
> > +
> > +*  XDP_ABORTED
> > +
> > +This property informs if netdev supports xdp aborted action.
> > +
> > +*  XDP_DROP
> > +
> > +This property informs if netdev supports xdp drop action.
> > +
> > +*  XDP_PASS
> > +
> > +This property informs if netdev supports xdp pass action.
> > +
> > +*  XDP_TX
> > +
> > +This property informs if netdev supports xdp tx action.
> > +
> > +*  XDP_REDIRECT
> > +
> > +This property informs if netdev supports xdp redirect action.
> > +It assumes the all beforehand mentioned flags are enabled.
> > +
> > +*  XDP_ZEROCOPY
> > +
> > +This property informs if netdev driver supports xdp zero copy.
> > +It assumes the all beforehand mentioned flags are enabled.
> 
> Nit: I think 'XDP_ZEROCOPY' can lead people to think that this is
> zero-copy support for all XDP operations, which is obviously not the
> case. So maybe 'XDP_SOCK_ZEROCOPY' (and update the description to
> mention AF_XDP sockets explicitly)?

AF_XDP_ZEROCOPY?

> 
> -Toke
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 10:28   ` [Intel-wired-lan] " alardam
@ 2020-12-04 12:57     ` Maciej Fijalkowski
  -1 siblings, 0 replies; 120+ messages in thread
From: Maciej Fijalkowski @ 2020-12-04 12:57 UTC (permalink / raw)
  To: alardam
  Cc: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciejromanfijalkowski, intel-wired-lan,
	Marek Majtyka

On Fri, Dec 04, 2020 at 11:28:57AM +0100, alardam@gmail.com wrote:
> From: Marek Majtyka <marekx.majtyka@intel.com>
> 
> Implement support for checking what kind of xdp functionality a netdev
> supports. Previously, there was no way to do this other than to try
> to create an AF_XDP socket on the interface or load an XDP program and see
> if it worked. This commit changes this by adding a new variable which
> describes all xdp supported functions on pretty detailed level:
>  - aborted
>  - drop
>  - pass
>  - tx
>  - redirect
>  - zero copy
>  - hardware offload.
> 
> Zerocopy mode requires that redirect xdp operation is implemented
> in a driver and the driver supports also zero copy mode.
> Full mode requires that all xdp operation are implemented in the driver.
> Basic mode is just full mode without redirect operation.
> 
> Initially, these new flags are disabled for all drivers by default.
> 
> Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
> ---
>  .../networking/netdev-xdp-properties.rst      | 42 ++++++++
>  include/linux/netdevice.h                     |  2 +
>  include/linux/xdp_properties.h                | 53 +++++++++++
>  include/net/xdp.h                             | 95 +++++++++++++++++++
>  include/net/xdp_sock_drv.h                    | 10 ++
>  include/uapi/linux/ethtool.h                  |  1 +
>  include/uapi/linux/xdp_properties.h           | 32 +++++++
>  net/ethtool/common.c                          | 11 +++
>  net/ethtool/common.h                          |  4 +
>  net/ethtool/strset.c                          |  5 +
>  10 files changed, 255 insertions(+)
>  create mode 100644 Documentation/networking/netdev-xdp-properties.rst
>  create mode 100644 include/linux/xdp_properties.h
>  create mode 100644 include/uapi/linux/xdp_properties.h
> 
> diff --git a/Documentation/networking/netdev-xdp-properties.rst b/Documentation/networking/netdev-xdp-properties.rst
> new file mode 100644
> index 000000000000..4a434a1c512b
> --- /dev/null
> +++ b/Documentation/networking/netdev-xdp-properties.rst
> @@ -0,0 +1,42 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +Netdev XDP properties
> +=====================
> +
> + * XDP PROPERTIES FLAGS
> +
> +Following netdev xdp properties flags can be retrieve over netlink ethtool
> +interface the same way as netdev feature flags. These properties flags are
> +read only and cannot be change in the runtime.
> +
> +
> +*  XDP_ABORTED
> +
> +This property informs if netdev supports xdp aborted action.
> +
> +*  XDP_DROP
> +
> +This property informs if netdev supports xdp drop action.
> +
> +*  XDP_PASS
> +
> +This property informs if netdev supports xdp pass action.
> +
> +*  XDP_TX
> +
> +This property informs if netdev supports xdp tx action.
> +
> +*  XDP_REDIRECT
> +
> +This property informs if netdev supports xdp redirect action.
> +It assumes the all beforehand mentioned flags are enabled.
> +
> +*  XDP_ZEROCOPY
> +
> +This property informs if netdev driver supports xdp zero copy.
> +It assumes the all beforehand mentioned flags are enabled.
> +
> +*  XDP_HW_OFFLOAD
> +
> +This property informs if netdev driver supports xdp hw oflloading.
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 52d1cc2bd8a7..2544c7f0e1b7 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -43,6 +43,7 @@
>  #include <net/xdp.h>
>  
>  #include <linux/netdev_features.h>
> +#include <linux/xdp_properties.h>
>  #include <linux/neighbour.h>
>  #include <uapi/linux/netdevice.h>
>  #include <uapi/linux/if_bonding.h>
> @@ -2171,6 +2172,7 @@ struct net_device {
>  
>  	/* protected by rtnl_lock */
>  	struct bpf_xdp_entity	xdp_state[__MAX_XDP_MODE];
> +	xdp_properties_t	xdp_properties;
>  };
>  #define to_net_dev(d) container_of(d, struct net_device, dev)
>  
> diff --git a/include/linux/xdp_properties.h b/include/linux/xdp_properties.h
> new file mode 100644
> index 000000000000..c72c9bcc50de
> --- /dev/null
> +++ b/include/linux/xdp_properties.h
> @@ -0,0 +1,53 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Network device xdp properties.
> + */
> +#ifndef _LINUX_XDP_PROPERTIES_H
> +#define _LINUX_XDP_PROPERTIES_H
> +
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <asm/byteorder.h>
> +
> +typedef u64 xdp_properties_t;
> +
> +enum {
> +	XDP_F_ABORTED_BIT,
> +	XDP_F_DROP_BIT,
> +	XDP_F_PASS_BIT,
> +	XDP_F_TX_BIT,
> +	XDP_F_REDIRECT_BIT,
> +	XDP_F_ZEROCOPY_BIT,
> +	XDP_F_HW_OFFLOAD_BIT,
> +
> +	/*
> +	 * Add your fresh new property above and remember to update
> +	 * xdp_properties_strings [] in net/core/ethtool.c and maybe
> +	 * some xdp_properties mask #defines below. Please also describe it
> +	 * in Documentation/networking/xdp_properties.rst.
> +	 */
> +
> +	/**/XDP_PROPERTIES_COUNT
> +};
> +
> +#define __XDP_F_BIT(bit)	((xdp_properties_t)1 << (bit))
> +#define __XDP_F(name)		__XDP_F_BIT(XDP_F_##name##_BIT)
> +
> +#define XDP_F_ABORTED		__XDP_F(ABORTED)
> +#define XDP_F_DROP		__XDP_F(DROP)
> +#define XDP_F_PASS		__XDP_F(PASS)
> +#define XDP_F_TX		__XDP_F(TX)
> +#define XDP_F_REDIRECT		__XDP_F(REDIRECT)
> +#define XDP_F_ZEROCOPY		__XDP_F(ZEROCOPY)
> +#define XDP_F_HW_OFFLOAD	__XDP_F(HW_OFFLOAD)
> +
> +#define XDP_F_BASIC		(XDP_F_ABORTED |	\
> +				 XDP_F_DROP |		\
> +				 XDP_F_PASS |		\
> +				 XDP_F_TX)
> +
> +#define XDP_F_FULL		(XDP_F_BASIC | XDP_F_REDIRECT)
> +
> +#define XDP_F_FULL_ZC		(XDP_F_FULL | XDP_F_ZEROCOPY)

Seems like you're not making use of this flag? Next patch combines two
calls for XDP_F_FULL and XDP_F_ZEROCOPY, like:

xdp_set_full_properties(&netdev->xdp_properties);
xsk_set_zc_property(&netdev->xdp_properties);

So either drop the flag, or introduce xdp_set_full_zc_properties().

I was also thinking if it would make sense to align the naming here and
refer to these as 'xdp features', like netdevice.h tends to do, not 'xdp
properties'.

> +
> +#endif /* _LINUX_XDP_PROPERTIES_H */
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 700ad5db7f5d..a9fabc1282cf 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -7,6 +7,7 @@
>  #define __LINUX_NET_XDP_H__
>  
>  #include <linux/skbuff.h> /* skb_shared_info */
> +#include <linux/xdp_properties.h>
>  
>  /**
>   * DOC: XDP RX-queue information
> @@ -255,6 +256,100 @@ struct xdp_attachment_info {
>  	u32 flags;
>  };
>  
> +#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
> +
> +static __always_inline void
> +xdp_set_aborted_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_ABORTED;
> +}
> +
> +static __always_inline void
> +xdp_set_pass_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_PASS;
> +}
> +
> +static __always_inline void
> +xdp_set_drop_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_DROP;
> +}
> +
> +static __always_inline void
> +xdp_set_tx_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_TX;
> +}
> +
> +static __always_inline void
> +xdp_set_redirect_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_REDIRECT;
> +}
> +
> +static __always_inline void
> +xdp_set_hw_offload_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_HW_OFFLOAD;
> +}
> +
> +static __always_inline void
> +xdp_set_basic_properties(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_BASIC;
> +}
> +
> +static __always_inline void
> +xdp_set_full_properties(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_FULL;
> +}
> +
> +#else
> +
> +static __always_inline void
> +xdp_set_aborted_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_pass_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_drop_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_tx_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_redirect_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_hw_offload_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_basic_properties(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_full_properties(xdp_properties_t *properties)
> +{
> +}
> +
> +#endif
> +
>  struct netdev_bpf;
>  bool xdp_attachment_flags_ok(struct xdp_attachment_info *info,
>  			     struct netdev_bpf *bpf);
> diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
> index 4e295541e396..48a3b6d165c7 100644
> --- a/include/net/xdp_sock_drv.h
> +++ b/include/net/xdp_sock_drv.h
> @@ -8,6 +8,7 @@
>  
>  #include <net/xdp_sock.h>
>  #include <net/xsk_buff_pool.h>
> +#include <linux/xdp_properties.h>
>  
>  #ifdef CONFIG_XDP_SOCKETS
>  
> @@ -117,6 +118,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
>  	xp_dma_sync_for_device(pool, dma, size);
>  }
>  
> +static inline void xsk_set_zc_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_ZEROCOPY;
> +}
> +
>  #else
>  
>  static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
> @@ -242,6 +248,10 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
>  {
>  }
>  
> +static inline void xsk_set_zc_property(xdp_properties_t *properties)
> +{
> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>  
>  #endif /* _LINUX_XDP_SOCK_DRV_H */
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 9ca87bc73c44..dfcb0e2c98b2 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -688,6 +688,7 @@ enum ethtool_stringset {
>  	ETH_SS_TS_TX_TYPES,
>  	ETH_SS_TS_RX_FILTERS,
>  	ETH_SS_UDP_TUNNEL_TYPES,
> +	ETH_SS_XDP_PROPERTIES,
>  
>  	/* add new constants above here */
>  	ETH_SS_COUNT
> diff --git a/include/uapi/linux/xdp_properties.h b/include/uapi/linux/xdp_properties.h
> new file mode 100644
> index 000000000000..e85be03eb707
> --- /dev/null
> +++ b/include/uapi/linux/xdp_properties.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +
> +/*
> + * Copyright (c) 2020 Intel
> + */
> +
> +#ifndef __UAPI_LINUX_XDP_PROPERTIES__
> +#define __UAPI_LINUX_XDP_PROPERTIES__
> +
> +/* ETH_GSTRING_LEN define is needed. */
> +#include <linux/ethtool.h>
> +
> +#define XDP_PROPERTIES_ABORTED_STR	"xdp-aborted"
> +#define XDP_PROPERTIES_DROP_STR		"xdp-drop"
> +#define XDP_PROPERTIES_PASS_STR		"xdp-pass"
> +#define XDP_PROPERTIES_TX_STR		"xdp-tx"
> +#define XDP_PROPERTIES_REDIRECT_STR	"xdp-redirect"
> +#define XDP_PROPERTIES_ZEROCOPY_STR	"xdp-zerocopy"
> +#define XDP_PROPERTIES_HW_OFFLOAD_STR	"xdp-hw-offload"
> +
> +#define	DECLARE_XDP_PROPERTIES_TABLE(name)		\
> +	const char name[][ETH_GSTRING_LEN] = {		\
> +		XDP_PROPERTIES_ABORTED_STR,		\
> +		XDP_PROPERTIES_DROP_STR,		\
> +		XDP_PROPERTIES_PASS_STR,		\
> +		XDP_PROPERTIES_TX_STR,			\
> +		XDP_PROPERTIES_REDIRECT_STR,		\
> +		XDP_PROPERTIES_ZEROCOPY_STR,		\
> +		XDP_PROPERTIES_HW_OFFLOAD_STR,		\
> +	}
> +
> +#endif  /* __UAPI_LINUX_XDP_PROPERTIES__ */
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index 24036e3055a1..8f15f96b8922 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -4,6 +4,7 @@
>  #include <linux/net_tstamp.h>
>  #include <linux/phy.h>
>  #include <linux/rtnetlink.h>
> +#include <uapi/linux/xdp_properties.h>
>  
>  #include "common.h"
>  
> @@ -283,6 +284,16 @@ const char udp_tunnel_type_names[][ETH_GSTRING_LEN] = {
>  static_assert(ARRAY_SIZE(udp_tunnel_type_names) ==
>  	      __ETHTOOL_UDP_TUNNEL_TYPE_CNT);
>  
> +const char xdp_properties_strings[XDP_PROPERTIES_COUNT][ETH_GSTRING_LEN] = {
> +	[XDP_F_ABORTED_BIT] =		XDP_PROPERTIES_ABORTED_STR,
> +	[XDP_F_DROP_BIT] =		XDP_PROPERTIES_DROP_STR,
> +	[XDP_F_PASS_BIT] =		XDP_PROPERTIES_PASS_STR,
> +	[XDP_F_TX_BIT] =		XDP_PROPERTIES_TX_STR,
> +	[XDP_F_REDIRECT_BIT] =		XDP_PROPERTIES_REDIRECT_STR,
> +	[XDP_F_ZEROCOPY_BIT] =		XDP_PROPERTIES_ZEROCOPY_STR,
> +	[XDP_F_HW_OFFLOAD_BIT] =	XDP_PROPERTIES_HW_OFFLOAD_STR,
> +};
> +
>  /* return false if legacy contained non-0 deprecated fields
>   * maxtxpkt/maxrxpkt. rest of ksettings always updated
>   */
> diff --git a/net/ethtool/common.h b/net/ethtool/common.h
> index 3d9251c95a8b..85a35f8781eb 100644
> --- a/net/ethtool/common.h
> +++ b/net/ethtool/common.h
> @@ -5,8 +5,10 @@
>  
>  #include <linux/netdevice.h>
>  #include <linux/ethtool.h>
> +#include <linux/xdp_properties.h>
>  
>  #define ETHTOOL_DEV_FEATURE_WORDS	DIV_ROUND_UP(NETDEV_FEATURE_COUNT, 32)
> +#define ETHTOOL_XDP_PROPERTIES_WORDS	DIV_ROUND_UP(XDP_PROPERTIES_COUNT, 32)
>  
>  /* compose link mode index from speed, type and duplex */
>  #define ETHTOOL_LINK_MODE(speed, type, duplex) \
> @@ -22,6 +24,8 @@ extern const char
>  tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN];
>  extern const char
>  phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN];
> +extern const char
> +xdp_properties_strings[XDP_PROPERTIES_COUNT][ETH_GSTRING_LEN];
>  extern const char link_mode_names[][ETH_GSTRING_LEN];
>  extern const char netif_msg_class_names[][ETH_GSTRING_LEN];
>  extern const char wol_mode_names[][ETH_GSTRING_LEN];
> diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c
> index 0baad0ce1832..684e751b31a9 100644
> --- a/net/ethtool/strset.c
> +++ b/net/ethtool/strset.c
> @@ -80,6 +80,11 @@ static const struct strset_info info_template[] = {
>  		.count		= __ETHTOOL_UDP_TUNNEL_TYPE_CNT,
>  		.strings	= udp_tunnel_type_names,
>  	},
> +	[ETH_SS_XDP_PROPERTIES] = {
> +		.per_dev	= false,
> +		.count		= ARRAY_SIZE(xdp_properties_strings),
> +		.strings	= xdp_properties_strings,
> +	},
>  };
>  
>  struct strset_req_info {
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-04 12:57     ` Maciej Fijalkowski
  0 siblings, 0 replies; 120+ messages in thread
From: Maciej Fijalkowski @ 2020-12-04 12:57 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Dec 04, 2020 at 11:28:57AM +0100, alardam at gmail.com wrote:
> From: Marek Majtyka <marekx.majtyka@intel.com>
> 
> Implement support for checking what kind of xdp functionality a netdev
> supports. Previously, there was no way to do this other than to try
> to create an AF_XDP socket on the interface or load an XDP program and see
> if it worked. This commit changes this by adding a new variable which
> describes all xdp supported functions on pretty detailed level:
>  - aborted
>  - drop
>  - pass
>  - tx
>  - redirect
>  - zero copy
>  - hardware offload.
> 
> Zerocopy mode requires that redirect xdp operation is implemented
> in a driver and the driver supports also zero copy mode.
> Full mode requires that all xdp operation are implemented in the driver.
> Basic mode is just full mode without redirect operation.
> 
> Initially, these new flags are disabled for all drivers by default.
> 
> Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
> ---
>  .../networking/netdev-xdp-properties.rst      | 42 ++++++++
>  include/linux/netdevice.h                     |  2 +
>  include/linux/xdp_properties.h                | 53 +++++++++++
>  include/net/xdp.h                             | 95 +++++++++++++++++++
>  include/net/xdp_sock_drv.h                    | 10 ++
>  include/uapi/linux/ethtool.h                  |  1 +
>  include/uapi/linux/xdp_properties.h           | 32 +++++++
>  net/ethtool/common.c                          | 11 +++
>  net/ethtool/common.h                          |  4 +
>  net/ethtool/strset.c                          |  5 +
>  10 files changed, 255 insertions(+)
>  create mode 100644 Documentation/networking/netdev-xdp-properties.rst
>  create mode 100644 include/linux/xdp_properties.h
>  create mode 100644 include/uapi/linux/xdp_properties.h
> 
> diff --git a/Documentation/networking/netdev-xdp-properties.rst b/Documentation/networking/netdev-xdp-properties.rst
> new file mode 100644
> index 000000000000..4a434a1c512b
> --- /dev/null
> +++ b/Documentation/networking/netdev-xdp-properties.rst
> @@ -0,0 +1,42 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +Netdev XDP properties
> +=====================
> +
> + * XDP PROPERTIES FLAGS
> +
> +Following netdev xdp properties flags can be retrieve over netlink ethtool
> +interface the same way as netdev feature flags. These properties flags are
> +read only and cannot be change in the runtime.
> +
> +
> +*  XDP_ABORTED
> +
> +This property informs if netdev supports xdp aborted action.
> +
> +*  XDP_DROP
> +
> +This property informs if netdev supports xdp drop action.
> +
> +*  XDP_PASS
> +
> +This property informs if netdev supports xdp pass action.
> +
> +*  XDP_TX
> +
> +This property informs if netdev supports xdp tx action.
> +
> +*  XDP_REDIRECT
> +
> +This property informs if netdev supports xdp redirect action.
> +It assumes the all beforehand mentioned flags are enabled.
> +
> +*  XDP_ZEROCOPY
> +
> +This property informs if netdev driver supports xdp zero copy.
> +It assumes the all beforehand mentioned flags are enabled.
> +
> +*  XDP_HW_OFFLOAD
> +
> +This property informs if netdev driver supports xdp hw oflloading.
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 52d1cc2bd8a7..2544c7f0e1b7 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -43,6 +43,7 @@
>  #include <net/xdp.h>
>  
>  #include <linux/netdev_features.h>
> +#include <linux/xdp_properties.h>
>  #include <linux/neighbour.h>
>  #include <uapi/linux/netdevice.h>
>  #include <uapi/linux/if_bonding.h>
> @@ -2171,6 +2172,7 @@ struct net_device {
>  
>  	/* protected by rtnl_lock */
>  	struct bpf_xdp_entity	xdp_state[__MAX_XDP_MODE];
> +	xdp_properties_t	xdp_properties;
>  };
>  #define to_net_dev(d) container_of(d, struct net_device, dev)
>  
> diff --git a/include/linux/xdp_properties.h b/include/linux/xdp_properties.h
> new file mode 100644
> index 000000000000..c72c9bcc50de
> --- /dev/null
> +++ b/include/linux/xdp_properties.h
> @@ -0,0 +1,53 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Network device xdp properties.
> + */
> +#ifndef _LINUX_XDP_PROPERTIES_H
> +#define _LINUX_XDP_PROPERTIES_H
> +
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <asm/byteorder.h>
> +
> +typedef u64 xdp_properties_t;
> +
> +enum {
> +	XDP_F_ABORTED_BIT,
> +	XDP_F_DROP_BIT,
> +	XDP_F_PASS_BIT,
> +	XDP_F_TX_BIT,
> +	XDP_F_REDIRECT_BIT,
> +	XDP_F_ZEROCOPY_BIT,
> +	XDP_F_HW_OFFLOAD_BIT,
> +
> +	/*
> +	 * Add your fresh new property above and remember to update
> +	 * xdp_properties_strings [] in net/core/ethtool.c and maybe
> +	 * some xdp_properties mask #defines below. Please also describe it
> +	 * in Documentation/networking/xdp_properties.rst.
> +	 */
> +
> +	/**/XDP_PROPERTIES_COUNT
> +};
> +
> +#define __XDP_F_BIT(bit)	((xdp_properties_t)1 << (bit))
> +#define __XDP_F(name)		__XDP_F_BIT(XDP_F_##name##_BIT)
> +
> +#define XDP_F_ABORTED		__XDP_F(ABORTED)
> +#define XDP_F_DROP		__XDP_F(DROP)
> +#define XDP_F_PASS		__XDP_F(PASS)
> +#define XDP_F_TX		__XDP_F(TX)
> +#define XDP_F_REDIRECT		__XDP_F(REDIRECT)
> +#define XDP_F_ZEROCOPY		__XDP_F(ZEROCOPY)
> +#define XDP_F_HW_OFFLOAD	__XDP_F(HW_OFFLOAD)
> +
> +#define XDP_F_BASIC		(XDP_F_ABORTED |	\
> +				 XDP_F_DROP |		\
> +				 XDP_F_PASS |		\
> +				 XDP_F_TX)
> +
> +#define XDP_F_FULL		(XDP_F_BASIC | XDP_F_REDIRECT)
> +
> +#define XDP_F_FULL_ZC		(XDP_F_FULL | XDP_F_ZEROCOPY)

Seems like you're not making use of this flag? Next patch combines two
calls for XDP_F_FULL and XDP_F_ZEROCOPY, like:

xdp_set_full_properties(&netdev->xdp_properties);
xsk_set_zc_property(&netdev->xdp_properties);

So either drop the flag, or introduce xdp_set_full_zc_properties().

I was also thinking if it would make sense to align the naming here and
refer to these as 'xdp features', like netdevice.h tends to do, not 'xdp
properties'.

> +
> +#endif /* _LINUX_XDP_PROPERTIES_H */
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 700ad5db7f5d..a9fabc1282cf 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -7,6 +7,7 @@
>  #define __LINUX_NET_XDP_H__
>  
>  #include <linux/skbuff.h> /* skb_shared_info */
> +#include <linux/xdp_properties.h>
>  
>  /**
>   * DOC: XDP RX-queue information
> @@ -255,6 +256,100 @@ struct xdp_attachment_info {
>  	u32 flags;
>  };
>  
> +#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
> +
> +static __always_inline void
> +xdp_set_aborted_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_ABORTED;
> +}
> +
> +static __always_inline void
> +xdp_set_pass_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_PASS;
> +}
> +
> +static __always_inline void
> +xdp_set_drop_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_DROP;
> +}
> +
> +static __always_inline void
> +xdp_set_tx_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_TX;
> +}
> +
> +static __always_inline void
> +xdp_set_redirect_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_REDIRECT;
> +}
> +
> +static __always_inline void
> +xdp_set_hw_offload_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_HW_OFFLOAD;
> +}
> +
> +static __always_inline void
> +xdp_set_basic_properties(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_BASIC;
> +}
> +
> +static __always_inline void
> +xdp_set_full_properties(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_FULL;
> +}
> +
> +#else
> +
> +static __always_inline void
> +xdp_set_aborted_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_pass_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_drop_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_tx_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_redirect_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_hw_offload_property(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_basic_properties(xdp_properties_t *properties)
> +{
> +}
> +
> +static __always_inline void
> +xdp_set_full_properties(xdp_properties_t *properties)
> +{
> +}
> +
> +#endif
> +
>  struct netdev_bpf;
>  bool xdp_attachment_flags_ok(struct xdp_attachment_info *info,
>  			     struct netdev_bpf *bpf);
> diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
> index 4e295541e396..48a3b6d165c7 100644
> --- a/include/net/xdp_sock_drv.h
> +++ b/include/net/xdp_sock_drv.h
> @@ -8,6 +8,7 @@
>  
>  #include <net/xdp_sock.h>
>  #include <net/xsk_buff_pool.h>
> +#include <linux/xdp_properties.h>
>  
>  #ifdef CONFIG_XDP_SOCKETS
>  
> @@ -117,6 +118,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
>  	xp_dma_sync_for_device(pool, dma, size);
>  }
>  
> +static inline void xsk_set_zc_property(xdp_properties_t *properties)
> +{
> +	*properties |= XDP_F_ZEROCOPY;
> +}
> +
>  #else
>  
>  static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
> @@ -242,6 +248,10 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
>  {
>  }
>  
> +static inline void xsk_set_zc_property(xdp_properties_t *properties)
> +{
> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>  
>  #endif /* _LINUX_XDP_SOCK_DRV_H */
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 9ca87bc73c44..dfcb0e2c98b2 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -688,6 +688,7 @@ enum ethtool_stringset {
>  	ETH_SS_TS_TX_TYPES,
>  	ETH_SS_TS_RX_FILTERS,
>  	ETH_SS_UDP_TUNNEL_TYPES,
> +	ETH_SS_XDP_PROPERTIES,
>  
>  	/* add new constants above here */
>  	ETH_SS_COUNT
> diff --git a/include/uapi/linux/xdp_properties.h b/include/uapi/linux/xdp_properties.h
> new file mode 100644
> index 000000000000..e85be03eb707
> --- /dev/null
> +++ b/include/uapi/linux/xdp_properties.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +
> +/*
> + * Copyright (c) 2020 Intel
> + */
> +
> +#ifndef __UAPI_LINUX_XDP_PROPERTIES__
> +#define __UAPI_LINUX_XDP_PROPERTIES__
> +
> +/* ETH_GSTRING_LEN define is needed. */
> +#include <linux/ethtool.h>
> +
> +#define XDP_PROPERTIES_ABORTED_STR	"xdp-aborted"
> +#define XDP_PROPERTIES_DROP_STR		"xdp-drop"
> +#define XDP_PROPERTIES_PASS_STR		"xdp-pass"
> +#define XDP_PROPERTIES_TX_STR		"xdp-tx"
> +#define XDP_PROPERTIES_REDIRECT_STR	"xdp-redirect"
> +#define XDP_PROPERTIES_ZEROCOPY_STR	"xdp-zerocopy"
> +#define XDP_PROPERTIES_HW_OFFLOAD_STR	"xdp-hw-offload"
> +
> +#define	DECLARE_XDP_PROPERTIES_TABLE(name)		\
> +	const char name[][ETH_GSTRING_LEN] = {		\
> +		XDP_PROPERTIES_ABORTED_STR,		\
> +		XDP_PROPERTIES_DROP_STR,		\
> +		XDP_PROPERTIES_PASS_STR,		\
> +		XDP_PROPERTIES_TX_STR,			\
> +		XDP_PROPERTIES_REDIRECT_STR,		\
> +		XDP_PROPERTIES_ZEROCOPY_STR,		\
> +		XDP_PROPERTIES_HW_OFFLOAD_STR,		\
> +	}
> +
> +#endif  /* __UAPI_LINUX_XDP_PROPERTIES__ */
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index 24036e3055a1..8f15f96b8922 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -4,6 +4,7 @@
>  #include <linux/net_tstamp.h>
>  #include <linux/phy.h>
>  #include <linux/rtnetlink.h>
> +#include <uapi/linux/xdp_properties.h>
>  
>  #include "common.h"
>  
> @@ -283,6 +284,16 @@ const char udp_tunnel_type_names[][ETH_GSTRING_LEN] = {
>  static_assert(ARRAY_SIZE(udp_tunnel_type_names) ==
>  	      __ETHTOOL_UDP_TUNNEL_TYPE_CNT);
>  
> +const char xdp_properties_strings[XDP_PROPERTIES_COUNT][ETH_GSTRING_LEN] = {
> +	[XDP_F_ABORTED_BIT] =		XDP_PROPERTIES_ABORTED_STR,
> +	[XDP_F_DROP_BIT] =		XDP_PROPERTIES_DROP_STR,
> +	[XDP_F_PASS_BIT] =		XDP_PROPERTIES_PASS_STR,
> +	[XDP_F_TX_BIT] =		XDP_PROPERTIES_TX_STR,
> +	[XDP_F_REDIRECT_BIT] =		XDP_PROPERTIES_REDIRECT_STR,
> +	[XDP_F_ZEROCOPY_BIT] =		XDP_PROPERTIES_ZEROCOPY_STR,
> +	[XDP_F_HW_OFFLOAD_BIT] =	XDP_PROPERTIES_HW_OFFLOAD_STR,
> +};
> +
>  /* return false if legacy contained non-0 deprecated fields
>   * maxtxpkt/maxrxpkt. rest of ksettings always updated
>   */
> diff --git a/net/ethtool/common.h b/net/ethtool/common.h
> index 3d9251c95a8b..85a35f8781eb 100644
> --- a/net/ethtool/common.h
> +++ b/net/ethtool/common.h
> @@ -5,8 +5,10 @@
>  
>  #include <linux/netdevice.h>
>  #include <linux/ethtool.h>
> +#include <linux/xdp_properties.h>
>  
>  #define ETHTOOL_DEV_FEATURE_WORDS	DIV_ROUND_UP(NETDEV_FEATURE_COUNT, 32)
> +#define ETHTOOL_XDP_PROPERTIES_WORDS	DIV_ROUND_UP(XDP_PROPERTIES_COUNT, 32)
>  
>  /* compose link mode index from speed, type and duplex */
>  #define ETHTOOL_LINK_MODE(speed, type, duplex) \
> @@ -22,6 +24,8 @@ extern const char
>  tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN];
>  extern const char
>  phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN];
> +extern const char
> +xdp_properties_strings[XDP_PROPERTIES_COUNT][ETH_GSTRING_LEN];
>  extern const char link_mode_names[][ETH_GSTRING_LEN];
>  extern const char netif_msg_class_names[][ETH_GSTRING_LEN];
>  extern const char wol_mode_names[][ETH_GSTRING_LEN];
> diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c
> index 0baad0ce1832..684e751b31a9 100644
> --- a/net/ethtool/strset.c
> +++ b/net/ethtool/strset.c
> @@ -80,6 +80,11 @@ static const struct strset_info info_template[] = {
>  		.count		= __ETHTOOL_UDP_TUNNEL_TYPE_CNT,
>  		.strings	= udp_tunnel_type_names,
>  	},
> +	[ETH_SS_XDP_PROPERTIES] = {
> +		.per_dev	= false,
> +		.count		= ARRAY_SIZE(xdp_properties_strings),
> +		.strings	= xdp_properties_strings,
> +	},
>  };
>  
>  struct strset_req_info {
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 12:46       ` [Intel-wired-lan] " Maciej Fijalkowski
@ 2020-12-04 15:21         ` Daniel Borkmann
  -1 siblings, 0 replies; 120+ messages in thread
From: Daniel Borkmann @ 2020-12-04 15:21 UTC (permalink / raw)
  To: Maciej Fijalkowski, Toke Høiland-Jørgensen
  Cc: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, netdev, davem, john.fastabend, hawk, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciejromanfijalkowski, intel-wired-lan,
	Marek Majtyka

On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke Høiland-Jørgensen wrote:
>> alardam@gmail.com writes:
>>> From: Marek Majtyka <marekx.majtyka@intel.com>
>>>
>>> Implement support for checking what kind of xdp functionality a netdev
>>> supports. Previously, there was no way to do this other than to try
>>> to create an AF_XDP socket on the interface or load an XDP program and see
>>> if it worked. This commit changes this by adding a new variable which
>>> describes all xdp supported functions on pretty detailed level:
>>
>> I like the direction this is going! :)
>>
>>>   - aborted
>>>   - drop
>>>   - pass
>>>   - tx

I strongly think we should _not_ merge any native XDP driver patchset that does
not support/implement the above return codes. Could we instead group them together
and call this something like XDP_BASE functionality to not give a wrong impression?
If this is properly documented that these are basic must-have _requirements_, then
users and driver developers both know what the expectations are.

>>>   - redirect
>>
>> Drivers can in principle implement support for the XDP_REDIRECT return
>> code (and calling xdp_do_redirect()) without implementing ndo_xdp_xmit()
>> for being the *target* of a redirect. While my quick grepping doesn't
>> turn up any drivers that do only one of these right now, I think we've
>> had examples of it in the past, so it would probably be better to split
>> the redirect feature flag in two.
>>
>> This would also make it trivial to replace the check in __xdp_enqueue()
>> (in devmap.c) from looking at whether the ndo is defined, and just
>> checking the flag. It would be great if you could do this as part of
>> this series.
>>
>> Maybe we could even make the 'redirect target' flag be set automatically
>> if a driver implements ndo_xdp_xmit?
> 
> +1
> 
>>>   - zero copy
>>>   - hardware offload.

One other thing that is quite annoying to figure out sometimes and not always
obvious from reading the driver code (and it may even differ depending on how
the driver was built :/) is how much XDP headroom a driver really provides.

We tried to standardize on a minimum guaranteed amount, but unfortunately not
everyone seems to implement it, but I think it would be very useful to query
this from application side, for example, consider that an app inserts a BPF
prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
know which of the different encaps it implements are realistically possible on
the underlying XDP supported dev.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-04 15:21         ` Daniel Borkmann
  0 siblings, 0 replies; 120+ messages in thread
From: Daniel Borkmann @ 2020-12-04 15:21 UTC (permalink / raw)
  To: intel-wired-lan

On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke H?iland-J?rgensen wrote:
>> alardam at gmail.com writes:
>>> From: Marek Majtyka <marekx.majtyka@intel.com>
>>>
>>> Implement support for checking what kind of xdp functionality a netdev
>>> supports. Previously, there was no way to do this other than to try
>>> to create an AF_XDP socket on the interface or load an XDP program and see
>>> if it worked. This commit changes this by adding a new variable which
>>> describes all xdp supported functions on pretty detailed level:
>>
>> I like the direction this is going! :)
>>
>>>   - aborted
>>>   - drop
>>>   - pass
>>>   - tx

I strongly think we should _not_ merge any native XDP driver patchset that does
not support/implement the above return codes. Could we instead group them together
and call this something like XDP_BASE functionality to not give a wrong impression?
If this is properly documented that these are basic must-have _requirements_, then
users and driver developers both know what the expectations are.

>>>   - redirect
>>
>> Drivers can in principle implement support for the XDP_REDIRECT return
>> code (and calling xdp_do_redirect()) without implementing ndo_xdp_xmit()
>> for being the *target* of a redirect. While my quick grepping doesn't
>> turn up any drivers that do only one of these right now, I think we've
>> had examples of it in the past, so it would probably be better to split
>> the redirect feature flag in two.
>>
>> This would also make it trivial to replace the check in __xdp_enqueue()
>> (in devmap.c) from looking at whether the ndo is defined, and just
>> checking the flag. It would be great if you could do this as part of
>> this series.
>>
>> Maybe we could even make the 'redirect target' flag be set automatically
>> if a driver implements ndo_xdp_xmit?
> 
> +1
> 
>>>   - zero copy
>>>   - hardware offload.

One other thing that is quite annoying to figure out sometimes and not always
obvious from reading the driver code (and it may even differ depending on how
the driver was built :/) is how much XDP headroom a driver really provides.

We tried to standardize on a minimum guaranteed amount, but unfortunately not
everyone seems to implement it, but I think it would be very useful to query
this from application side, for example, consider that an app inserts a BPF
prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
know which of the different encaps it implements are realistically possible on
the underlying XDP supported dev.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 0/5] New netdev feature flags for XDP
  2020-12-04 10:28 ` [Intel-wired-lan] " alardam
@ 2020-12-04 17:20   ` Jakub Kicinski
  -1 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2020-12-04 17:20 UTC (permalink / raw)
  To: alardam
  Cc: magnus.karlsson, bjorn.topel, andrii.nakryiko, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke, maciej.fijalkowski,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Fri,  4 Dec 2020 11:28:56 +0100 alardam@gmail.com wrote:
>  * Extend ethtool netlink interface in order to get access to the XDP
>    bitmap (XDP_PROPERTIES_GET). [Toke]

That's a good direction, but I don't see why XDP caps belong in ethtool
at all? We use rtnetlink to manage the progs...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 0/5] New netdev feature flags for XDP
@ 2020-12-04 17:20   ` Jakub Kicinski
  0 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2020-12-04 17:20 UTC (permalink / raw)
  To: intel-wired-lan

On Fri,  4 Dec 2020 11:28:56 +0100 alardam at gmail.com wrote:
>  * Extend ethtool netlink interface in order to get access to the XDP
>    bitmap (XDP_PROPERTIES_GET). [Toke]

That's a good direction, but I don't see why XDP caps belong in ethtool
at all? We use rtnetlink to manage the progs...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 15:21         ` [Intel-wired-lan] " Daniel Borkmann
@ 2020-12-04 17:20           ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-04 17:20 UTC (permalink / raw)
  To: Daniel Borkmann, Maciej Fijalkowski
  Cc: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, netdev, davem, john.fastabend, hawk, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciejromanfijalkowski, intel-wired-lan,
	Marek Majtyka, Jesper Dangaard Brouer

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
>> On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke Høiland-Jørgensen wrote:
>>> alardam@gmail.com writes:
>>>> From: Marek Majtyka <marekx.majtyka@intel.com>
>>>>
>>>> Implement support for checking what kind of xdp functionality a netdev
>>>> supports. Previously, there was no way to do this other than to try
>>>> to create an AF_XDP socket on the interface or load an XDP program and see
>>>> if it worked. This commit changes this by adding a new variable which
>>>> describes all xdp supported functions on pretty detailed level:
>>>
>>> I like the direction this is going! :)
>>>
>>>>   - aborted
>>>>   - drop
>>>>   - pass
>>>>   - tx
>
> I strongly think we should _not_ merge any native XDP driver patchset
> that does not support/implement the above return codes. Could we
> instead group them together and call this something like XDP_BASE
> functionality to not give a wrong impression? If this is properly
> documented that these are basic must-have _requirements_, then users
> and driver developers both know what the expectations are.

I think there may have been drivers that only did DROP/PASS on first
merge; but adding TX to the "base set" is fine by me, as long as it's
actually enforced ;)

(As in, we originally said the same thing about the full feature set and
that never really worked out).

>>>>   - redirect
>>>
>>> Drivers can in principle implement support for the XDP_REDIRECT return
>>> code (and calling xdp_do_redirect()) without implementing ndo_xdp_xmit()
>>> for being the *target* of a redirect. While my quick grepping doesn't
>>> turn up any drivers that do only one of these right now, I think we've
>>> had examples of it in the past, so it would probably be better to split
>>> the redirect feature flag in two.
>>>
>>> This would also make it trivial to replace the check in __xdp_enqueue()
>>> (in devmap.c) from looking at whether the ndo is defined, and just
>>> checking the flag. It would be great if you could do this as part of
>>> this series.
>>>
>>> Maybe we could even make the 'redirect target' flag be set automatically
>>> if a driver implements ndo_xdp_xmit?
>> 
>> +1
>> 
>>>>   - zero copy
>>>>   - hardware offload.
>
> One other thing that is quite annoying to figure out sometimes and not always
> obvious from reading the driver code (and it may even differ depending on how
> the driver was built :/) is how much XDP headroom a driver really provides.
>
> We tried to standardize on a minimum guaranteed amount, but unfortunately not
> everyone seems to implement it, but I think it would be very useful to query
> this from application side, for example, consider that an app inserts a BPF
> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
> know which of the different encaps it implements are realistically possible on
> the underlying XDP supported dev.

How many distinct values are there in reality? Enough to express this in
a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
an additional field to get the exact value? If we implement the latter
we also run the risk of people actually implementing all sorts of weird
values, whereas if we constrain it to a few distinct values it's easier
to push back against adding new values (as it'll be obvious from the
addition of new flags).

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-04 17:20           ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2020-12-04 17:20 UTC (permalink / raw)
  To: intel-wired-lan

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
>> On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke H?iland-J?rgensen wrote:
>>> alardam at gmail.com writes:
>>>> From: Marek Majtyka <marekx.majtyka@intel.com>
>>>>
>>>> Implement support for checking what kind of xdp functionality a netdev
>>>> supports. Previously, there was no way to do this other than to try
>>>> to create an AF_XDP socket on the interface or load an XDP program and see
>>>> if it worked. This commit changes this by adding a new variable which
>>>> describes all xdp supported functions on pretty detailed level:
>>>
>>> I like the direction this is going! :)
>>>
>>>>   - aborted
>>>>   - drop
>>>>   - pass
>>>>   - tx
>
> I strongly think we should _not_ merge any native XDP driver patchset
> that does not support/implement the above return codes. Could we
> instead group them together and call this something like XDP_BASE
> functionality to not give a wrong impression? If this is properly
> documented that these are basic must-have _requirements_, then users
> and driver developers both know what the expectations are.

I think there may have been drivers that only did DROP/PASS on first
merge; but adding TX to the "base set" is fine by me, as long as it's
actually enforced ;)

(As in, we originally said the same thing about the full feature set and
that never really worked out).

>>>>   - redirect
>>>
>>> Drivers can in principle implement support for the XDP_REDIRECT return
>>> code (and calling xdp_do_redirect()) without implementing ndo_xdp_xmit()
>>> for being the *target* of a redirect. While my quick grepping doesn't
>>> turn up any drivers that do only one of these right now, I think we've
>>> had examples of it in the past, so it would probably be better to split
>>> the redirect feature flag in two.
>>>
>>> This would also make it trivial to replace the check in __xdp_enqueue()
>>> (in devmap.c) from looking at whether the ndo is defined, and just
>>> checking the flag. It would be great if you could do this as part of
>>> this series.
>>>
>>> Maybe we could even make the 'redirect target' flag be set automatically
>>> if a driver implements ndo_xdp_xmit?
>> 
>> +1
>> 
>>>>   - zero copy
>>>>   - hardware offload.
>
> One other thing that is quite annoying to figure out sometimes and not always
> obvious from reading the driver code (and it may even differ depending on how
> the driver was built :/) is how much XDP headroom a driver really provides.
>
> We tried to standardize on a minimum guaranteed amount, but unfortunately not
> everyone seems to implement it, but I think it would be very useful to query
> this from application side, for example, consider that an app inserts a BPF
> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
> know which of the different encaps it implements are realistically possible on
> the underlying XDP supported dev.

How many distinct values are there in reality? Enough to express this in
a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
an additional field to get the exact value? If we implement the latter
we also run the risk of people actually implementing all sorts of weird
values, whereas if we constrain it to a few distinct values it's easier
to push back against adding new values (as it'll be obvious from the
addition of new flags).

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 0/5] New netdev feature flags for XDP
  2020-12-04 17:20   ` [Intel-wired-lan] " Jakub Kicinski
@ 2020-12-04 17:26     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-04 17:26 UTC (permalink / raw)
  To: Jakub Kicinski, alardam
  Cc: magnus.karlsson, bjorn.topel, andrii.nakryiko, ast, daniel,
	netdev, davem, john.fastabend, hawk, maciej.fijalkowski,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

Jakub Kicinski <kuba@kernel.org> writes:

> On Fri,  4 Dec 2020 11:28:56 +0100 alardam@gmail.com wrote:
>>  * Extend ethtool netlink interface in order to get access to the XDP
>>    bitmap (XDP_PROPERTIES_GET). [Toke]
>
> That's a good direction, but I don't see why XDP caps belong in ethtool
> at all? We use rtnetlink to manage the progs...

You normally use ethtool to get all the other features a device support,
don't you? And for XDP you even use it to configure the number of
TXQs.

I mean, it could be an rtnetlink interface as well, of course, but I
don't think it's completely weird if this goes into ethtool...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 0/5] New netdev feature flags for XDP
@ 2020-12-04 17:26     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2020-12-04 17:26 UTC (permalink / raw)
  To: intel-wired-lan

Jakub Kicinski <kuba@kernel.org> writes:

> On Fri,  4 Dec 2020 11:28:56 +0100 alardam at gmail.com wrote:
>>  * Extend ethtool netlink interface in order to get access to the XDP
>>    bitmap (XDP_PROPERTIES_GET). [Toke]
>
> That's a good direction, but I don't see why XDP caps belong in ethtool
> at all? We use rtnetlink to manage the progs...

You normally use ethtool to get all the other features a device support,
don't you? And for XDP you even use it to configure the number of
TXQs.

I mean, it could be an rtnetlink interface as well, of course, but I
don't think it's completely weird if this goes into ethtool...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 0/5] New netdev feature flags for XDP
  2020-12-04 17:26     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
@ 2020-12-04 19:22       ` Jakub Kicinski
  -1 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2020-12-04 19:22 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, ast,
	daniel, netdev, davem, john.fastabend, hawk, maciej.fijalkowski,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Fri, 04 Dec 2020 18:26:10 +0100 Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> 
> > On Fri,  4 Dec 2020 11:28:56 +0100 alardam@gmail.com wrote:  
> >>  * Extend ethtool netlink interface in order to get access to the XDP
> >>    bitmap (XDP_PROPERTIES_GET). [Toke]  
> >
> > That's a good direction, but I don't see why XDP caps belong in ethtool
> > at all? We use rtnetlink to manage the progs...  
> 
> You normally use ethtool to get all the other features a device support,
> don't you?

Not really, please take a look at all the IFLA attributes. There's 
a bunch of capabilities there.

> And for XDP you even use it to configure the number of TXQs.
> 
> I mean, it could be an rtnetlink interface as well, of course, but I
> don't think it's completely weird if this goes into ethtool...


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 0/5] New netdev feature flags for XDP
@ 2020-12-04 19:22       ` Jakub Kicinski
  0 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2020-12-04 19:22 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 04 Dec 2020 18:26:10 +0100 Toke H?iland-J?rgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> 
> > On Fri,  4 Dec 2020 11:28:56 +0100 alardam at gmail.com wrote:  
> >>  * Extend ethtool netlink interface in order to get access to the XDP
> >>    bitmap (XDP_PROPERTIES_GET). [Toke]  
> >
> > That's a good direction, but I don't see why XDP caps belong in ethtool
> > at all? We use rtnetlink to manage the progs...  
> 
> You normally use ethtool to get all the other features a device support,
> don't you?

Not really, please take a look at all the IFLA attributes. There's 
a bunch of capabilities there.

> And for XDP you even use it to configure the number of TXQs.
> 
> I mean, it could be an rtnetlink interface as well, of course, but I
> don't think it's completely weird if this goes into ethtool...


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 17:20           ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
@ 2020-12-04 22:19             ` Daniel Borkmann
  -1 siblings, 0 replies; 120+ messages in thread
From: Daniel Borkmann @ 2020-12-04 22:19 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Maciej Fijalkowski
  Cc: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, netdev, davem, john.fastabend, hawk, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciejromanfijalkowski, intel-wired-lan,
	Marek Majtyka, Jesper Dangaard Brouer

On 12/4/20 6:20 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
[...]
>> We tried to standardize on a minimum guaranteed amount, but unfortunately not
>> everyone seems to implement it, but I think it would be very useful to query
>> this from application side, for example, consider that an app inserts a BPF
>> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
>> know which of the different encaps it implements are realistically possible on
>> the underlying XDP supported dev.
> 
> How many distinct values are there in reality? Enough to express this in
> a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
> an additional field to get the exact value? If we implement the latter
> we also run the risk of people actually implementing all sorts of weird
> values, whereas if we constrain it to a few distinct values it's easier
> to push back against adding new values (as it'll be obvious from the
> addition of new flags).

It's not everywhere straight forward to determine unfortunately, see also [0,1]
as some data points where Jesper looked into in the past, so in some cases it
might differ depending on the build/runtime config..

   [0] https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit@firesoul/
   [1] https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul/

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-04 22:19             ` Daniel Borkmann
  0 siblings, 0 replies; 120+ messages in thread
From: Daniel Borkmann @ 2020-12-04 22:19 UTC (permalink / raw)
  To: intel-wired-lan

On 12/4/20 6:20 PM, Toke H?iland-J?rgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
[...]
>> We tried to standardize on a minimum guaranteed amount, but unfortunately not
>> everyone seems to implement it, but I think it would be very useful to query
>> this from application side, for example, consider that an app inserts a BPF
>> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
>> know which of the different encaps it implements are realistically possible on
>> the underlying XDP supported dev.
> 
> How many distinct values are there in reality? Enough to express this in
> a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
> an additional field to get the exact value? If we implement the latter
> we also run the risk of people actually implementing all sorts of weird
> values, whereas if we constrain it to a few distinct values it's easier
> to push back against adding new values (as it'll be obvious from the
> addition of new flags).

It's not everywhere straight forward to determine unfortunately, see also [0,1]
as some data points where Jesper looked into in the past, so in some cases it
might differ depending on the build/runtime config..

   [0] https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit at firesoul/
   [1] https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit at firesoul/

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 22:19             ` [Intel-wired-lan] " Daniel Borkmann
@ 2020-12-07 11:54               ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-07 11:54 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Toke Høiland-Jørgensen, Maciej Fijalkowski, alardam,
	magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, netdev,
	davem, john.fastabend, hawk, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciejromanfijalkowski, intel-wired-lan,
	Marek Majtyka, brouer, Saeed Mahameed

On Fri, 4 Dec 2020 23:19:55 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 12/4/20 6:20 PM, Toke Høiland-Jørgensen wrote:
> > Daniel Borkmann <daniel@iogearbox.net> writes:  
> [...]
> >> We tried to standardize on a minimum guaranteed amount, but unfortunately not
> >> everyone seems to implement it, but I think it would be very useful to query
> >> this from application side, for example, consider that an app inserts a BPF
> >> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
> >> know which of the different encaps it implements are realistically possible on
> >> the underlying XDP supported dev.  
> > 
> > How many distinct values are there in reality? Enough to express this in
> > a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
> > an additional field to get the exact value? If we implement the latter
> > we also run the risk of people actually implementing all sorts of weird
> > values, whereas if we constrain it to a few distinct values it's easier
> > to push back against adding new values (as it'll be obvious from the
> > addition of new flags).  
> 
> It's not everywhere straight forward to determine unfortunately, see also [0,1]
> as some data points where Jesper looked into in the past, so in some cases it
> might differ depending on the build/runtime config..
> 
>    [0] https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit@firesoul/
>    [1] https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul/

Yes, unfortunately drivers have already gotten creative in this area,
and variations have sneaked in.  I remember that we were forced to
allow SFC driver to use 128 bytes headroom, to avoid a memory
corruption. I tried hard to have the minimum 192 bytes as it is 3
cachelines, but I failed to enforce this.

It might be valuable to expose info on the drivers headroom size, as
this will allow end-users to take advantage of this (instead of having
to use the lowest common headroom) and up-front in userspace rejecting
to load on e.g. SFC that have this annoying limitation.

BUT thinking about what the drivers headroom size MEANS to userspace,
I'm not sure it is wise to give this info to userspace.  The
XDP-headroom is used for several kernel internal things, that limit the
available space for growing packet-headroom.  E.g. (1) xdp_frame is
something that we likely need to grow (even-though I'm pushing back),
E.g. (2) metadata area which Saeed is looking to populate from driver
code (also reduce packet-headroom for encap-headers).  So, userspace
cannot use the XDP-headroom size to much...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-07 11:54               ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-07 11:54 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 4 Dec 2020 23:19:55 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 12/4/20 6:20 PM, Toke H?iland-J?rgensen wrote:
> > Daniel Borkmann <daniel@iogearbox.net> writes:  
> [...]
> >> We tried to standardize on a minimum guaranteed amount, but unfortunately not
> >> everyone seems to implement it, but I think it would be very useful to query
> >> this from application side, for example, consider that an app inserts a BPF
> >> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
> >> know which of the different encaps it implements are realistically possible on
> >> the underlying XDP supported dev.  
> > 
> > How many distinct values are there in reality? Enough to express this in
> > a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
> > an additional field to get the exact value? If we implement the latter
> > we also run the risk of people actually implementing all sorts of weird
> > values, whereas if we constrain it to a few distinct values it's easier
> > to push back against adding new values (as it'll be obvious from the
> > addition of new flags).  
> 
> It's not everywhere straight forward to determine unfortunately, see also [0,1]
> as some data points where Jesper looked into in the past, so in some cases it
> might differ depending on the build/runtime config..
> 
>    [0] https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit at firesoul/
>    [1] https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit at firesoul/

Yes, unfortunately drivers have already gotten creative in this area,
and variations have sneaked in.  I remember that we were forced to
allow SFC driver to use 128 bytes headroom, to avoid a memory
corruption. I tried hard to have the minimum 192 bytes as it is 3
cachelines, but I failed to enforce this.

It might be valuable to expose info on the drivers headroom size, as
this will allow end-users to take advantage of this (instead of having
to use the lowest common headroom) and up-front in userspace rejecting
to load on e.g. SFC that have this annoying limitation.

BUT thinking about what the drivers headroom size MEANS to userspace,
I'm not sure it is wise to give this info to userspace.  The
XDP-headroom is used for several kernel internal things, that limit the
available space for growing packet-headroom.  E.g. (1) xdp_frame is
something that we likely need to grow (even-though I'm pushing back),
E.g. (2) metadata area which Saeed is looking to populate from driver
code (also reduce packet-headroom for encap-headers).  So, userspace
cannot use the XDP-headroom size to much...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 22:19             ` [Intel-wired-lan] " Daniel Borkmann
@ 2020-12-07 12:03               ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-07 12:03 UTC (permalink / raw)
  To: Daniel Borkmann, Maciej Fijalkowski, Jesper Dangaard Brouer
  Cc: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, netdev, davem, john.fastabend, hawk, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciejromanfijalkowski, intel-wired-lan,
	Marek Majtyka

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 12/4/20 6:20 PM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
> [...]
>>> We tried to standardize on a minimum guaranteed amount, but unfortunately not
>>> everyone seems to implement it, but I think it would be very useful to query
>>> this from application side, for example, consider that an app inserts a BPF
>>> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
>>> know which of the different encaps it implements are realistically possible on
>>> the underlying XDP supported dev.
>> 
>> How many distinct values are there in reality? Enough to express this in
>> a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
>> an additional field to get the exact value? If we implement the latter
>> we also run the risk of people actually implementing all sorts of weird
>> values, whereas if we constrain it to a few distinct values it's easier
>> to push back against adding new values (as it'll be obvious from the
>> addition of new flags).
>
> It's not everywhere straight forward to determine unfortunately, see also [0,1]
> as some data points where Jesper looked into in the past, so in some cases it
> might differ depending on the build/runtime config..
>
>    [0] https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit@firesoul/
>    [1] https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul/

Right, well in that case maybe we should just expose the actual headroom
as a separate netlink attribute? Although I suppose that would require
another round of driver changes since Jesper's patch you linked above
only puts this into xdp_buff at XDP program runtime.

Jesper, WDYT?

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-07 12:03               ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2020-12-07 12:03 UTC (permalink / raw)
  To: intel-wired-lan

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 12/4/20 6:20 PM, Toke H?iland-J?rgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
> [...]
>>> We tried to standardize on a minimum guaranteed amount, but unfortunately not
>>> everyone seems to implement it, but I think it would be very useful to query
>>> this from application side, for example, consider that an app inserts a BPF
>>> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
>>> know which of the different encaps it implements are realistically possible on
>>> the underlying XDP supported dev.
>> 
>> How many distinct values are there in reality? Enough to express this in
>> a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
>> an additional field to get the exact value? If we implement the latter
>> we also run the risk of people actually implementing all sorts of weird
>> values, whereas if we constrain it to a few distinct values it's easier
>> to push back against adding new values (as it'll be obvious from the
>> addition of new flags).
>
> It's not everywhere straight forward to determine unfortunately, see also [0,1]
> as some data points where Jesper looked into in the past, so in some cases it
> might differ depending on the build/runtime config..
>
>    [0] https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit at firesoul/
>    [1] https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit at firesoul/

Right, well in that case maybe we should just expose the actual headroom
as a separate netlink attribute? Although I suppose that would require
another round of driver changes since Jesper's patch you linked above
only puts this into xdp_buff at XDP program runtime.

Jesper, WDYT?

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 0/5] New netdev feature flags for XDP
  2020-12-04 19:22       ` [Intel-wired-lan] " Jakub Kicinski
@ 2020-12-07 12:04         ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-07 12:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, ast,
	daniel, netdev, davem, john.fastabend, hawk, maciej.fijalkowski,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

Jakub Kicinski <kuba@kernel.org> writes:

> On Fri, 04 Dec 2020 18:26:10 +0100 Toke Høiland-Jørgensen wrote:
>> Jakub Kicinski <kuba@kernel.org> writes:
>> 
>> > On Fri,  4 Dec 2020 11:28:56 +0100 alardam@gmail.com wrote:  
>> >>  * Extend ethtool netlink interface in order to get access to the XDP
>> >>    bitmap (XDP_PROPERTIES_GET). [Toke]  
>> >
>> > That's a good direction, but I don't see why XDP caps belong in ethtool
>> > at all? We use rtnetlink to manage the progs...  
>> 
>> You normally use ethtool to get all the other features a device support,
>> don't you?
>
> Not really, please take a look at all the IFLA attributes. There's 
> a bunch of capabilities there.

Ah, right, TIL. Well, putting this new property in rtnetlink instead of
ethtool is fine by me as well :)

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 0/5] New netdev feature flags for XDP
@ 2020-12-07 12:04         ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2020-12-07 12:04 UTC (permalink / raw)
  To: intel-wired-lan

Jakub Kicinski <kuba@kernel.org> writes:

> On Fri, 04 Dec 2020 18:26:10 +0100 Toke H?iland-J?rgensen wrote:
>> Jakub Kicinski <kuba@kernel.org> writes:
>> 
>> > On Fri,  4 Dec 2020 11:28:56 +0100 alardam at gmail.com wrote:  
>> >>  * Extend ethtool netlink interface in order to get access to the XDP
>> >>    bitmap (XDP_PROPERTIES_GET). [Toke]  
>> >
>> > That's a good direction, but I don't see why XDP caps belong in ethtool
>> > at all? We use rtnetlink to manage the progs...  
>> 
>> You normally use ethtool to get all the other features a device support,
>> don't you?
>
> Not really, please take a look at all the IFLA attributes. There's 
> a bunch of capabilities there.

Ah, right, TIL. Well, putting this new property in rtnetlink instead of
ethtool is fine by me as well :)

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-07 11:54               ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-12-07 12:08                 ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-07 12:08 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Daniel Borkmann
  Cc: Maciej Fijalkowski, alardam, magnus.karlsson, bjorn.topel,
	andrii.nakryiko, kuba, ast, netdev, davem, john.fastabend, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka, brouer, Saeed Mahameed

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Fri, 4 Dec 2020 23:19:55 +0100
> Daniel Borkmann <daniel@iogearbox.net> wrote:
>
>> On 12/4/20 6:20 PM, Toke Høiland-Jørgensen wrote:
>> > Daniel Borkmann <daniel@iogearbox.net> writes:  
>> [...]
>> >> We tried to standardize on a minimum guaranteed amount, but unfortunately not
>> >> everyone seems to implement it, but I think it would be very useful to query
>> >> this from application side, for example, consider that an app inserts a BPF
>> >> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
>> >> know which of the different encaps it implements are realistically possible on
>> >> the underlying XDP supported dev.  
>> > 
>> > How many distinct values are there in reality? Enough to express this in
>> > a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
>> > an additional field to get the exact value? If we implement the latter
>> > we also run the risk of people actually implementing all sorts of weird
>> > values, whereas if we constrain it to a few distinct values it's easier
>> > to push back against adding new values (as it'll be obvious from the
>> > addition of new flags).  
>> 
>> It's not everywhere straight forward to determine unfortunately, see also [0,1]
>> as some data points where Jesper looked into in the past, so in some cases it
>> might differ depending on the build/runtime config..
>> 
>>    [0] https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit@firesoul/
>>    [1] https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul/
>
> Yes, unfortunately drivers have already gotten creative in this area,
> and variations have sneaked in.  I remember that we were forced to
> allow SFC driver to use 128 bytes headroom, to avoid a memory
> corruption. I tried hard to have the minimum 192 bytes as it is 3
> cachelines, but I failed to enforce this.
>
> It might be valuable to expose info on the drivers headroom size, as
> this will allow end-users to take advantage of this (instead of having
> to use the lowest common headroom) and up-front in userspace rejecting
> to load on e.g. SFC that have this annoying limitation.
>
> BUT thinking about what the drivers headroom size MEANS to userspace,
> I'm not sure it is wise to give this info to userspace.  The
> XDP-headroom is used for several kernel internal things, that limit the
> available space for growing packet-headroom.  E.g. (1) xdp_frame is
> something that we likely need to grow (even-though I'm pushing back),
> E.g. (2) metadata area which Saeed is looking to populate from driver
> code (also reduce packet-headroom for encap-headers).  So, userspace
> cannot use the XDP-headroom size to much...

(Ah, you had already replied, sorry seems I missed that).

Can we calculate a number from the headroom that is meaningful for
userspace? I suppose that would be "total number of bytes available for
metadata+packet extension"? Even with growing data structures, any
particular kernel should be able to inform userspace of the current
value, no?

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-07 12:08                 ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2020-12-07 12:08 UTC (permalink / raw)
  To: intel-wired-lan

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Fri, 4 Dec 2020 23:19:55 +0100
> Daniel Borkmann <daniel@iogearbox.net> wrote:
>
>> On 12/4/20 6:20 PM, Toke H?iland-J?rgensen wrote:
>> > Daniel Borkmann <daniel@iogearbox.net> writes:  
>> [...]
>> >> We tried to standardize on a minimum guaranteed amount, but unfortunately not
>> >> everyone seems to implement it, but I think it would be very useful to query
>> >> this from application side, for example, consider that an app inserts a BPF
>> >> prog at XDP doing custom encap shortly before XDP_TX so it would be useful to
>> >> know which of the different encaps it implements are realistically possible on
>> >> the underlying XDP supported dev.  
>> > 
>> > How many distinct values are there in reality? Enough to express this in
>> > a few flags (XDP_HEADROOM_128, XDP_HEADROOM_192, etc?), or does it need
>> > an additional field to get the exact value? If we implement the latter
>> > we also run the risk of people actually implementing all sorts of weird
>> > values, whereas if we constrain it to a few distinct values it's easier
>> > to push back against adding new values (as it'll be obvious from the
>> > addition of new flags).  
>> 
>> It's not everywhere straight forward to determine unfortunately, see also [0,1]
>> as some data points where Jesper looked into in the past, so in some cases it
>> might differ depending on the build/runtime config..
>> 
>>    [0] https://lore.kernel.org/bpf/158945314698.97035.5286827951225578467.stgit at firesoul/
>>    [1] https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit at firesoul/
>
> Yes, unfortunately drivers have already gotten creative in this area,
> and variations have sneaked in.  I remember that we were forced to
> allow SFC driver to use 128 bytes headroom, to avoid a memory
> corruption. I tried hard to have the minimum 192 bytes as it is 3
> cachelines, but I failed to enforce this.
>
> It might be valuable to expose info on the drivers headroom size, as
> this will allow end-users to take advantage of this (instead of having
> to use the lowest common headroom) and up-front in userspace rejecting
> to load on e.g. SFC that have this annoying limitation.
>
> BUT thinking about what the drivers headroom size MEANS to userspace,
> I'm not sure it is wise to give this info to userspace.  The
> XDP-headroom is used for several kernel internal things, that limit the
> available space for growing packet-headroom.  E.g. (1) xdp_frame is
> something that we likely need to grow (even-though I'm pushing back),
> E.g. (2) metadata area which Saeed is looking to populate from driver
> code (also reduce packet-headroom for encap-headers).  So, userspace
> cannot use the XDP-headroom size to much...

(Ah, you had already replied, sorry seems I missed that).

Can we calculate a number from the headroom that is meaningful for
userspace? I suppose that would be "total number of bytes available for
metadata+packet extension"? Even with growing data structures, any
particular kernel should be able to inform userspace of the current
value, no?

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-04 15:21         ` [Intel-wired-lan] " Daniel Borkmann
@ 2020-12-07 12:54           ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-07 12:54 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Maciej Fijalkowski, Toke Høiland-Jørgensen, alardam,
	magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, netdev,
	davem, john.fastabend, hawk, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciejromanfijalkowski, intel-wired-lan,
	Marek Majtyka

On Fri, 4 Dec 2020 16:21:08 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> > On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke Høiland-Jørgensen wrote:  
> >> alardam@gmail.com writes:  
> >>> From: Marek Majtyka <marekx.majtyka@intel.com>
> >>>
> >>> Implement support for checking what kind of xdp functionality a netdev
> >>> supports. Previously, there was no way to do this other than to try
> >>> to create an AF_XDP socket on the interface or load an XDP program and see
> >>> if it worked. This commit changes this by adding a new variable which
> >>> describes all xdp supported functions on pretty detailed level:  
> >>
> >> I like the direction this is going! :)

(Me too, don't get discouraged by our nitpicking, keep working on this! :-))

> >>  
> >>>   - aborted
> >>>   - drop
> >>>   - pass
> >>>   - tx  
> 
> I strongly think we should _not_ merge any native XDP driver patchset
> that does not support/implement the above return codes. 

I agree, with above statement.

> Could we instead group them together and call this something like
> XDP_BASE functionality to not give a wrong impression?

I disagree.  I can accept that XDP_BASE include aborted+drop+pass.

I think we need to keep XDP_TX action separate, because I think that
there are use-cases where the we want to disable XDP_TX due to end-user
policy or hardware limitations.

Use-case(1): Cloud-provider want to give customers (running VMs) ability
to load XDP program for DDoS protection (only), but don't want to allow
customer to use XDP_TX (that can implement LB or cheat their VM
isolation policy).

Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
resources, as the use-case is only DDoS.  Today we have this problem
with the ixgbe hardware, that cannot load XDP programs on systems with
more than 192 CPUs.


> If this is properly documented that these are basic must-have
> _requirements_, then users and driver developers both know what the
> expectations are.

We can still document that XDP_TX is a must-have requirement, when a
driver implements XDP.


> >>>   - redirect  
> >>


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-07 12:54           ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-07 12:54 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 4 Dec 2020 16:21:08 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> > On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke H?iland-J?rgensen wrote:  
> >> alardam at gmail.com writes:  
> >>> From: Marek Majtyka <marekx.majtyka@intel.com>
> >>>
> >>> Implement support for checking what kind of xdp functionality a netdev
> >>> supports. Previously, there was no way to do this other than to try
> >>> to create an AF_XDP socket on the interface or load an XDP program and see
> >>> if it worked. This commit changes this by adding a new variable which
> >>> describes all xdp supported functions on pretty detailed level:  
> >>
> >> I like the direction this is going! :)

(Me too, don't get discouraged by our nitpicking, keep working on this! :-))

> >>  
> >>>   - aborted
> >>>   - drop
> >>>   - pass
> >>>   - tx  
> 
> I strongly think we should _not_ merge any native XDP driver patchset
> that does not support/implement the above return codes. 

I agree, with above statement.

> Could we instead group them together and call this something like
> XDP_BASE functionality to not give a wrong impression?

I disagree.  I can accept that XDP_BASE include aborted+drop+pass.

I think we need to keep XDP_TX action separate, because I think that
there are use-cases where the we want to disable XDP_TX due to end-user
policy or hardware limitations.

Use-case(1): Cloud-provider want to give customers (running VMs) ability
to load XDP program for DDoS protection (only), but don't want to allow
customer to use XDP_TX (that can implement LB or cheat their VM
isolation policy).

Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
resources, as the use-case is only DDoS.  Today we have this problem
with the ixgbe hardware, that cannot load XDP programs on systems with
more than 192 CPUs.


> If this is properly documented that these are basic must-have
> _requirements_, then users and driver developers both know what the
> expectations are.

We can still document that XDP_TX is a must-have requirement, when a
driver implements XDP.


> >>>   - redirect  
> >>


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-07 12:54           ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-12-07 20:52             ` John Fastabend
  -1 siblings, 0 replies; 120+ messages in thread
From: John Fastabend @ 2020-12-07 20:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Daniel Borkmann
  Cc: Maciej Fijalkowski, Toke Høiland-Jørgensen, alardam,
	magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, netdev,
	davem, john.fastabend, hawk, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciejromanfijalkowski, intel-wired-lan,
	Marek Majtyka

Jesper Dangaard Brouer wrote:
> On Fri, 4 Dec 2020 16:21:08 +0100
> Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
> > On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> > > On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke Høiland-Jørgensen wrote:  
> > >> alardam@gmail.com writes:  
> > >>> From: Marek Majtyka <marekx.majtyka@intel.com>
> > >>>
> > >>> Implement support for checking what kind of xdp functionality a netdev
> > >>> supports. Previously, there was no way to do this other than to try
> > >>> to create an AF_XDP socket on the interface or load an XDP program and see
> > >>> if it worked. This commit changes this by adding a new variable which
> > >>> describes all xdp supported functions on pretty detailed level:  
> > >>
> > >> I like the direction this is going! :)
> 
> (Me too, don't get discouraged by our nitpicking, keep working on this! :-))
> 
> > >>  
> > >>>   - aborted
> > >>>   - drop
> > >>>   - pass
> > >>>   - tx  
> > 
> > I strongly think we should _not_ merge any native XDP driver patchset
> > that does not support/implement the above return codes. 
> 
> I agree, with above statement.
> 
> > Could we instead group them together and call this something like
> > XDP_BASE functionality to not give a wrong impression?
> 
> I disagree.  I can accept that XDP_BASE include aborted+drop+pass.
> 
> I think we need to keep XDP_TX action separate, because I think that
> there are use-cases where the we want to disable XDP_TX due to end-user
> policy or hardware limitations.

How about we discover this at load time though. Meaning if the program
doesn't use XDP_TX then the hardware can skip resource allocations for
it. I think we could have verifier or extra pass discover the use of
XDP_TX and then pass a bit down to driver to enable/disable TX caps.

> 
> Use-case(1): Cloud-provider want to give customers (running VMs) ability
> to load XDP program for DDoS protection (only), but don't want to allow
> customer to use XDP_TX (that can implement LB or cheat their VM
> isolation policy).

Not following. What interface do they want to allow loading on? If its
the VM interface then I don't see how it matters. From outside the
VM there should be no way to discover if its done in VM or in tc or
some other stack.

If its doing some onloading/offloading I would assume they need to
ensure the isolation, etc. is still maintained because you can't
let one VMs program work on other VMs packets safely.

So what did I miss, above doesn't make sense to me.

> 
> Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> resources, as the use-case is only DDoS.  Today we have this problem
> with the ixgbe hardware, that cannot load XDP programs on systems with
> more than 192 CPUs.

The ixgbe issues is just a bug or missing-feature in my opinion.

I think we just document that XDP_TX consumes resources and if users
care they shouldn't use XD_TX in programs and in that case hardware
should via program discovery not allocate the resource. This seems
cleaner in my opinion then more bits for features.

> 
> 
> > If this is properly documented that these are basic must-have
> > _requirements_, then users and driver developers both know what the
> > expectations are.
> 
> We can still document that XDP_TX is a must-have requirement, when a
> driver implements XDP.

+1

> 
> 
> > >>>   - redirect  
> > >>
> 
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 



^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-07 20:52             ` John Fastabend
  0 siblings, 0 replies; 120+ messages in thread
From: John Fastabend @ 2020-12-07 20:52 UTC (permalink / raw)
  To: intel-wired-lan

Jesper Dangaard Brouer wrote:
> On Fri, 4 Dec 2020 16:21:08 +0100
> Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
> > On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> > > On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke H?iland-J?rgensen wrote:  
> > >> alardam at gmail.com writes:  
> > >>> From: Marek Majtyka <marekx.majtyka@intel.com>
> > >>>
> > >>> Implement support for checking what kind of xdp functionality a netdev
> > >>> supports. Previously, there was no way to do this other than to try
> > >>> to create an AF_XDP socket on the interface or load an XDP program and see
> > >>> if it worked. This commit changes this by adding a new variable which
> > >>> describes all xdp supported functions on pretty detailed level:  
> > >>
> > >> I like the direction this is going! :)
> 
> (Me too, don't get discouraged by our nitpicking, keep working on this! :-))
> 
> > >>  
> > >>>   - aborted
> > >>>   - drop
> > >>>   - pass
> > >>>   - tx  
> > 
> > I strongly think we should _not_ merge any native XDP driver patchset
> > that does not support/implement the above return codes. 
> 
> I agree, with above statement.
> 
> > Could we instead group them together and call this something like
> > XDP_BASE functionality to not give a wrong impression?
> 
> I disagree.  I can accept that XDP_BASE include aborted+drop+pass.
> 
> I think we need to keep XDP_TX action separate, because I think that
> there are use-cases where the we want to disable XDP_TX due to end-user
> policy or hardware limitations.

How about we discover this at load time though. Meaning if the program
doesn't use XDP_TX then the hardware can skip resource allocations for
it. I think we could have verifier or extra pass discover the use of
XDP_TX and then pass a bit down to driver to enable/disable TX caps.

> 
> Use-case(1): Cloud-provider want to give customers (running VMs) ability
> to load XDP program for DDoS protection (only), but don't want to allow
> customer to use XDP_TX (that can implement LB or cheat their VM
> isolation policy).

Not following. What interface do they want to allow loading on? If its
the VM interface then I don't see how it matters. From outside the
VM there should be no way to discover if its done in VM or in tc or
some other stack.

If its doing some onloading/offloading I would assume they need to
ensure the isolation, etc. is still maintained because you can't
let one VMs program work on other VMs packets safely.

So what did I miss, above doesn't make sense to me.

> 
> Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> resources, as the use-case is only DDoS.  Today we have this problem
> with the ixgbe hardware, that cannot load XDP programs on systems with
> more than 192 CPUs.

The ixgbe issues is just a bug or missing-feature in my opinion.

I think we just document that XDP_TX consumes resources and if users
care they shouldn't use XD_TX in programs and in that case hardware
should via program discovery not allocate the resource. This seems
cleaner in my opinion then more bits for features.

> 
> 
> > If this is properly documented that these are basic must-have
> > _requirements_, then users and driver developers both know what the
> > expectations are.
> 
> We can still document that XDP_TX is a must-have requirement, when a
> driver implements XDP.

+1

> 
> 
> > >>>   - redirect  
> > >>
> 
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-07 20:52             ` [Intel-wired-lan] " John Fastabend
@ 2020-12-07 22:38               ` Saeed Mahameed
  -1 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-07 22:38 UTC (permalink / raw)
  To: John Fastabend, Jesper Dangaard Brouer, Daniel Borkmann
  Cc: Maciej Fijalkowski, Toke Høiland-Jørgensen, alardam,
	magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, netdev,
	davem, hawk, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

On Mon, 2020-12-07 at 12:52 -0800, John Fastabend wrote:
> Jesper Dangaard Brouer wrote:
> > On Fri, 4 Dec 2020 16:21:08 +0100
> > Daniel Borkmann <daniel@iogearbox.net> wrote:
> > 
> > > On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> > > > On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke Høiland-
> > > > Jørgensen wrote:  
> > > > > alardam@gmail.com writes:  
> > > > > > From: Marek Majtyka <marekx.majtyka@intel.com>
> > > > > > 
> > > > > > Implement support for checking what kind of xdp
> > > > > > functionality a netdev
> > > > > > supports. Previously, there was no way to do this other
> > > > > > than to try
> > > > > > to create an AF_XDP socket on the interface or load an XDP
> > > > > > program and see
> > > > > > if it worked. This commit changes this by adding a new
> > > > > > variable which
> > > > > > describes all xdp supported functions on pretty detailed
> > > > > > level:  
> > > > > 
> > > > > I like the direction this is going! :)
> > 
> > (Me too, don't get discouraged by our nitpicking, keep working on
> > this! :-))
> > 
> > > > >  
> > > > > >   - aborted
> > > > > >   - drop
> > > > > >   - pass
> > > > > >   - tx  
> > > 
> > > I strongly think we should _not_ merge any native XDP driver
> > > patchset
> > > that does not support/implement the above return codes. 
> > 
> > I agree, with above statement.
> > 
> > > Could we instead group them together and call this something like
> > > XDP_BASE functionality to not give a wrong impression?
> > 
> > I disagree.  I can accept that XDP_BASE include aborted+drop+pass.
> > 
XDP_BASE is a weird name i vote:  
XDP_FLAG_RX,
XDP_FLAG_TX,
XDP_FLAG_REDIRECT,
XDP_FLAG_AF_XDP,
XDP_FLAG_AFXDP_ZC

> > I think we need to keep XDP_TX action separate, because I think
> > that
> > there are use-cases where the we want to disable XDP_TX due to end-
> > user
> > policy or hardware limitations.
> 
> How about we discover this at load time though. Meaning if the
> program
> doesn't use XDP_TX then the hardware can skip resource allocations
> for
> it. I think we could have verifier or extra pass discover the use of
> XDP_TX and then pass a bit down to driver to enable/disable TX caps.
> 

+1, how about we also attach some attributes to the program that would
tell the kernel/driver how to prepare configure itself for the new
program ?

Attributes like how much headroom the program needs, what meta data
driver must provide, should the driver do csum on tx, etc .. 

some attribute can be extracted from the byte code/logic others are
stated explicitly in some predefined section in the XDP prog itself.

On a second thought, this could be disruptive, users will eventually
want to replace XDP progs, and they might want a persistent config
prior to loading/reloading any prog to avoid reconfigs (packet drops)
between progs.

> > Use-case(1): Cloud-provider want to give customers (running VMs)
> > ability
> > to load XDP program for DDoS protection (only), but don't want to
> > allow
> > customer to use XDP_TX (that can implement LB or cheat their VM
> > isolation policy).
> 
> Not following. What interface do they want to allow loading on? If
> its
> the VM interface then I don't see how it matters. From outside the
> VM there should be no way to discover if its done in VM or in tc or
> some other stack.
> 
> If its doing some onloading/offloading I would assume they need to
> ensure the isolation, etc. is still maintained because you can't
> let one VMs program work on other VMs packets safely.
> 
> So what did I miss, above doesn't make sense to me.
> 
> > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > resources, as the use-case is only DDoS.  Today we have this
> > problem
> > with the ixgbe hardware, that cannot load XDP programs on systems
> > with
> > more than 192 CPUs.
> 
> The ixgbe issues is just a bug or missing-feature in my opinion.
> 
> I think we just document that XDP_TX consumes resources and if users
> care they shouldn't use XD_TX in programs and in that case hardware
> should via program discovery not allocate the resource. This seems
> cleaner in my opinion then more bits for features.
> 
> > 
> > > If this is properly documented that these are basic must-have
> > > _requirements_, then users and driver developers both know what
> > > the
> > > expectations are.
> > 
> > We can still document that XDP_TX is a must-have requirement, when
> > a
> > driver implements XDP.
> 
> +1
> 

Ho about xdp redirect ? 
do we still need to load a no-op program on the egress netdev so it
would allocate the xdp tx/redirect queues ? 

Adding the above discovery feature will break xdp redirect native mode
and will require to have a special flag for xdp_redirect, so it
actually makes more sense to have a unique knob to turn on XDP tx, for
the redirect use case.






^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-07 22:38               ` Saeed Mahameed
  0 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-07 22:38 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, 2020-12-07 at 12:52 -0800, John Fastabend wrote:
> Jesper Dangaard Brouer wrote:
> > On Fri, 4 Dec 2020 16:21:08 +0100
> > Daniel Borkmann <daniel@iogearbox.net> wrote:
> > 
> > > On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> > > > On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke H?iland-
> > > > J?rgensen wrote:  
> > > > > alardam at gmail.com writes:  
> > > > > > From: Marek Majtyka <marekx.majtyka@intel.com>
> > > > > > 
> > > > > > Implement support for checking what kind of xdp
> > > > > > functionality a netdev
> > > > > > supports. Previously, there was no way to do this other
> > > > > > than to try
> > > > > > to create an AF_XDP socket on the interface or load an XDP
> > > > > > program and see
> > > > > > if it worked. This commit changes this by adding a new
> > > > > > variable which
> > > > > > describes all xdp supported functions on pretty detailed
> > > > > > level:  
> > > > > 
> > > > > I like the direction this is going! :)
> > 
> > (Me too, don't get discouraged by our nitpicking, keep working on
> > this! :-))
> > 
> > > > >  
> > > > > >   - aborted
> > > > > >   - drop
> > > > > >   - pass
> > > > > >   - tx  
> > > 
> > > I strongly think we should _not_ merge any native XDP driver
> > > patchset
> > > that does not support/implement the above return codes. 
> > 
> > I agree, with above statement.
> > 
> > > Could we instead group them together and call this something like
> > > XDP_BASE functionality to not give a wrong impression?
> > 
> > I disagree.  I can accept that XDP_BASE include aborted+drop+pass.
> > 
XDP_BASE is a weird name i vote:  
XDP_FLAG_RX,
XDP_FLAG_TX,
XDP_FLAG_REDIRECT,
XDP_FLAG_AF_XDP,
XDP_FLAG_AFXDP_ZC

> > I think we need to keep XDP_TX action separate, because I think
> > that
> > there are use-cases where the we want to disable XDP_TX due to end-
> > user
> > policy or hardware limitations.
> 
> How about we discover this at load time though. Meaning if the
> program
> doesn't use XDP_TX then the hardware can skip resource allocations
> for
> it. I think we could have verifier or extra pass discover the use of
> XDP_TX and then pass a bit down to driver to enable/disable TX caps.
> 

+1, how about we also attach some attributes to the program that would
tell the kernel/driver how to prepare configure itself for the new
program ?

Attributes like how much headroom the program needs, what meta data
driver must provide, should the driver do csum on tx, etc .. 

some attribute can be extracted from the byte code/logic others are
stated explicitly in some predefined section in the XDP prog itself.

On a second thought, this could be disruptive, users will eventually
want to replace XDP progs, and they might want a persistent config
prior to loading/reloading any prog to avoid reconfigs (packet drops)
between progs.

> > Use-case(1): Cloud-provider want to give customers (running VMs)
> > ability
> > to load XDP program for DDoS protection (only), but don't want to
> > allow
> > customer to use XDP_TX (that can implement LB or cheat their VM
> > isolation policy).
> 
> Not following. What interface do they want to allow loading on? If
> its
> the VM interface then I don't see how it matters. From outside the
> VM there should be no way to discover if its done in VM or in tc or
> some other stack.
> 
> If its doing some onloading/offloading I would assume they need to
> ensure the isolation, etc. is still maintained because you can't
> let one VMs program work on other VMs packets safely.
> 
> So what did I miss, above doesn't make sense to me.
> 
> > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > resources, as the use-case is only DDoS.  Today we have this
> > problem
> > with the ixgbe hardware, that cannot load XDP programs on systems
> > with
> > more than 192 CPUs.
> 
> The ixgbe issues is just a bug or missing-feature in my opinion.
> 
> I think we just document that XDP_TX consumes resources and if users
> care they shouldn't use XD_TX in programs and in that case hardware
> should via program discovery not allocate the resource. This seems
> cleaner in my opinion then more bits for features.
> 
> > 
> > > If this is properly documented that these are basic must-have
> > > _requirements_, then users and driver developers both know what
> > > the
> > > expectations are.
> > 
> > We can still document that XDP_TX is a must-have requirement, when
> > a
> > driver implements XDP.
> 
> +1
> 

Ho about xdp redirect ? 
do we still need to load a no-op program on the egress netdev so it
would allocate the xdp tx/redirect queues ? 

Adding the above discovery feature will break xdp redirect native mode
and will require to have a special flag for xdp_redirect, so it
actually makes more sense to have a unique knob to turn on XDP tx, for
the redirect use case.






^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-07 20:52             ` [Intel-wired-lan] " John Fastabend
@ 2020-12-07 23:07               ` Maciej Fijalkowski
  -1 siblings, 0 replies; 120+ messages in thread
From: Maciej Fijalkowski @ 2020-12-07 23:07 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jesper Dangaard Brouer, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Mon, Dec 07, 2020 at 12:52:22PM -0800, John Fastabend wrote:
> Jesper Dangaard Brouer wrote:
> > On Fri, 4 Dec 2020 16:21:08 +0100
> > Daniel Borkmann <daniel@iogearbox.net> wrote:
> > 
> > > On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> > > > On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke Høiland-Jørgensen wrote:  
> > > >> alardam@gmail.com writes:  
> > > >>> From: Marek Majtyka <marekx.majtyka@intel.com>
> > > >>>
> > > >>> Implement support for checking what kind of xdp functionality a netdev
> > > >>> supports. Previously, there was no way to do this other than to try
> > > >>> to create an AF_XDP socket on the interface or load an XDP program and see
> > > >>> if it worked. This commit changes this by adding a new variable which
> > > >>> describes all xdp supported functions on pretty detailed level:  
> > > >>
> > > >> I like the direction this is going! :)
> > 
> > (Me too, don't get discouraged by our nitpicking, keep working on this! :-))
> > 
> > > >>  
> > > >>>   - aborted
> > > >>>   - drop
> > > >>>   - pass
> > > >>>   - tx  
> > > 
> > > I strongly think we should _not_ merge any native XDP driver patchset
> > > that does not support/implement the above return codes. 
> > 
> > I agree, with above statement.
> > 
> > > Could we instead group them together and call this something like
> > > XDP_BASE functionality to not give a wrong impression?
> > 
> > I disagree.  I can accept that XDP_BASE include aborted+drop+pass.
> > 
> > I think we need to keep XDP_TX action separate, because I think that
> > there are use-cases where the we want to disable XDP_TX due to end-user
> > policy or hardware limitations.
> 
> How about we discover this at load time though. Meaning if the program
> doesn't use XDP_TX then the hardware can skip resource allocations for
> it. I think we could have verifier or extra pass discover the use of
> XDP_TX and then pass a bit down to driver to enable/disable TX caps.

+1

> 
> > 
> > Use-case(1): Cloud-provider want to give customers (running VMs) ability
> > to load XDP program for DDoS protection (only), but don't want to allow
> > customer to use XDP_TX (that can implement LB or cheat their VM
> > isolation policy).
> 
> Not following. What interface do they want to allow loading on? If its
> the VM interface then I don't see how it matters. From outside the
> VM there should be no way to discover if its done in VM or in tc or
> some other stack.
> 
> If its doing some onloading/offloading I would assume they need to
> ensure the isolation, etc. is still maintained because you can't
> let one VMs program work on other VMs packets safely.
> 
> So what did I miss, above doesn't make sense to me.
> 
> > 
> > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > resources, as the use-case is only DDoS.  Today we have this problem
> > with the ixgbe hardware, that cannot load XDP programs on systems with
> > more than 192 CPUs.
> 
> The ixgbe issues is just a bug or missing-feature in my opinion.

Not a bug, rather HW limitation?

> 
> I think we just document that XDP_TX consumes resources and if users
> care they shouldn't use XD_TX in programs and in that case hardware
> should via program discovery not allocate the resource. This seems
> cleaner in my opinion then more bits for features.

But what if I'm with some limited HW that actually has a support for XDP
and I would like to utilize XDP_TX?

Not all drivers that support XDP consume Tx resources. Recently igb got
support and it shares Tx queues between netstack and XDP.

I feel like we should have a sort-of best effort approach in case we
stumble upon the XDP_TX in prog being loaded and query the driver if it
would be able to provide the Tx resources on the current system, given
that normally we tend to have a queue per core.

In that case igb would say yes, ixgbe would say no and prog would be
rejected.

> 
> > 
> > 
> > > If this is properly documented that these are basic must-have
> > > _requirements_, then users and driver developers both know what the
> > > expectations are.
> > 
> > We can still document that XDP_TX is a must-have requirement, when a
> > driver implements XDP.
> 
> +1
> 
> > 
> > 
> > > >>>   - redirect  
> > > >>
> > 
> > 
> > -- 
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   LinkedIn: http://www.linkedin.com/in/brouer
> > 
> 
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-07 23:07               ` Maciej Fijalkowski
  0 siblings, 0 replies; 120+ messages in thread
From: Maciej Fijalkowski @ 2020-12-07 23:07 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Dec 07, 2020 at 12:52:22PM -0800, John Fastabend wrote:
> Jesper Dangaard Brouer wrote:
> > On Fri, 4 Dec 2020 16:21:08 +0100
> > Daniel Borkmann <daniel@iogearbox.net> wrote:
> > 
> > > On 12/4/20 1:46 PM, Maciej Fijalkowski wrote:
> > > > On Fri, Dec 04, 2020 at 01:18:31PM +0100, Toke H?iland-J?rgensen wrote:  
> > > >> alardam at gmail.com writes:  
> > > >>> From: Marek Majtyka <marekx.majtyka@intel.com>
> > > >>>
> > > >>> Implement support for checking what kind of xdp functionality a netdev
> > > >>> supports. Previously, there was no way to do this other than to try
> > > >>> to create an AF_XDP socket on the interface or load an XDP program and see
> > > >>> if it worked. This commit changes this by adding a new variable which
> > > >>> describes all xdp supported functions on pretty detailed level:  
> > > >>
> > > >> I like the direction this is going! :)
> > 
> > (Me too, don't get discouraged by our nitpicking, keep working on this! :-))
> > 
> > > >>  
> > > >>>   - aborted
> > > >>>   - drop
> > > >>>   - pass
> > > >>>   - tx  
> > > 
> > > I strongly think we should _not_ merge any native XDP driver patchset
> > > that does not support/implement the above return codes. 
> > 
> > I agree, with above statement.
> > 
> > > Could we instead group them together and call this something like
> > > XDP_BASE functionality to not give a wrong impression?
> > 
> > I disagree.  I can accept that XDP_BASE include aborted+drop+pass.
> > 
> > I think we need to keep XDP_TX action separate, because I think that
> > there are use-cases where the we want to disable XDP_TX due to end-user
> > policy or hardware limitations.
> 
> How about we discover this at load time though. Meaning if the program
> doesn't use XDP_TX then the hardware can skip resource allocations for
> it. I think we could have verifier or extra pass discover the use of
> XDP_TX and then pass a bit down to driver to enable/disable TX caps.

+1

> 
> > 
> > Use-case(1): Cloud-provider want to give customers (running VMs) ability
> > to load XDP program for DDoS protection (only), but don't want to allow
> > customer to use XDP_TX (that can implement LB or cheat their VM
> > isolation policy).
> 
> Not following. What interface do they want to allow loading on? If its
> the VM interface then I don't see how it matters. From outside the
> VM there should be no way to discover if its done in VM or in tc or
> some other stack.
> 
> If its doing some onloading/offloading I would assume they need to
> ensure the isolation, etc. is still maintained because you can't
> let one VMs program work on other VMs packets safely.
> 
> So what did I miss, above doesn't make sense to me.
> 
> > 
> > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > resources, as the use-case is only DDoS.  Today we have this problem
> > with the ixgbe hardware, that cannot load XDP programs on systems with
> > more than 192 CPUs.
> 
> The ixgbe issues is just a bug or missing-feature in my opinion.

Not a bug, rather HW limitation?

> 
> I think we just document that XDP_TX consumes resources and if users
> care they shouldn't use XD_TX in programs and in that case hardware
> should via program discovery not allocate the resource. This seems
> cleaner in my opinion then more bits for features.

But what if I'm with some limited HW that actually has a support for XDP
and I would like to utilize XDP_TX?

Not all drivers that support XDP consume Tx resources. Recently igb got
support and it shares Tx queues between netstack and XDP.

I feel like we should have a sort-of best effort approach in case we
stumble upon the XDP_TX in prog being loaded and query the driver if it
would be able to provide the Tx resources on the current system, given
that normally we tend to have a queue per core.

In that case igb would say yes, ixgbe would say no and prog would be
rejected.

> 
> > 
> > 
> > > If this is properly documented that these are basic must-have
> > > _requirements_, then users and driver developers both know what the
> > > expectations are.
> > 
> > We can still document that XDP_TX is a must-have requirement, when a
> > driver implements XDP.
> 
> +1
> 
> > 
> > 
> > > >>>   - redirect  
> > > >>
> > 
> > 
> > -- 
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   LinkedIn: http://www.linkedin.com/in/brouer
> > 
> 
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-07 20:52             ` [Intel-wired-lan] " John Fastabend
@ 2020-12-08  1:01               ` David Ahern
  -1 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-08  1:01 UTC (permalink / raw)
  To: John Fastabend, Jesper Dangaard Brouer, Daniel Borkmann
  Cc: Maciej Fijalkowski, Toke Høiland-Jørgensen, alardam,
	magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, netdev,
	davem, hawk, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

On 12/7/20 1:52 PM, John Fastabend wrote:
>>
>> I think we need to keep XDP_TX action separate, because I think that
>> there are use-cases where the we want to disable XDP_TX due to end-user
>> policy or hardware limitations.
> 
> How about we discover this at load time though. Meaning if the program
> doesn't use XDP_TX then the hardware can skip resource allocations for
> it. I think we could have verifier or extra pass discover the use of
> XDP_TX and then pass a bit down to driver to enable/disable TX caps.
> 

This was discussed in the context of virtio_net some months back - it is
hard to impossible to know a program will not return XDP_TX (e.g., value
comes from a map).

Flipping that around, what if the program attach indicates whether
XDP_TX could be returned. If so, driver manages the resource needs. If
not, no resource needed and if the program violates that and returns
XDP_TX the packet is dropped.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-08  1:01               ` David Ahern
  0 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-08  1:01 UTC (permalink / raw)
  To: intel-wired-lan

On 12/7/20 1:52 PM, John Fastabend wrote:
>>
>> I think we need to keep XDP_TX action separate, because I think that
>> there are use-cases where the we want to disable XDP_TX due to end-user
>> policy or hardware limitations.
> 
> How about we discover this at load time though. Meaning if the program
> doesn't use XDP_TX then the hardware can skip resource allocations for
> it. I think we could have verifier or extra pass discover the use of
> XDP_TX and then pass a bit down to driver to enable/disable TX caps.
> 

This was discussed in the context of virtio_net some months back - it is
hard to impossible to know a program will not return XDP_TX (e.g., value
comes from a map).

Flipping that around, what if the program attach indicates whether
XDP_TX could be returned. If so, driver manages the resource needs. If
not, no resource needed and if the program violates that and returns
XDP_TX the packet is dropped.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-08  1:01               ` [Intel-wired-lan] " David Ahern
@ 2020-12-08  8:28                 ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-08  8:28 UTC (permalink / raw)
  To: David Ahern
  Cc: John Fastabend, Daniel Borkmann, Maciej Fijalkowski,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Mon, 7 Dec 2020 18:01:00 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 12/7/20 1:52 PM, John Fastabend wrote:
> >>
> >> I think we need to keep XDP_TX action separate, because I think that
> >> there are use-cases where the we want to disable XDP_TX due to end-user
> >> policy or hardware limitations.  
> > 
> > How about we discover this at load time though. 

Nitpick at XDP "attach" time. The general disconnect between BPF and
XDP is that BPF can verify at "load" time (as kernel knows what it
support) while XDP can have different support/features per driver, and
cannot do this until attachment time. (See later issue with tail calls).
(All other BPF-hooks don't have this issue)

> > Meaning if the program
> > doesn't use XDP_TX then the hardware can skip resource allocations for
> > it. I think we could have verifier or extra pass discover the use of
> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
> >   
> 
> This was discussed in the context of virtio_net some months back - it is
> hard to impossible to know a program will not return XDP_TX (e.g., value
> comes from a map).

It is hard, and sometimes not possible.  For maps the workaround is
that BPF-programmer adds a bound check on values from the map. If not
doing that the verifier have to assume all possible return codes are
used by BPF-prog.

The real nemesis is program tail calls, that can be added dynamically
after the XDP program is attached.  It is at attachment time that
changing the NIC resources is possible.  So, for program tail calls the
verifier have to assume all possible return codes are used by BPF-prog.

BPF now have function calls and function replace right(?)  How does
this affect this detection of possible return codes?


> Flipping that around, what if the program attach indicates whether
> XDP_TX could be returned. If so, driver manages the resource needs. If
> not, no resource needed and if the program violates that and returns
> XDP_TX the packet is dropped.

I do like this idea, as IMHO we do need something that is connected
with the BPF-prog, that describe what resources the program request
(either like above via detecting this in verifier, or simply manually
configuring this in the BPF-prog ELF file)

The main idea is that we all (I assume) want to provide a better
end-user interface/experience. By direct feedback to the end-user that
"loading+attaching" this XDP BPF-prog will not work, as e.g. driver
don't support a specific return code.  Thus, we need to reject
"loading+attaching" if features cannot be satisfied.

We need a solution as; today it is causing frustration for end-users
that packets can be (silently) dropped by XDP, e.g. if driver don't
support XDP_REDIRECT.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-08  8:28                 ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-08  8:28 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, 7 Dec 2020 18:01:00 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 12/7/20 1:52 PM, John Fastabend wrote:
> >>
> >> I think we need to keep XDP_TX action separate, because I think that
> >> there are use-cases where the we want to disable XDP_TX due to end-user
> >> policy or hardware limitations.  
> > 
> > How about we discover this at load time though. 

Nitpick at XDP "attach" time. The general disconnect between BPF and
XDP is that BPF can verify at "load" time (as kernel knows what it
support) while XDP can have different support/features per driver, and
cannot do this until attachment time. (See later issue with tail calls).
(All other BPF-hooks don't have this issue)

> > Meaning if the program
> > doesn't use XDP_TX then the hardware can skip resource allocations for
> > it. I think we could have verifier or extra pass discover the use of
> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
> >   
> 
> This was discussed in the context of virtio_net some months back - it is
> hard to impossible to know a program will not return XDP_TX (e.g., value
> comes from a map).

It is hard, and sometimes not possible.  For maps the workaround is
that BPF-programmer adds a bound check on values from the map. If not
doing that the verifier have to assume all possible return codes are
used by BPF-prog.

The real nemesis is program tail calls, that can be added dynamically
after the XDP program is attached.  It is at attachment time that
changing the NIC resources is possible.  So, for program tail calls the
verifier have to assume all possible return codes are used by BPF-prog.

BPF now have function calls and function replace right(?)  How does
this affect this detection of possible return codes?


> Flipping that around, what if the program attach indicates whether
> XDP_TX could be returned. If so, driver manages the resource needs. If
> not, no resource needed and if the program violates that and returns
> XDP_TX the packet is dropped.

I do like this idea, as IMHO we do need something that is connected
with the BPF-prog, that describe what resources the program request
(either like above via detecting this in verifier, or simply manually
configuring this in the BPF-prog ELF file)

The main idea is that we all (I assume) want to provide a better
end-user interface/experience. By direct feedback to the end-user that
"loading+attaching" this XDP BPF-prog will not work, as e.g. driver
don't support a specific return code.  Thus, we need to reject
"loading+attaching" if features cannot be satisfied.

We need a solution as; today it is causing frustration for end-users
that packets can be (silently) dropped by XDP, e.g. if driver don't
support XDP_REDIRECT.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-07 20:52             ` [Intel-wired-lan] " John Fastabend
@ 2020-12-08  9:00               ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-08  9:00 UTC (permalink / raw)
  To: John Fastabend
  Cc: brouer, Daniel Borkmann, Maciej Fijalkowski,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Mon, 07 Dec 2020 12:52:22 -0800
John Fastabend <john.fastabend@gmail.com> wrote:

> > Use-case(1): Cloud-provider want to give customers (running VMs) ability
> > to load XDP program for DDoS protection (only), but don't want to allow
> > customer to use XDP_TX (that can implement LB or cheat their VM
> > isolation policy).  
> 
> Not following. What interface do they want to allow loading on? If its
> the VM interface then I don't see how it matters. From outside the
> VM there should be no way to discover if its done in VM or in tc or
> some other stack.
> 
> If its doing some onloading/offloading I would assume they need to
> ensure the isolation, etc. is still maintained because you can't
> let one VMs program work on other VMs packets safely.
> 
> So what did I miss, above doesn't make sense to me.

The Cloud-provider want to load customer provided BPF-code on the
physical Host-OS NIC (that support XDP).  The customer can get access
to a web-interface where they can write or upload their BPF-prog.

As multiple customers can upload BPF-progs, the Cloud-provider have to
write a BPF-prog dispatcher that runs these multiple program.  This
could be done via BPF tail-calls, or via Toke's libxdp[1], or via
devmap XDP-progs per egress port.

The Cloud-provider don't fully trust customers BPF-prog.   They already
pre-filtered traffic to the given VM, so they can allow customers
freedom to see traffic and do XDP_PASS and XDP_DROP.  They
administratively (via ethtool) want to disable the XDP_REDIRECT and
XDP_TX driver feature, as it can be used for violation their VM
isolation policy between customers.

Is the use-case more clear now?


[1] https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-08  9:00               ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-08  9:00 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, 07 Dec 2020 12:52:22 -0800
John Fastabend <john.fastabend@gmail.com> wrote:

> > Use-case(1): Cloud-provider want to give customers (running VMs) ability
> > to load XDP program for DDoS protection (only), but don't want to allow
> > customer to use XDP_TX (that can implement LB or cheat their VM
> > isolation policy).  
> 
> Not following. What interface do they want to allow loading on? If its
> the VM interface then I don't see how it matters. From outside the
> VM there should be no way to discover if its done in VM or in tc or
> some other stack.
> 
> If its doing some onloading/offloading I would assume they need to
> ensure the isolation, etc. is still maintained because you can't
> let one VMs program work on other VMs packets safely.
> 
> So what did I miss, above doesn't make sense to me.

The Cloud-provider want to load customer provided BPF-code on the
physical Host-OS NIC (that support XDP).  The customer can get access
to a web-interface where they can write or upload their BPF-prog.

As multiple customers can upload BPF-progs, the Cloud-provider have to
write a BPF-prog dispatcher that runs these multiple program.  This
could be done via BPF tail-calls, or via Toke's libxdp[1], or via
devmap XDP-progs per egress port.

The Cloud-provider don't fully trust customers BPF-prog.   They already
pre-filtered traffic to the given VM, so they can allow customers
freedom to see traffic and do XDP_PASS and XDP_DROP.  They
administratively (via ethtool) want to disable the XDP_REDIRECT and
XDP_TX driver feature, as it can be used for violation their VM
isolation policy between customers.

Is the use-case more clear now?


[1] https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-08  9:00               ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-12-08  9:42                 ` Daniel Borkmann
  -1 siblings, 0 replies; 120+ messages in thread
From: Daniel Borkmann @ 2020-12-08  9:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, John Fastabend
  Cc: Maciej Fijalkowski, Toke Høiland-Jørgensen, alardam,
	magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, netdev,
	davem, hawk, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

On 12/8/20 10:00 AM, Jesper Dangaard Brouer wrote:
> On Mon, 07 Dec 2020 12:52:22 -0800
> John Fastabend <john.fastabend@gmail.com> wrote:
> 
>>> Use-case(1): Cloud-provider want to give customers (running VMs) ability
>>> to load XDP program for DDoS protection (only), but don't want to allow
>>> customer to use XDP_TX (that can implement LB or cheat their VM
>>> isolation policy).
>>
>> Not following. What interface do they want to allow loading on? If its
>> the VM interface then I don't see how it matters. From outside the
>> VM there should be no way to discover if its done in VM or in tc or
>> some other stack.
>>
>> If its doing some onloading/offloading I would assume they need to
>> ensure the isolation, etc. is still maintained because you can't
>> let one VMs program work on other VMs packets safely.
>>
>> So what did I miss, above doesn't make sense to me.
> 
> The Cloud-provider want to load customer provided BPF-code on the
> physical Host-OS NIC (that support XDP).  The customer can get access
> to a web-interface where they can write or upload their BPF-prog.
> 
> As multiple customers can upload BPF-progs, the Cloud-provider have to
> write a BPF-prog dispatcher that runs these multiple program.  This
> could be done via BPF tail-calls, or via Toke's libxdp[1], or via
> devmap XDP-progs per egress port.
> 
> The Cloud-provider don't fully trust customers BPF-prog.   They already
> pre-filtered traffic to the given VM, so they can allow customers
> freedom to see traffic and do XDP_PASS and XDP_DROP.  They
> administratively (via ethtool) want to disable the XDP_REDIRECT and
> XDP_TX driver feature, as it can be used for violation their VM
> isolation policy between customers.
> 
> Is the use-case more clear now?

I think we're talking about two different things. The use case as I understood
it in (1) mentioned to be able to disable XDP_TX for NICs that are deployed in
the VM. This would be a no-go as-is since that would mean my basic assumption
for attaching XDP progs is gone in that today return codes pass/drop/tx is
pretty much available everywhere on native XDP supported NICs. And if you've
tried it on major cloud providers like AWS or Azure that offer SRIOV-based
networking that works okay and further restricting this would mean breakage of
existing programs.

What you mean here is "offload" from guest to host which is a different use
case than what likely John and I read from your description in (1). Such program
should then be loaded via BPF offload API. Meaning, if offload is used and the
host is then configured to disallow XDP_TX for such requests from guests, then
these get rejected through such facility, but if the /same/ program was loaded as
regular native XDP where it's still running in the guest, then it must succeed.
These are two entirely different things.

It's not clear to me whether some ethtool XDP properties flag is the right place
to describe this (plus this needs to differ between offloaded / non-offloaded progs)
or whether this should be an implementation detail for things like virtio_net e.g.
via virtio_has_feature(). Feels more like the latter to me which already has such
a facility in place.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-08  9:42                 ` Daniel Borkmann
  0 siblings, 0 replies; 120+ messages in thread
From: Daniel Borkmann @ 2020-12-08  9:42 UTC (permalink / raw)
  To: intel-wired-lan

On 12/8/20 10:00 AM, Jesper Dangaard Brouer wrote:
> On Mon, 07 Dec 2020 12:52:22 -0800
> John Fastabend <john.fastabend@gmail.com> wrote:
> 
>>> Use-case(1): Cloud-provider want to give customers (running VMs) ability
>>> to load XDP program for DDoS protection (only), but don't want to allow
>>> customer to use XDP_TX (that can implement LB or cheat their VM
>>> isolation policy).
>>
>> Not following. What interface do they want to allow loading on? If its
>> the VM interface then I don't see how it matters. From outside the
>> VM there should be no way to discover if its done in VM or in tc or
>> some other stack.
>>
>> If its doing some onloading/offloading I would assume they need to
>> ensure the isolation, etc. is still maintained because you can't
>> let one VMs program work on other VMs packets safely.
>>
>> So what did I miss, above doesn't make sense to me.
> 
> The Cloud-provider want to load customer provided BPF-code on the
> physical Host-OS NIC (that support XDP).  The customer can get access
> to a web-interface where they can write or upload their BPF-prog.
> 
> As multiple customers can upload BPF-progs, the Cloud-provider have to
> write a BPF-prog dispatcher that runs these multiple program.  This
> could be done via BPF tail-calls, or via Toke's libxdp[1], or via
> devmap XDP-progs per egress port.
> 
> The Cloud-provider don't fully trust customers BPF-prog.   They already
> pre-filtered traffic to the given VM, so they can allow customers
> freedom to see traffic and do XDP_PASS and XDP_DROP.  They
> administratively (via ethtool) want to disable the XDP_REDIRECT and
> XDP_TX driver feature, as it can be used for violation their VM
> isolation policy between customers.
> 
> Is the use-case more clear now?

I think we're talking about two different things. The use case as I understood
it in (1) mentioned to be able to disable XDP_TX for NICs that are deployed in
the VM. This would be a no-go as-is since that would mean my basic assumption
for attaching XDP progs is gone in that today return codes pass/drop/tx is
pretty much available everywhere on native XDP supported NICs. And if you've
tried it on major cloud providers like AWS or Azure that offer SRIOV-based
networking that works okay and further restricting this would mean breakage of
existing programs.

What you mean here is "offload" from guest to host which is a different use
case than what likely John and I read from your description in (1). Such program
should then be loaded via BPF offload API. Meaning, if offload is used and the
host is then configured to disallow XDP_TX for such requests from guests, then
these get rejected through such facility, but if the /same/ program was loaded as
regular native XDP where it's still running in the guest, then it must succeed.
These are two entirely different things.

It's not clear to me whether some ethtool XDP properties flag is the right place
to describe this (plus this needs to differ between offloaded / non-offloaded progs)
or whether this should be an implementation detail for things like virtio_net e.g.
via virtio_has_feature(). Feels more like the latter to me which already has such
a facility in place.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-08  8:28                 ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-12-08 11:58                   ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-08 11:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern
  Cc: John Fastabend, Daniel Borkmann, Maciej Fijalkowski, alardam,
	magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, netdev,
	davem, hawk, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

Jesper Dangaard Brouer <jbrouer@redhat.com> writes:

> On Mon, 7 Dec 2020 18:01:00 -0700
> David Ahern <dsahern@gmail.com> wrote:
>
>> On 12/7/20 1:52 PM, John Fastabend wrote:
>> >>
>> >> I think we need to keep XDP_TX action separate, because I think that
>> >> there are use-cases where the we want to disable XDP_TX due to end-user
>> >> policy or hardware limitations.  
>> > 
>> > How about we discover this at load time though. 
>
> Nitpick at XDP "attach" time. The general disconnect between BPF and
> XDP is that BPF can verify at "load" time (as kernel knows what it
> support) while XDP can have different support/features per driver, and
> cannot do this until attachment time. (See later issue with tail calls).
> (All other BPF-hooks don't have this issue)
>
>> > Meaning if the program
>> > doesn't use XDP_TX then the hardware can skip resource allocations for
>> > it. I think we could have verifier or extra pass discover the use of
>> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
>> >   
>> 
>> This was discussed in the context of virtio_net some months back - it is
>> hard to impossible to know a program will not return XDP_TX (e.g., value
>> comes from a map).
>
> It is hard, and sometimes not possible.  For maps the workaround is
> that BPF-programmer adds a bound check on values from the map. If not
> doing that the verifier have to assume all possible return codes are
> used by BPF-prog.
>
> The real nemesis is program tail calls, that can be added dynamically
> after the XDP program is attached.  It is at attachment time that
> changing the NIC resources is possible.  So, for program tail calls the
> verifier have to assume all possible return codes are used by BPF-prog.

We actually had someone working on a scheme for how to express this for
programs some months ago, but unfortunately that stalled out (Jesper
already knows this, but FYI to the rest of you). In any case, I view
this as a "next step". Just exposing the feature bits to userspace will
help users today, and as a side effect, this also makes drivers declare
what they support, which we can then incorporate into the core code to,
e.g., reject attachment of programs that won't work anyway. But let's
do this in increments and not make the perfect the enemy of the good
here.

> BPF now have function calls and function replace right(?)  How does
> this affect this detection of possible return codes?

It does have the same issue as tail calls, in that the return code of
the function being replaced can obviously change. However, the verifier
knows the target of a replace, so it can propagate any constraints put
upon the caller if we implement it that way.

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-08 11:58                   ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2020-12-08 11:58 UTC (permalink / raw)
  To: intel-wired-lan

Jesper Dangaard Brouer <jbrouer@redhat.com> writes:

> On Mon, 7 Dec 2020 18:01:00 -0700
> David Ahern <dsahern@gmail.com> wrote:
>
>> On 12/7/20 1:52 PM, John Fastabend wrote:
>> >>
>> >> I think we need to keep XDP_TX action separate, because I think that
>> >> there are use-cases where the we want to disable XDP_TX due to end-user
>> >> policy or hardware limitations.  
>> > 
>> > How about we discover this at load time though. 
>
> Nitpick at XDP "attach" time. The general disconnect between BPF and
> XDP is that BPF can verify at "load" time (as kernel knows what it
> support) while XDP can have different support/features per driver, and
> cannot do this until attachment time. (See later issue with tail calls).
> (All other BPF-hooks don't have this issue)
>
>> > Meaning if the program
>> > doesn't use XDP_TX then the hardware can skip resource allocations for
>> > it. I think we could have verifier or extra pass discover the use of
>> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
>> >   
>> 
>> This was discussed in the context of virtio_net some months back - it is
>> hard to impossible to know a program will not return XDP_TX (e.g., value
>> comes from a map).
>
> It is hard, and sometimes not possible.  For maps the workaround is
> that BPF-programmer adds a bound check on values from the map. If not
> doing that the verifier have to assume all possible return codes are
> used by BPF-prog.
>
> The real nemesis is program tail calls, that can be added dynamically
> after the XDP program is attached.  It is at attachment time that
> changing the NIC resources is possible.  So, for program tail calls the
> verifier have to assume all possible return codes are used by BPF-prog.

We actually had someone working on a scheme for how to express this for
programs some months ago, but unfortunately that stalled out (Jesper
already knows this, but FYI to the rest of you). In any case, I view
this as a "next step". Just exposing the feature bits to userspace will
help users today, and as a side effect, this also makes drivers declare
what they support, which we can then incorporate into the core code to,
e.g., reject attachment of programs that won't work anyway. But let's
do this in increments and not make the perfect the enemy of the good
here.

> BPF now have function calls and function replace right(?)  How does
> this affect this detection of possible return codes?

It does have the same issue as tail calls, in that the return code of
the function being replaced can obviously change. However, the verifier
knows the target of a replace, so it can propagate any constraints put
upon the caller if we implement it that way.

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-08 11:58                   ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
@ 2020-12-09  5:50                     ` John Fastabend
  -1 siblings, 0 replies; 120+ messages in thread
From: John Fastabend @ 2020-12-09  5:50 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, David Ahern
  Cc: John Fastabend, Daniel Borkmann, Maciej Fijalkowski, alardam,
	magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, netdev,
	davem, hawk, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
> 
> > On Mon, 7 Dec 2020 18:01:00 -0700
> > David Ahern <dsahern@gmail.com> wrote:
> >
> >> On 12/7/20 1:52 PM, John Fastabend wrote:
> >> >>
> >> >> I think we need to keep XDP_TX action separate, because I think that
> >> >> there are use-cases where the we want to disable XDP_TX due to end-user
> >> >> policy or hardware limitations.  
> >> > 
> >> > How about we discover this at load time though. 
> >
> > Nitpick at XDP "attach" time. The general disconnect between BPF and
> > XDP is that BPF can verify at "load" time (as kernel knows what it
> > support) while XDP can have different support/features per driver, and
> > cannot do this until attachment time. (See later issue with tail calls).
> > (All other BPF-hooks don't have this issue)
> >
> >> > Meaning if the program
> >> > doesn't use XDP_TX then the hardware can skip resource allocations for
> >> > it. I think we could have verifier or extra pass discover the use of
> >> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
> >> >   
> >> 
> >> This was discussed in the context of virtio_net some months back - it is
> >> hard to impossible to know a program will not return XDP_TX (e.g., value
> >> comes from a map).
> >
> > It is hard, and sometimes not possible.  For maps the workaround is
> > that BPF-programmer adds a bound check on values from the map. If not
> > doing that the verifier have to assume all possible return codes are
> > used by BPF-prog.
> >
> > The real nemesis is program tail calls, that can be added dynamically
> > after the XDP program is attached.  It is at attachment time that
> > changing the NIC resources is possible.  So, for program tail calls the
> > verifier have to assume all possible return codes are used by BPF-prog.
> 
> We actually had someone working on a scheme for how to express this for
> programs some months ago, but unfortunately that stalled out (Jesper
> already knows this, but FYI to the rest of you). In any case, I view
> this as a "next step". Just exposing the feature bits to userspace will
> help users today, and as a side effect, this also makes drivers declare
> what they support, which we can then incorporate into the core code to,
> e.g., reject attachment of programs that won't work anyway. But let's
> do this in increments and not make the perfect the enemy of the good
> here.
> 
> > BPF now have function calls and function replace right(?)  How does
> > this affect this detection of possible return codes?
> 
> It does have the same issue as tail calls, in that the return code of
> the function being replaced can obviously change. However, the verifier
> knows the target of a replace, so it can propagate any constraints put
> upon the caller if we implement it that way.

OK I'm convinced its not possible to tell at attach time if a program
will use XDP_TX or not in general. And in fact for most real programs it
likely will not be knowable. At least most programs I look at these days
use either tail calls or function calls so seems like a dead end.

Also above somewhere it was pointed out that XDP_REDIRECT would want
the queues and it seems even more challenging to sort that out.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-09  5:50                     ` John Fastabend
  0 siblings, 0 replies; 120+ messages in thread
From: John Fastabend @ 2020-12-09  5:50 UTC (permalink / raw)
  To: intel-wired-lan

Toke H?iland-J?rgensen wrote:
> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
> 
> > On Mon, 7 Dec 2020 18:01:00 -0700
> > David Ahern <dsahern@gmail.com> wrote:
> >
> >> On 12/7/20 1:52 PM, John Fastabend wrote:
> >> >>
> >> >> I think we need to keep XDP_TX action separate, because I think that
> >> >> there are use-cases where the we want to disable XDP_TX due to end-user
> >> >> policy or hardware limitations.  
> >> > 
> >> > How about we discover this at load time though. 
> >
> > Nitpick at XDP "attach" time. The general disconnect between BPF and
> > XDP is that BPF can verify at "load" time (as kernel knows what it
> > support) while XDP can have different support/features per driver, and
> > cannot do this until attachment time. (See later issue with tail calls).
> > (All other BPF-hooks don't have this issue)
> >
> >> > Meaning if the program
> >> > doesn't use XDP_TX then the hardware can skip resource allocations for
> >> > it. I think we could have verifier or extra pass discover the use of
> >> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
> >> >   
> >> 
> >> This was discussed in the context of virtio_net some months back - it is
> >> hard to impossible to know a program will not return XDP_TX (e.g., value
> >> comes from a map).
> >
> > It is hard, and sometimes not possible.  For maps the workaround is
> > that BPF-programmer adds a bound check on values from the map. If not
> > doing that the verifier have to assume all possible return codes are
> > used by BPF-prog.
> >
> > The real nemesis is program tail calls, that can be added dynamically
> > after the XDP program is attached.  It is at attachment time that
> > changing the NIC resources is possible.  So, for program tail calls the
> > verifier have to assume all possible return codes are used by BPF-prog.
> 
> We actually had someone working on a scheme for how to express this for
> programs some months ago, but unfortunately that stalled out (Jesper
> already knows this, but FYI to the rest of you). In any case, I view
> this as a "next step". Just exposing the feature bits to userspace will
> help users today, and as a side effect, this also makes drivers declare
> what they support, which we can then incorporate into the core code to,
> e.g., reject attachment of programs that won't work anyway. But let's
> do this in increments and not make the perfect the enemy of the good
> here.
> 
> > BPF now have function calls and function replace right(?)  How does
> > this affect this detection of possible return codes?
> 
> It does have the same issue as tail calls, in that the return code of
> the function being replaced can obviously change. However, the verifier
> knows the target of a replace, so it can propagate any constraints put
> upon the caller if we implement it that way.

OK I'm convinced its not possible to tell at attach time if a program
will use XDP_TX or not in general. And in fact for most real programs it
likely will not be knowable. At least most programs I look at these days
use either tail calls or function calls so seems like a dead end.

Also above somewhere it was pointed out that XDP_REDIRECT would want
the queues and it seems even more challenging to sort that out.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-07 23:07               ` [Intel-wired-lan] " Maciej Fijalkowski
@ 2020-12-09  6:03                 ` John Fastabend
  -1 siblings, 0 replies; 120+ messages in thread
From: John Fastabend @ 2020-12-09  6:03 UTC (permalink / raw)
  To: Maciej Fijalkowski, John Fastabend
  Cc: Jesper Dangaard Brouer, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

> On Mon, Dec 07, 2020 at 12:52:22PM -0800, John Fastabend wrote:
> > Jesper Dangaard Brouer wrote:
> > > On Fri, 4 Dec 2020 16:21:08 +0100
> > > Daniel Borkmann <daniel@iogearbox.net> wrote:

[...] pruning the thread to answer Jesper.

> > > 
> > > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > > resources, as the use-case is only DDoS.  Today we have this problem
> > > with the ixgbe hardware, that cannot load XDP programs on systems with
> > > more than 192 CPUs.
> > 
> > The ixgbe issues is just a bug or missing-feature in my opinion.
> 
> Not a bug, rather HW limitation?

Well hardware has some max queue limit. Likely <192 otherwise I would
have kept doing queue per core on up to 192. But, ideally we should
still load and either share queues across multiple cores or restirct
down to a subset of CPUs. Do you need 192 cores for a 10gbps nic,
probably not. Yes, it requires some extra care, but should be doable
if someone cares enough. I gather current limitation/bug is because
no one has that configuration and/or has complained loud enough.

> 
> > 
> > I think we just document that XDP_TX consumes resources and if users
> > care they shouldn't use XD_TX in programs and in that case hardware
> > should via program discovery not allocate the resource. This seems
> > cleaner in my opinion then more bits for features.
> 
> But what if I'm with some limited HW that actually has a support for XDP
> and I would like to utilize XDP_TX?
> 
> Not all drivers that support XDP consume Tx resources. Recently igb got
> support and it shares Tx queues between netstack and XDP.

Makes sense to me.

> 
> I feel like we should have a sort-of best effort approach in case we
> stumble upon the XDP_TX in prog being loaded and query the driver if it
> would be able to provide the Tx resources on the current system, given
> that normally we tend to have a queue per core.

Why do we need to query? I guess you want some indication from the
driver its not going to be running in the ideal NIC configuraition?
I guess printing a warning would be the normal way to show that. But,
maybe your point is you want something easier to query?

> 
> In that case igb would say yes, ixgbe would say no and prog would be
> rejected.

I think the driver should load even if it can't meet the queue per
core quota. Refusing to load at all or just dropping packets on the
floor is not very friendly. I think we agree on that point.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-09  6:03                 ` John Fastabend
  0 siblings, 0 replies; 120+ messages in thread
From: John Fastabend @ 2020-12-09  6:03 UTC (permalink / raw)
  To: intel-wired-lan

> On Mon, Dec 07, 2020 at 12:52:22PM -0800, John Fastabend wrote:
> > Jesper Dangaard Brouer wrote:
> > > On Fri, 4 Dec 2020 16:21:08 +0100
> > > Daniel Borkmann <daniel@iogearbox.net> wrote:

[...] pruning the thread to answer Jesper.

> > > 
> > > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > > resources, as the use-case is only DDoS.  Today we have this problem
> > > with the ixgbe hardware, that cannot load XDP programs on systems with
> > > more than 192 CPUs.
> > 
> > The ixgbe issues is just a bug or missing-feature in my opinion.
> 
> Not a bug, rather HW limitation?

Well hardware has some max queue limit. Likely <192 otherwise I would
have kept doing queue per core on up to 192. But, ideally we should
still load and either share queues across multiple cores or restirct
down to a subset of CPUs. Do you need 192 cores for a 10gbps nic,
probably not. Yes, it requires some extra care, but should be doable
if someone cares enough. I gather current limitation/bug is because
no one has that configuration and/or has complained loud enough.

> 
> > 
> > I think we just document that XDP_TX consumes resources and if users
> > care they shouldn't use XD_TX in programs and in that case hardware
> > should via program discovery not allocate the resource. This seems
> > cleaner in my opinion then more bits for features.
> 
> But what if I'm with some limited HW that actually has a support for XDP
> and I would like to utilize XDP_TX?
> 
> Not all drivers that support XDP consume Tx resources. Recently igb got
> support and it shares Tx queues between netstack and XDP.

Makes sense to me.

> 
> I feel like we should have a sort-of best effort approach in case we
> stumble upon the XDP_TX in prog being loaded and query the driver if it
> would be able to provide the Tx resources on the current system, given
> that normally we tend to have a queue per core.

Why do we need to query? I guess you want some indication from the
driver its not going to be running in the ideal NIC configuraition?
I guess printing a warning would be the normal way to show that. But,
maybe your point is you want something easier to query?

> 
> In that case igb would say yes, ixgbe would say no and prog would be
> rejected.

I think the driver should load even if it can't meet the queue per
core quota. Refusing to load at all or just dropping packets on the
floor is not very friendly. I think we agree on that point.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-09  6:03                 ` [Intel-wired-lan] " John Fastabend
@ 2020-12-09  9:54                   ` Maciej Fijalkowski
  -1 siblings, 0 replies; 120+ messages in thread
From: Maciej Fijalkowski @ 2020-12-09  9:54 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jesper Dangaard Brouer, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Tue, Dec 08, 2020 at 10:03:51PM -0800, John Fastabend wrote:
> > On Mon, Dec 07, 2020 at 12:52:22PM -0800, John Fastabend wrote:
> > > Jesper Dangaard Brouer wrote:
> > > > On Fri, 4 Dec 2020 16:21:08 +0100
> > > > Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
> [...] pruning the thread to answer Jesper.

I think you meant me, but thanks anyway for responding :)

> 
> > > > 
> > > > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > > > resources, as the use-case is only DDoS.  Today we have this problem
> > > > with the ixgbe hardware, that cannot load XDP programs on systems with
> > > > more than 192 CPUs.
> > > 
> > > The ixgbe issues is just a bug or missing-feature in my opinion.
> > 
> > Not a bug, rather HW limitation?
> 
> Well hardware has some max queue limit. Likely <192 otherwise I would
> have kept doing queue per core on up to 192. But, ideally we should

Data sheet states its 128 Tx qs for ixgbe.

> still load and either share queues across multiple cores or restirct
> down to a subset of CPUs.

And that's the missing piece of logic, I suppose.

> Do you need 192 cores for a 10gbps nic, probably not.

Let's hear from Jesper :p

> Yes, it requires some extra care, but should be doable
> if someone cares enough. I gather current limitation/bug is because
> no one has that configuration and/or has complained loud enough.

I would say we're safe for queue per core approach for newer devices where
we have thousands of queues to play with. Older devices combined with big
cpu count can cause us some problems.

Wondering if drivers could have a problem when user would do something
weird as limiting the queue count to a lower value than cpu count and then
changing the irq affinity?

> 
> > 
> > > 
> > > I think we just document that XDP_TX consumes resources and if users
> > > care they shouldn't use XD_TX in programs and in that case hardware
> > > should via program discovery not allocate the resource. This seems
> > > cleaner in my opinion then more bits for features.
> > 
> > But what if I'm with some limited HW that actually has a support for XDP
> > and I would like to utilize XDP_TX?
> > 
> > Not all drivers that support XDP consume Tx resources. Recently igb got
> > support and it shares Tx queues between netstack and XDP.
> 
> Makes sense to me.
> 
> > 
> > I feel like we should have a sort-of best effort approach in case we
> > stumble upon the XDP_TX in prog being loaded and query the driver if it
> > would be able to provide the Tx resources on the current system, given
> > that normally we tend to have a queue per core.
> 
> Why do we need to query? I guess you want some indication from the
> driver its not going to be running in the ideal NIC configuraition?
> I guess printing a warning would be the normal way to show that. But,
> maybe your point is you want something easier to query?

I meant that given Jesper's example, what should we do? You don't have Tx
resources to pull at all. Should we have a data path for that case that
would share Tx qs between XDP/netstack? Probably not.

> 
> > 
> > In that case igb would say yes, ixgbe would say no and prog would be
> > rejected.
> 
> I think the driver should load even if it can't meet the queue per
> core quota. Refusing to load at all or just dropping packets on the
> floor is not very friendly. I think we agree on that point.

Agreed on that. But it needs some work. I can dabble on that a bit.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-09  9:54                   ` Maciej Fijalkowski
  0 siblings, 0 replies; 120+ messages in thread
From: Maciej Fijalkowski @ 2020-12-09  9:54 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Dec 08, 2020 at 10:03:51PM -0800, John Fastabend wrote:
> > On Mon, Dec 07, 2020 at 12:52:22PM -0800, John Fastabend wrote:
> > > Jesper Dangaard Brouer wrote:
> > > > On Fri, 4 Dec 2020 16:21:08 +0100
> > > > Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
> [...] pruning the thread to answer Jesper.

I think you meant me, but thanks anyway for responding :)

> 
> > > > 
> > > > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > > > resources, as the use-case is only DDoS.  Today we have this problem
> > > > with the ixgbe hardware, that cannot load XDP programs on systems with
> > > > more than 192 CPUs.
> > > 
> > > The ixgbe issues is just a bug or missing-feature in my opinion.
> > 
> > Not a bug, rather HW limitation?
> 
> Well hardware has some max queue limit. Likely <192 otherwise I would
> have kept doing queue per core on up to 192. But, ideally we should

Data sheet states its 128 Tx qs for ixgbe.

> still load and either share queues across multiple cores or restirct
> down to a subset of CPUs.

And that's the missing piece of logic, I suppose.

> Do you need 192 cores for a 10gbps nic, probably not.

Let's hear from Jesper :p

> Yes, it requires some extra care, but should be doable
> if someone cares enough. I gather current limitation/bug is because
> no one has that configuration and/or has complained loud enough.

I would say we're safe for queue per core approach for newer devices where
we have thousands of queues to play with. Older devices combined with big
cpu count can cause us some problems.

Wondering if drivers could have a problem when user would do something
weird as limiting the queue count to a lower value than cpu count and then
changing the irq affinity?

> 
> > 
> > > 
> > > I think we just document that XDP_TX consumes resources and if users
> > > care they shouldn't use XD_TX in programs and in that case hardware
> > > should via program discovery not allocate the resource. This seems
> > > cleaner in my opinion then more bits for features.
> > 
> > But what if I'm with some limited HW that actually has a support for XDP
> > and I would like to utilize XDP_TX?
> > 
> > Not all drivers that support XDP consume Tx resources. Recently igb got
> > support and it shares Tx queues between netstack and XDP.
> 
> Makes sense to me.
> 
> > 
> > I feel like we should have a sort-of best effort approach in case we
> > stumble upon the XDP_TX in prog being loaded and query the driver if it
> > would be able to provide the Tx resources on the current system, given
> > that normally we tend to have a queue per core.
> 
> Why do we need to query? I guess you want some indication from the
> driver its not going to be running in the ideal NIC configuraition?
> I guess printing a warning would be the normal way to show that. But,
> maybe your point is you want something easier to query?

I meant that given Jesper's example, what should we do? You don't have Tx
resources to pull at all. Should we have a data path for that case that
would share Tx qs between XDP/netstack? Probably not.

> 
> > 
> > In that case igb would say yes, ixgbe would say no and prog would be
> > rejected.
> 
> I think the driver should load even if it can't meet the queue per
> core quota. Refusing to load at all or just dropping packets on the
> floor is not very friendly. I think we agree on that point.

Agreed on that. But it needs some work. I can dabble on that a bit.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-09  5:50                     ` [Intel-wired-lan] " John Fastabend
@ 2020-12-09 10:26                       ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-09 10:26 UTC (permalink / raw)
  To: John Fastabend, Jesper Dangaard Brouer, David Ahern
  Cc: John Fastabend, Daniel Borkmann, Maciej Fijalkowski, alardam,
	magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, netdev,
	davem, hawk, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>> 
>> > On Mon, 7 Dec 2020 18:01:00 -0700
>> > David Ahern <dsahern@gmail.com> wrote:
>> >
>> >> On 12/7/20 1:52 PM, John Fastabend wrote:
>> >> >>
>> >> >> I think we need to keep XDP_TX action separate, because I think that
>> >> >> there are use-cases where the we want to disable XDP_TX due to end-user
>> >> >> policy or hardware limitations.  
>> >> > 
>> >> > How about we discover this at load time though. 
>> >
>> > Nitpick at XDP "attach" time. The general disconnect between BPF and
>> > XDP is that BPF can verify at "load" time (as kernel knows what it
>> > support) while XDP can have different support/features per driver, and
>> > cannot do this until attachment time. (See later issue with tail calls).
>> > (All other BPF-hooks don't have this issue)
>> >
>> >> > Meaning if the program
>> >> > doesn't use XDP_TX then the hardware can skip resource allocations for
>> >> > it. I think we could have verifier or extra pass discover the use of
>> >> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
>> >> >   
>> >> 
>> >> This was discussed in the context of virtio_net some months back - it is
>> >> hard to impossible to know a program will not return XDP_TX (e.g., value
>> >> comes from a map).
>> >
>> > It is hard, and sometimes not possible.  For maps the workaround is
>> > that BPF-programmer adds a bound check on values from the map. If not
>> > doing that the verifier have to assume all possible return codes are
>> > used by BPF-prog.
>> >
>> > The real nemesis is program tail calls, that can be added dynamically
>> > after the XDP program is attached.  It is at attachment time that
>> > changing the NIC resources is possible.  So, for program tail calls the
>> > verifier have to assume all possible return codes are used by BPF-prog.
>> 
>> We actually had someone working on a scheme for how to express this for
>> programs some months ago, but unfortunately that stalled out (Jesper
>> already knows this, but FYI to the rest of you). In any case, I view
>> this as a "next step". Just exposing the feature bits to userspace will
>> help users today, and as a side effect, this also makes drivers declare
>> what they support, which we can then incorporate into the core code to,
>> e.g., reject attachment of programs that won't work anyway. But let's
>> do this in increments and not make the perfect the enemy of the good
>> here.
>> 
>> > BPF now have function calls and function replace right(?)  How does
>> > this affect this detection of possible return codes?
>> 
>> It does have the same issue as tail calls, in that the return code of
>> the function being replaced can obviously change. However, the verifier
>> knows the target of a replace, so it can propagate any constraints put
>> upon the caller if we implement it that way.
>
> OK I'm convinced its not possible to tell at attach time if a program
> will use XDP_TX or not in general. And in fact for most real programs it
> likely will not be knowable. At least most programs I look at these days
> use either tail calls or function calls so seems like a dead end.
>
> Also above somewhere it was pointed out that XDP_REDIRECT would want
> the queues and it seems even more challenging to sort that out.

Yeah. Doesn't mean that all hope is lost for "reject stuff that doesn't
work". We could either do pessimistic return code detection (if we don't
know for sure assume all codes are used), or we could add metadata where
the program declares what it wants to do...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-09 10:26                       ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2020-12-09 10:26 UTC (permalink / raw)
  To: intel-wired-lan

John Fastabend <john.fastabend@gmail.com> writes:

> Toke H?iland-J?rgensen wrote:
>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>> 
>> > On Mon, 7 Dec 2020 18:01:00 -0700
>> > David Ahern <dsahern@gmail.com> wrote:
>> >
>> >> On 12/7/20 1:52 PM, John Fastabend wrote:
>> >> >>
>> >> >> I think we need to keep XDP_TX action separate, because I think that
>> >> >> there are use-cases where the we want to disable XDP_TX due to end-user
>> >> >> policy or hardware limitations.  
>> >> > 
>> >> > How about we discover this at load time though. 
>> >
>> > Nitpick at XDP "attach" time. The general disconnect between BPF and
>> > XDP is that BPF can verify at "load" time (as kernel knows what it
>> > support) while XDP can have different support/features per driver, and
>> > cannot do this until attachment time. (See later issue with tail calls).
>> > (All other BPF-hooks don't have this issue)
>> >
>> >> > Meaning if the program
>> >> > doesn't use XDP_TX then the hardware can skip resource allocations for
>> >> > it. I think we could have verifier or extra pass discover the use of
>> >> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
>> >> >   
>> >> 
>> >> This was discussed in the context of virtio_net some months back - it is
>> >> hard to impossible to know a program will not return XDP_TX (e.g., value
>> >> comes from a map).
>> >
>> > It is hard, and sometimes not possible.  For maps the workaround is
>> > that BPF-programmer adds a bound check on values from the map. If not
>> > doing that the verifier have to assume all possible return codes are
>> > used by BPF-prog.
>> >
>> > The real nemesis is program tail calls, that can be added dynamically
>> > after the XDP program is attached.  It is at attachment time that
>> > changing the NIC resources is possible.  So, for program tail calls the
>> > verifier have to assume all possible return codes are used by BPF-prog.
>> 
>> We actually had someone working on a scheme for how to express this for
>> programs some months ago, but unfortunately that stalled out (Jesper
>> already knows this, but FYI to the rest of you). In any case, I view
>> this as a "next step". Just exposing the feature bits to userspace will
>> help users today, and as a side effect, this also makes drivers declare
>> what they support, which we can then incorporate into the core code to,
>> e.g., reject attachment of programs that won't work anyway. But let's
>> do this in increments and not make the perfect the enemy of the good
>> here.
>> 
>> > BPF now have function calls and function replace right(?)  How does
>> > this affect this detection of possible return codes?
>> 
>> It does have the same issue as tail calls, in that the return code of
>> the function being replaced can obviously change. However, the verifier
>> knows the target of a replace, so it can propagate any constraints put
>> upon the caller if we implement it that way.
>
> OK I'm convinced its not possible to tell at attach time if a program
> will use XDP_TX or not in general. And in fact for most real programs it
> likely will not be knowable. At least most programs I look at these days
> use either tail calls or function calls so seems like a dead end.
>
> Also above somewhere it was pointed out that XDP_REDIRECT would want
> the queues and it seems even more challenging to sort that out.

Yeah. Doesn't mean that all hope is lost for "reject stuff that doesn't
work". We could either do pessimistic return code detection (if we don't
know for sure assume all codes are used), or we could add metadata where
the program declares what it wants to do...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-09  9:54                   ` [Intel-wired-lan] " Maciej Fijalkowski
@ 2020-12-09 11:52                     ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-09 11:52 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Wed, 9 Dec 2020 10:54:54 +0100
Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote:

> On Tue, Dec 08, 2020 at 10:03:51PM -0800, John Fastabend wrote:
> > > On Mon, Dec 07, 2020 at 12:52:22PM -0800, John Fastabend wrote:  
> > > > Jesper Dangaard Brouer wrote:  
> > > > > On Fri, 4 Dec 2020 16:21:08 +0100
> > > > > Daniel Borkmann <daniel@iogearbox.net> wrote:  
> > 
> > [...] pruning the thread to answer Jesper.  
> 
> I think you meant me, but thanks anyway for responding :)

I was about to say that ;-)

> > > > > 
> > > > > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > > > > resources, as the use-case is only DDoS.  Today we have this problem
> > > > > with the ixgbe hardware, that cannot load XDP programs on systems with
> > > > > more than 192 CPUs.  
> > > > 
> > > > The ixgbe issues is just a bug or missing-feature in my opinion.  
> > > 
> > > Not a bug, rather HW limitation?  
> > 
> > Well hardware has some max queue limit. Likely <192 otherwise I would
> > have kept doing queue per core on up to 192. But, ideally we should  
> 
> Data sheet states its 128 Tx qs for ixgbe.

I likely remember wrong, maybe it was only ~96 CPUs.  I do remember that
some TX queue were reserved for something else, and QA reported issues
(as I don't have this high end system myself).


> > still load and either share queues across multiple cores or restirct
> > down to a subset of CPUs.  
> 
> And that's the missing piece of logic, I suppose.
> 
> > Do you need 192 cores for a 10gbps nic, probably not.  
> 
> Let's hear from Jesper :p

LOL - of-cause you don't need 192 cores.  With XDP I will claim that
you only need 2 cores (with high GHz) to forward 10gbps wirespeed small
packets.

The point is that this only works, when we avoid atomic lock operations
per packet and bulk NIC PCIe tail/doorbell.  It was actually John's
invention/design to have a dedicated TX queue per core to avoid the
atomic lock operation per packet when queuing packets to the NIC.

 10G @64B give budget of 67.2 ns (241 cycles @ 3.60GHz)
 Atomic lock operation use:[1]
 - Type:spin_lock_unlock         Per elem: 34 cycles(tsc) 9.485 ns
 - Type:spin_lock_unlock_irqsave Per elem: 61 cycles(tsc) 17.125 ns
 (And atomic can affect Inst per cycle)

But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
(up-to 16) so it should not be a problem to solve this by sharing
TX-queue and talking a lock per 16 packets.  I still recommend that,
for fallback case,  you allocated a number a TX-queue and distribute
this across CPUs to avoid hitting a congested lock (above measurements
are the optimal non-congested atomic lock operation)

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c

> > Yes, it requires some extra care, but should be doable
> > if someone cares enough. I gather current limitation/bug is because
> > no one has that configuration and/or has complained loud enough.  
> 
> I would say we're safe for queue per core approach for newer devices where
> we have thousands of queues to play with. Older devices combined with big
> cpu count can cause us some problems.
> 
> Wondering if drivers could have a problem when user would do something
> weird as limiting the queue count to a lower value than cpu count and then
> changing the irq affinity?

Not sure what you mean.

But for XDP RX-side we use softirq NAPI guarantee to guard against
concurrent access to our (per-cpu) data structures.

> >   
> > >   
> > > > 
> > > > I think we just document that XDP_TX consumes resources and if users
> > > > care they shouldn't use XD_TX in programs and in that case hardware
> > > > should via program discovery not allocate the resource. This seems
> > > > cleaner in my opinion then more bits for features.  
> > > 
> > > But what if I'm with some limited HW that actually has a support for XDP
> > > and I would like to utilize XDP_TX?
> > > 
> > > Not all drivers that support XDP consume Tx resources. Recently igb got
> > > support and it shares Tx queues between netstack and XDP.  
> > 
> > Makes sense to me.
> >   
> > > 
> > > I feel like we should have a sort-of best effort approach in case we
> > > stumble upon the XDP_TX in prog being loaded and query the driver if it
> > > would be able to provide the Tx resources on the current system, given
> > > that normally we tend to have a queue per core.  
> > 
> > Why do we need to query? I guess you want some indication from the
> > driver its not going to be running in the ideal NIC configuraition?
> > I guess printing a warning would be the normal way to show that. But,
> > maybe your point is you want something easier to query?  
> 
> I meant that given Jesper's example, what should we do? You don't have Tx
> resources to pull at all. Should we have a data path for that case that
> would share Tx qs between XDP/netstack? Probably not.
> 

I think ixgbe should have a fallback mode, where it allocated e.g. 32
TX-queue for XDP xmits or even just same amount as RX-queues (I think
XDP_TX and XDP_REDIRECT can share these TX-queues dedicated to XDP).
When in fallback mode a lock need to be taken (sharded across CPUs),
but ndo_xdp_xmit will bulk up-to 16 packets, so it should not matter
too much.

I do think ixgbe should output a dmesg log message, to say it is in XDP
fallback mode with X number of TX-queues.  For us QA usually collect
the dmesg output after a test run.

   
> > > 
> > > In that case igb would say yes, ixgbe would say no and prog would be
> > > rejected.  
> > 
> > I think the driver should load even if it can't meet the queue per
> > core quota. Refusing to load at all or just dropping packets on the
> > floor is not very friendly. I think we agree on that point.  
> 
> Agreed on that. But it needs some work. I can dabble on that a bit.
> 

I will really appreciate if Intel can fix this in the ixgbe driver, and
implement a fallback method.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-09 11:52                     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-09 11:52 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 9 Dec 2020 10:54:54 +0100
Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote:

> On Tue, Dec 08, 2020 at 10:03:51PM -0800, John Fastabend wrote:
> > > On Mon, Dec 07, 2020 at 12:52:22PM -0800, John Fastabend wrote:  
> > > > Jesper Dangaard Brouer wrote:  
> > > > > On Fri, 4 Dec 2020 16:21:08 +0100
> > > > > Daniel Borkmann <daniel@iogearbox.net> wrote:  
> > 
> > [...] pruning the thread to answer Jesper.  
> 
> I think you meant me, but thanks anyway for responding :)

I was about to say that ;-)

> > > > > 
> > > > > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > > > > resources, as the use-case is only DDoS.  Today we have this problem
> > > > > with the ixgbe hardware, that cannot load XDP programs on systems with
> > > > > more than 192 CPUs.  
> > > > 
> > > > The ixgbe issues is just a bug or missing-feature in my opinion.  
> > > 
> > > Not a bug, rather HW limitation?  
> > 
> > Well hardware has some max queue limit. Likely <192 otherwise I would
> > have kept doing queue per core on up to 192. But, ideally we should  
> 
> Data sheet states its 128 Tx qs for ixgbe.

I likely remember wrong, maybe it was only ~96 CPUs.  I do remember that
some TX queue were reserved for something else, and QA reported issues
(as I don't have this high end system myself).


> > still load and either share queues across multiple cores or restirct
> > down to a subset of CPUs.  
> 
> And that's the missing piece of logic, I suppose.
> 
> > Do you need 192 cores for a 10gbps nic, probably not.  
> 
> Let's hear from Jesper :p

LOL - of-cause you don't need 192 cores.  With XDP I will claim that
you only need 2 cores (with high GHz) to forward 10gbps wirespeed small
packets.

The point is that this only works, when we avoid atomic lock operations
per packet and bulk NIC PCIe tail/doorbell.  It was actually John's
invention/design to have a dedicated TX queue per core to avoid the
atomic lock operation per packet when queuing packets to the NIC.

 10G @64B give budget of 67.2 ns (241 cycles @ 3.60GHz)
 Atomic lock operation use:[1]
 - Type:spin_lock_unlock         Per elem: 34 cycles(tsc) 9.485 ns
 - Type:spin_lock_unlock_irqsave Per elem: 61 cycles(tsc) 17.125 ns
 (And atomic can affect Inst per cycle)

But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
(up-to 16) so it should not be a problem to solve this by sharing
TX-queue and talking a lock per 16 packets.  I still recommend that,
for fallback case,  you allocated a number a TX-queue and distribute
this across CPUs to avoid hitting a congested lock (above measurements
are the optimal non-congested atomic lock operation)

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c

> > Yes, it requires some extra care, but should be doable
> > if someone cares enough. I gather current limitation/bug is because
> > no one has that configuration and/or has complained loud enough.  
> 
> I would say we're safe for queue per core approach for newer devices where
> we have thousands of queues to play with. Older devices combined with big
> cpu count can cause us some problems.
> 
> Wondering if drivers could have a problem when user would do something
> weird as limiting the queue count to a lower value than cpu count and then
> changing the irq affinity?

Not sure what you mean.

But for XDP RX-side we use softirq NAPI guarantee to guard against
concurrent access to our (per-cpu) data structures.

> >   
> > >   
> > > > 
> > > > I think we just document that XDP_TX consumes resources and if users
> > > > care they shouldn't use XD_TX in programs and in that case hardware
> > > > should via program discovery not allocate the resource. This seems
> > > > cleaner in my opinion then more bits for features.  
> > > 
> > > But what if I'm with some limited HW that actually has a support for XDP
> > > and I would like to utilize XDP_TX?
> > > 
> > > Not all drivers that support XDP consume Tx resources. Recently igb got
> > > support and it shares Tx queues between netstack and XDP.  
> > 
> > Makes sense to me.
> >   
> > > 
> > > I feel like we should have a sort-of best effort approach in case we
> > > stumble upon the XDP_TX in prog being loaded and query the driver if it
> > > would be able to provide the Tx resources on the current system, given
> > > that normally we tend to have a queue per core.  
> > 
> > Why do we need to query? I guess you want some indication from the
> > driver its not going to be running in the ideal NIC configuraition?
> > I guess printing a warning would be the normal way to show that. But,
> > maybe your point is you want something easier to query?  
> 
> I meant that given Jesper's example, what should we do? You don't have Tx
> resources to pull at all. Should we have a data path for that case that
> would share Tx qs between XDP/netstack? Probably not.
> 

I think ixgbe should have a fallback mode, where it allocated e.g. 32
TX-queue for XDP xmits or even just same amount as RX-queues (I think
XDP_TX and XDP_REDIRECT can share these TX-queues dedicated to XDP).
When in fallback mode a lock need to be taken (sharded across CPUs),
but ndo_xdp_xmit will bulk up-to 16 packets, so it should not matter
too much.

I do think ixgbe should output a dmesg log message, to say it is in XDP
fallback mode with X number of TX-queues.  For us QA usually collect
the dmesg output after a test run.

   
> > > 
> > > In that case igb would say yes, ixgbe would say no and prog would be
> > > rejected.  
> > 
> > I think the driver should load even if it can't meet the queue per
> > core quota. Refusing to load at all or just dropping packets on the
> > floor is not very friendly. I think we agree on that point.  
> 
> Agreed on that. But it needs some work. I can dabble on that a bit.
> 

I will really appreciate if Intel can fix this in the ixgbe driver, and
implement a fallback method.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-09 11:52                     ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-12-09 15:41                       ` David Ahern
  -1 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-09 15:41 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Maciej Fijalkowski
  Cc: John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
>>> still load and either share queues across multiple cores or restirct
>>> down to a subset of CPUs.  
>>
>> And that's the missing piece of logic, I suppose.
>>
>>> Do you need 192 cores for a 10gbps nic, probably not.  
>>
>> Let's hear from Jesper :p
> 
> LOL - of-cause you don't need 192 cores.  With XDP I will claim that
> you only need 2 cores (with high GHz) to forward 10gbps wirespeed small
> packets.

You don't need 192 for 10G on Rx. However, if you are using XDP_REDIRECT
from VM tap devices the next device (presumably the host NIC) does need
to be able to handle the redirect.

My personal experience with this one is mlx5/ConnectX4-LX with a limit
of 63 queues and a server with 96 logical cpus. If the vhost thread for
the tap device runs on a cpu that does not have an XDP TX Queue, the
packet is dropped. This is a really bizarre case to debug as some
packets go out fine while others are dropped.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-09 15:41                       ` David Ahern
  0 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-09 15:41 UTC (permalink / raw)
  To: intel-wired-lan

On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
>>> still load and either share queues across multiple cores or restirct
>>> down to a subset of CPUs.  
>>
>> And that's the missing piece of logic, I suppose.
>>
>>> Do you need 192 cores for a 10gbps nic, probably not.  
>>
>> Let's hear from Jesper :p
> 
> LOL - of-cause you don't need 192 cores.  With XDP I will claim that
> you only need 2 cores (with high GHz) to forward 10gbps wirespeed small
> packets.

You don't need 192 for 10G on Rx. However, if you are using XDP_REDIRECT
from VM tap devices the next device (presumably the host NIC) does need
to be able to handle the redirect.

My personal experience with this one is mlx5/ConnectX4-LX with a limit
of 63 queues and a server with 96 logical cpus. If the vhost thread for
the tap device runs on a cpu that does not have an XDP TX Queue, the
packet is dropped. This is a really bizarre case to debug as some
packets go out fine while others are dropped.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-09 11:52                     ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-12-09 15:44                       ` David Ahern
  -1 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-09 15:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Maciej Fijalkowski
  Cc: John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
> (up-to 16) so it should not be a problem to solve this by sharing
> TX-queue and talking a lock per 16 packets.  I still recommend that,
> for fallback case,  you allocated a number a TX-queue and distribute
> this across CPUs to avoid hitting a congested lock (above measurements
> are the optimal non-congested atomic lock operation)

I have been meaning to ask you why 16 for the XDP batching? If the
netdev budget is 64, why not something higher like 32 or 64?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-09 15:44                       ` David Ahern
  0 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-09 15:44 UTC (permalink / raw)
  To: intel-wired-lan

On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
> (up-to 16) so it should not be a problem to solve this by sharing
> TX-queue and talking a lock per 16 packets.  I still recommend that,
> for fallback case,  you allocated a number a TX-queue and distribute
> this across CPUs to avoid hitting a congested lock (above measurements
> are the optimal non-congested atomic lock operation)

I have been meaning to ask you why 16 for the XDP batching? If the
netdev budget is 64, why not something higher like 32 or 64?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-09 15:41                       ` [Intel-wired-lan] " David Ahern
@ 2020-12-09 17:15                         ` Saeed Mahameed
  -1 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-09 17:15 UTC (permalink / raw)
  To: David Ahern, Jesper Dangaard Brouer, Maciej Fijalkowski
  Cc: John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Wed, 2020-12-09 at 08:41 -0700, David Ahern wrote:
> On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> > > > still load and either share queues across multiple cores or
> > > > restirct
> > > > down to a subset of CPUs.  
> > > 
> > > And that's the missing piece of logic, I suppose.
> > > 
> > > > Do you need 192 cores for a 10gbps nic, probably not.  
> > > 
> > > Let's hear from Jesper :p
> > 
> > LOL - of-cause you don't need 192 cores.  With XDP I will claim
> > that
> > you only need 2 cores (with high GHz) to forward 10gbps wirespeed
> > small
> > packets.
> 
> You don't need 192 for 10G on Rx. However, if you are using
> XDP_REDIRECT
> from VM tap devices the next device (presumably the host NIC) does
> need
> to be able to handle the redirect.
> 
> My personal experience with this one is mlx5/ConnectX4-LX with a
> limit

This limit was removed from mlx5
https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm@mellanox.com/
Note: you still need to use ehttool to increase from 64 to 128 or 96 in
your case.

> of 63 queues and a server with 96 logical cpus. If the vhost thread
> for
> the tap device runs on a cpu that does not have an XDP TX Queue, the
> packet is dropped. This is a really bizarre case to debug as some
> packets go out fine while others are dropped.

I agree, the user experience horrible.


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-09 17:15                         ` Saeed Mahameed
  0 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-09 17:15 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 2020-12-09 at 08:41 -0700, David Ahern wrote:
> On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> > > > still load and either share queues across multiple cores or
> > > > restirct
> > > > down to a subset of CPUs.  
> > > 
> > > And that's the missing piece of logic, I suppose.
> > > 
> > > > Do you need 192 cores for a 10gbps nic, probably not.  
> > > 
> > > Let's hear from Jesper :p
> > 
> > LOL - of-cause you don't need 192 cores.  With XDP I will claim
> > that
> > you only need 2 cores (with high GHz) to forward 10gbps wirespeed
> > small
> > packets.
> 
> You don't need 192 for 10G on Rx. However, if you are using
> XDP_REDIRECT
> from VM tap devices the next device (presumably the host NIC) does
> need
> to be able to handle the redirect.
> 
> My personal experience with this one is mlx5/ConnectX4-LX with a
> limit

This limit was removed from mlx5
https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm at mellanox.com/
Note: you still need to use ehttool to increase from 64 to 128 or 96 in
your case.

> of 63 queues and a server with 96 logical cpus. If the vhost thread
> for
> the tap device runs on a cpu that does not have an XDP TX Queue, the
> packet is dropped. This is a really bizarre case to debug as some
> packets go out fine while others are dropped.

I agree, the user experience horrible.


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [Intel-wired-lan] [PATCH v2 bpf 2/5] drivers/net: turn XDP properties on
  2020-12-04 10:28   ` [Intel-wired-lan] " alardam
@ 2020-12-09 19:05     ` kernel test robot
  -1 siblings, 0 replies; 120+ messages in thread
From: kernel test robot @ 2020-12-09 19:05 UTC (permalink / raw)
  To: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, daniel, netdev, davem, john.fastabend, hawk
  Cc: kbuild-all, clang-built-linux

[-- Attachment #1: Type: text/plain, Size: 4330 bytes --]

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on eceae70bdeaeb6b8ceb662983cf663ff352fbc96]

url:    https://github.com/0day-ci/linux/commits/alardam-gmail-com/New-netdev-feature-flags-for-XDP/20201204-183428
base:    eceae70bdeaeb6b8ceb662983cf663ff352fbc96
config: x86_64-randconfig-a003-20201209 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1968804ac726e7674d5de22bc2204b45857da344)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/34e23fdbb761e9296101b14dc8c523d574ce6f74
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review alardam-gmail-com/New-netdev-feature-flags-for-XDP/20201204-183428
        git checkout 34e23fdbb761e9296101b14dc8c523d574ce6f74
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/net/ethernet/intel/ice/ice_main.c:2984:2: error: implicit declaration of function 'xsk_set_zc_properties' [-Werror,-Wimplicit-function-declaration]
           xsk_set_zc_properties(&netdev->xdp_properties);
           ^
   drivers/net/ethernet/intel/ice/ice_main.c:2984:2: note: did you mean 'xsk_set_zc_property'?
   include/net/xdp_sock_drv.h:251:20: note: 'xsk_set_zc_property' declared here
   static inline void xsk_set_zc_property(xdp_properties_t *properties)
                      ^
   1 error generated.

vim +/xsk_set_zc_properties +2984 drivers/net/ethernet/intel/ice/ice_main.c

  2951	
  2952	/**
  2953	 * ice_cfg_netdev - Allocate, configure and register a netdev
  2954	 * @vsi: the VSI associated with the new netdev
  2955	 *
  2956	 * Returns 0 on success, negative value on failure
  2957	 */
  2958	static int ice_cfg_netdev(struct ice_vsi *vsi)
  2959	{
  2960		struct ice_pf *pf = vsi->back;
  2961		struct ice_netdev_priv *np;
  2962		struct net_device *netdev;
  2963		u8 mac_addr[ETH_ALEN];
  2964		int err;
  2965	
  2966		err = ice_devlink_create_port(vsi);
  2967		if (err)
  2968			return err;
  2969	
  2970		netdev = alloc_etherdev_mqs(sizeof(*np), vsi->alloc_txq,
  2971					    vsi->alloc_rxq);
  2972		if (!netdev) {
  2973			err = -ENOMEM;
  2974			goto err_destroy_devlink_port;
  2975		}
  2976	
  2977		vsi->netdev = netdev;
  2978		np = netdev_priv(netdev);
  2979		np->vsi = vsi;
  2980	
  2981		ice_set_netdev_features(netdev);
  2982	
  2983		xdp_set_full_properties(&netdev->xdp_properties);
> 2984		xsk_set_zc_properties(&netdev->xdp_properties);
  2985	
  2986		ice_set_ops(netdev);
  2987	
  2988		if (vsi->type == ICE_VSI_PF) {
  2989			SET_NETDEV_DEV(netdev, ice_pf_to_dev(pf));
  2990			ether_addr_copy(mac_addr, vsi->port_info->mac.perm_addr);
  2991			ether_addr_copy(netdev->dev_addr, mac_addr);
  2992			ether_addr_copy(netdev->perm_addr, mac_addr);
  2993		}
  2994	
  2995		netdev->priv_flags |= IFF_UNICAST_FLT;
  2996	
  2997		/* Setup netdev TC information */
  2998		ice_vsi_cfg_netdev_tc(vsi, vsi->tc_cfg.ena_tc);
  2999	
  3000		/* setup watchdog timeout value to be 5 second */
  3001		netdev->watchdog_timeo = 5 * HZ;
  3002	
  3003		netdev->min_mtu = ETH_MIN_MTU;
  3004		netdev->max_mtu = ICE_MAX_MTU;
  3005	
  3006		err = register_netdev(vsi->netdev);
  3007		if (err)
  3008			goto err_free_netdev;
  3009	
  3010		devlink_port_type_eth_set(&vsi->devlink_port, vsi->netdev);
  3011	
  3012		netif_carrier_off(vsi->netdev);
  3013	
  3014		/* make sure transmit queues start off as stopped */
  3015		netif_tx_stop_all_queues(vsi->netdev);
  3016	
  3017		return 0;
  3018	
  3019	err_free_netdev:
  3020		free_netdev(vsi->netdev);
  3021		vsi->netdev = NULL;
  3022	err_destroy_devlink_port:
  3023		ice_devlink_destroy_port(vsi);
  3024		return err;
  3025	}
  3026	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36972 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [Intel-wired-lan] [PATCH v2 bpf 2/5] drivers/net: turn XDP properties on
@ 2020-12-09 19:05     ` kernel test robot
  0 siblings, 0 replies; 120+ messages in thread
From: kernel test robot @ 2020-12-09 19:05 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4450 bytes --]

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on eceae70bdeaeb6b8ceb662983cf663ff352fbc96]

url:    https://github.com/0day-ci/linux/commits/alardam-gmail-com/New-netdev-feature-flags-for-XDP/20201204-183428
base:    eceae70bdeaeb6b8ceb662983cf663ff352fbc96
config: x86_64-randconfig-a003-20201209 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1968804ac726e7674d5de22bc2204b45857da344)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/34e23fdbb761e9296101b14dc8c523d574ce6f74
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review alardam-gmail-com/New-netdev-feature-flags-for-XDP/20201204-183428
        git checkout 34e23fdbb761e9296101b14dc8c523d574ce6f74
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/net/ethernet/intel/ice/ice_main.c:2984:2: error: implicit declaration of function 'xsk_set_zc_properties' [-Werror,-Wimplicit-function-declaration]
           xsk_set_zc_properties(&netdev->xdp_properties);
           ^
   drivers/net/ethernet/intel/ice/ice_main.c:2984:2: note: did you mean 'xsk_set_zc_property'?
   include/net/xdp_sock_drv.h:251:20: note: 'xsk_set_zc_property' declared here
   static inline void xsk_set_zc_property(xdp_properties_t *properties)
                      ^
   1 error generated.

vim +/xsk_set_zc_properties +2984 drivers/net/ethernet/intel/ice/ice_main.c

  2951	
  2952	/**
  2953	 * ice_cfg_netdev - Allocate, configure and register a netdev
  2954	 * @vsi: the VSI associated with the new netdev
  2955	 *
  2956	 * Returns 0 on success, negative value on failure
  2957	 */
  2958	static int ice_cfg_netdev(struct ice_vsi *vsi)
  2959	{
  2960		struct ice_pf *pf = vsi->back;
  2961		struct ice_netdev_priv *np;
  2962		struct net_device *netdev;
  2963		u8 mac_addr[ETH_ALEN];
  2964		int err;
  2965	
  2966		err = ice_devlink_create_port(vsi);
  2967		if (err)
  2968			return err;
  2969	
  2970		netdev = alloc_etherdev_mqs(sizeof(*np), vsi->alloc_txq,
  2971					    vsi->alloc_rxq);
  2972		if (!netdev) {
  2973			err = -ENOMEM;
  2974			goto err_destroy_devlink_port;
  2975		}
  2976	
  2977		vsi->netdev = netdev;
  2978		np = netdev_priv(netdev);
  2979		np->vsi = vsi;
  2980	
  2981		ice_set_netdev_features(netdev);
  2982	
  2983		xdp_set_full_properties(&netdev->xdp_properties);
> 2984		xsk_set_zc_properties(&netdev->xdp_properties);
  2985	
  2986		ice_set_ops(netdev);
  2987	
  2988		if (vsi->type == ICE_VSI_PF) {
  2989			SET_NETDEV_DEV(netdev, ice_pf_to_dev(pf));
  2990			ether_addr_copy(mac_addr, vsi->port_info->mac.perm_addr);
  2991			ether_addr_copy(netdev->dev_addr, mac_addr);
  2992			ether_addr_copy(netdev->perm_addr, mac_addr);
  2993		}
  2994	
  2995		netdev->priv_flags |= IFF_UNICAST_FLT;
  2996	
  2997		/* Setup netdev TC information */
  2998		ice_vsi_cfg_netdev_tc(vsi, vsi->tc_cfg.ena_tc);
  2999	
  3000		/* setup watchdog timeout value to be 5 second */
  3001		netdev->watchdog_timeo = 5 * HZ;
  3002	
  3003		netdev->min_mtu = ETH_MIN_MTU;
  3004		netdev->max_mtu = ICE_MAX_MTU;
  3005	
  3006		err = register_netdev(vsi->netdev);
  3007		if (err)
  3008			goto err_free_netdev;
  3009	
  3010		devlink_port_type_eth_set(&vsi->devlink_port, vsi->netdev);
  3011	
  3012		netif_carrier_off(vsi->netdev);
  3013	
  3014		/* make sure transmit queues start off as stopped */
  3015		netif_tx_stop_all_queues(vsi->netdev);
  3016	
  3017		return 0;
  3018	
  3019	err_free_netdev:
  3020		free_netdev(vsi->netdev);
  3021		vsi->netdev = NULL;
  3022	err_destroy_devlink_port:
  3023		ice_devlink_destroy_port(vsi);
  3024		return err;
  3025	}
  3026	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36972 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-09 17:15                         ` [Intel-wired-lan] " Saeed Mahameed
@ 2020-12-10  3:34                           ` David Ahern
  -1 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-10  3:34 UTC (permalink / raw)
  To: Saeed Mahameed, Jesper Dangaard Brouer, Maciej Fijalkowski
  Cc: John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On 12/9/20 10:15 AM, Saeed Mahameed wrote:
>> My personal experience with this one is mlx5/ConnectX4-LX with a
>> limit
> 
> This limit was removed from mlx5
> https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm@mellanox.com/
> Note: you still need to use ehttool to increase from 64 to 128 or 96 in
> your case.
> 

I asked you about that commit back in May:

https://lore.kernel.org/netdev/198081c2-cb0d-e1d5-901c-446b63c36706@gmail.com/

As noted in the thread, it did not work for me.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-10  3:34                           ` David Ahern
  0 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-10  3:34 UTC (permalink / raw)
  To: intel-wired-lan

On 12/9/20 10:15 AM, Saeed Mahameed wrote:
>> My personal experience with this one is mlx5/ConnectX4-LX with a
>> limit
> 
> This limit was removed from mlx5
> https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm at mellanox.com/
> Note: you still need to use ehttool to increase from 64 to 128 or 96 in
> your case.
> 

I asked you about that commit back in May:

https://lore.kernel.org/netdev/198081c2-cb0d-e1d5-901c-446b63c36706 at gmail.com/

As noted in the thread, it did not work for me.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-10  3:34                           ` [Intel-wired-lan] " David Ahern
@ 2020-12-10  6:48                             ` Saeed Mahameed
  -1 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-10  6:48 UTC (permalink / raw)
  To: David Ahern, Jesper Dangaard Brouer, Maciej Fijalkowski
  Cc: John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Wed, 2020-12-09 at 20:34 -0700, David Ahern wrote:
> On 12/9/20 10:15 AM, Saeed Mahameed wrote:
> > > My personal experience with this one is mlx5/ConnectX4-LX with a
> > > limit
> > 
> > This limit was removed from mlx5
> > https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm@mellanox.com/
> > Note: you still need to use ehttool to increase from 64 to 128 or
> > 96 in
> > your case.
> > 
> 
> I asked you about that commit back in May:
> 

:/, sorry i missed this email, must have been the mlnx nvidia email
transition.

> https://lore.kernel.org/netdev/198081c2-cb0d-e1d5-901c-446b63c36706@gmail.com/
> 
> As noted in the thread, it did not work for me.

Still relevant ? I might need to get you some tools to increase #msix
in Firmware.





^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-10  6:48                             ` Saeed Mahameed
  0 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-10  6:48 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 2020-12-09 at 20:34 -0700, David Ahern wrote:
> On 12/9/20 10:15 AM, Saeed Mahameed wrote:
> > > My personal experience with this one is mlx5/ConnectX4-LX with a
> > > limit
> > 
> > This limit was removed from mlx5
> > https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm at mellanox.com/
> > Note: you still need to use ehttool to increase from 64 to 128 or
> > 96 in
> > your case.
> > 
> 
> I asked you about that commit back in May:
> 

:/, sorry i missed this email, must have been the mlnx nvidia email
transition.

> https://lore.kernel.org/netdev/198081c2-cb0d-e1d5-901c-446b63c36706 at gmail.com/
> 
> As noted in the thread, it did not work for me.

Still relevant ? I might need to get you some tools to increase #msix
in Firmware.





^ permalink raw reply	[flat|nested] 120+ messages in thread

* Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set)
  2020-12-09 15:44                       ` [Intel-wired-lan] " David Ahern
@ 2020-12-10 13:32                         ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-10 13:32 UTC (permalink / raw)
  To: David Ahern, Frey Alfredsson
  Cc: brouer, Maciej Fijalkowski, John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka, Michael S. Tsirkin

On Wed, 9 Dec 2020 08:44:33 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> > But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
> > (up-to 16) so it should not be a problem to solve this by sharing
> > TX-queue and talking a lock per 16 packets.  I still recommend that,
> > for fallback case,  you allocated a number a TX-queue and distribute
> > this across CPUs to avoid hitting a congested lock (above measurements
> > are the optimal non-congested atomic lock operation)  
> 
> I have been meaning to ask you why 16 for the XDP batching? If the
> netdev budget is 64, why not something higher like 32 or 64?

Thanks you for asking as there are multiple good reasons and
consideration for this 16 batch size.  Notice cpumap have batch size 8,
which is also an explicit choice.  And AF_XDP went in the wrong
direction IMHO and I think have 256.  I designed this to be a choice in
the map code, for the level of bulking it needs/wants.

The low level explanation is that these 8 and 16 batch sizes are
optimized towards cache sizes and Intel's Line-Fill-Buffer (prefetcher
with 10 elements).  I'm betting on that memory backing these 8 or 16
packets have higher chance to remain/being in cache, and I can prefetch
them without evicting them from cache again.  In some cases the pointer
to these packets are queued into a ptr_ring, and it is more optimal to
write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the ptr_ring.

The general explanation is my goal to do bulking without adding latency.
This is explicitly stated in my presentation[1] as of Feb 2016, slide 20.
Sure, you/we can likely make the micro-benchmarks look better by using
64 batch size, but that will introduce added latency and likely shoot
our-selves in the foot for real workloads.  With experience from
bufferbloat and real networks, we know that massive TX bulking have bad
effects.  Still XDP-redirect does massive bulking (NIC flush is after
full 64 budget) and we don't have pushback or a queue mechanism (so I
know we are already shooting ourselves in the foot) ...  Fortunately we
now have a PhD student working on queuing for XDP.

It is also important to understand that this is an adaptive bulking
scheme, which comes from NAPI.  We don't wait for packets arriving
shortly, we pickup what NIC have available, but by only taking 8 or 16
packets (instead of emptying the entire RX-queue), and then spending
some time to send them along, I'm hoping that NIC could have gotten
some more frame.  For cpumap and veth (in-some-cases) they can start to
consume packets from these batches, but NIC drivers gets XDP_XMIT_FLUSH
signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to
update their internal queue state (and BQL), and if it gets close to
full they can choose to flush/doorbell the NIC earlier.  When doing
queuing for XDP we need to expose these NIC queue states, and having 4
calls with 16 packets (64 budget) also gives us more chances to get NIC
queue state info which the NIC already touch.


[1] https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set)
@ 2020-12-10 13:32                         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-10 13:32 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 9 Dec 2020 08:44:33 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> > But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
> > (up-to 16) so it should not be a problem to solve this by sharing
> > TX-queue and talking a lock per 16 packets.  I still recommend that,
> > for fallback case,  you allocated a number a TX-queue and distribute
> > this across CPUs to avoid hitting a congested lock (above measurements
> > are the optimal non-congested atomic lock operation)  
> 
> I have been meaning to ask you why 16 for the XDP batching? If the
> netdev budget is 64, why not something higher like 32 or 64?

Thanks you for asking as there are multiple good reasons and
consideration for this 16 batch size.  Notice cpumap have batch size 8,
which is also an explicit choice.  And AF_XDP went in the wrong
direction IMHO and I think have 256.  I designed this to be a choice in
the map code, for the level of bulking it needs/wants.

The low level explanation is that these 8 and 16 batch sizes are
optimized towards cache sizes and Intel's Line-Fill-Buffer (prefetcher
with 10 elements).  I'm betting on that memory backing these 8 or 16
packets have higher chance to remain/being in cache, and I can prefetch
them without evicting them from cache again.  In some cases the pointer
to these packets are queued into a ptr_ring, and it is more optimal to
write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the ptr_ring.

The general explanation is my goal to do bulking without adding latency.
This is explicitly stated in my presentation[1] as of Feb 2016, slide 20.
Sure, you/we can likely make the micro-benchmarks look better by using
64 batch size, but that will introduce added latency and likely shoot
our-selves in the foot for real workloads.  With experience from
bufferbloat and real networks, we know that massive TX bulking have bad
effects.  Still XDP-redirect does massive bulking (NIC flush is after
full 64 budget) and we don't have pushback or a queue mechanism (so I
know we are already shooting ourselves in the foot) ...  Fortunately we
now have a PhD student working on queuing for XDP.

It is also important to understand that this is an adaptive bulking
scheme, which comes from NAPI.  We don't wait for packets arriving
shortly, we pickup what NIC have available, but by only taking 8 or 16
packets (instead of emptying the entire RX-queue), and then spending
some time to send them along, I'm hoping that NIC could have gotten
some more frame.  For cpumap and veth (in-some-cases) they can start to
consume packets from these batches, but NIC drivers gets XDP_XMIT_FLUSH
signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to
update their internal queue state (and BQL), and if it gets close to
full they can choose to flush/doorbell the NIC earlier.  When doing
queuing for XDP we need to expose these NIC queue states, and having 4
calls with 16 packets (64 budget) also gives us more chances to get NIC
queue state info which the NIC already touch.


[1] https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [Intel-wired-lan] Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set)
  2020-12-10 13:32                         ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-12-10 14:14                           ` Magnus Karlsson
  -1 siblings, 0 replies; 120+ messages in thread
From: Magnus Karlsson @ 2020-12-10 14:14 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Ahern, Frey Alfredsson, Maciej Fijalkowski,
	Andrii Nakryiko, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, Network Development,
	Toke Høiland-Jørgensen, Alexei Starovoitov,
	Marek Majtyka, Marek Majtyka, Jonathan Lemon, intel-wired-lan,
	Jakub Kicinski, bpf, Björn Töpel, David S. Miller,
	Karlsson, Magnus

On Thu, Dec 10, 2020 at 2:32 PM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 9 Dec 2020 08:44:33 -0700
> David Ahern <dsahern@gmail.com> wrote:
>
> > On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> > > But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
> > > (up-to 16) so it should not be a problem to solve this by sharing
> > > TX-queue and talking a lock per 16 packets.  I still recommend that,
> > > for fallback case,  you allocated a number a TX-queue and distribute
> > > this across CPUs to avoid hitting a congested lock (above measurements
> > > are the optimal non-congested atomic lock operation)
> >
> > I have been meaning to ask you why 16 for the XDP batching? If the
> > netdev budget is 64, why not something higher like 32 or 64?
>
> Thanks you for asking as there are multiple good reasons and
> consideration for this 16 batch size.  Notice cpumap have batch size 8,
> which is also an explicit choice.  And AF_XDP went in the wrong
> direction IMHO and I think have 256.  I designed this to be a choice in
> the map code, for the level of bulking it needs/wants.

FYI, as far as I know, there is nothing in AF_XDP that says bulking
should be 256. There is a 256 number in the i40e driver that states
the maximum number of packets to be sent within one napi_poll loop.
But this is just a maximum number and only for that driver. (In case
you wonder, that number was inherited from the original skb Tx
implementation in the driver.) The actual batch size is controlled by
the application. If it puts 1 packet in the Tx ring and calls send(),
the batch size will be 1. If it puts 128 packets in the Tx ring and
calls send(), you get a batch size of 128, and so on. It is flexible,
so you can trade-off latency with throughput in the way the
application desires. Rx batch size has also become flexible now with
the introduction of Björn's prefer_busy_poll patch set [1].

[1] https://lore.kernel.org/netdev/20201130185205.196029-1-bjorn.topel@gmail.com/

> The low level explanation is that these 8 and 16 batch sizes are
> optimized towards cache sizes and Intel's Line-Fill-Buffer (prefetcher
> with 10 elements).  I'm betting on that memory backing these 8 or 16
> packets have higher chance to remain/being in cache, and I can prefetch
> them without evicting them from cache again.  In some cases the pointer
> to these packets are queued into a ptr_ring, and it is more optimal to
> write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the ptr_ring.
>
> The general explanation is my goal to do bulking without adding latency.
> This is explicitly stated in my presentation[1] as of Feb 2016, slide 20.
> Sure, you/we can likely make the micro-benchmarks look better by using
> 64 batch size, but that will introduce added latency and likely shoot
> our-selves in the foot for real workloads.  With experience from
> bufferbloat and real networks, we know that massive TX bulking have bad
> effects.  Still XDP-redirect does massive bulking (NIC flush is after
> full 64 budget) and we don't have pushback or a queue mechanism (so I
> know we are already shooting ourselves in the foot) ...  Fortunately we
> now have a PhD student working on queuing for XDP.
>
> It is also important to understand that this is an adaptive bulking
> scheme, which comes from NAPI.  We don't wait for packets arriving
> shortly, we pickup what NIC have available, but by only taking 8 or 16
> packets (instead of emptying the entire RX-queue), and then spending
> some time to send them along, I'm hoping that NIC could have gotten
> some more frame.  For cpumap and veth (in-some-cases) they can start to
> consume packets from these batches, but NIC drivers gets XDP_XMIT_FLUSH
> signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to
> update their internal queue state (and BQL), and if it gets close to
> full they can choose to flush/doorbell the NIC earlier.  When doing
> queuing for XDP we need to expose these NIC queue states, and having 4
> calls with 16 packets (64 budget) also gives us more chances to get NIC
> queue state info which the NIC already touch.
>
>
> [1] https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set)
@ 2020-12-10 14:14                           ` Magnus Karlsson
  0 siblings, 0 replies; 120+ messages in thread
From: Magnus Karlsson @ 2020-12-10 14:14 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Dec 10, 2020 at 2:32 PM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 9 Dec 2020 08:44:33 -0700
> David Ahern <dsahern@gmail.com> wrote:
>
> > On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> > > But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
> > > (up-to 16) so it should not be a problem to solve this by sharing
> > > TX-queue and talking a lock per 16 packets.  I still recommend that,
> > > for fallback case,  you allocated a number a TX-queue and distribute
> > > this across CPUs to avoid hitting a congested lock (above measurements
> > > are the optimal non-congested atomic lock operation)
> >
> > I have been meaning to ask you why 16 for the XDP batching? If the
> > netdev budget is 64, why not something higher like 32 or 64?
>
> Thanks you for asking as there are multiple good reasons and
> consideration for this 16 batch size.  Notice cpumap have batch size 8,
> which is also an explicit choice.  And AF_XDP went in the wrong
> direction IMHO and I think have 256.  I designed this to be a choice in
> the map code, for the level of bulking it needs/wants.

FYI, as far as I know, there is nothing in AF_XDP that says bulking
should be 256. There is a 256 number in the i40e driver that states
the maximum number of packets to be sent within one napi_poll loop.
But this is just a maximum number and only for that driver. (In case
you wonder, that number was inherited from the original skb Tx
implementation in the driver.) The actual batch size is controlled by
the application. If it puts 1 packet in the Tx ring and calls send(),
the batch size will be 1. If it puts 128 packets in the Tx ring and
calls send(), you get a batch size of 128, and so on. It is flexible,
so you can trade-off latency with throughput in the way the
application desires. Rx batch size has also become flexible now with
the introduction of Bj?rn's prefer_busy_poll patch set [1].

[1] https://lore.kernel.org/netdev/20201130185205.196029-1-bjorn.topel at gmail.com/

> The low level explanation is that these 8 and 16 batch sizes are
> optimized towards cache sizes and Intel's Line-Fill-Buffer (prefetcher
> with 10 elements).  I'm betting on that memory backing these 8 or 16
> packets have higher chance to remain/being in cache, and I can prefetch
> them without evicting them from cache again.  In some cases the pointer
> to these packets are queued into a ptr_ring, and it is more optimal to
> write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the ptr_ring.
>
> The general explanation is my goal to do bulking without adding latency.
> This is explicitly stated in my presentation[1] as of Feb 2016, slide 20.
> Sure, you/we can likely make the micro-benchmarks look better by using
> 64 batch size, but that will introduce added latency and likely shoot
> our-selves in the foot for real workloads.  With experience from
> bufferbloat and real networks, we know that massive TX bulking have bad
> effects.  Still XDP-redirect does massive bulking (NIC flush is after
> full 64 budget) and we don't have pushback or a queue mechanism (so I
> know we are already shooting ourselves in the foot) ...  Fortunately we
> now have a PhD student working on queuing for XDP.
>
> It is also important to understand that this is an adaptive bulking
> scheme, which comes from NAPI.  We don't wait for packets arriving
> shortly, we pickup what NIC have available, but by only taking 8 or 16
> packets (instead of emptying the entire RX-queue), and then spending
> some time to send them along, I'm hoping that NIC could have gotten
> some more frame.  For cpumap and veth (in-some-cases) they can start to
> consume packets from these batches, but NIC drivers gets XDP_XMIT_FLUSH
> signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to
> update their internal queue state (and BQL), and if it gets close to
> full they can choose to flush/doorbell the NIC earlier.  When doing
> queuing for XDP we need to expose these NIC queue states, and having 4
> calls with 16 packets (64 budget) also gives us more chances to get NIC
> queue state info which the NIC already touch.
>
>
> [1] https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-10  6:48                             ` [Intel-wired-lan] " Saeed Mahameed
@ 2020-12-10 15:30                               ` David Ahern
  -1 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-10 15:30 UTC (permalink / raw)
  To: Saeed Mahameed, Jesper Dangaard Brouer, Maciej Fijalkowski
  Cc: John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On 12/9/20 11:48 PM, Saeed Mahameed wrote:
> On Wed, 2020-12-09 at 20:34 -0700, David Ahern wrote:
>> On 12/9/20 10:15 AM, Saeed Mahameed wrote:
>>>> My personal experience with this one is mlx5/ConnectX4-LX with a
>>>> limit
>>>
>>> This limit was removed from mlx5
>>> https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm@mellanox.com/
>>> Note: you still need to use ehttool to increase from 64 to 128 or
>>> 96 in
>>> your case.
>>>
>>
>> I asked you about that commit back in May:
>>
> 
> :/, sorry i missed this email, must have been the mlnx nvidia email
> transition.
> 
>> https://lore.kernel.org/netdev/198081c2-cb0d-e1d5-901c-446b63c36706@gmail.com/
>>
>> As noted in the thread, it did not work for me.
> 
> Still relevant ? I might need to get you some tools to increase #msix
> in Firmware.
> 

not for me at the moment, but it would be good to document what a user
needs to do - especially if it involves vendor specific tools and steps.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-10 15:30                               ` David Ahern
  0 siblings, 0 replies; 120+ messages in thread
From: David Ahern @ 2020-12-10 15:30 UTC (permalink / raw)
  To: intel-wired-lan

On 12/9/20 11:48 PM, Saeed Mahameed wrote:
> On Wed, 2020-12-09 at 20:34 -0700, David Ahern wrote:
>> On 12/9/20 10:15 AM, Saeed Mahameed wrote:
>>>> My personal experience with this one is mlx5/ConnectX4-LX with a
>>>> limit
>>>
>>> This limit was removed from mlx5
>>> https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm at mellanox.com/
>>> Note: you still need to use ehttool to increase from 64 to 128 or
>>> 96 in
>>> your case.
>>>
>>
>> I asked you about that commit back in May:
>>
> 
> :/, sorry i missed this email, must have been the mlnx nvidia email
> transition.
> 
>> https://lore.kernel.org/netdev/198081c2-cb0d-e1d5-901c-446b63c36706 at gmail.com/
>>
>> As noted in the thread, it did not work for me.
> 
> Still relevant ? I might need to get you some tools to increase #msix
> in Firmware.
> 

not for me at the moment, but it would be good to document what a user
needs to do - especially if it involves vendor specific tools and steps.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [Intel-wired-lan] Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set)
  2020-12-10 14:14                           ` Magnus Karlsson
@ 2020-12-10 17:30                             ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-10 17:30 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: David Ahern, Frey Alfredsson, Maciej Fijalkowski,
	Andrii Nakryiko, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, Network Development,
	Toke Høiland-Jørgensen, Alexei Starovoitov,
	Marek Majtyka, Marek Majtyka, Jonathan Lemon, intel-wired-lan,
	Jakub Kicinski, bpf, Björn Töpel, David S. Miller,
	Karlsson, Magnus, brouer

On Thu, 10 Dec 2020 15:14:18 +0100
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Thu, Dec 10, 2020 at 2:32 PM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Wed, 9 Dec 2020 08:44:33 -0700
> > David Ahern <dsahern@gmail.com> wrote:
> >  
> > > On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:  
> > > > But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
> > > > (up-to 16) so it should not be a problem to solve this by sharing
> > > > TX-queue and talking a lock per 16 packets.  I still recommend that,
> > > > for fallback case,  you allocated a number a TX-queue and distribute
> > > > this across CPUs to avoid hitting a congested lock (above measurements
> > > > are the optimal non-congested atomic lock operation)  
> > >
> > > I have been meaning to ask you why 16 for the XDP batching? If the
> > > netdev budget is 64, why not something higher like 32 or 64?  
> >
> > Thanks you for asking as there are multiple good reasons and
> > consideration for this 16 batch size.  Notice cpumap have batch size 8,
> > which is also an explicit choice.  And AF_XDP went in the wrong
> > direction IMHO and I think have 256.  I designed this to be a choice in
> > the map code, for the level of bulking it needs/wants.  
> 
> FYI, as far as I know, there is nothing in AF_XDP that says bulking
> should be 256. There is a 256 number in the i40e driver that states
> the maximum number of packets to be sent within one napi_poll loop.
> But this is just a maximum number and only for that driver. (In case
> you wonder, that number was inherited from the original skb Tx
> implementation in the driver.) 

Ah, that explains the issue I have on the production system that runs
the EDT-pacer[2].  I see that i40e function i40e_clean_tx_irq() ignores
napi_budget but uses it own budget, that defaults to 256.  Looks like I
can adjust this via ethtool -C tx-frames-irq.   I turned this down to
64 (32 was giving worse results, and below 16 system acted strange).

Now the issue is gone, which was that if TX-DMA completion was running
(i40e_clean_tx_irq()) on the same CPU that send packets via FQ-pacer
qdisc, then the pacing was not accurate, and was sending too bursty.

System have already tuned "net/core/dev_weight" and RX/TX-bias to
reduce bulking, as this can influence latency and the EDT-pacing
accuracy. (It is a middlebox bridging VLANs and BPF-EDT tiemstamping and
FQ-pacing packets to solve bursts overflowing switch ports).

  sudo sysctl net/core/dev_weight
  net.core.dev_weight = 1
  net.core.dev_weight_rx_bias = 32
  net.core.dev_weight_tx_bias = 1

This net.core.dev_weight_tx_bias=1 (together with dev_weight=1) cause
qdisc transmit budget to become one packet, cycling through
NET_TX_SOFTIRQ which consumes time and gives a little more pacing space
for the packets.


> The actual batch size is controlled by
> the application. If it puts 1 packet in the Tx ring and calls send(),
> the batch size will be 1. If it puts 128 packets in the Tx ring and
> calls send(), you get a batch size of 128, and so on. It is flexible,
> so you can trade-off latency with throughput in the way the
> application desires. Rx batch size has also become flexible now with
> the introduction of Björn's prefer_busy_poll patch set [1].
> 
> [1] https://lore.kernel.org/netdev/20201130185205.196029-1-bjorn.topel@gmail.com/

This looks like a cool trick, to get even more accurate packet scheduling.

I played with the tunings, and could see changed behavior with mpstat,
but ended up tuning it off again, as I could not measure a direct
correlation with the bpftrace tools[3].


> > The low level explanation is that these 8 and 16 batch sizes are
> > optimized towards cache sizes and Intel's Line-Fill-Buffer (prefetcher
> > with 10 elements).  I'm betting on that memory backing these 8 or 16
> > packets have higher chance to remain/being in cache, and I can prefetch
> > them without evicting them from cache again.  In some cases the pointer
> > to these packets are queued into a ptr_ring, and it is more optimal to
> > write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the ptr_ring.
> >
> > The general explanation is my goal to do bulking without adding latency.
> > This is explicitly stated in my presentation[1] as of Feb 2016, slide 20.
> > Sure, you/we can likely make the micro-benchmarks look better by using
> > 64 batch size, but that will introduce added latency and likely shoot
> > our-selves in the foot for real workloads.  With experience from
> > bufferbloat and real networks, we know that massive TX bulking have bad
> > effects.  Still XDP-redirect does massive bulking (NIC flush is after
> > full 64 budget) and we don't have pushback or a queue mechanism (so I
> > know we are already shooting ourselves in the foot) ...  Fortunately we
> > now have a PhD student working on queuing for XDP.
> >
> > It is also important to understand that this is an adaptive bulking
> > scheme, which comes from NAPI.  We don't wait for packets arriving
> > shortly, we pickup what NIC have available, but by only taking 8 or 16
> > packets (instead of emptying the entire RX-queue), and then spending
> > some time to send them along, I'm hoping that NIC could have gotten
> > some more frame.  For cpumap and veth (in-some-cases) they can start to
> > consume packets from these batches, but NIC drivers gets XDP_XMIT_FLUSH
> > signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to
> > update their internal queue state (and BQL), and if it gets close to
> > full they can choose to flush/doorbell the NIC earlier.  When doing
> > queuing for XDP we need to expose these NIC queue states, and having 4
> > calls with 16 packets (64 budget) also gives us more chances to get NIC
> > queue state info which the NIC already touch.
> >
> >
> > [1] https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf

[2] https://github.com/netoptimizer/bpf-examples/tree/master/traffic-pacing-edt/

[3] https://github.com/netoptimizer/bpf-examples/tree/master/traffic-pacing-edt/bpftrace


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set)
@ 2020-12-10 17:30                             ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 120+ messages in thread
From: Jesper Dangaard Brouer @ 2020-12-10 17:30 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 10 Dec 2020 15:14:18 +0100
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Thu, Dec 10, 2020 at 2:32 PM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Wed, 9 Dec 2020 08:44:33 -0700
> > David Ahern <dsahern@gmail.com> wrote:
> >  
> > > On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:  
> > > > But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
> > > > (up-to 16) so it should not be a problem to solve this by sharing
> > > > TX-queue and talking a lock per 16 packets.  I still recommend that,
> > > > for fallback case,  you allocated a number a TX-queue and distribute
> > > > this across CPUs to avoid hitting a congested lock (above measurements
> > > > are the optimal non-congested atomic lock operation)  
> > >
> > > I have been meaning to ask you why 16 for the XDP batching? If the
> > > netdev budget is 64, why not something higher like 32 or 64?  
> >
> > Thanks you for asking as there are multiple good reasons and
> > consideration for this 16 batch size.  Notice cpumap have batch size 8,
> > which is also an explicit choice.  And AF_XDP went in the wrong
> > direction IMHO and I think have 256.  I designed this to be a choice in
> > the map code, for the level of bulking it needs/wants.  
> 
> FYI, as far as I know, there is nothing in AF_XDP that says bulking
> should be 256. There is a 256 number in the i40e driver that states
> the maximum number of packets to be sent within one napi_poll loop.
> But this is just a maximum number and only for that driver. (In case
> you wonder, that number was inherited from the original skb Tx
> implementation in the driver.) 

Ah, that explains the issue I have on the production system that runs
the EDT-pacer[2].  I see that i40e function i40e_clean_tx_irq() ignores
napi_budget but uses it own budget, that defaults to 256.  Looks like I
can adjust this via ethtool -C tx-frames-irq.   I turned this down to
64 (32 was giving worse results, and below 16 system acted strange).

Now the issue is gone, which was that if TX-DMA completion was running
(i40e_clean_tx_irq()) on the same CPU that send packets via FQ-pacer
qdisc, then the pacing was not accurate, and was sending too bursty.

System have already tuned "net/core/dev_weight" and RX/TX-bias to
reduce bulking, as this can influence latency and the EDT-pacing
accuracy. (It is a middlebox bridging VLANs and BPF-EDT tiemstamping and
FQ-pacing packets to solve bursts overflowing switch ports).

  sudo sysctl net/core/dev_weight
  net.core.dev_weight = 1
  net.core.dev_weight_rx_bias = 32
  net.core.dev_weight_tx_bias = 1

This net.core.dev_weight_tx_bias=1 (together with dev_weight=1) cause
qdisc transmit budget to become one packet, cycling through
NET_TX_SOFTIRQ which consumes time and gives a little more pacing space
for the packets.


> The actual batch size is controlled by
> the application. If it puts 1 packet in the Tx ring and calls send(),
> the batch size will be 1. If it puts 128 packets in the Tx ring and
> calls send(), you get a batch size of 128, and so on. It is flexible,
> so you can trade-off latency with throughput in the way the
> application desires. Rx batch size has also become flexible now with
> the introduction of Bj?rn's prefer_busy_poll patch set [1].
> 
> [1] https://lore.kernel.org/netdev/20201130185205.196029-1-bjorn.topel at gmail.com/

This looks like a cool trick, to get even more accurate packet scheduling.

I played with the tunings, and could see changed behavior with mpstat,
but ended up tuning it off again, as I could not measure a direct
correlation with the bpftrace tools[3].


> > The low level explanation is that these 8 and 16 batch sizes are
> > optimized towards cache sizes and Intel's Line-Fill-Buffer (prefetcher
> > with 10 elements).  I'm betting on that memory backing these 8 or 16
> > packets have higher chance to remain/being in cache, and I can prefetch
> > them without evicting them from cache again.  In some cases the pointer
> > to these packets are queued into a ptr_ring, and it is more optimal to
> > write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the ptr_ring.
> >
> > The general explanation is my goal to do bulking without adding latency.
> > This is explicitly stated in my presentation[1] as of Feb 2016, slide 20.
> > Sure, you/we can likely make the micro-benchmarks look better by using
> > 64 batch size, but that will introduce added latency and likely shoot
> > our-selves in the foot for real workloads.  With experience from
> > bufferbloat and real networks, we know that massive TX bulking have bad
> > effects.  Still XDP-redirect does massive bulking (NIC flush is after
> > full 64 budget) and we don't have pushback or a queue mechanism (so I
> > know we are already shooting ourselves in the foot) ...  Fortunately we
> > now have a PhD student working on queuing for XDP.
> >
> > It is also important to understand that this is an adaptive bulking
> > scheme, which comes from NAPI.  We don't wait for packets arriving
> > shortly, we pickup what NIC have available, but by only taking 8 or 16
> > packets (instead of emptying the entire RX-queue), and then spending
> > some time to send them along, I'm hoping that NIC could have gotten
> > some more frame.  For cpumap and veth (in-some-cases) they can start to
> > consume packets from these batches, but NIC drivers gets XDP_XMIT_FLUSH
> > signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to
> > update their internal queue state (and BQL), and if it gets close to
> > full they can choose to flush/doorbell the NIC earlier.  When doing
> > queuing for XDP we need to expose these NIC queue states, and having 4
> > calls with 16 packets (64 budget) also gives us more chances to get NIC
> > queue state info which the NIC already touch.
> >
> >
> > [1] https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf

[2] https://github.com/netoptimizer/bpf-examples/tree/master/traffic-pacing-edt/

[3] https://github.com/netoptimizer/bpf-examples/tree/master/traffic-pacing-edt/bpftrace


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-10 15:30                               ` [Intel-wired-lan] " David Ahern
@ 2020-12-10 18:58                                 ` Saeed Mahameed
  -1 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-10 18:58 UTC (permalink / raw)
  To: David Ahern, Jesper Dangaard Brouer, Maciej Fijalkowski
  Cc: John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka

On Thu, 2020-12-10 at 08:30 -0700, David Ahern wrote:
> On 12/9/20 11:48 PM, Saeed Mahameed wrote:
> > On Wed, 2020-12-09 at 20:34 -0700, David Ahern wrote:
> > > On 12/9/20 10:15 AM, Saeed Mahameed wrote:
> > > > > My personal experience with this one is mlx5/ConnectX4-LX
> > > > > with a
> > > > > limit
> > > > 
> > > > This limit was removed from mlx5
> > > > https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm@mellanox.com/
> > > > Note: you still need to use ehttool to increase from 64 to 128
> > > > or
> > > > 96 in
> > > > your case.
> > > > 
> > > 
> > > I asked you about that commit back in May:
> > > 
> > 
> > :/, sorry i missed this email, must have been the mlnx nvidia email
> > transition.
> > 
> > > https://lore.kernel.org/netdev/198081c2-cb0d-e1d5-901c-446b63c36706@gmail.com/
> > > 
> > > As noted in the thread, it did not work for me.
> > 
> > Still relevant ? I might need to get you some tools to increase
> > #msix
> > in Firmware.
> > 
> 
> not for me at the moment, but it would be good to document what a
> user
> needs to do - especially if it involves vendor specific tools and
> steps.

Ack, Thanks for pointing that out, I will take action on this.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2020-12-10 18:58                                 ` Saeed Mahameed
  0 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-10 18:58 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 2020-12-10 at 08:30 -0700, David Ahern wrote:
> On 12/9/20 11:48 PM, Saeed Mahameed wrote:
> > On Wed, 2020-12-09 at 20:34 -0700, David Ahern wrote:
> > > On 12/9/20 10:15 AM, Saeed Mahameed wrote:
> > > > > My personal experience with this one is mlx5/ConnectX4-LX
> > > > > with a
> > > > > limit
> > > > 
> > > > This limit was removed from mlx5
> > > > https://patchwork.ozlabs.org/project/netdev/patch/20200107191335.12272-5-saeedm at mellanox.com/
> > > > Note: you still need to use ehttool to increase from 64 to 128
> > > > or
> > > > 96 in
> > > > your case.
> > > > 
> > > 
> > > I asked you about that commit back in May:
> > > 
> > 
> > :/, sorry i missed this email, must have been the mlnx nvidia email
> > transition.
> > 
> > > https://lore.kernel.org/netdev/198081c2-cb0d-e1d5-901c-446b63c36706 at gmail.com/
> > > 
> > > As noted in the thread, it did not work for me.
> > 
> > Still relevant ? I might need to get you some tools to increase
> > #msix
> > in Firmware.
> > 
> 
> not for me at the moment, but it would be good to document what a
> user
> needs to do - especially if it involves vendor specific tools and
> steps.

Ack, Thanks for pointing that out, I will take action on this.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set)
  2020-12-10 13:32                         ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-12-10 19:20                           ` Saeed Mahameed
  -1 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-10 19:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern, Frey Alfredsson
  Cc: Maciej Fijalkowski, John Fastabend, Daniel Borkmann,
	Toke Høiland-Jørgensen, alardam, magnus.karlsson,
	bjorn.topel, andrii.nakryiko, kuba, ast, netdev, davem, hawk,
	jonathan.lemon, bpf, jeffrey.t.kirsher, maciejromanfijalkowski,
	intel-wired-lan, Marek Majtyka, Michael S. Tsirkin

On Thu, 2020-12-10 at 14:32 +0100, Jesper Dangaard Brouer wrote:
> On Wed, 9 Dec 2020 08:44:33 -0700
> David Ahern <dsahern@gmail.com> wrote:
> 
> > On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> > > But I have redesigned the ndo_xdp_xmit call to take a bulk of
> > > packets
> > > (up-to 16) so it should not be a problem to solve this by sharing
> > > TX-queue and talking a lock per 16 packets.  I still recommend
> > > that,
> > > for fallback case,  you allocated a number a TX-queue and
> > > distribute
> > > this across CPUs to avoid hitting a congested lock (above
> > > measurements
> > > are the optimal non-congested atomic lock operation)  
> > 
> > I have been meaning to ask you why 16 for the XDP batching? If the
> > netdev budget is 64, why not something higher like 32 or 64?
> 
> Thanks you for asking as there are multiple good reasons and
> consideration for this 16 batch size.  Notice cpumap have batch size
> 8,
> which is also an explicit choice.  And AF_XDP went in the wrong
> direction IMHO and I think have 256.  I designed this to be a choice
> in
> the map code, for the level of bulking it needs/wants.
> 
> The low level explanation is that these 8 and 16 batch sizes are
> optimized towards cache sizes and Intel's Line-Fill-Buffer
> (prefetcher
> with 10 elements).  I'm betting on that memory backing these 8 or 16
> packets have higher chance to remain/being in cache, and I can
> prefetch
> them without evicting them from cache again.  In some cases the
> pointer
> to these packets are queued into a ptr_ring, and it is more optimal
> to
> write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the
> ptr_ring.
> 

I've warned people about this once or twice on the mailing list, for
example re-populating the rx ring, a common mistake is to use the napi
budget, which has the exact side effects as you are explaining here
Jesper !

these 8/16 numbers are used in more than one place in the stack, xdp,
gro, hw buffer re-population, etc..
how can we enforce such numbers and a uniform handling in all drivers?
1. have a clear documentation ? well know defines, for people to copy?

2. for XDP we must keep track on the memory backing of the xdp bulked
data as Jesper pointed out, so we always make sure whatever bulk-size
we define it always remains cache friendly, especially now where people
stated working on  multi-buff and other features that will extend the
xdp_buff and xdp_frame, do we need a selftest that maybe runs pahole to
see the those data strcutre remain within reasonable format/sizes ?



> The general explanation is my goal to do bulking without adding
> latency.
> This is explicitly stated in my presentation[1] as of Feb 2016, slide
> 20.
> Sure, you/we can likely make the micro-benchmarks look better by
> using
> 64 batch size, but that will introduce added latency and likely shoot
> our-selves in the foot for real workloads.  With experience from
> bufferbloat and real networks, we know that massive TX bulking have
> bad
> effects.  Still XDP-redirect does massive bulking (NIC flush is after
> full 64 budget) and we don't have pushback or a queue mechanism (so I
> know we are already shooting ourselves in the foot) ...  Fortunately
> we
> now have a PhD student working on queuing for XDP.
> 
> It is also important to understand that this is an adaptive bulking
> scheme, which comes from NAPI.  We don't wait for packets arriving
> shortly, we pickup what NIC have available, but by only taking 8 or
> 16
> packets (instead of emptying the entire RX-queue), and then spending
> some time to send them along, I'm hoping that NIC could have gotten
> some more frame.  For cpumap and veth (in-some-cases) they can start
> to
> consume packets from these batches, but NIC drivers gets
> XDP_XMIT_FLUSH
> signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to
> update their internal queue state (and BQL), and if it gets close to
> full they can choose to flush/doorbell the NIC earlier.  When doing
> queuing for XDP we need to expose these NIC queue states, and having
> 4
> calls with 16 packets (64 budget) also gives us more chances to get
> NIC
> queue state info which the NIC already touch.
> 
> 
> [1] 
> https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set)
@ 2020-12-10 19:20                           ` Saeed Mahameed
  0 siblings, 0 replies; 120+ messages in thread
From: Saeed Mahameed @ 2020-12-10 19:20 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 2020-12-10 at 14:32 +0100, Jesper Dangaard Brouer wrote:
> On Wed, 9 Dec 2020 08:44:33 -0700
> David Ahern <dsahern@gmail.com> wrote:
> 
> > On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote:
> > > But I have redesigned the ndo_xdp_xmit call to take a bulk of
> > > packets
> > > (up-to 16) so it should not be a problem to solve this by sharing
> > > TX-queue and talking a lock per 16 packets.  I still recommend
> > > that,
> > > for fallback case,  you allocated a number a TX-queue and
> > > distribute
> > > this across CPUs to avoid hitting a congested lock (above
> > > measurements
> > > are the optimal non-congested atomic lock operation)  
> > 
> > I have been meaning to ask you why 16 for the XDP batching? If the
> > netdev budget is 64, why not something higher like 32 or 64?
> 
> Thanks you for asking as there are multiple good reasons and
> consideration for this 16 batch size.  Notice cpumap have batch size
> 8,
> which is also an explicit choice.  And AF_XDP went in the wrong
> direction IMHO and I think have 256.  I designed this to be a choice
> in
> the map code, for the level of bulking it needs/wants.
> 
> The low level explanation is that these 8 and 16 batch sizes are
> optimized towards cache sizes and Intel's Line-Fill-Buffer
> (prefetcher
> with 10 elements).  I'm betting on that memory backing these 8 or 16
> packets have higher chance to remain/being in cache, and I can
> prefetch
> them without evicting them from cache again.  In some cases the
> pointer
> to these packets are queued into a ptr_ring, and it is more optimal
> to
> write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the
> ptr_ring.
> 

I've warned people about this once or twice on the mailing list, for
example re-populating the rx ring, a common mistake is to use the napi
budget, which has the exact side effects as you are explaining here
Jesper !

these 8/16 numbers are used in more than one place in the stack, xdp,
gro, hw buffer re-population, etc..
how can we enforce such numbers and a uniform handling in all drivers?
1. have a clear documentation ? well know defines, for people to copy?

2. for XDP we must keep track on the memory backing of the xdp bulked
data as Jesper pointed out, so we always make sure whatever bulk-size
we define it always remains cache friendly, especially now where people
stated working on  multi-buff and other features that will extend the
xdp_buff and xdp_frame, do we need a selftest that maybe runs pahole to
see the those data strcutre remain within reasonable format/sizes ?



> The general explanation is my goal to do bulking without adding
> latency.
> This is explicitly stated in my presentation[1] as of Feb 2016, slide
> 20.
> Sure, you/we can likely make the micro-benchmarks look better by
> using
> 64 batch size, but that will introduce added latency and likely shoot
> our-selves in the foot for real workloads.  With experience from
> bufferbloat and real networks, we know that massive TX bulking have
> bad
> effects.  Still XDP-redirect does massive bulking (NIC flush is after
> full 64 budget) and we don't have pushback or a queue mechanism (so I
> know we are already shooting ourselves in the foot) ...  Fortunately
> we
> now have a PhD student working on queuing for XDP.
> 
> It is also important to understand that this is an adaptive bulking
> scheme, which comes from NAPI.  We don't wait for packets arriving
> shortly, we pickup what NIC have available, but by only taking 8 or
> 16
> packets (instead of emptying the entire RX-queue), and then spending
> some time to send them along, I'm hoping that NIC could have gotten
> some more frame.  For cpumap and veth (in-some-cases) they can start
> to
> consume packets from these batches, but NIC drivers gets
> XDP_XMIT_FLUSH
> signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to
> update their internal queue state (and BQL), and if it gets close to
> full they can choose to flush/doorbell the NIC earlier.  When doing
> queuing for XDP we need to expose these NIC queue states, and having
> 4
> calls with 16 packets (64 budget) also gives us more chances to get
> NIC
> queue state info which the NIC already touch.
> 
> 
> [1] 
> https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2020-12-10 18:58                                 ` [Intel-wired-lan] " Saeed Mahameed
@ 2021-01-05 11:56                                   ` Marek Majtyka
  -1 siblings, 0 replies; 120+ messages in thread
From: Marek Majtyka @ 2021-01-05 11:56 UTC (permalink / raw)
  To: Saeed Mahameed, David Ahern, Maciej Fijalkowski, John Fastabend,
	Jesper Dangaard Brouer, Daniel Borkmann,
	Toke Høiland-Jørgensen, Maciej Fijalkowski
  Cc: Björn Töpel, Andrii Nakryiko, Jonathan Lemon,
	Alexei Starovoitov, Network Development, David S. Miller, hawk,
	bpf, intel-wired-lan, Jakub Kicinski, Karlsson, Magnus,
	jeffrey.t.kirsher

I would like to thank you for your time, comments, nitpicking as well
as encouraging.

One thing needs clarification I think, that is, that those flags
describe driver static feature sets - which are read-only. They have
nothing in common with driver runtime configuration change yet.
Runtime change of this state can be added but it needs a new variable
and it can be done later on if someone needs it.

Obviously, it is not possible to make everybody happy, especially with
XDP_BASE flags set. To be honest, this XDP_BASE definition is a
syntactic sugar for me and I can live without it. We can either remove
it completely, from
which IMO we all and other developers will suffer later on, or maybe
we can agree on these two helper set of flags: XDP_BASE (TX, ABORTED,
PASS, DROP) and XDP_LIMITED_BASE(ABORTED,PASS_DROP).
What do you think?

I am also going to add a new XDP_REDIRECT_TARGET flag and retrieving
XDP flags over rtnelink interface.

I also think that for completeness, ethtool implementation should be
kept  together with rtnelink part in order to cover both ip and
ethtool tools. Do I have your approval or disagreement? Please let me know.

Both AF_XDP_ZEROCOPY and XDP_SOCK_ZEROCOPY are good to me. I will pick
the one XDP_SOCK_ZEROCOPY unless there are protests.

I don't think that HEADROOM parameter should be passed via the flags.
It is by nature a number and an attempt to quantize with flags seems
to be an unnatural limitation for the future.

Thanks
Marek

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-01-05 11:56                                   ` Marek Majtyka
  0 siblings, 0 replies; 120+ messages in thread
From: Marek Majtyka @ 2021-01-05 11:56 UTC (permalink / raw)
  To: intel-wired-lan

I would like to thank you for your time, comments, nitpicking as well
as encouraging.

One thing needs clarification I think, that is, that those flags
describe driver static feature sets - which are read-only. They have
nothing in common with driver runtime configuration change yet.
Runtime change of this state can be added but it needs a new variable
and it can be done later on if someone needs it.

Obviously, it is not possible to make everybody happy, especially with
XDP_BASE flags set. To be honest, this XDP_BASE definition is a
syntactic sugar for me and I can live without it. We can either remove
it completely, from
which IMO we all and other developers will suffer later on, or maybe
we can agree on these two helper set of flags: XDP_BASE (TX, ABORTED,
PASS, DROP) and XDP_LIMITED_BASE(ABORTED,PASS_DROP).
What do you think?

I am also going to add a new XDP_REDIRECT_TARGET flag and retrieving
XDP flags over rtnelink interface.

I also think that for completeness, ethtool implementation should be
kept  together with rtnelink part in order to cover both ip and
ethtool tools. Do I have your approval or disagreement? Please let me know.

Both AF_XDP_ZEROCOPY and XDP_SOCK_ZEROCOPY are good to me. I will pick
the one XDP_SOCK_ZEROCOPY unless there are protests.

I don't think that HEADROOM parameter should be passed via the flags.
It is by nature a number and an attempt to quantize with flags seems
to be an unnatural limitation for the future.

Thanks
Marek

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-01-05 11:56                                   ` [Intel-wired-lan] " Marek Majtyka
@ 2021-02-01 16:16                                     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-02-01 16:16 UTC (permalink / raw)
  To: Marek Majtyka, Saeed Mahameed, David Ahern, Maciej Fijalkowski,
	John Fastabend, Jesper Dangaard Brouer, Daniel Borkmann,
	Maciej Fijalkowski
  Cc: Björn Töpel, Andrii Nakryiko, Jonathan Lemon,
	Alexei Starovoitov, Network Development, David S. Miller, hawk,
	bpf, intel-wired-lan, Jakub Kicinski, Karlsson, Magnus,
	jeffrey.t.kirsher

Marek Majtyka <alardam@gmail.com> writes:

> I would like to thank you for your time, comments, nitpicking as well
> as encouraging.
>
> One thing needs clarification I think, that is, that those flags
> describe driver static feature sets - which are read-only. They have
> nothing in common with driver runtime configuration change yet.
> Runtime change of this state can be added but it needs a new variable
> and it can be done later on if someone needs it.
>
> Obviously, it is not possible to make everybody happy, especially with
> XDP_BASE flags set. To be honest, this XDP_BASE definition is a
> syntactic sugar for me and I can live without it. We can either remove
> it completely, from
> which IMO we all and other developers will suffer later on, or maybe
> we can agree on these two helper set of flags: XDP_BASE (TX, ABORTED,
> PASS, DROP) and XDP_LIMITED_BASE(ABORTED,PASS_DROP).
> What do you think?
>
> I am also going to add a new XDP_REDIRECT_TARGET flag and retrieving
> XDP flags over rtnelink interface.
>
> I also think that for completeness, ethtool implementation should be
> kept  together with rtnelink part in order to cover both ip and
> ethtool tools. Do I have your approval or disagreement? Please let me
> know.

Hi Marek

I just realised that it seems no one actually replied to your email. On
my part at least that was because I didn't have any objections, so I'm
hoping you didn't feel the lack of response was discouraging (and that
you're still working on a revision of this series)? :)

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-01 16:16                                     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-02-01 16:16 UTC (permalink / raw)
  To: intel-wired-lan

Marek Majtyka <alardam@gmail.com> writes:

> I would like to thank you for your time, comments, nitpicking as well
> as encouraging.
>
> One thing needs clarification I think, that is, that those flags
> describe driver static feature sets - which are read-only. They have
> nothing in common with driver runtime configuration change yet.
> Runtime change of this state can be added but it needs a new variable
> and it can be done later on if someone needs it.
>
> Obviously, it is not possible to make everybody happy, especially with
> XDP_BASE flags set. To be honest, this XDP_BASE definition is a
> syntactic sugar for me and I can live without it. We can either remove
> it completely, from
> which IMO we all and other developers will suffer later on, or maybe
> we can agree on these two helper set of flags: XDP_BASE (TX, ABORTED,
> PASS, DROP) and XDP_LIMITED_BASE(ABORTED,PASS_DROP).
> What do you think?
>
> I am also going to add a new XDP_REDIRECT_TARGET flag and retrieving
> XDP flags over rtnelink interface.
>
> I also think that for completeness, ethtool implementation should be
> kept  together with rtnelink part in order to cover both ip and
> ethtool tools. Do I have your approval or disagreement? Please let me
> know.

Hi Marek

I just realised that it seems no one actually replied to your email. On
my part at least that was because I didn't have any objections, so I'm
hoping you didn't feel the lack of response was discouraging (and that
you're still working on a revision of this series)? :)

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-01 16:16                                     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
@ 2021-02-02 11:26                                       ` Marek Majtyka
  -1 siblings, 0 replies; 120+ messages in thread
From: Marek Majtyka @ 2021-02-02 11:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Saeed Mahameed, David Ahern, Maciej Fijalkowski, John Fastabend,
	Jesper Dangaard Brouer, Daniel Borkmann, Maciej Fijalkowski,
	Björn Töpel, Andrii Nakryiko, Jonathan Lemon,
	Alexei Starovoitov, Network Development, David S. Miller, hawk,
	bpf, intel-wired-lan, Jakub Kicinski, Karlsson, Magnus,
	jeffrey.t.kirsher

Thanks Toke,

In fact, I was waiting for a single confirmation, disagreement or
comment. I have it now. As there are no more comments, I am getting
down to work right away.

Marek




On Mon, Feb 1, 2021 at 5:16 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Marek Majtyka <alardam@gmail.com> writes:
>
> > I would like to thank you for your time, comments, nitpicking as well
> > as encouraging.
> >
> > One thing needs clarification I think, that is, that those flags
> > describe driver static feature sets - which are read-only. They have
> > nothing in common with driver runtime configuration change yet.
> > Runtime change of this state can be added but it needs a new variable
> > and it can be done later on if someone needs it.
> >
> > Obviously, it is not possible to make everybody happy, especially with
> > XDP_BASE flags set. To be honest, this XDP_BASE definition is a
> > syntactic sugar for me and I can live without it. We can either remove
> > it completely, from
> > which IMO we all and other developers will suffer later on, or maybe
> > we can agree on these two helper set of flags: XDP_BASE (TX, ABORTED,
> > PASS, DROP) and XDP_LIMITED_BASE(ABORTED,PASS_DROP).
> > What do you think?
> >
> > I am also going to add a new XDP_REDIRECT_TARGET flag and retrieving
> > XDP flags over rtnelink interface.
> >
> > I also think that for completeness, ethtool implementation should be
> > kept  together with rtnelink part in order to cover both ip and
> > ethtool tools. Do I have your approval or disagreement? Please let me
> > know.
>
> Hi Marek
>
> I just realised that it seems no one actually replied to your email. On
> my part at least that was because I didn't have any objections, so I'm
> hoping you didn't feel the lack of response was discouraging (and that
> you're still working on a revision of this series)? :)
>
> -Toke
>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-02 11:26                                       ` Marek Majtyka
  0 siblings, 0 replies; 120+ messages in thread
From: Marek Majtyka @ 2021-02-02 11:26 UTC (permalink / raw)
  To: intel-wired-lan

Thanks Toke,

In fact, I was waiting for a single confirmation, disagreement or
comment. I have it now. As there are no more comments, I am getting
down to work right away.

Marek




On Mon, Feb 1, 2021 at 5:16 PM Toke H?iland-J?rgensen <toke@redhat.com> wrote:
>
> Marek Majtyka <alardam@gmail.com> writes:
>
> > I would like to thank you for your time, comments, nitpicking as well
> > as encouraging.
> >
> > One thing needs clarification I think, that is, that those flags
> > describe driver static feature sets - which are read-only. They have
> > nothing in common with driver runtime configuration change yet.
> > Runtime change of this state can be added but it needs a new variable
> > and it can be done later on if someone needs it.
> >
> > Obviously, it is not possible to make everybody happy, especially with
> > XDP_BASE flags set. To be honest, this XDP_BASE definition is a
> > syntactic sugar for me and I can live without it. We can either remove
> > it completely, from
> > which IMO we all and other developers will suffer later on, or maybe
> > we can agree on these two helper set of flags: XDP_BASE (TX, ABORTED,
> > PASS, DROP) and XDP_LIMITED_BASE(ABORTED,PASS_DROP).
> > What do you think?
> >
> > I am also going to add a new XDP_REDIRECT_TARGET flag and retrieving
> > XDP flags over rtnelink interface.
> >
> > I also think that for completeness, ethtool implementation should be
> > kept  together with rtnelink part in order to cover both ip and
> > ethtool tools. Do I have your approval or disagreement? Please let me
> > know.
>
> Hi Marek
>
> I just realised that it seems no one actually replied to your email. On
> my part at least that was because I didn't have any objections, so I'm
> hoping you didn't feel the lack of response was discouraging (and that
> you're still working on a revision of this series)? :)
>
> -Toke
>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-02 11:26                                       ` [Intel-wired-lan] " Marek Majtyka
@ 2021-02-02 12:05                                         ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-02-02 12:05 UTC (permalink / raw)
  To: Marek Majtyka
  Cc: Saeed Mahameed, David Ahern, Maciej Fijalkowski, John Fastabend,
	Jesper Dangaard Brouer, Daniel Borkmann, Maciej Fijalkowski,
	Björn Töpel, Andrii Nakryiko, Jonathan Lemon,
	Alexei Starovoitov, Network Development, David S. Miller, hawk,
	bpf, intel-wired-lan, Jakub Kicinski, Karlsson, Magnus,
	jeffrey.t.kirsher

Marek Majtyka <alardam@gmail.com> writes:

> Thanks Toke,
>
> In fact, I was waiting for a single confirmation, disagreement or
> comment. I have it now. As there are no more comments, I am getting
> down to work right away.

Awesome! And sorry for not replying straight away - I hate it when I
send out something myself and receive no replies, so I suppose I should
get better at not doing that myself :)

As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
just realised I didn't reply to), I am fine with defining XDP_BASE as a
shortcut for TX/ABORTED/PASS/DROP, but think we should skip
XDP_LIMITED_BASE and instead require all new drivers to implement the
full XDP_BASE set straight away. As long as we're talking about
features *implemented* by the driver, at least; i.e., it should still be
possible to *deactivate* XDP_TX if you don't want to use the HW
resources, but I don't think there's much benefit from defining the
LIMITED_BASE set as a shortcut for this mode...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-02 12:05                                         ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-02-02 12:05 UTC (permalink / raw)
  To: intel-wired-lan

Marek Majtyka <alardam@gmail.com> writes:

> Thanks Toke,
>
> In fact, I was waiting for a single confirmation, disagreement or
> comment. I have it now. As there are no more comments, I am getting
> down to work right away.

Awesome! And sorry for not replying straight away - I hate it when I
send out something myself and receive no replies, so I suppose I should
get better at not doing that myself :)

As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
just realised I didn't reply to), I am fine with defining XDP_BASE as a
shortcut for TX/ABORTED/PASS/DROP, but think we should skip
XDP_LIMITED_BASE and instead require all new drivers to implement the
full XDP_BASE set straight away. As long as we're talking about
features *implemented* by the driver, at least; i.e., it should still be
possible to *deactivate* XDP_TX if you don't want to use the HW
resources, but I don't think there's much benefit from defining the
LIMITED_BASE set as a shortcut for this mode...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-02 12:05                                         ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
@ 2021-02-02 19:34                                           ` Jakub Kicinski
  -1 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2021-02-02 19:34 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Marek Majtyka, Saeed Mahameed, David Ahern, Maciej Fijalkowski,
	John Fastabend, Jesper Dangaard Brouer, Daniel Borkmann,
	Maciej Fijalkowski, Björn Töpel, Andrii Nakryiko,
	Jonathan Lemon, Alexei Starovoitov, Network Development,
	David S. Miller, hawk, bpf, intel-wired-lan, Karlsson, Magnus,
	jeffrey.t.kirsher

On Tue, 02 Feb 2021 13:05:34 +0100 Toke Høiland-Jørgensen wrote:
> Marek Majtyka <alardam@gmail.com> writes:
> 
> > Thanks Toke,
> >
> > In fact, I was waiting for a single confirmation, disagreement or
> > comment. I have it now. As there are no more comments, I am getting
> > down to work right away.  
> 
> Awesome! And sorry for not replying straight away - I hate it when I
> send out something myself and receive no replies, so I suppose I should
> get better at not doing that myself :)
> 
> As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
> just realised I didn't reply to), I am fine with defining XDP_BASE as a
> shortcut for TX/ABORTED/PASS/DROP, but think we should skip
> XDP_LIMITED_BASE and instead require all new drivers to implement the
> full XDP_BASE set straight away. As long as we're talking about
> features *implemented* by the driver, at least; i.e., it should still be
> possible to *deactivate* XDP_TX if you don't want to use the HW
> resources, but I don't think there's much benefit from defining the
> LIMITED_BASE set as a shortcut for this mode...

I still have mixed feelings about these flags. The first step IMO
should be adding validation tests. I bet^W pray every vendor has
validation tests but since they are not unified we don't know what
level of interoperability we're achieving in practice. That doesn't
matter for trivial feature like base actions, but we'll inevitably 
move on to defining more advanced capabilities and the question of
"what supporting X actually mean" will come up (3 years later, when
we don't remember ourselves).

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-02 19:34                                           ` Jakub Kicinski
  0 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2021-02-02 19:34 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, 02 Feb 2021 13:05:34 +0100 Toke H?iland-J?rgensen wrote:
> Marek Majtyka <alardam@gmail.com> writes:
> 
> > Thanks Toke,
> >
> > In fact, I was waiting for a single confirmation, disagreement or
> > comment. I have it now. As there are no more comments, I am getting
> > down to work right away.  
> 
> Awesome! And sorry for not replying straight away - I hate it when I
> send out something myself and receive no replies, so I suppose I should
> get better at not doing that myself :)
> 
> As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
> just realised I didn't reply to), I am fine with defining XDP_BASE as a
> shortcut for TX/ABORTED/PASS/DROP, but think we should skip
> XDP_LIMITED_BASE and instead require all new drivers to implement the
> full XDP_BASE set straight away. As long as we're talking about
> features *implemented* by the driver, at least; i.e., it should still be
> possible to *deactivate* XDP_TX if you don't want to use the HW
> resources, but I don't think there's much benefit from defining the
> LIMITED_BASE set as a shortcut for this mode...

I still have mixed feelings about these flags. The first step IMO
should be adding validation tests. I bet^W pray every vendor has
validation tests but since they are not unified we don't know what
level of interoperability we're achieving in practice. That doesn't
matter for trivial feature like base actions, but we'll inevitably 
move on to defining more advanced capabilities and the question of
"what supporting X actually mean" will come up (3 years later, when
we don't remember ourselves).

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-02 19:34                                           ` [Intel-wired-lan] " Jakub Kicinski
@ 2021-02-03 12:50                                             ` Marek Majtyka
  -1 siblings, 0 replies; 120+ messages in thread
From: Marek Majtyka @ 2021-02-03 12:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toke Høiland-Jørgensen, Saeed Mahameed, David Ahern,
	Maciej Fijalkowski, John Fastabend, Jesper Dangaard Brouer,
	Daniel Borkmann, Maciej Fijalkowski, Björn Töpel,
	Andrii Nakryiko, Jonathan Lemon, Alexei Starovoitov,
	Network Development, David S. Miller, hawk, bpf, intel-wired-lan,
	Karlsson, Magnus, jeffrey.t.kirsher

On Tue, Feb 2, 2021 at 8:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 02 Feb 2021 13:05:34 +0100 Toke Høiland-Jørgensen wrote:
> > Marek Majtyka <alardam@gmail.com> writes:
> >
> > > Thanks Toke,
> > >
> > > In fact, I was waiting for a single confirmation, disagreement or
> > > comment. I have it now. As there are no more comments, I am getting
> > > down to work right away.
> >
> > Awesome! And sorry for not replying straight away - I hate it when I
> > send out something myself and receive no replies, so I suppose I should
> > get better at not doing that myself :)
> >
> > As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
> > just realised I didn't reply to), I am fine with defining XDP_BASE as a
> > shortcut for TX/ABORTED/PASS/DROP, but think we should skip
> > XDP_LIMITED_BASE and instead require all new drivers to implement the
> > full XDP_BASE set straight away. As long as we're talking about
> > features *implemented* by the driver, at least; i.e., it should still be
> > possible to *deactivate* XDP_TX if you don't want to use the HW
> > resources, but I don't think there's much benefit from defining the
> > LIMITED_BASE set as a shortcut for this mode...
>
> I still have mixed feelings about these flags. The first step IMO
> should be adding validation tests. I bet^W pray every vendor has
> validation tests but since they are not unified we don't know what
> level of interoperability we're achieving in practice. That doesn't
> matter for trivial feature like base actions, but we'll inevitably
> move on to defining more advanced capabilities and the question of
> "what supporting X actually mean" will come up (3 years later, when
> we don't remember ourselves).

I am a bit confused now. Did you mean validation tests of those XDP
flags, which I am working on or some other validation tests?
What should these tests verify? Can you please elaborate more on the
topic, please - just a few sentences how are you see it?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-03 12:50                                             ` Marek Majtyka
  0 siblings, 0 replies; 120+ messages in thread
From: Marek Majtyka @ 2021-02-03 12:50 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Feb 2, 2021 at 8:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 02 Feb 2021 13:05:34 +0100 Toke H?iland-J?rgensen wrote:
> > Marek Majtyka <alardam@gmail.com> writes:
> >
> > > Thanks Toke,
> > >
> > > In fact, I was waiting for a single confirmation, disagreement or
> > > comment. I have it now. As there are no more comments, I am getting
> > > down to work right away.
> >
> > Awesome! And sorry for not replying straight away - I hate it when I
> > send out something myself and receive no replies, so I suppose I should
> > get better at not doing that myself :)
> >
> > As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
> > just realised I didn't reply to), I am fine with defining XDP_BASE as a
> > shortcut for TX/ABORTED/PASS/DROP, but think we should skip
> > XDP_LIMITED_BASE and instead require all new drivers to implement the
> > full XDP_BASE set straight away. As long as we're talking about
> > features *implemented* by the driver, at least; i.e., it should still be
> > possible to *deactivate* XDP_TX if you don't want to use the HW
> > resources, but I don't think there's much benefit from defining the
> > LIMITED_BASE set as a shortcut for this mode...
>
> I still have mixed feelings about these flags. The first step IMO
> should be adding validation tests. I bet^W pray every vendor has
> validation tests but since they are not unified we don't know what
> level of interoperability we're achieving in practice. That doesn't
> matter for trivial feature like base actions, but we'll inevitably
> move on to defining more advanced capabilities and the question of
> "what supporting X actually mean" will come up (3 years later, when
> we don't remember ourselves).

I am a bit confused now. Did you mean validation tests of those XDP
flags, which I am working on or some other validation tests?
What should these tests verify? Can you please elaborate more on the
topic, please - just a few sentences how are you see it?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-03 12:50                                             ` [Intel-wired-lan] " Marek Majtyka
@ 2021-02-03 17:02                                               ` Jakub Kicinski
  -1 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2021-02-03 17:02 UTC (permalink / raw)
  To: Marek Majtyka
  Cc: Toke Høiland-Jørgensen, Saeed Mahameed, David Ahern,
	Maciej Fijalkowski, John Fastabend, Jesper Dangaard Brouer,
	Daniel Borkmann, Maciej Fijalkowski, Björn Töpel,
	Andrii Nakryiko, Jonathan Lemon, Alexei Starovoitov,
	Network Development, David S. Miller, hawk, bpf, intel-wired-lan,
	Karlsson, Magnus, jeffrey.t.kirsher

On Wed, 3 Feb 2021 13:50:59 +0100 Marek Majtyka wrote:
> On Tue, Feb 2, 2021 at 8:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Tue, 02 Feb 2021 13:05:34 +0100 Toke Høiland-Jørgensen wrote:  
> > > Awesome! And sorry for not replying straight away - I hate it when I
> > > send out something myself and receive no replies, so I suppose I should
> > > get better at not doing that myself :)
> > >
> > > As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
> > > just realised I didn't reply to), I am fine with defining XDP_BASE as a
> > > shortcut for TX/ABORTED/PASS/DROP, but think we should skip
> > > XDP_LIMITED_BASE and instead require all new drivers to implement the
> > > full XDP_BASE set straight away. As long as we're talking about
> > > features *implemented* by the driver, at least; i.e., it should still be
> > > possible to *deactivate* XDP_TX if you don't want to use the HW
> > > resources, but I don't think there's much benefit from defining the
> > > LIMITED_BASE set as a shortcut for this mode...  
> >
> > I still have mixed feelings about these flags. The first step IMO
> > should be adding validation tests. I bet^W pray every vendor has
> > validation tests but since they are not unified we don't know what
> > level of interoperability we're achieving in practice. That doesn't
> > matter for trivial feature like base actions, but we'll inevitably
> > move on to defining more advanced capabilities and the question of
> > "what supporting X actually mean" will come up (3 years later, when
> > we don't remember ourselves).  
> 
> I am a bit confused now. Did you mean validation tests of those XDP
> flags, which I am working on or some other validation tests?
> What should these tests verify? Can you please elaborate more on the
> topic, please - just a few sentences how are you see it?

Conformance tests can be written for all features, whether they have 
an explicit capability in the uAPI or not. But for those that do IMO
the tests should be required.

Let me give you an example. This set adds a bit that says Intel NICs 
can do XDP_TX and XDP_REDIRECT, yet we both know of the Tx queue
shenanigans. So can i40e do XDP_REDIRECT or can it not?

If we have exhaustive conformance tests we can confidently answer that
question. And the answer may not be "yes" or "no", it may actually be
"we need more options because many implementations fall in between".

I think readable (IOW not written in some insane DSL) tests can also 
be useful for users who want to check which features their program /
deployment will require.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-03 17:02                                               ` Jakub Kicinski
  0 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2021-02-03 17:02 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 3 Feb 2021 13:50:59 +0100 Marek Majtyka wrote:
> On Tue, Feb 2, 2021 at 8:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Tue, 02 Feb 2021 13:05:34 +0100 Toke H?iland-J?rgensen wrote:  
> > > Awesome! And sorry for not replying straight away - I hate it when I
> > > send out something myself and receive no replies, so I suppose I should
> > > get better at not doing that myself :)
> > >
> > > As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
> > > just realised I didn't reply to), I am fine with defining XDP_BASE as a
> > > shortcut for TX/ABORTED/PASS/DROP, but think we should skip
> > > XDP_LIMITED_BASE and instead require all new drivers to implement the
> > > full XDP_BASE set straight away. As long as we're talking about
> > > features *implemented* by the driver, at least; i.e., it should still be
> > > possible to *deactivate* XDP_TX if you don't want to use the HW
> > > resources, but I don't think there's much benefit from defining the
> > > LIMITED_BASE set as a shortcut for this mode...  
> >
> > I still have mixed feelings about these flags. The first step IMO
> > should be adding validation tests. I bet^W pray every vendor has
> > validation tests but since they are not unified we don't know what
> > level of interoperability we're achieving in practice. That doesn't
> > matter for trivial feature like base actions, but we'll inevitably
> > move on to defining more advanced capabilities and the question of
> > "what supporting X actually mean" will come up (3 years later, when
> > we don't remember ourselves).  
> 
> I am a bit confused now. Did you mean validation tests of those XDP
> flags, which I am working on or some other validation tests?
> What should these tests verify? Can you please elaborate more on the
> topic, please - just a few sentences how are you see it?

Conformance tests can be written for all features, whether they have 
an explicit capability in the uAPI or not. But for those that do IMO
the tests should be required.

Let me give you an example. This set adds a bit that says Intel NICs 
can do XDP_TX and XDP_REDIRECT, yet we both know of the Tx queue
shenanigans. So can i40e do XDP_REDIRECT or can it not?

If we have exhaustive conformance tests we can confidently answer that
question. And the answer may not be "yes" or "no", it may actually be
"we need more options because many implementations fall in between".

I think readable (IOW not written in some insane DSL) tests can also 
be useful for users who want to check which features their program /
deployment will require.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-03 17:02                                               ` [Intel-wired-lan] " Jakub Kicinski
@ 2021-02-10 10:53                                                 ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-02-10 10:53 UTC (permalink / raw)
  To: Jakub Kicinski, Marek Majtyka
  Cc: Saeed Mahameed, David Ahern, Maciej Fijalkowski, John Fastabend,
	Jesper Dangaard Brouer, Daniel Borkmann, Maciej Fijalkowski,
	Björn Töpel, Andrii Nakryiko, Jonathan Lemon,
	Alexei Starovoitov, Network Development, David S. Miller, hawk,
	bpf, intel-wired-lan, Karlsson, Magnus, jeffrey.t.kirsher

Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 3 Feb 2021 13:50:59 +0100 Marek Majtyka wrote:
>> On Tue, Feb 2, 2021 at 8:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> > On Tue, 02 Feb 2021 13:05:34 +0100 Toke Høiland-Jørgensen wrote:  
>> > > Awesome! And sorry for not replying straight away - I hate it when I
>> > > send out something myself and receive no replies, so I suppose I should
>> > > get better at not doing that myself :)
>> > >
>> > > As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
>> > > just realised I didn't reply to), I am fine with defining XDP_BASE as a
>> > > shortcut for TX/ABORTED/PASS/DROP, but think we should skip
>> > > XDP_LIMITED_BASE and instead require all new drivers to implement the
>> > > full XDP_BASE set straight away. As long as we're talking about
>> > > features *implemented* by the driver, at least; i.e., it should still be
>> > > possible to *deactivate* XDP_TX if you don't want to use the HW
>> > > resources, but I don't think there's much benefit from defining the
>> > > LIMITED_BASE set as a shortcut for this mode...  
>> >
>> > I still have mixed feelings about these flags. The first step IMO
>> > should be adding validation tests. I bet^W pray every vendor has
>> > validation tests but since they are not unified we don't know what
>> > level of interoperability we're achieving in practice. That doesn't
>> > matter for trivial feature like base actions, but we'll inevitably
>> > move on to defining more advanced capabilities and the question of
>> > "what supporting X actually mean" will come up (3 years later, when
>> > we don't remember ourselves).  
>> 
>> I am a bit confused now. Did you mean validation tests of those XDP
>> flags, which I am working on or some other validation tests?
>> What should these tests verify? Can you please elaborate more on the
>> topic, please - just a few sentences how are you see it?
>
> Conformance tests can be written for all features, whether they have 
> an explicit capability in the uAPI or not. But for those that do IMO
> the tests should be required.
>
> Let me give you an example. This set adds a bit that says Intel NICs 
> can do XDP_TX and XDP_REDIRECT, yet we both know of the Tx queue
> shenanigans. So can i40e do XDP_REDIRECT or can it not?
>
> If we have exhaustive conformance tests we can confidently answer that
> question. And the answer may not be "yes" or "no", it may actually be
> "we need more options because many implementations fall in between".
>
> I think readable (IOW not written in some insane DSL) tests can also 
> be useful for users who want to check which features their program /
> deployment will require.

While I do agree that that kind of conformance test would be great, I
don't think it has to hold up this series (the perfect being the enemy
of the good, and all that). We have a real problem today that userspace
can't tell if a given driver implements, say, XDP_REDIRECT, and so
people try to use it and spend days wondering which black hole their
packets disappear into. And for things like container migration we need
to be able to predict whether a given host supports a feature *before*
we start the migration and try to use it.

I view the feature flags as a list of features *implemented* by the
driver. Which should be pretty static in a given kernel, but may be
different than the features currently *enabled* on a given system (due
to, e.g., the TX queue stuff).

The simple way to expose the latter would be to just have a second set
of flags indicating the current configured state; and for that I guess
we should at least agree what "enabled" means; and a conformance test
would be a way to do this, of course.

I don't see why we can't do this in stages, though; start with the first
set of flags ('implemented'), move on to the second one ('enabled'), and
then to things like making the kernel react to the flags by rejecting
insertion into devmaps for invalid interfaces...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-10 10:53                                                 ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-02-10 10:53 UTC (permalink / raw)
  To: intel-wired-lan

Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 3 Feb 2021 13:50:59 +0100 Marek Majtyka wrote:
>> On Tue, Feb 2, 2021 at 8:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> > On Tue, 02 Feb 2021 13:05:34 +0100 Toke H?iland-J?rgensen wrote:  
>> > > Awesome! And sorry for not replying straight away - I hate it when I
>> > > send out something myself and receive no replies, so I suppose I should
>> > > get better at not doing that myself :)
>> > >
>> > > As for the inclusion of the XDP_BASE / XDP_LIMITED_BASE sets (which I
>> > > just realised I didn't reply to), I am fine with defining XDP_BASE as a
>> > > shortcut for TX/ABORTED/PASS/DROP, but think we should skip
>> > > XDP_LIMITED_BASE and instead require all new drivers to implement the
>> > > full XDP_BASE set straight away. As long as we're talking about
>> > > features *implemented* by the driver, at least; i.e., it should still be
>> > > possible to *deactivate* XDP_TX if you don't want to use the HW
>> > > resources, but I don't think there's much benefit from defining the
>> > > LIMITED_BASE set as a shortcut for this mode...  
>> >
>> > I still have mixed feelings about these flags. The first step IMO
>> > should be adding validation tests. I bet^W pray every vendor has
>> > validation tests but since they are not unified we don't know what
>> > level of interoperability we're achieving in practice. That doesn't
>> > matter for trivial feature like base actions, but we'll inevitably
>> > move on to defining more advanced capabilities and the question of
>> > "what supporting X actually mean" will come up (3 years later, when
>> > we don't remember ourselves).  
>> 
>> I am a bit confused now. Did you mean validation tests of those XDP
>> flags, which I am working on or some other validation tests?
>> What should these tests verify? Can you please elaborate more on the
>> topic, please - just a few sentences how are you see it?
>
> Conformance tests can be written for all features, whether they have 
> an explicit capability in the uAPI or not. But for those that do IMO
> the tests should be required.
>
> Let me give you an example. This set adds a bit that says Intel NICs 
> can do XDP_TX and XDP_REDIRECT, yet we both know of the Tx queue
> shenanigans. So can i40e do XDP_REDIRECT or can it not?
>
> If we have exhaustive conformance tests we can confidently answer that
> question. And the answer may not be "yes" or "no", it may actually be
> "we need more options because many implementations fall in between".
>
> I think readable (IOW not written in some insane DSL) tests can also 
> be useful for users who want to check which features their program /
> deployment will require.

While I do agree that that kind of conformance test would be great, I
don't think it has to hold up this series (the perfect being the enemy
of the good, and all that). We have a real problem today that userspace
can't tell if a given driver implements, say, XDP_REDIRECT, and so
people try to use it and spend days wondering which black hole their
packets disappear into. And for things like container migration we need
to be able to predict whether a given host supports a feature *before*
we start the migration and try to use it.

I view the feature flags as a list of features *implemented* by the
driver. Which should be pretty static in a given kernel, but may be
different than the features currently *enabled* on a given system (due
to, e.g., the TX queue stuff).

The simple way to expose the latter would be to just have a second set
of flags indicating the current configured state; and for that I guess
we should at least agree what "enabled" means; and a conformance test
would be a way to do this, of course.

I don't see why we can't do this in stages, though; start with the first
set of flags ('implemented'), move on to the second one ('enabled'), and
then to things like making the kernel react to the flags by rejecting
insertion into devmaps for invalid interfaces...

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-10 10:53                                                 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
@ 2021-02-10 18:31                                                   ` Jakub Kicinski
  -1 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2021-02-10 18:31 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Marek Majtyka, Saeed Mahameed, David Ahern, Maciej Fijalkowski,
	John Fastabend, Jesper Dangaard Brouer, Daniel Borkmann,
	Maciej Fijalkowski, Björn Töpel, Andrii Nakryiko,
	Jonathan Lemon, Alexei Starovoitov, Network Development,
	David S. Miller, hawk, bpf, intel-wired-lan, Karlsson, Magnus,
	jeffrey.t.kirsher

On Wed, 10 Feb 2021 11:53:53 +0100 Toke Høiland-Jørgensen wrote:
> >> I am a bit confused now. Did you mean validation tests of those XDP
> >> flags, which I am working on or some other validation tests?
> >> What should these tests verify? Can you please elaborate more on the
> >> topic, please - just a few sentences how are you see it?  
> >
> > Conformance tests can be written for all features, whether they have 
> > an explicit capability in the uAPI or not. But for those that do IMO
> > the tests should be required.
> >
> > Let me give you an example. This set adds a bit that says Intel NICs 
> > can do XDP_TX and XDP_REDIRECT, yet we both know of the Tx queue
> > shenanigans. So can i40e do XDP_REDIRECT or can it not?
> >
> > If we have exhaustive conformance tests we can confidently answer that
> > question. And the answer may not be "yes" or "no", it may actually be
> > "we need more options because many implementations fall in between".
> >
> > I think readable (IOW not written in some insane DSL) tests can also 
> > be useful for users who want to check which features their program /
> > deployment will require.  
> 
> While I do agree that that kind of conformance test would be great, I
> don't think it has to hold up this series (the perfect being the enemy
> of the good, and all that). We have a real problem today that userspace
> can't tell if a given driver implements, say, XDP_REDIRECT, and so
> people try to use it and spend days wondering which black hole their
> packets disappear into. And for things like container migration we need
> to be able to predict whether a given host supports a feature *before*
> we start the migration and try to use it.

Unless you have a strong definition of what XDP_REDIRECT means the flag
itself is not worth much. We're not talking about normal ethtool feature
flags which are primarily stack-driven, XDP is implemented mostly by
the driver, each vendor can do their own thing. Maybe I've seen one
vendor incompatibility too many at my day job to hope for the best...

> I view the feature flags as a list of features *implemented* by the
> driver. Which should be pretty static in a given kernel, but may be
> different than the features currently *enabled* on a given system (due
> to, e.g., the TX queue stuff).

Hm, maybe I'm not being clear enough. The way XDP_REDIRECT (your
example) is implemented across drivers differs in a meaningful ways. 
Hence the need for conformance testing. We don't have a golden SW
standard to fall back on, like we do with HW offloads.

Also IDK why those tests are considered such a huge ask. As I said most
vendors probably already have them, and so I'd guess do good distros.
So let's work together.

> The simple way to expose the latter would be to just have a second set
> of flags indicating the current configured state; and for that I guess
> we should at least agree what "enabled" means; and a conformance test
> would be a way to do this, of course.
> 
> I don't see why we can't do this in stages, though; start with the first
> set of flags ('implemented'), move on to the second one ('enabled'), and
> then to things like making the kernel react to the flags by rejecting
> insertion into devmaps for invalid interfaces...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-10 18:31                                                   ` Jakub Kicinski
  0 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2021-02-10 18:31 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 10 Feb 2021 11:53:53 +0100 Toke H?iland-J?rgensen wrote:
> >> I am a bit confused now. Did you mean validation tests of those XDP
> >> flags, which I am working on or some other validation tests?
> >> What should these tests verify? Can you please elaborate more on the
> >> topic, please - just a few sentences how are you see it?  
> >
> > Conformance tests can be written for all features, whether they have 
> > an explicit capability in the uAPI or not. But for those that do IMO
> > the tests should be required.
> >
> > Let me give you an example. This set adds a bit that says Intel NICs 
> > can do XDP_TX and XDP_REDIRECT, yet we both know of the Tx queue
> > shenanigans. So can i40e do XDP_REDIRECT or can it not?
> >
> > If we have exhaustive conformance tests we can confidently answer that
> > question. And the answer may not be "yes" or "no", it may actually be
> > "we need more options because many implementations fall in between".
> >
> > I think readable (IOW not written in some insane DSL) tests can also 
> > be useful for users who want to check which features their program /
> > deployment will require.  
> 
> While I do agree that that kind of conformance test would be great, I
> don't think it has to hold up this series (the perfect being the enemy
> of the good, and all that). We have a real problem today that userspace
> can't tell if a given driver implements, say, XDP_REDIRECT, and so
> people try to use it and spend days wondering which black hole their
> packets disappear into. And for things like container migration we need
> to be able to predict whether a given host supports a feature *before*
> we start the migration and try to use it.

Unless you have a strong definition of what XDP_REDIRECT means the flag
itself is not worth much. We're not talking about normal ethtool feature
flags which are primarily stack-driven, XDP is implemented mostly by
the driver, each vendor can do their own thing. Maybe I've seen one
vendor incompatibility too many at my day job to hope for the best...

> I view the feature flags as a list of features *implemented* by the
> driver. Which should be pretty static in a given kernel, but may be
> different than the features currently *enabled* on a given system (due
> to, e.g., the TX queue stuff).

Hm, maybe I'm not being clear enough. The way XDP_REDIRECT (your
example) is implemented across drivers differs in a meaningful ways. 
Hence the need for conformance testing. We don't have a golden SW
standard to fall back on, like we do with HW offloads.

Also IDK why those tests are considered such a huge ask. As I said most
vendors probably already have them, and so I'd guess do good distros.
So let's work together.

> The simple way to expose the latter would be to just have a second set
> of flags indicating the current configured state; and for that I guess
> we should at least agree what "enabled" means; and a conformance test
> would be a way to do this, of course.
> 
> I don't see why we can't do this in stages, though; start with the first
> set of flags ('implemented'), move on to the second one ('enabled'), and
> then to things like making the kernel react to the flags by rejecting
> insertion into devmaps for invalid interfaces...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-10 18:31                                                   ` [Intel-wired-lan] " Jakub Kicinski
@ 2021-02-10 22:52                                                     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-02-10 22:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Marek Majtyka, Saeed Mahameed, David Ahern, Maciej Fijalkowski,
	John Fastabend, Jesper Dangaard Brouer, Daniel Borkmann,
	Maciej Fijalkowski, Björn Töpel, Andrii Nakryiko,
	Jonathan Lemon, Alexei Starovoitov, Network Development,
	David S. Miller, hawk, bpf, intel-wired-lan, Karlsson, Magnus,
	jeffrey.t.kirsher

Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 10 Feb 2021 11:53:53 +0100 Toke Høiland-Jørgensen wrote:
>> >> I am a bit confused now. Did you mean validation tests of those XDP
>> >> flags, which I am working on or some other validation tests?
>> >> What should these tests verify? Can you please elaborate more on the
>> >> topic, please - just a few sentences how are you see it?  
>> >
>> > Conformance tests can be written for all features, whether they have 
>> > an explicit capability in the uAPI or not. But for those that do IMO
>> > the tests should be required.
>> >
>> > Let me give you an example. This set adds a bit that says Intel NICs 
>> > can do XDP_TX and XDP_REDIRECT, yet we both know of the Tx queue
>> > shenanigans. So can i40e do XDP_REDIRECT or can it not?
>> >
>> > If we have exhaustive conformance tests we can confidently answer that
>> > question. And the answer may not be "yes" or "no", it may actually be
>> > "we need more options because many implementations fall in between".
>> >
>> > I think readable (IOW not written in some insane DSL) tests can also 
>> > be useful for users who want to check which features their program /
>> > deployment will require.  
>> 
>> While I do agree that that kind of conformance test would be great, I
>> don't think it has to hold up this series (the perfect being the enemy
>> of the good, and all that). We have a real problem today that userspace
>> can't tell if a given driver implements, say, XDP_REDIRECT, and so
>> people try to use it and spend days wondering which black hole their
>> packets disappear into. And for things like container migration we need
>> to be able to predict whether a given host supports a feature *before*
>> we start the migration and try to use it.
>
> Unless you have a strong definition of what XDP_REDIRECT means the flag
> itself is not worth much. We're not talking about normal ethtool feature
> flags which are primarily stack-driven, XDP is implemented mostly by
> the driver, each vendor can do their own thing. Maybe I've seen one
> vendor incompatibility too many at my day job to hope for the best...

I'm totally on board with documenting what a feature means. E.g., for
XDP_REDIRECT, whether it's acceptable to fail the redirect in some
situations even when it's active, or if there should always be a
slow-path fallback.

But I disagree that the flag is worthless without it. People are running
into real issues with trying to run XDP_REDIRECT programs on a driver
that doesn't support it at all, and it's incredibly confusing. The
latest example popped up literally yesterday:

https://lore.kernel.org/xdp-newbies/CAM-scZPPeu44FeCPGO=Qz=03CrhhfB1GdJ8FNEpPqP_G27c6mQ@mail.gmail.com/

>> I view the feature flags as a list of features *implemented* by the
>> driver. Which should be pretty static in a given kernel, but may be
>> different than the features currently *enabled* on a given system (due
>> to, e.g., the TX queue stuff).
>
> Hm, maybe I'm not being clear enough. The way XDP_REDIRECT (your
> example) is implemented across drivers differs in a meaningful ways. 
> Hence the need for conformance testing. We don't have a golden SW
> standard to fall back on, like we do with HW offloads.

I'm not disagreeing that we need to harmonise what "implementing a
feature" means. Maybe I'm just not sure what you mean by "conformance
testing"? What would that look like, specifically? A script in selftest
that sets up a redirect between two interfaces that we tell people to
run? Or what? How would you catch, say, that issue where if a machine
has more CPUs than the NIC has TXQs things start falling apart?

> Also IDK why those tests are considered such a huge ask. As I said most
> vendors probably already have them, and so I'd guess do good distros.
> So let's work together.

I guess what I'm afraid of is that this will end up delaying or stalling
a fix for a long-standing issue (which is what I consider this series as
shown by the example above). Maybe you can alleviate that by expanding a
bit on what you mean?

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-10 22:52                                                     ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-02-10 22:52 UTC (permalink / raw)
  To: intel-wired-lan

Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 10 Feb 2021 11:53:53 +0100 Toke H?iland-J?rgensen wrote:
>> >> I am a bit confused now. Did you mean validation tests of those XDP
>> >> flags, which I am working on or some other validation tests?
>> >> What should these tests verify? Can you please elaborate more on the
>> >> topic, please - just a few sentences how are you see it?  
>> >
>> > Conformance tests can be written for all features, whether they have 
>> > an explicit capability in the uAPI or not. But for those that do IMO
>> > the tests should be required.
>> >
>> > Let me give you an example. This set adds a bit that says Intel NICs 
>> > can do XDP_TX and XDP_REDIRECT, yet we both know of the Tx queue
>> > shenanigans. So can i40e do XDP_REDIRECT or can it not?
>> >
>> > If we have exhaustive conformance tests we can confidently answer that
>> > question. And the answer may not be "yes" or "no", it may actually be
>> > "we need more options because many implementations fall in between".
>> >
>> > I think readable (IOW not written in some insane DSL) tests can also 
>> > be useful for users who want to check which features their program /
>> > deployment will require.  
>> 
>> While I do agree that that kind of conformance test would be great, I
>> don't think it has to hold up this series (the perfect being the enemy
>> of the good, and all that). We have a real problem today that userspace
>> can't tell if a given driver implements, say, XDP_REDIRECT, and so
>> people try to use it and spend days wondering which black hole their
>> packets disappear into. And for things like container migration we need
>> to be able to predict whether a given host supports a feature *before*
>> we start the migration and try to use it.
>
> Unless you have a strong definition of what XDP_REDIRECT means the flag
> itself is not worth much. We're not talking about normal ethtool feature
> flags which are primarily stack-driven, XDP is implemented mostly by
> the driver, each vendor can do their own thing. Maybe I've seen one
> vendor incompatibility too many at my day job to hope for the best...

I'm totally on board with documenting what a feature means. E.g., for
XDP_REDIRECT, whether it's acceptable to fail the redirect in some
situations even when it's active, or if there should always be a
slow-path fallback.

But I disagree that the flag is worthless without it. People are running
into real issues with trying to run XDP_REDIRECT programs on a driver
that doesn't support it at all, and it's incredibly confusing. The
latest example popped up literally yesterday:

https://lore.kernel.org/xdp-newbies/CAM-scZPPeu44FeCPGO=Qz=03CrhhfB1GdJ8FNEpPqP_G27c6mQ at mail.gmail.com/

>> I view the feature flags as a list of features *implemented* by the
>> driver. Which should be pretty static in a given kernel, but may be
>> different than the features currently *enabled* on a given system (due
>> to, e.g., the TX queue stuff).
>
> Hm, maybe I'm not being clear enough. The way XDP_REDIRECT (your
> example) is implemented across drivers differs in a meaningful ways. 
> Hence the need for conformance testing. We don't have a golden SW
> standard to fall back on, like we do with HW offloads.

I'm not disagreeing that we need to harmonise what "implementing a
feature" means. Maybe I'm just not sure what you mean by "conformance
testing"? What would that look like, specifically? A script in selftest
that sets up a redirect between two interfaces that we tell people to
run? Or what? How would you catch, say, that issue where if a machine
has more CPUs than the NIC has TXQs things start falling apart?

> Also IDK why those tests are considered such a huge ask. As I said most
> vendors probably already have them, and so I'd guess do good distros.
> So let's work together.

I guess what I'm afraid of is that this will end up delaying or stalling
a fix for a long-standing issue (which is what I consider this series as
shown by the example above). Maybe you can alleviate that by expanding a
bit on what you mean?

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-10 22:52                                                     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
@ 2021-02-12  1:26                                                       ` Jakub Kicinski
  -1 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2021-02-12  1:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Marek Majtyka, Saeed Mahameed, David Ahern, Maciej Fijalkowski,
	John Fastabend, Jesper Dangaard Brouer, Daniel Borkmann,
	Maciej Fijalkowski, Björn Töpel, Andrii Nakryiko,
	Jonathan Lemon, Alexei Starovoitov, Network Development,
	David S. Miller, hawk, bpf, intel-wired-lan, Karlsson, Magnus,
	jeffrey.t.kirsher

On Wed, 10 Feb 2021 23:52:39 +0100 Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> > On Wed, 10 Feb 2021 11:53:53 +0100 Toke Høiland-Jørgensen wrote:  
> >> While I do agree that that kind of conformance test would be great, I
> >> don't think it has to hold up this series (the perfect being the enemy
> >> of the good, and all that). We have a real problem today that userspace
> >> can't tell if a given driver implements, say, XDP_REDIRECT, and so
> >> people try to use it and spend days wondering which black hole their
> >> packets disappear into. And for things like container migration we need
> >> to be able to predict whether a given host supports a feature *before*
> >> we start the migration and try to use it.  
> >
> > Unless you have a strong definition of what XDP_REDIRECT means the flag
> > itself is not worth much. We're not talking about normal ethtool feature
> > flags which are primarily stack-driven, XDP is implemented mostly by
> > the driver, each vendor can do their own thing. Maybe I've seen one
> > vendor incompatibility too many at my day job to hope for the best...  
> 
> I'm totally on board with documenting what a feature means.

We're trying documentation in devlink etc. and it's not that great.
It's never clear and comprehensive enough, barely anyone reads it.

> E.g., for
> XDP_REDIRECT, whether it's acceptable to fail the redirect in some
> situations even when it's active, or if there should always be a
> slow-path fallback.
> 
> But I disagree that the flag is worthless without it. People are running
> into real issues with trying to run XDP_REDIRECT programs on a driver
> that doesn't support it at all, and it's incredibly confusing. The
> latest example popped up literally yesterday:
> 
> https://lore.kernel.org/xdp-newbies/CAM-scZPPeu44FeCPGO=Qz=03CrhhfB1GdJ8FNEpPqP_G27c6mQ@mail.gmail.com/

To help such confusion we'd actually have to validate the program
against the device caps. But perhaps I'm less concerned about a
newcomer not knowing how to use things and more concerned about
providing abstractions which will make programs dependably working
across vendors and HW generations.

> >> I view the feature flags as a list of features *implemented* by the
> >> driver. Which should be pretty static in a given kernel, but may be
> >> different than the features currently *enabled* on a given system (due
> >> to, e.g., the TX queue stuff).  
> >
> > Hm, maybe I'm not being clear enough. The way XDP_REDIRECT (your
> > example) is implemented across drivers differs in a meaningful ways. 
> > Hence the need for conformance testing. We don't have a golden SW
> > standard to fall back on, like we do with HW offloads.  
> 
> I'm not disagreeing that we need to harmonise what "implementing a
> feature" means. Maybe I'm just not sure what you mean by "conformance
> testing"? What would that look like, specifically? 

We developed a pretty good set of tests at my previous job for testing
driver XDP as well as checking that the offload conforms to the SW
behavior. I assume any vendor who takes quality seriously has
comprehensive XDP tests.

If those tests were upstream / common so that we could run them
against every implementation - the features which are supported by 
a driver fall out naturally out of the set of tests which passed.
And the structure of the capability API could be based on what the
tests need to know to make a SKIP vs FAIL decision.

Common tests would obviously also ease the validation burden, burden of
writing tests on vendors and make it far easier for new implementations
to be confidently submitted.

> A script in selftest that sets up a redirect between two interfaces
> that we tell people to run? Or what? How would you catch, say, that
> issue where if a machine has more CPUs than the NIC has TXQs things
> start falling apart?

selftests should be a good place, but I don't mind the location.
The point is having tests which anyone (vendors and users) can run
to test their platforms. One of the tests should indeed test if every
CPU in the platform can XDP_REDIRECT. Shouldn't it be a rather trivial
combination of tun/veth, mh and taskset?

> > Also IDK why those tests are considered such a huge ask. As I said most
> > vendors probably already have them, and so I'd guess do good distros.
> > So let's work together.  
> 
> I guess what I'm afraid of is that this will end up delaying or stalling
> a fix for a long-standing issue (which is what I consider this series as
> shown by the example above). Maybe you can alleviate that by expanding a
> bit on what you mean?

I hope what I wrote helps a little. I'm not good at explaining. 

Perhaps I had seen one too many vendor incompatibility to trust that
adding a driver API without a validation suite will result in something
usable in production settings. 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-12  1:26                                                       ` Jakub Kicinski
  0 siblings, 0 replies; 120+ messages in thread
From: Jakub Kicinski @ 2021-02-12  1:26 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 10 Feb 2021 23:52:39 +0100 Toke H?iland-J?rgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> > On Wed, 10 Feb 2021 11:53:53 +0100 Toke H?iland-J?rgensen wrote:  
> >> While I do agree that that kind of conformance test would be great, I
> >> don't think it has to hold up this series (the perfect being the enemy
> >> of the good, and all that). We have a real problem today that userspace
> >> can't tell if a given driver implements, say, XDP_REDIRECT, and so
> >> people try to use it and spend days wondering which black hole their
> >> packets disappear into. And for things like container migration we need
> >> to be able to predict whether a given host supports a feature *before*
> >> we start the migration and try to use it.  
> >
> > Unless you have a strong definition of what XDP_REDIRECT means the flag
> > itself is not worth much. We're not talking about normal ethtool feature
> > flags which are primarily stack-driven, XDP is implemented mostly by
> > the driver, each vendor can do their own thing. Maybe I've seen one
> > vendor incompatibility too many at my day job to hope for the best...  
> 
> I'm totally on board with documenting what a feature means.

We're trying documentation in devlink etc. and it's not that great.
It's never clear and comprehensive enough, barely anyone reads it.

> E.g., for
> XDP_REDIRECT, whether it's acceptable to fail the redirect in some
> situations even when it's active, or if there should always be a
> slow-path fallback.
> 
> But I disagree that the flag is worthless without it. People are running
> into real issues with trying to run XDP_REDIRECT programs on a driver
> that doesn't support it at all, and it's incredibly confusing. The
> latest example popped up literally yesterday:
> 
> https://lore.kernel.org/xdp-newbies/CAM-scZPPeu44FeCPGO=Qz=03CrhhfB1GdJ8FNEpPqP_G27c6mQ at mail.gmail.com/

To help such confusion we'd actually have to validate the program
against the device caps. But perhaps I'm less concerned about a
newcomer not knowing how to use things and more concerned about
providing abstractions which will make programs dependably working
across vendors and HW generations.

> >> I view the feature flags as a list of features *implemented* by the
> >> driver. Which should be pretty static in a given kernel, but may be
> >> different than the features currently *enabled* on a given system (due
> >> to, e.g., the TX queue stuff).  
> >
> > Hm, maybe I'm not being clear enough. The way XDP_REDIRECT (your
> > example) is implemented across drivers differs in a meaningful ways. 
> > Hence the need for conformance testing. We don't have a golden SW
> > standard to fall back on, like we do with HW offloads.  
> 
> I'm not disagreeing that we need to harmonise what "implementing a
> feature" means. Maybe I'm just not sure what you mean by "conformance
> testing"? What would that look like, specifically? 

We developed a pretty good set of tests at my previous job for testing
driver XDP as well as checking that the offload conforms to the SW
behavior. I assume any vendor who takes quality seriously has
comprehensive XDP tests.

If those tests were upstream / common so that we could run them
against every implementation - the features which are supported by 
a driver fall out naturally out of the set of tests which passed.
And the structure of the capability API could be based on what the
tests need to know to make a SKIP vs FAIL decision.

Common tests would obviously also ease the validation burden, burden of
writing tests on vendors and make it far easier for new implementations
to be confidently submitted.

> A script in selftest that sets up a redirect between two interfaces
> that we tell people to run? Or what? How would you catch, say, that
> issue where if a machine has more CPUs than the NIC has TXQs things
> start falling apart?

selftests should be a good place, but I don't mind the location.
The point is having tests which anyone (vendors and users) can run
to test their platforms. One of the tests should indeed test if every
CPU in the platform can XDP_REDIRECT. Shouldn't it be a rather trivial
combination of tun/veth, mh and taskset?

> > Also IDK why those tests are considered such a huge ask. As I said most
> > vendors probably already have them, and so I'd guess do good distros.
> > So let's work together.  
> 
> I guess what I'm afraid of is that this will end up delaying or stalling
> a fix for a long-standing issue (which is what I consider this series as
> shown by the example above). Maybe you can alleviate that by expanding a
> bit on what you mean?

I hope what I wrote helps a little. I'm not good at explaining. 

Perhaps I had seen one too many vendor incompatibility to trust that
adding a driver API without a validation suite will result in something
usable in production settings. 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-12  1:26                                                       ` [Intel-wired-lan] " Jakub Kicinski
@ 2021-02-12  2:05                                                         ` Alexei Starovoitov
  -1 siblings, 0 replies; 120+ messages in thread
From: Alexei Starovoitov @ 2021-02-12  2:05 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toke Høiland-Jørgensen, Marek Majtyka, Saeed Mahameed,
	David Ahern, Maciej Fijalkowski, John Fastabend,
	Jesper Dangaard Brouer, Daniel Borkmann, Maciej Fijalkowski,
	Björn Töpel, Andrii Nakryiko, Jonathan Lemon,
	Alexei Starovoitov, Network Development, David S. Miller,
	Jesper Dangaard Brouer, bpf, intel-wired-lan, Karlsson, Magnus,
	Jeff Kirsher

On Thu, Feb 11, 2021 at 5:26 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Perhaps I had seen one too many vendor incompatibility to trust that
> adding a driver API without a validation suite will result in something
> usable in production settings.

I agree with Jakub. I don't see how extra ethtool reporting will help.
Anyone who wants to know whether eth0 supports XDP_REDIRECT can already do so:
ethtool -S eth0 | grep xdp_redirect

I think converging on the same stat names across the drivers will make
the whole thing much more user friendly than new apis.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-12  2:05                                                         ` Alexei Starovoitov
  0 siblings, 0 replies; 120+ messages in thread
From: Alexei Starovoitov @ 2021-02-12  2:05 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Feb 11, 2021 at 5:26 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Perhaps I had seen one too many vendor incompatibility to trust that
> adding a driver API without a validation suite will result in something
> usable in production settings.

I agree with Jakub. I don't see how extra ethtool reporting will help.
Anyone who wants to know whether eth0 supports XDP_REDIRECT can already do so:
ethtool -S eth0 | grep xdp_redirect

I think converging on the same stat names across the drivers will make
the whole thing much more user friendly than new apis.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-12  2:05                                                         ` [Intel-wired-lan] " Alexei Starovoitov
@ 2021-02-12  7:02                                                           ` Marek Majtyka
  -1 siblings, 0 replies; 120+ messages in thread
From: Marek Majtyka @ 2021-02-12  7:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jakub Kicinski, Toke Høiland-Jørgensen, Saeed Mahameed,
	David Ahern, Maciej Fijalkowski, John Fastabend,
	Jesper Dangaard Brouer, Daniel Borkmann, Maciej Fijalkowski,
	Björn Töpel, Andrii Nakryiko, Jonathan Lemon,
	Alexei Starovoitov, Network Development, David S. Miller,
	Jesper Dangaard Brouer, bpf, intel-wired-lan, Karlsson, Magnus,
	Jeff Kirsher

On Fri, Feb 12, 2021 at 3:05 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Feb 11, 2021 at 5:26 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Perhaps I had seen one too many vendor incompatibility to trust that
> > adding a driver API without a validation suite will result in something
> > usable in production settings.
>
> I agree with Jakub. I don't see how extra ethtool reporting will help.
> Anyone who wants to know whether eth0 supports XDP_REDIRECT can already do so:
> ethtool -S eth0 | grep xdp_redirect

Doing things right can never be treated as an addition. It is the
other way around. Option -S is for statistics and additionally it can
show something (AFAIR there wasn't such counter xdp_redirect, it must
be something new, thanks for the info). But  nevertheless it cannot
cover all needs IMO.

Some questions worth to consider:
Is this extra reporting function of statistics clearly documented in
ethtool? Is this going to be clearly documented? Would it be easier
for users/admins to find it?
What about zero copy? Can it be available via statistics, too?
What about drivers XDP transmit locking flag (latest request from Jesper)?






>
> I think converging on the same stat names across the drivers will make
> the whole thing much more user friendly than new apis.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-12  7:02                                                           ` Marek Majtyka
  0 siblings, 0 replies; 120+ messages in thread
From: Marek Majtyka @ 2021-02-12  7:02 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Feb 12, 2021 at 3:05 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Feb 11, 2021 at 5:26 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Perhaps I had seen one too many vendor incompatibility to trust that
> > adding a driver API without a validation suite will result in something
> > usable in production settings.
>
> I agree with Jakub. I don't see how extra ethtool reporting will help.
> Anyone who wants to know whether eth0 supports XDP_REDIRECT can already do so:
> ethtool -S eth0 | grep xdp_redirect

Doing things right can never be treated as an addition. It is the
other way around. Option -S is for statistics and additionally it can
show something (AFAIR there wasn't such counter xdp_redirect, it must
be something new, thanks for the info). But  nevertheless it cannot
cover all needs IMO.

Some questions worth to consider:
Is this extra reporting function of statistics clearly documented in
ethtool? Is this going to be clearly documented? Would it be easier
for users/admins to find it?
What about zero copy? Can it be available via statistics, too?
What about drivers XDP transmit locking flag (latest request from Jesper)?






>
> I think converging on the same stat names across the drivers will make
> the whole thing much more user friendly than new apis.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
  2021-02-12  7:02                                                           ` [Intel-wired-lan] " Marek Majtyka
@ 2021-02-16 14:30                                                             ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  -1 siblings, 0 replies; 120+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-02-16 14:30 UTC (permalink / raw)
  To: Marek Majtyka, Alexei Starovoitov
  Cc: Jakub Kicinski, Saeed Mahameed, David Ahern, Maciej Fijalkowski,
	John Fastabend, Jesper Dangaard Brouer, Daniel Borkmann,
	Maciej Fijalkowski, Björn Töpel, Andrii Nakryiko,
	Jonathan Lemon, Alexei Starovoitov, Network Development,
	David S. Miller, Jesper Dangaard Brouer, bpf, intel-wired-lan,
	Karlsson, Magnus, Jeff Kirsher

Marek Majtyka <alardam@gmail.com> writes:

> On Fri, Feb 12, 2021 at 3:05 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Thu, Feb 11, 2021 at 5:26 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> >
>> > Perhaps I had seen one too many vendor incompatibility to trust that
>> > adding a driver API without a validation suite will result in something
>> > usable in production settings.
>>
>> I agree with Jakub. I don't see how extra ethtool reporting will help.
>> Anyone who wants to know whether eth0 supports XDP_REDIRECT can already do so:
>> ethtool -S eth0 | grep xdp_redirect
>
> Doing things right can never be treated as an addition. It is the
> other way around. Option -S is for statistics and additionally it can
> show something (AFAIR there wasn't such counter xdp_redirect, it must
> be something new, thanks for the info). But  nevertheless it cannot
> cover all needs IMO.
>
> Some questions worth to consider:
> Is this extra reporting function of statistics clearly documented in
> ethtool? Is this going to be clearly documented? Would it be easier
> for users/admins to find it?
> What about zero copy? Can it be available via statistics, too?
> What about drivers XDP transmit locking flag (latest request from Jesper)?


There is no way the statistics is enough. And saying "just grep for
xdp_redirect in ethtool -S" is bordering on active hostility towards
users.

We need drivers to export explicit features so we can do things like:

- Explicitly reject attaching a program that tries to do xdp_redirect on
  an interface that doesn't support it.

- Prevent devices that don't implement ndo_xdp_xmit() from being
  inserted into a devmap (oh, and this is on thing you can't know at all
  from the statistics, BTW).

- Expose the features in a machine-readable format (like the ethtool
  flags in your patch) so applications can discover in a reliable way
  what is available and do proper fallback if features are missing.

I can accept that we need some kind of conformance test to define what
each flag means (which would be kinda like a selftest for the feature
flags), but we definitely need the feature flags themselves!

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [Intel-wired-lan] [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
@ 2021-02-16 14:30                                                             ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 120+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-02-16 14:30 UTC (permalink / raw)
  To: intel-wired-lan

Marek Majtyka <alardam@gmail.com> writes:

> On Fri, Feb 12, 2021 at 3:05 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Thu, Feb 11, 2021 at 5:26 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> >
>> > Perhaps I had seen one too many vendor incompatibility to trust that
>> > adding a driver API without a validation suite will result in something
>> > usable in production settings.
>>
>> I agree with Jakub. I don't see how extra ethtool reporting will help.
>> Anyone who wants to know whether eth0 supports XDP_REDIRECT can already do so:
>> ethtool -S eth0 | grep xdp_redirect
>
> Doing things right can never be treated as an addition. It is the
> other way around. Option -S is for statistics and additionally it can
> show something (AFAIR there wasn't such counter xdp_redirect, it must
> be something new, thanks for the info). But  nevertheless it cannot
> cover all needs IMO.
>
> Some questions worth to consider:
> Is this extra reporting function of statistics clearly documented in
> ethtool? Is this going to be clearly documented? Would it be easier
> for users/admins to find it?
> What about zero copy? Can it be available via statistics, too?
> What about drivers XDP transmit locking flag (latest request from Jesper)?


There is no way the statistics is enough. And saying "just grep for
xdp_redirect in ethtool -S" is bordering on active hostility towards
users.

We need drivers to export explicit features so we can do things like:

- Explicitly reject attaching a program that tries to do xdp_redirect on
  an interface that doesn't support it.

- Prevent devices that don't implement ndo_xdp_xmit() from being
  inserted into a devmap (oh, and this is on thing you can't know at all
  from the statistics, BTW).

- Expose the features in a machine-readable format (like the ethtool
  flags in your patch) so applications can discover in a reliable way
  what is available and do proper fallback if features are missing.

I can accept that we need some kind of conformance test to define what
each flag means (which would be kinda like a selftest for the feature
flags), but we definitely need the feature flags themselves!

-Toke


^ permalink raw reply	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2021-02-16 14:32 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-04 10:28 [PATCH v2 bpf 0/5] New netdev feature flags for XDP alardam
2020-12-04 10:28 ` [Intel-wired-lan] " alardam
2020-12-04 10:28 ` [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set alardam
2020-12-04 10:28   ` [Intel-wired-lan] " alardam
2020-12-04 12:18   ` Toke Høiland-Jørgensen
2020-12-04 12:18     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2020-12-04 12:46     ` Maciej Fijalkowski
2020-12-04 12:46       ` [Intel-wired-lan] " Maciej Fijalkowski
2020-12-04 15:21       ` Daniel Borkmann
2020-12-04 15:21         ` [Intel-wired-lan] " Daniel Borkmann
2020-12-04 17:20         ` Toke Høiland-Jørgensen
2020-12-04 17:20           ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2020-12-04 22:19           ` Daniel Borkmann
2020-12-04 22:19             ` [Intel-wired-lan] " Daniel Borkmann
2020-12-07 11:54             ` Jesper Dangaard Brouer
2020-12-07 11:54               ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-12-07 12:08               ` Toke Høiland-Jørgensen
2020-12-07 12:08                 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2020-12-07 12:03             ` Toke Høiland-Jørgensen
2020-12-07 12:03               ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2020-12-07 12:54         ` Jesper Dangaard Brouer
2020-12-07 12:54           ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-12-07 20:52           ` John Fastabend
2020-12-07 20:52             ` [Intel-wired-lan] " John Fastabend
2020-12-07 22:38             ` Saeed Mahameed
2020-12-07 22:38               ` [Intel-wired-lan] " Saeed Mahameed
2020-12-07 23:07             ` Maciej Fijalkowski
2020-12-07 23:07               ` [Intel-wired-lan] " Maciej Fijalkowski
2020-12-09  6:03               ` John Fastabend
2020-12-09  6:03                 ` [Intel-wired-lan] " John Fastabend
2020-12-09  9:54                 ` Maciej Fijalkowski
2020-12-09  9:54                   ` [Intel-wired-lan] " Maciej Fijalkowski
2020-12-09 11:52                   ` Jesper Dangaard Brouer
2020-12-09 11:52                     ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-12-09 15:41                     ` David Ahern
2020-12-09 15:41                       ` [Intel-wired-lan] " David Ahern
2020-12-09 17:15                       ` Saeed Mahameed
2020-12-09 17:15                         ` [Intel-wired-lan] " Saeed Mahameed
2020-12-10  3:34                         ` David Ahern
2020-12-10  3:34                           ` [Intel-wired-lan] " David Ahern
2020-12-10  6:48                           ` Saeed Mahameed
2020-12-10  6:48                             ` [Intel-wired-lan] " Saeed Mahameed
2020-12-10 15:30                             ` David Ahern
2020-12-10 15:30                               ` [Intel-wired-lan] " David Ahern
2020-12-10 18:58                               ` Saeed Mahameed
2020-12-10 18:58                                 ` [Intel-wired-lan] " Saeed Mahameed
2021-01-05 11:56                                 ` Marek Majtyka
2021-01-05 11:56                                   ` [Intel-wired-lan] " Marek Majtyka
2021-02-01 16:16                                   ` Toke Høiland-Jørgensen
2021-02-01 16:16                                     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2021-02-02 11:26                                     ` Marek Majtyka
2021-02-02 11:26                                       ` [Intel-wired-lan] " Marek Majtyka
2021-02-02 12:05                                       ` Toke Høiland-Jørgensen
2021-02-02 12:05                                         ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2021-02-02 19:34                                         ` Jakub Kicinski
2021-02-02 19:34                                           ` [Intel-wired-lan] " Jakub Kicinski
2021-02-03 12:50                                           ` Marek Majtyka
2021-02-03 12:50                                             ` [Intel-wired-lan] " Marek Majtyka
2021-02-03 17:02                                             ` Jakub Kicinski
2021-02-03 17:02                                               ` [Intel-wired-lan] " Jakub Kicinski
2021-02-10 10:53                                               ` Toke Høiland-Jørgensen
2021-02-10 10:53                                                 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2021-02-10 18:31                                                 ` Jakub Kicinski
2021-02-10 18:31                                                   ` [Intel-wired-lan] " Jakub Kicinski
2021-02-10 22:52                                                   ` Toke Høiland-Jørgensen
2021-02-10 22:52                                                     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2021-02-12  1:26                                                     ` Jakub Kicinski
2021-02-12  1:26                                                       ` [Intel-wired-lan] " Jakub Kicinski
2021-02-12  2:05                                                       ` Alexei Starovoitov
2021-02-12  2:05                                                         ` [Intel-wired-lan] " Alexei Starovoitov
2021-02-12  7:02                                                         ` Marek Majtyka
2021-02-12  7:02                                                           ` [Intel-wired-lan] " Marek Majtyka
2021-02-16 14:30                                                           ` Toke Høiland-Jørgensen
2021-02-16 14:30                                                             ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2020-12-09 15:44                     ` David Ahern
2020-12-09 15:44                       ` [Intel-wired-lan] " David Ahern
2020-12-10 13:32                       ` Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set) Jesper Dangaard Brouer
2020-12-10 13:32                         ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-12-10 14:14                         ` Magnus Karlsson
2020-12-10 14:14                           ` Magnus Karlsson
2020-12-10 17:30                           ` Jesper Dangaard Brouer
2020-12-10 17:30                             ` Jesper Dangaard Brouer
2020-12-10 19:20                         ` Saeed Mahameed
2020-12-10 19:20                           ` [Intel-wired-lan] " Saeed Mahameed
2020-12-08  1:01             ` [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set David Ahern
2020-12-08  1:01               ` [Intel-wired-lan] " David Ahern
2020-12-08  8:28               ` Jesper Dangaard Brouer
2020-12-08  8:28                 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-12-08 11:58                 ` Toke Høiland-Jørgensen
2020-12-08 11:58                   ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2020-12-09  5:50                   ` John Fastabend
2020-12-09  5:50                     ` [Intel-wired-lan] " John Fastabend
2020-12-09 10:26                     ` Toke Høiland-Jørgensen
2020-12-09 10:26                       ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2020-12-08  9:00             ` Jesper Dangaard Brouer
2020-12-08  9:00               ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-12-08  9:42               ` Daniel Borkmann
2020-12-08  9:42                 ` [Intel-wired-lan] " Daniel Borkmann
2020-12-04 12:57   ` Maciej Fijalkowski
2020-12-04 12:57     ` [Intel-wired-lan] " Maciej Fijalkowski
2020-12-04 10:28 ` [PATCH v2 bpf 2/5] drivers/net: turn XDP properties on alardam
2020-12-04 10:28   ` [Intel-wired-lan] " alardam
2020-12-04 12:19   ` Toke Høiland-Jørgensen
2020-12-04 12:19     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2020-12-09 19:05   ` kernel test robot
2020-12-09 19:05     ` kernel test robot
2020-12-04 10:28 ` [PATCH v2 bpf 3/5] xsk: add usage of xdp properties flags alardam
2020-12-04 10:28   ` [Intel-wired-lan] " alardam
2020-12-04 10:29 ` [PATCH v2 bpf 4/5] xsk: add check for full support of XDP in bind alardam
2020-12-04 10:29   ` [Intel-wired-lan] " alardam
2020-12-04 10:29 ` [PATCH v2 bpf 5/5] ethtool: provide xdp info with XDP_PROPERTIES_GET alardam
2020-12-04 10:29   ` [Intel-wired-lan] " alardam
2020-12-04 17:20 ` [PATCH v2 bpf 0/5] New netdev feature flags for XDP Jakub Kicinski
2020-12-04 17:20   ` [Intel-wired-lan] " Jakub Kicinski
2020-12-04 17:26   ` Toke Høiland-Jørgensen
2020-12-04 17:26     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2020-12-04 19:22     ` Jakub Kicinski
2020-12-04 19:22       ` [Intel-wired-lan] " Jakub Kicinski
2020-12-07 12:04       ` Toke Høiland-Jørgensen
2020-12-07 12:04         ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.