netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] New netdev feature flags for XDP
@ 2020-11-16  9:34 alardam
  2020-11-16  9:34 ` [PATCH 1/8] net: ethtool: extend netdev_features flag set alardam
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: alardam @ 2020-11-16  9:34 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Implement support for checking if a netdev has native XDP and AF_XDP zero
copy support. Previously, there was no way to do this other than to try
to create an AF_XDP socket on the interface or load an XDP program and
see if it worked. This commit changes this by extending existing
netdev_features in the following way:
 * xdp        - full XDP support (XDP_{TX, PASS, DROP, ABORT, REDIRECT})
 * af-xdp-zc  - AF_XDP zero copy support
NICs supporting these features are updated by turning the corresponding
netdev feature flags on.

NOTE:
 Only the compilation check was performed for:
  - ice, 
  - igb,
  - mlx5, 
  - bnxt, 
  - dpaa2, 
  - mvmeta, 
  - mvpp2, 
  - qede,
  - sfc, 
  - netsec, 
  - cpsw, 
  - xen, 
  - virtio_net.

Libbpf library is extended in order to provide a simple API for gathering
information about XDP supported capabilities of a netdev. This API
utilizes netlink interface towards kernel. With this function it is
possible to get xsk supported options for netdev beforehand.
The new API is used in core xsk code as well as in the xdpsock sample.

These new flags also solve the problem with strict recognition of zero
copy support. The problem is that there are drivers out there that only
support XDP partially, so it is possible to successfully load the XDP
program in native mode, but it will still not be able to support zero-copy
as it does not have XDP_REDIRECT support. With af-xdp-zc flag the check
is possible and trivial.

Marek Majtyka (8):
  net: ethtool: extend netdev_features flag set
  drivers/net: turn XDP flags on
  xsk: add usage of xdp netdev_features flags
  xsk: add check for full support of XDP in bind
  libbpf: extend netlink attribute API
  libbpf: add functions to get XSK modes
  libbpf: add API to get XSK/XDP caps
  samples/bpf/xdp: apply netdev XDP/XSK modes info

 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   1 +
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |   1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c   |   2 +
 drivers/net/ethernet/intel/ice/ice_main.c     |   4 +
 drivers/net/ethernet/intel/igb/igb_main.c     |   2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   3 +
 drivers/net/ethernet/marvell/mvneta.c         |   1 +
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   1 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   2 +
 drivers/net/ethernet/qlogic/qede/qede_main.c  |   1 +
 drivers/net/ethernet/sfc/efx.c                |   1 +
 drivers/net/ethernet/socionext/netsec.c       |   1 +
 drivers/net/ethernet/ti/cpsw.c                |   2 +
 drivers/net/ethernet/ti/cpsw_new.c            |   2 +
 drivers/net/tun.c                             |   4 +
 drivers/net/veth.c                            |   1 +
 drivers/net/virtio_net.c                      |   1 +
 drivers/net/xen-netfront.c                    |   1 +
 include/linux/netdev_features.h               |   6 +
 include/net/xdp.h                             |  13 +
 include/net/xdp_sock_drv.h                    |  11 +
 include/uapi/linux/if_xdp.h                   |   1 +
 net/ethtool/common.c                          |   2 +
 net/xdp/xsk.c                                 |   4 +-
 net/xdp/xsk_buff_pool.c                       |  21 +-
 samples/bpf/xdpsock_user.c                    | 117 +++++-
 tools/include/uapi/linux/ethtool.h            |  44 ++
 tools/include/uapi/linux/if_xdp.h             |   1 +
 tools/lib/bpf/ethtool.h                       |  49 +++
 tools/lib/bpf/libbpf.h                        |   1 +
 tools/lib/bpf/libbpf.map                      |   1 +
 tools/lib/bpf/netlink.c                       | 379 +++++++++++++++++-
 tools/lib/bpf/nlattr.c                        | 105 +++++
 tools/lib/bpf/nlattr.h                        |  22 +
 tools/lib/bpf/xsk.c                           |  54 ++-
 tools/lib/bpf/xsk.h                           |   3 +-
 36 files changed, 845 insertions(+), 20 deletions(-)
 create mode 100644 tools/lib/bpf/ethtool.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/8] net: ethtool: extend netdev_features flag set
  2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
@ 2020-11-16  9:34 ` alardam
  2020-11-16  9:34 ` [PATCH 2/8] drivers/net: turn XDP flags on alardam
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: alardam @ 2020-11-16  9:34 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Implement support for checking if a netdev has XDP and AF_XDP zero copy
support. Previously, there was no way to do this other than to try
to create an AF_XDP socket on the interface or load an XDP program
and see if it worked. This commit changes this by extending existing
netdev_features in the following way:
 * xdp        - full XDP support (XDP_{TX, PASS, DROP, ABORT, REDIRECT})
 * af-xdp-zc  - AF_XDP zero copy support

By default these new flags are disabled for all drivers.

    $ ethtool -k enp1s0f0
     Features for enp1s0f0:
     ..
     xdp: off [fixed]
     af-xdp-zc: off [fixed]

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 include/linux/netdev_features.h |  6 ++++++
 include/net/xdp.h               | 13 +++++++++++++
 include/net/xdp_sock_drv.h      | 11 +++++++++++
 net/ethtool/common.c            |  2 ++
 4 files changed, 32 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 934de56644e7..d154ee7209b9 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -85,6 +85,8 @@ enum {
 
 	NETIF_F_HW_MACSEC_BIT,		/* Offload MACsec operations */
 
+	NETIF_F_XDP_BIT,		/* XDP support */
+	NETIF_F_AF_XDP_ZC_BIT,		/* AF_XDP zero-copy support */
 	/*
 	 * Add your fresh new feature above and remember to update
 	 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -157,6 +159,9 @@ enum {
 #define NETIF_F_GRO_FRAGLIST	__NETIF_F(GRO_FRAGLIST)
 #define NETIF_F_GSO_FRAGLIST	__NETIF_F(GSO_FRAGLIST)
 #define NETIF_F_HW_MACSEC	__NETIF_F(HW_MACSEC)
+#define NETIF_F_XDP		__NETIF_F(XDP)
+#define NETIF_F_AF_XDP_ZC	__NETIF_F(AF_XDP_ZC)
+
 
 /* Finds the next feature with the highest number of the range of start till 0.
  */
@@ -182,6 +187,7 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start)
 /* Features valid for ethtool to change */
 /* = all defined minus driver/device-class-related */
 #define NETIF_F_NEVER_CHANGE	(NETIF_F_VLAN_CHALLENGED | \
+				 NETIF_F_XDP | NETIF_F_AF_XDP_ZC | \
 				 NETIF_F_LLTX | NETIF_F_NETNS_LOCAL)
 
 /* remember that ((t)1 << t_BITS) is undefined in C99 */
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 7d48b2ae217a..82bb47372b02 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -254,6 +254,19 @@ struct xdp_attachment_info {
 	u32 flags;
 };
 
+#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
+static __always_inline void
+xdp_set_feature_flag(netdev_features_t *features)
+{
+	*features |= NETIF_F_XDP;
+}
+#else
+static __always_inline void
+xdp_set_feature_flag(netdev_features_t *features)
+{
+}
+#endif
+
 struct netdev_bpf;
 bool xdp_attachment_flags_ok(struct xdp_attachment_info *info,
 			     struct netdev_bpf *bpf);
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 5b1ee8a9976d..86b41f89d09d 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -22,6 +22,12 @@ void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool);
 void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool);
 bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool);
 
+static __always_inline void
+xsk_set_feature_flag(netdev_features_t *features)
+{
+	*features |= NETIF_F_AF_XDP_ZC;
+}
+
 static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
 {
 	return XDP_PACKET_HEADROOM + pool->headroom;
@@ -235,6 +241,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 {
 }
 
+static __always_inline void
+xsk_set_feature_flag(netdev_features_t *features)
+{
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_DRV_H */
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 24036e3055a1..eed225283eee 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -68,6 +68,8 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
 	[NETIF_F_HW_TLS_RX_BIT] =	 "tls-hw-rx-offload",
 	[NETIF_F_GRO_FRAGLIST_BIT] =	 "rx-gro-list",
 	[NETIF_F_HW_MACSEC_BIT] =	 "macsec-hw-offload",
+	[NETIF_F_XDP_BIT] =              "xdp",
+	[NETIF_F_AF_XDP_ZC_BIT] =        "af-xdp-zc",
 };
 
 const char
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/8] drivers/net: turn XDP flags on
  2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
  2020-11-16  9:34 ` [PATCH 1/8] net: ethtool: extend netdev_features flag set alardam
@ 2020-11-16  9:34 ` alardam
  2020-11-16  9:34 ` [PATCH 3/8] xsk: add usage of xdp netdev_features flags alardam
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: alardam @ 2020-11-16  9:34 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Turn 'xdp' and 'af-xdp-zc' feature flags on for:
 - i40e
 - ice
 - ixgbe
 - mlx5.

Turn 'xdp' feature flag on for:
 - igb
 - tun
 - veth
 - dpaa2
 - mvneta
 - mvpp2
 - qede
 - sfc
 - netsec
 - cpsw
 - xen
 - virtio_net.

The first group of NICs is currently visible with ethtool as:
  $ ethtool -k enp1s0f0
    Features for enp1s0f0:
    ..
    xdp: on [fixed]
    af-xdp-zc: on [fixed]

whereas for the second group output is as:
  $ ethtool -k enp1s0f0
    Features for enp1s0f0:
    ..
    xdp: on [fixed]
    af-xdp-zc: off [fixed]

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 1 +
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c  | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c       | 2 ++
 drivers/net/ethernet/intel/ice/ice_main.c         | 4 ++++
 drivers/net/ethernet/intel/igb/igb_main.c         | 2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 3 +++
 drivers/net/ethernet/marvell/mvneta.c             | 1 +
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c   | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c      | 1 +
 drivers/net/ethernet/sfc/efx.c                    | 1 +
 drivers/net/ethernet/socionext/netsec.c           | 1 +
 drivers/net/ethernet/ti/cpsw.c                    | 2 ++
 drivers/net/ethernet/ti/cpsw_new.c                | 2 ++
 drivers/net/tun.c                                 | 4 ++++
 drivers/net/veth.c                                | 1 +
 drivers/net/virtio_net.c                          | 1 +
 drivers/net/xen-netfront.c                        | 1 +
 18 files changed, 31 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7975f59735d6..9f689717319d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -12604,6 +12604,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->features |= dev->hw_features | NETIF_F_HIGHDMA;
 	if (dev->features & NETIF_F_GRO_HW)
 		dev->features &= ~NETIF_F_LRO;
+	xdp_set_feature_flag(&dev->features);
 	dev->priv_flags |= IFF_UNICAST_FLT;
 
 #ifdef CONFIG_BNXT_SRIOV
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index cf9400a9886d..418ec3dae1dd 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -4014,6 +4014,7 @@ static int dpaa2_eth_netdev_init(struct net_device *net_dev)
 			    NETIF_F_SG | NETIF_F_HIGHDMA |
 			    NETIF_F_LLTX | NETIF_F_HW_TC;
 	net_dev->hw_features = net_dev->features;
+	xdp_set_feature_flag(&net_dev->features);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 4f8a2154b93f..0b7825f629c6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12873,6 +12873,8 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	netdev->hw_features |= hw_features;
 
 	netdev->features |= hw_features | NETIF_F_HW_VLAN_CTAG_FILTER;
+	xdp_set_feature_flag(&netdev->features);
+	xsk_set_feature_flag(&netdev->features);
 	netdev->hw_enc_features |= NETIF_F_TSO_MANGLEID;
 
 	if (vsi->type == I40E_VSI_MAIN) {
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 2dea4d0e9415..7b932ba42f09 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -13,6 +13,7 @@
 #include "ice_dcb_lib.h"
 #include "ice_dcb_nl.h"
 #include "ice_devlink.h"
+#include <net/xdp_sock_drv.h>
 
 #define DRV_SUMMARY	"Intel(R) Ethernet Connection E800 Series Linux Driver"
 static const char ice_driver_string[] = DRV_SUMMARY;
@@ -2941,6 +2942,9 @@ static void ice_set_netdev_features(struct net_device *netdev)
 
 	/* enable features */
 	netdev->features |= netdev->hw_features;
+	xdp_set_feature_flag(&netdev->features);
+	xsk_set_feature_flag(&netdev->features);
+
 	/* encap and VLAN devices inherit default, csumo and tso features */
 	netdev->hw_enc_features |= dflt_features | csumo_features |
 				   tso_features;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 5fc2c381da55..e89a0442606f 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3293,6 +3293,8 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 			    NETIF_F_HW_VLAN_CTAG_RX |
 			    NETIF_F_HW_VLAN_CTAG_TX;
 
+	xdp_set_feature_flag(&netdev->features);
+
 	netdev->priv_flags |= IFF_SUPP_NOFCS;
 
 	netdev->priv_flags |= IFF_UNICAST_FLT;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 45ae33e15303..fccc282a9492 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10880,6 +10880,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
 		netdev->features |= NETIF_F_LRO;
 
+	xdp_set_feature_flag(&netdev->features);
+	xsk_set_feature_flag(&netdev->features);
+
 	if (ixgbe_check_fw_error(adapter)) {
 		err = -EIO;
 		goto err_sw_init;
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 183530ed4d1d..b80c1eca820e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -5262,6 +5262,7 @@ static int mvneta_probe(struct platform_device *pdev)
 			NETIF_F_TSO | NETIF_F_RXCSUM;
 	dev->hw_features |= dev->features;
 	dev->vlan_features |= dev->features;
+	xdp_set_feature_flag(&dev->features);
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 	dev->gso_max_segs = MVNETA_MAX_TSO_SEGS;
 
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 3069e192d773..6add63ef9ac0 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -6475,6 +6475,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		mvpp2_set_hw_csum(port, port->pool_long->id);
 
 	dev->vlan_features |= features;
+	xdp_set_feature_flag(&dev->features);
 	dev->gso_max_segs = MVPP2_MAX_TSO_SEGS;
 	dev->priv_flags |= IFF_UNICAST_FLT;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 527c5f12c5af..67aaec330816 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4980,6 +4980,8 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	netdev->features         |= NETIF_F_HIGHDMA;
 	netdev->features         |= NETIF_F_HW_VLAN_STAG_FILTER;
+	xdp_set_feature_flag(&netdev->features);
+	xsk_set_feature_flag(&netdev->features);
 
 	netdev->priv_flags       |= IFF_UNICAST_FLT;
 
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 05e3a3b60269..add1b820adea 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -839,6 +839,7 @@ static void qede_init_ndev(struct qede_dev *edev)
 	ndev->features = hw_features | NETIF_F_RXHASH | NETIF_F_RXCSUM |
 			 NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HIGHDMA |
 			 NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_TX;
+	xdp_set_feature_flag(&ndev->features);
 
 	ndev->hw_features = hw_features;
 
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 718308076341..ca5a582ce26b 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1101,6 +1101,7 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
 	efx = netdev_priv(net_dev);
 	efx->type = (const struct efx_nic_type *) entry->driver_data;
 	efx->fixed_features |= NETIF_F_HIGHDMA;
+	xdp_set_feature_flag(&efx->fixed_features);
 
 	pci_set_drvdata(pci_dev, efx);
 	SET_NETDEV_DEV(net_dev, &pci_dev->dev);
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 1503cc9ec6e2..8a8cca53821d 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -2099,6 +2099,7 @@ static int netsec_probe(struct platform_device *pdev)
 	ndev->features |= NETIF_F_HIGHDMA | NETIF_F_RXCSUM | NETIF_F_GSO |
 				NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 	ndev->hw_features = ndev->features;
+	xdp_set_feature_flag(&ndev->features);
 
 	priv->rx_cksum_offload_flag = true;
 
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 9fd1f77190ad..ca5548f92b56 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1475,6 +1475,7 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 	priv_sl2->emac_port = 1;
 	cpsw->slaves[1].ndev = ndev;
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
+	xdp_set_feature_flag(&ndev->features);
 
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
@@ -1654,6 +1655,7 @@ static int cpsw_probe(struct platform_device *pdev)
 	cpsw->slaves[0].ndev = ndev;
 
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
+	xdp_set_feature_flag(&ndev->features);
 
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index f779d2e1b5c5..9a8fb3dd8a19 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -1416,6 +1416,8 @@ static int cpsw_create_ports(struct cpsw_common *cpsw)
 		ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER |
 				  NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_NETNS_LOCAL;
 
+		xdp_set_feature_flag(&ndev->features);
+
 		ndev->netdev_ops = &cpsw_netdev_ops;
 		ndev->ethtool_ops = &cpsw_ethtool_ops;
 		SET_NETDEV_DEV(ndev, dev);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3d45d56172cb..2187714598d6 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2721,6 +2721,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 				     ~(NETIF_F_HW_VLAN_CTAG_TX |
 				       NETIF_F_HW_VLAN_STAG_TX);
 
+		/* Currently tap does not support XDP, only tun does. */
+		if (tun->flags == IFF_TUN)
+			xdp_set_feature_flag(&dev->features);
+
 		tun->flags = (tun->flags & ~TUN_FEATURES) |
 			      (ifr->ifr_flags & TUN_FEATURES);
 
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c737668008a..26f2e83d9a2c 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1266,6 +1266,7 @@ static void veth_setup(struct net_device *dev)
 	dev->hw_features = VETH_FEATURES;
 	dev->hw_enc_features = VETH_FEATURES;
 	dev->mpls_features = NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE;
+	xdp_set_feature_flag(&dev->features);
 }
 
 /*
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 21b71148c532..ee33e9fce4c5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3017,6 +3017,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 		dev->hw_features |= NETIF_F_LRO;
 
 	dev->vlan_features = dev->features;
+	xdp_set_feature_flag(&dev->features);
 
 	/* MTU range: 68 - 65535 */
 	dev->min_mtu = MIN_MTU;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 920cac4385bf..abe5f0104c73 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1555,6 +1555,7 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
          * negotiate with the backend regarding supported features.
          */
 	netdev->features |= netdev->hw_features;
+	xdp_set_feature_flag(&netdev->features);
 
 	netdev->ethtool_ops = &xennet_ethtool_ops;
 	netdev->min_mtu = ETH_MIN_MTU;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/8] xsk: add usage of xdp netdev_features flags
  2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
  2020-11-16  9:34 ` [PATCH 1/8] net: ethtool: extend netdev_features flag set alardam
  2020-11-16  9:34 ` [PATCH 2/8] drivers/net: turn XDP flags on alardam
@ 2020-11-16  9:34 ` alardam
  2020-11-16  9:34 ` [PATCH 4/8] xsk: add check for full support of XDP in bind alardam
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: alardam @ 2020-11-16  9:34 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Change necessary condition check for XSK from ndo functions to
netdev_features flags.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 net/xdp/xsk_buff_pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 8a3bf4e1318e..76922696ad3c 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -159,8 +159,8 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 		/* For copy-mode, we are done. */
 		return 0;
 
-	if (!netdev->netdev_ops->ndo_bpf ||
-	    !netdev->netdev_ops->ndo_xsk_wakeup) {
+	if (!(NETIF_F_XDP & netdev->features &&
+	      NETIF_F_AF_XDP_ZC & netdev->features)) {
 		err = -EOPNOTSUPP;
 		goto err_unreg_pool;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/8] xsk: add check for full support of XDP in bind
  2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
                   ` (2 preceding siblings ...)
  2020-11-16  9:34 ` [PATCH 3/8] xsk: add usage of xdp netdev_features flags alardam
@ 2020-11-16  9:34 ` alardam
  2020-11-16  9:34 ` [PATCH 5/8] libbpf: extend netlink attribute API alardam
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: alardam @ 2020-11-16  9:34 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Add check for full support of XDP in AF_XDP socket bind.

To be able to use an AF_XDP socket with zero-copy, there needs to be
support for both XDP_REDIRECT in the driver (XDP native mode) and the
driver needs to support zero-copy. The problem is that there are
drivers out there that only support XDP partially, so it is possible
to successfully load the XDP program in native mode, but it will still
not be able to support zero-copy as it does not have XDP_REDIRECT
support. We can now alleviate this problem by using the new XDP netdev
capability that signifies if full XDP support is indeed present. This
check can be triggered by a new bind flag called
XDP_CHECK_NATIVE_MODE.

To simplify usage, this check is triggered automatically from inside
libbpf library via turning on the new XDP_CHECK_NATIVE_MODE flag if and
only if the driver mode is selected for the socket. As a result, the
xsk_bind function decides if the native mode for a given interface makes
sense or not using xdp netdev feature flags. Eventually the xsk socket is
bound or an error is returned. Apart from this change and to catch all
invalid inputs in a single place, an additional check is set to forbid
sbk mode and zero copy settings at the same time as that combination makes
no sense.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 include/uapi/linux/if_xdp.h       |  1 +
 net/xdp/xsk.c                     |  4 ++--
 net/xdp/xsk_buff_pool.c           | 17 ++++++++++++++++-
 tools/include/uapi/linux/if_xdp.h |  1 +
 tools/lib/bpf/xsk.c               |  3 +++
 5 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index a78a8096f4ce..8f47754dacce 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -25,6 +25,7 @@
  * application.
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
+#define XDP_CHECK_NATIVE_MODE (1 << 4)
 
 /* Flags for xsk_umem_config flags */
 #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index cfbec3989a76..a9c386083377 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -658,7 +658,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 
 	flags = sxdp->sxdp_flags;
 	if (flags & ~(XDP_SHARED_UMEM | XDP_COPY | XDP_ZEROCOPY |
-		      XDP_USE_NEED_WAKEUP))
+		      XDP_USE_NEED_WAKEUP | XDP_CHECK_NATIVE_MODE))
 		return -EINVAL;
 
 	rtnl_lock();
@@ -686,7 +686,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		struct socket *sock;
 
 		if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY) ||
-		    (flags & XDP_USE_NEED_WAKEUP)) {
+		    (flags & XDP_USE_NEED_WAKEUP) || (flags & XDP_CHECK_NATIVE_MODE)) {
 			/* Cannot specify flags for shared sockets. */
 			err = -EINVAL;
 			goto out_unlock;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 76922696ad3c..231d88ddd978 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -123,7 +123,7 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
 static int __xp_assign_dev(struct xsk_buff_pool *pool,
 			   struct net_device *netdev, u16 queue_id, u16 flags)
 {
-	bool force_zc, force_copy;
+	bool force_zc, force_copy, force_check;
 	struct netdev_bpf bpf;
 	int err = 0;
 
@@ -131,10 +131,24 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 
 	force_zc = flags & XDP_ZEROCOPY;
 	force_copy = flags & XDP_COPY;
+	force_check = flags & XDP_CHECK_NATIVE_MODE;
+
 
 	if (force_zc && force_copy)
 		return -EINVAL;
 
+	if (!(flags & XDP_SHARED_UMEM)) {
+		if (force_check) {
+			/* forbid driver mode without full XDP support */
+			if (!(NETIF_F_XDP & netdev->features))
+				return -EOPNOTSUPP;
+		} else {
+			/* forbid skb mode and zero copy */
+			if (force_zc)
+				return -EINVAL;
+		}
+	}
+
 	if (xsk_get_pool_from_qid(netdev, queue_id))
 		return -EBUSY;
 
@@ -206,6 +220,7 @@ int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_umem *umem,
 		return -EINVAL;
 
 	flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY;
+	flags |= XDP_SHARED_UMEM;
 	if (pool->uses_need_wakeup)
 		flags |= XDP_USE_NEED_WAKEUP;
 
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index a78a8096f4ce..8f47754dacce 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -25,6 +25,7 @@
  * application.
  */
 #define XDP_USE_NEED_WAKEUP (1 << 3)
+#define XDP_CHECK_NATIVE_MODE (1 << 4)
 
 /* Flags for xsk_umem_config flags */
 #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 9bc537d0b92d..7951f7ea6db3 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -18,6 +18,7 @@
 #include <linux/ethtool.h>
 #include <linux/filter.h>
 #include <linux/if_ether.h>
+#include <linux/if_link.h>
 #include <linux/if_packet.h>
 #include <linux/if_xdp.h>
 #include <linux/kernel.h>
@@ -827,6 +828,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
 		sxdp.sxdp_shared_umem_fd = umem->fd;
 	} else {
 		sxdp.sxdp_flags = xsk->config.bind_flags;
+		if (xsk->config.xdp_flags & XDP_FLAGS_DRV_MODE)
+			sxdp.sxdp_flags |= XDP_CHECK_NATIVE_MODE;
 	}
 
 	err = bind(xsk->fd, (struct sockaddr *)&sxdp, sizeof(sxdp));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/8] libbpf: extend netlink attribute API
  2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
                   ` (3 preceding siblings ...)
  2020-11-16  9:34 ` [PATCH 4/8] xsk: add check for full support of XDP in bind alardam
@ 2020-11-16  9:34 ` alardam
  2020-11-16  9:34 ` [PATCH 6/8] libbpf: add functions to get XSK modes alardam
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: alardam @ 2020-11-16  9:34 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Extend netlink attribute API to put a different attribute into
the netlink message (nest{start, end}, string, u32, flag, etc).
Add new API to parse attribute array.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 tools/lib/bpf/nlattr.c | 105 +++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/nlattr.h |  22 +++++++++
 2 files changed, 127 insertions(+)

diff --git a/tools/lib/bpf/nlattr.c b/tools/lib/bpf/nlattr.c
index b607fa9852b1..b37b4d266832 100644
--- a/tools/lib/bpf/nlattr.c
+++ b/tools/lib/bpf/nlattr.c
@@ -83,6 +83,52 @@ static inline int nlmsg_len(const struct nlmsghdr *nlh)
 	return nlh->nlmsg_len - NLMSG_HDRLEN;
 }
 
+/**
+ * Create attribute index table based on a stream of attributes.
+ * @arg tb		Index array to be filled (indexed from 0 to elem-1).
+ * @arg elem		Number of attributes in the table.
+ * @arg maxtype		Maximum attribute type expected and accepted.
+ * @arg head		Head of attribute stream.
+ * @arg policy		Attribute validation policy.
+ *
+ * Iterates over the stream of attributes and stores a pointer to each
+ * attribute in the index array using incremented counter as index to
+ * the array. Attribute with a index greater than or equal to the elem value
+ * specified will be ignored and function terminates with error. If a policy
+ * is not NULL, the attribute will be validated using the specified policy.
+ *
+ * @see nla_validate
+ * @return 0 on success or a negative error code.
+ */
+int libbpf_nla_parse_table(struct nlattr *tb[], int elem, struct nlattr *head,
+			   int maxtype, struct libbpf_nla_policy *policy)
+{
+	struct nlattr *nla;
+	int rem, err = 0;
+	int idx = 0;
+
+	memset(tb, 0, sizeof(struct nlattr *) * elem);
+
+	libbpf_nla_for_each_attr(nla, libbpf_nla_data(head), libbpf_nla_len(head), rem) {
+		if (idx >= elem) {
+			err = -EMSGSIZE;
+			goto errout;
+		}
+
+		if (policy) {
+			err = validate_nla(nla, maxtype, policy);
+			if (err < 0)
+				goto errout;
+		}
+
+		tb[idx] = nla;
+		idx++;
+	}
+
+errout:
+	return err;
+}
+
 /**
  * Create attribute index based on a stream of attributes.
  * @arg tb		Index array to be filled (maxtype+1 elements).
@@ -193,3 +239,62 @@ int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh)
 
 	return 0;
 }
+
+struct nlattr *libbpf_nla_next(struct nlattr *current)
+{
+	return current + NLA_ALIGN(current->nla_len) / sizeof(struct nlattr);
+}
+
+struct nlattr *libbpf_nla_nest_start(struct nlattr *start, int attrtype)
+{
+	start->nla_len = NLA_HDRLEN;
+	start->nla_type = attrtype | NLA_F_NESTED;
+	return start + 1;
+}
+
+int libbpf_nla_nest_end(struct nlattr *start, struct nlattr *next)
+{
+	start->nla_len += (unsigned char *)next  - (unsigned char *)start - NLA_HDRLEN;
+	return start->nla_len;
+}
+
+struct nlattr *libbpf_nla_put_u32(struct nlattr *start, int attrtype, uint32_t val)
+{
+	struct nlattr *next = start + 1;
+
+	start->nla_type = attrtype;
+	start->nla_len = NLA_HDRLEN + NLA_ALIGN(sizeof(uint32_t));
+	memcpy((char *)next, &val, sizeof(uint32_t));
+
+	return next + 1;
+}
+
+struct nlattr *libbpf_nla_put_str(struct nlattr *start, int attrtype,
+				  const char *string, int max_len)
+{
+	struct nlattr *next = start + 1;
+	size_t len = max_len > 0 ? strnlen(string, max_len - 1) : 0;
+	char *ptr = ((char *)next) + len;
+
+	start->nla_type = attrtype;
+	start->nla_len = NLA_HDRLEN + NLA_ALIGN(len + 1);
+	strncpy((char *)next, string, len);
+
+	for (size_t idx = len; idx < start->nla_len; ++idx, ptr++)
+		*ptr = '\0';
+
+	return libbpf_nla_next(start);
+}
+
+struct nlattr *libbpf_nla_put_flag(struct nlattr *start, int attrtype)
+{
+	start->nla_type = attrtype;
+	start->nla_len = NLA_HDRLEN;
+
+	return start + 1;
+}
+
+int libbpf_nla_attrs_length(struct nlattr *start, struct nlattr *next)
+{
+	return ((unsigned char *)next  - (unsigned char *)start);
+}
diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
index 6cc3ac91690f..93e18accfce5 100644
--- a/tools/lib/bpf/nlattr.h
+++ b/tools/lib/bpf/nlattr.h
@@ -76,6 +76,11 @@ static inline uint8_t libbpf_nla_getattr_u8(const struct nlattr *nla)
 	return *(uint8_t *)libbpf_nla_data(nla);
 }
 
+static inline uint16_t libbpf_nla_getattr_u16(const struct nlattr *nla)
+{
+	return *(uint16_t *)libbpf_nla_data(nla);
+}
+
 static inline uint32_t libbpf_nla_getattr_u32(const struct nlattr *nla)
 {
 	return *(uint32_t *)libbpf_nla_data(nla);
@@ -100,7 +105,24 @@ int libbpf_nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head,
 int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
 			    struct nlattr *nla,
 			    struct libbpf_nla_policy *policy);
+int libbpf_nla_parse_table(struct nlattr *tb[], int elem, struct nlattr *head,
+			   int type, struct libbpf_nla_policy *policy);
 
 int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
 
+struct nlattr *libbpf_nla_next(struct nlattr *current);
+
+struct nlattr *libbpf_nla_nest_start(struct nlattr *start, int attrtype);
+
+int libbpf_nla_nest_end(struct nlattr *start, struct nlattr *next);
+
+struct nlattr *libbpf_nla_put_u32(struct nlattr *start, int attrtype, uint32_t val);
+
+struct nlattr *libbpf_nla_put_str(struct nlattr *start, int attrtype,
+				  const char *string, int max_len);
+
+struct nlattr *libbpf_nla_put_flag(struct nlattr *start, int attrtype);
+
+int libbpf_nla_attrs_length(struct nlattr *start, struct nlattr *next);
+
 #endif /* __LIBBPF_NLATTR_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/8] libbpf: add functions to get XSK modes
  2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
                   ` (4 preceding siblings ...)
  2020-11-16  9:34 ` [PATCH 5/8] libbpf: extend netlink attribute API alardam
@ 2020-11-16  9:34 ` alardam
  2020-11-17  7:05   ` kernel test robot
  2020-11-16  9:34 ` [PATCH 7/8] libbpf: add API to get XSK/XDP caps alardam
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: alardam @ 2020-11-16  9:34 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Add functions to get XDP/XSK modes from netdev feature flags
over netlink ethtool family interface. These functions provide
functionalities that are going to be used in upcoming changes
together constituting new libbpf public API function which
informs about key xsk capabilities of given network interface.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 tools/include/uapi/linux/ethtool.h |  44 ++++
 tools/lib/bpf/ethtool.h            |  49 ++++
 tools/lib/bpf/libbpf.h             |   1 +
 tools/lib/bpf/netlink.c            | 379 ++++++++++++++++++++++++++++-
 4 files changed, 469 insertions(+), 4 deletions(-)
 create mode 100644 tools/lib/bpf/ethtool.h

diff --git a/tools/include/uapi/linux/ethtool.h b/tools/include/uapi/linux/ethtool.h
index c86c3e942df9..cf3041d302e4 100644
--- a/tools/include/uapi/linux/ethtool.h
+++ b/tools/include/uapi/linux/ethtool.h
@@ -48,4 +48,48 @@ struct ethtool_channels {
 	__u32	combined_count;
 };
 
+#define ETH_GSTRING_LEN		32
+
+/**
+ * enum ethtool_stringset - string set ID
+ * @ETH_SS_TEST: Self-test result names, for use with %ETHTOOL_TEST
+ * @ETH_SS_STATS: Statistic names, for use with %ETHTOOL_GSTATS
+ * @ETH_SS_PRIV_FLAGS: Driver private flag names, for use with
+ *	%ETHTOOL_GPFLAGS and %ETHTOOL_SPFLAGS
+ * @ETH_SS_NTUPLE_FILTERS: Previously used with %ETHTOOL_GRXNTUPLE;
+ *	now deprecated
+ * @ETH_SS_FEATURES: Device feature names
+ * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
+ * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS
+ * @ETH_SS_PHY_TUNABLES: PHY tunable names
+ * @ETH_SS_LINK_MODES: link mode names
+ * @ETH_SS_MSG_CLASSES: debug message class names
+ * @ETH_SS_WOL_MODES: wake-on-lan modes
+ * @ETH_SS_SOF_TIMESTAMPING: SOF_TIMESTAMPING_* flags
+ * @ETH_SS_TS_TX_TYPES: timestamping Tx types
+ * @ETH_SS_TS_RX_FILTERS: timestamping Rx filters
+ * @ETH_SS_UDP_TUNNEL_TYPES: UDP tunnel types
+ */
+enum ethtool_stringset {
+	ETH_SS_TEST		= 0,
+	ETH_SS_STATS,
+	ETH_SS_PRIV_FLAGS,
+	ETH_SS_NTUPLE_FILTERS,
+	ETH_SS_FEATURES,
+	ETH_SS_RSS_HASH_FUNCS,
+	ETH_SS_TUNABLES,
+	ETH_SS_PHY_STATS,
+	ETH_SS_PHY_TUNABLES,
+	ETH_SS_LINK_MODES,
+	ETH_SS_MSG_CLASSES,
+	ETH_SS_WOL_MODES,
+	ETH_SS_SOF_TIMESTAMPING,
+	ETH_SS_TS_TX_TYPES,
+	ETH_SS_TS_RX_FILTERS,
+	ETH_SS_UDP_TUNNEL_TYPES,
+
+	/* add new constants above here */
+	ETH_SS_COUNT
+};
+
 #endif /* _UAPI_LINUX_ETHTOOL_H */
diff --git a/tools/lib/bpf/ethtool.h b/tools/lib/bpf/ethtool.h
new file mode 100644
index 000000000000..14b2ae47bc26
--- /dev/null
+++ b/tools/lib/bpf/ethtool.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+
+/*
+ * Generic netlink ethtool family required defines
+ *
+ * Copyright (c) 2020 Intel
+ */
+
+#ifndef __LIBBPF_ETHTOOL_H_
+#define __LIBBPF_ETHTOOL_H_
+
+#include <linux/ethtool_netlink.h>
+
+#define DIV_ROUND_UP(n, d)  (((n) + (d) - 1) / (d))
+#define FEATURE_BITS_TO_BLOCKS(n_bits)      DIV_ROUND_UP(n_bits, 32U)
+
+#define FEATURE_WORD(blocks, index)  ((blocks)[(index) / 32U])
+#define FEATURE_FIELD_FLAG(index)       (1U << (index) % 32U)
+#define FEATURE_BIT_IS_SET(blocks, index)        \
+		(FEATURE_WORD(blocks, index) & FEATURE_FIELD_FLAG(index))
+
+#define NETDEV_XDP_STR			"xdp"
+#define NETDEV_XDP_LEN			4
+
+#define NETDEV_AF_XDP_ZC_STR		"af-xdp-zc"
+#define NETDEV_AF_XDP_ZC_LEN		10
+
+#define BUF_SIZE_4096			4096
+#define BUF_SIZE_8192			8192
+
+#define MAX_FEATURES			500
+
+struct ethnl_params {
+	const char *ifname;
+	const char *nl_family;
+	int features;
+	int xdp_idx;
+	int xdp_zc_idx;
+	int xdp_flags;
+	int xdp_zc_flags;
+	__u16 fam_id;
+};
+
+int libbpf_ethnl_get_ethtool_family_id(struct ethnl_params *param);
+int libbpf_ethnl_get_netdev_features(struct ethnl_params *param);
+int libbpf_ethnl_get_active_bits(struct ethnl_params *param);
+
+#endif /* __LIBBPF_ETHTOOL_H_ */
+
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 6909ee81113a..4f0656716eee 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -41,6 +41,7 @@ enum libbpf_errno {
 	LIBBPF_ERRNO__WRNGPID,	/* Wrong pid in netlink message */
 	LIBBPF_ERRNO__INVSEQ,	/* Invalid netlink sequence */
 	LIBBPF_ERRNO__NLPARSE,	/* netlink parsing error */
+	LIBBPF_ERRNO__INVXDP,	/* Invalid XDP data */
 	__LIBBPF_ERRNO__END,
 };
 
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 4dd73de00b6f..a5344401b842 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -6,6 +6,8 @@
 #include <unistd.h>
 #include <linux/bpf.h>
 #include <linux/rtnetlink.h>
+#include <linux/genetlink.h>
+#include <linux/if.h>
 #include <sys/socket.h>
 #include <errno.h>
 #include <time.h>
@@ -14,6 +16,7 @@
 #include "libbpf.h"
 #include "libbpf_internal.h"
 #include "nlattr.h"
+#include "ethtool.h"
 
 #ifndef SOL_NETLINK
 #define SOL_NETLINK 270
@@ -23,6 +26,11 @@ typedef int (*libbpf_dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);
 
 typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t,
 			      void *cookie);
+struct ethnl_msg {
+	struct nlmsghdr nlh;
+	struct genlmsghdr genlhdr;
+	char msg[BUF_SIZE_4096];
+};
 
 struct xdp_id_md {
 	int ifindex;
@@ -30,7 +38,7 @@ struct xdp_id_md {
 	struct xdp_link_info info;
 };
 
-static int libbpf_netlink_open(__u32 *nl_pid)
+static int libbpf_netlink_open(__u32 *nl_pid, int protocol)
 {
 	struct sockaddr_nl sa;
 	socklen_t addrlen;
@@ -40,7 +48,7 @@ static int libbpf_netlink_open(__u32 *nl_pid)
 	memset(&sa, 0, sizeof(sa));
 	sa.nl_family = AF_NETLINK;
 
-	sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
+	sock = socket(AF_NETLINK, SOCK_RAW, protocol);
 	if (sock < 0)
 		return -errno;
 
@@ -143,7 +151,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 	} req;
 	__u32 nl_pid = 0;
 
-	sock = libbpf_netlink_open(&nl_pid);
+	sock = libbpf_netlink_open(&nl_pid, NETLINK_ROUTE);
 	if (sock < 0)
 		return sock;
 
@@ -302,7 +310,7 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 	if (flags && flags & mask)
 		return -EINVAL;
 
-	sock = libbpf_netlink_open(&nl_pid);
+	sock = libbpf_netlink_open(&nl_pid, NETLINK_ROUTE);
 	if (sock < 0)
 		return sock;
 
@@ -370,3 +378,366 @@ int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 	return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
 				dump_link_nlmsg, cookie);
 }
+
+static int libbpf_ethtool_parse_feature_strings(struct nlattr *start, int elem,
+						int *xdp, int *xdp_zc)
+{
+	struct nlattr *tbs[__ETHTOOL_A_STRING_CNT + 1];
+	struct nlattr *tab[elem > 0 ? elem : 0];
+	struct libbpf_nla_policy policy[] = {
+		[ETHTOOL_A_STRING_UNSPEC] = {
+		.type = LIBBPF_NLA_UNSPEC,
+		.minlen = 0,
+		.maxlen = 0,
+		},
+		[ETHTOOL_A_STRING_INDEX] = {
+		.type = LIBBPF_NLA_U32,
+		.minlen = sizeof(uint32_t),
+		.maxlen = sizeof(uint32_t),
+		},
+		[ETHTOOL_A_STRING_VALUE] = {
+		.type = LIBBPF_NLA_STRING,
+		.minlen = 1,
+		.maxlen = ETH_GSTRING_LEN,
+		}
+	};
+	const char *f;
+	int n = 0;
+	__u32 v;
+	int ret;
+	int i;
+
+	if (!xdp || !xdp_zc || !start || elem <= 0)
+		return -EINVAL;
+
+	*xdp = -1;
+	*xdp_zc = -1;
+
+	ret = libbpf_nla_parse_table(tab, elem, start, 0, NULL);
+	if (ret)
+		goto cleanup;
+
+	for (i = 0; tab[i] && i < elem; ++i) {
+		ret = libbpf_nla_parse_nested(tbs, __ETHTOOL_A_STRING_CNT, tab[i], policy);
+		if (ret)
+			break;
+
+		if (tbs[ETHTOOL_A_STRING_INDEX] && tbs[ETHTOOL_A_STRING_VALUE]) {
+			f = libbpf_nla_getattr_str(tbs[ETHTOOL_A_STRING_VALUE]);
+			v = libbpf_nla_getattr_u32(tbs[ETHTOOL_A_STRING_INDEX]);
+
+			if (!strncmp(NETDEV_XDP_STR, f, NETDEV_XDP_LEN)) {
+				*xdp = v;
+				n++;
+			}
+
+			if (!strncmp(NETDEV_AF_XDP_ZC_STR, f, NETDEV_AF_XDP_ZC_LEN)) {
+				*xdp_zc = v;
+				n++;
+			}
+		} else {
+			ret = -LIBBPF_ERRNO__NLPARSE;
+			break;
+		}
+	}
+
+cleanup:
+	/* If error occurred return it. */
+	if (ret)
+		return ret;
+
+	/*
+	 * If zero or two xdp flags found that is okay.
+	 * Zero means older kernel without any xdp flags added.
+	 * Two means newer kernel with xdp flags added.
+	 * Both flags were added in single commit, so that
+	 * n == 1 is a faulty value.
+	 */
+	if (n == 2 || n == 0)
+		return 0;
+
+	/* If no error and one or more than 2 xdp flags found return error */
+	return -LIBBPF_ERRNO__INVXDP;
+}
+
+static int libbpf_ethnl_send(int sock, __u32 seq, __u32 nl_pid, struct ethnl_msg *req)
+{
+	ssize_t written;
+
+	req->nlh.nlmsg_pid = nl_pid;
+	req->nlh.nlmsg_seq = seq;
+
+	written = send(sock, req, req->nlh.nlmsg_len, 0);
+	if (written < 0)
+		return -errno;
+
+	if (written == req->nlh.nlmsg_len)
+		return 0;
+	else
+		return -errno;
+}
+
+static int libbpf_ethnl_validate(int len, __u16 fam_id, __u32 nl_pid, __u32 seq,
+				 struct ethnl_msg *req)
+{
+	if (!NLMSG_OK(&req->nlh, (unsigned int)len))
+		return -ENOMSG;
+
+	if (req->nlh.nlmsg_pid != nl_pid)
+		return -LIBBPF_ERRNO__WRNGPID;
+
+	if (req->nlh.nlmsg_seq != seq)
+		return -LIBBPF_ERRNO__INVSEQ;
+
+	if (req->nlh.nlmsg_type != fam_id) {
+		int ret = -ENOMSG;
+
+		if (req->nlh.nlmsg_type == NLMSG_ERROR) {
+			struct nlmsgerr *err = (struct nlmsgerr *)&req->genlhdr;
+
+			if (err->error)
+				ret = err->error;
+			libbpf_nla_dump_errormsg(&req->nlh);
+		}
+		return ret;
+	}
+
+	return 0;
+}
+
+static int libbpf_ethnl_send_recv(struct ethnl_msg *req, struct ethnl_params *param)
+{
+	__u32 nl_pid;
+	__u32 seq;
+	int sock;
+	int ret;
+	int len;
+
+	sock = libbpf_netlink_open(&nl_pid, NETLINK_GENERIC);
+	if (sock < 0) {
+		ret = sock;
+		goto cleanup;
+	}
+
+	seq = time(NULL);
+	ret = libbpf_ethnl_send(sock, seq, nl_pid, req);
+	if (ret)
+		goto cleanup;
+
+	len = recv(sock, req, sizeof(struct ethnl_msg), 0);
+	if (len < 0) {
+		ret = -errno;
+		goto cleanup;
+	}
+
+	ret = libbpf_ethnl_validate(len, param->fam_id, nl_pid, seq, req);
+	if (ret < 0)
+		goto cleanup;
+
+	ret = len;
+
+cleanup:
+	if (sock >= 0)
+		close(sock);
+
+	return ret;
+}
+
+int libbpf_ethnl_get_netdev_features(struct ethnl_params *param)
+{
+	struct ethnl_msg req = {
+		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct genlmsghdr)),
+		.nlh.nlmsg_flags = NLM_F_REQUEST,
+		.nlh.nlmsg_type = param->fam_id,
+		.nlh.nlmsg_pid = 0,
+		.genlhdr.version = ETHTOOL_GENL_VERSION,
+		.genlhdr.cmd = ETHTOOL_MSG_STRSET_GET,
+		.genlhdr.reserved = 0,
+	};
+	struct nlattr *tbn[__ETHTOOL_A_STRINGSETS_CNT + 1];
+	struct nlattr *tbnn[__ETHTOOL_A_STRINGSET_CNT + 1];
+	struct nlattr *tb[__ETHTOOL_A_STRSET_CNT + 1];
+	struct nlattr *nla, *nla_next, *nla_set;
+	int string_set = ETH_SS_FEATURES;
+	int ret;
+	int len;
+
+	memset(&req.msg, 0, BUF_SIZE_4096);
+
+	nla = (struct nlattr *)req.msg;
+	nla_next = libbpf_nla_nest_start(nla, ETHTOOL_A_STRSET_HEADER);
+	nla_next = libbpf_nla_put_str(nla_next, ETHTOOL_A_HEADER_DEV_NAME,
+				      param->ifname, IFNAMSIZ);
+	libbpf_nla_nest_end(nla, nla_next);
+
+	nla = nla_next;
+	nla_set = libbpf_nla_nest_start(nla, ETHTOOL_A_STRSET_STRINGSETS);
+	nla_next = libbpf_nla_nest_start(nla_set, ETHTOOL_A_STRINGSETS_STRINGSET);
+	nla_next = libbpf_nla_put_u32(nla_next, ETHTOOL_A_STRINGSET_ID, string_set);
+	libbpf_nla_nest_end(nla_set, nla_next);
+	libbpf_nla_nest_end(nla, nla_next);
+	if (!param->features)
+		nla_next = libbpf_nla_put_flag(nla_next, ETHTOOL_A_STRSET_COUNTS_ONLY);
+
+	req.nlh.nlmsg_len += libbpf_nla_attrs_length((struct nlattr *)req.msg, nla_next);
+
+	len = libbpf_ethnl_send_recv(&req, param);
+	if (len < 0)
+		return len;
+
+	/* set parsing error, and change if succeeded */
+	ret = -LIBBPF_ERRNO__NLPARSE;
+	nla = (struct nlattr *)req.msg;
+	len = len - NLMSG_HDRLEN - GENL_HDRLEN;
+
+	if (libbpf_nla_parse(tb, __ETHTOOL_A_STRSET_CNT, nla, len, NULL))
+		return ret;
+
+	if (!tb[ETHTOOL_A_STRSET_STRINGSETS])
+		return ret;
+
+	if (libbpf_nla_parse_nested(tbn, __ETHTOOL_A_STRINGSETS_CNT,
+				    tb[ETHTOOL_A_STRSET_STRINGSETS], NULL))
+		return ret;
+
+	if (!tbn[ETHTOOL_A_STRINGSETS_STRINGSET])
+		return ret;
+
+	if (libbpf_nla_parse_nested(tbnn, __ETHTOOL_A_STRINGSET_CNT,
+				    tbn[ETHTOOL_A_STRINGSETS_STRINGSET], NULL))
+		return ret;
+
+	if (param->features == 0) {
+		if (!tbnn[ETHTOOL_A_STRINGSET_COUNT])
+			return ret;
+
+		param->features = libbpf_nla_getattr_u32(tbnn[ETHTOOL_A_STRINGSET_COUNT]);
+
+		/* success */
+		ret = 0;
+	} else if (param->features > 0) {
+		if (!tbnn[ETHTOOL_A_STRINGSET_STRINGS])
+			return ret;
+
+		/*
+		 * Upper boundary is known, but it is input from socket stream.
+		 * Let's perform upper limit check anyway, and limit it up to
+		 * MAX_FEATURES (which is still far more than is actually needed).
+		 */
+		if (param->features > MAX_FEATURES)
+			param->features = MAX_FEATURES;
+
+		/* success if returns 0 */
+		ret = libbpf_ethtool_parse_feature_strings(tbnn[ETHTOOL_A_STRINGSET_STRINGS],
+							   param->features, &param->xdp_idx,
+							   &param->xdp_zc_idx);
+	}
+
+	return ret;
+}
+
+int libbpf_ethnl_get_ethtool_family_id(struct ethnl_params *param)
+{
+	struct ethnl_msg req = {
+		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct genlmsghdr)),
+		.nlh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+		.nlh.nlmsg_type = GENL_ID_CTRL,
+		.nlh.nlmsg_pid = 0,
+		.genlhdr.version = ETHTOOL_GENL_VERSION,
+		.genlhdr.cmd = CTRL_CMD_GETFAMILY,
+		.genlhdr.reserved = 0,
+	};
+	struct nlattr *tb[__CTRL_ATTR_MAX + 1] = {0};
+	struct nlattr *nla, *nla_next;
+	int ret = -1;
+	int len;
+
+	memset(&req.msg, 0, BUF_SIZE_4096);
+	param->fam_id = GENL_ID_CTRL;
+
+	nla = (struct nlattr *)req.msg;
+	nla_next = libbpf_nla_put_str(nla, CTRL_ATTR_FAMILY_NAME, param->nl_family, GENL_NAMSIZ);
+	req.nlh.nlmsg_len += libbpf_nla_attrs_length(nla, nla_next);
+
+	len = libbpf_ethnl_send_recv(&req, param);
+	if (len < 0)
+		return len;
+
+	/* set parsing error, and change if succeeded */
+	ret = -LIBBPF_ERRNO__NLPARSE;
+	len = len - NLMSG_HDRLEN - GENL_HDRLEN;
+	if (!libbpf_nla_parse(tb, __CTRL_ATTR_MAX, nla, len, NULL)) {
+		if (tb[CTRL_ATTR_FAMILY_ID]) {
+			param->fam_id = libbpf_nla_getattr_u16(tb[CTRL_ATTR_FAMILY_ID]);
+			ret = 0;
+		}
+	}
+
+	return ret;
+}
+
+int libbpf_ethnl_get_active_bits(struct ethnl_params *param)
+{
+	struct ethnl_msg req = {
+		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct genlmsghdr)),
+		.nlh.nlmsg_flags = NLM_F_REQUEST,
+		.nlh.nlmsg_type = param->fam_id,
+		.nlh.nlmsg_pid = 0,
+		.genlhdr.cmd = ETHTOOL_MSG_FEATURES_GET,
+		.genlhdr.version = ETHTOOL_GENL_VERSION,
+		.genlhdr.reserved = 0,
+	};
+	__u32  active[FEATURE_BITS_TO_BLOCKS(param->features)];
+	struct nlattr *tb[__ETHTOOL_A_FEATURES_CNT + 1];
+	struct nlattr *tbn[__ETHTOOL_A_BITSET_CNT + 1];
+	int flags = ETHTOOL_FLAG_COMPACT_BITSETS;
+	struct nlattr *nla, *nla_next;
+	int ret = -1;
+	int len;
+
+	memset(&req.msg, 0, BUF_SIZE_4096);
+
+	nla = (struct nlattr *)req.msg;
+	nla_next = libbpf_nla_nest_start(nla, ETHTOOL_A_FEATURES_HEADER);
+	nla_next = libbpf_nla_put_str(nla_next, ETHTOOL_A_HEADER_DEV_NAME, param->ifname, IFNAMSIZ);
+	nla_next = libbpf_nla_put_u32(nla_next, ETHTOOL_A_HEADER_FLAGS, flags);
+	libbpf_nla_nest_end(nla, nla_next);
+	req.nlh.nlmsg_len += libbpf_nla_attrs_length(nla, nla_next);
+
+	len = libbpf_ethnl_send_recv(&req, param);
+	if (len < 0)
+		return len;
+
+	ret = -LIBBPF_ERRNO__NLPARSE;
+	nla = (struct nlattr *)req.msg;
+	len = len - NLMSG_HDRLEN - GENL_HDRLEN;
+	if (libbpf_nla_parse(tb, __ETHTOOL_A_FEATURES_CNT, nla, len, NULL))
+		return ret;
+
+	if (!tb[ETHTOOL_A_FEATURES_ACTIVE])
+		return ret;
+
+	if (libbpf_nla_parse_nested(tbn, __ETHTOOL_A_BITSET_CNT,
+				    tb[ETHTOOL_A_FEATURES_ACTIVE], NULL))
+		return ret;
+
+	if (!tbn[ETHTOOL_A_BITSET_VALUE])
+		return ret;
+
+	for (unsigned int i = 0; i < FEATURE_BITS_TO_BLOCKS(param->features); ++i)
+		active[i]  = libbpf_nla_getattr_u32(tbn[ETHTOOL_A_BITSET_VALUE] + i);
+
+	/* mark successful parsing */
+	ret = 0;
+	if (FEATURE_BIT_IS_SET(active, param->xdp_idx)) {
+		param->xdp_flags = 1;
+		if (FEATURE_BIT_IS_SET(active, param->xdp_zc_idx))
+			param->xdp_zc_flags = 1;
+	} else {
+		/* zero copy without driver mode makes no sense */
+		if (FEATURE_BIT_IS_SET(active, param->xdp_zc_idx))
+			ret = -LIBBPF_ERRNO__INVXDP;
+	}
+
+	return ret;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/8] libbpf: add API to get XSK/XDP caps
  2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
                   ` (5 preceding siblings ...)
  2020-11-16  9:34 ` [PATCH 6/8] libbpf: add functions to get XSK modes alardam
@ 2020-11-16  9:34 ` alardam
  2020-11-16  9:34 ` [PATCH 8/8] samples/bpf/xdp: apply netdev XDP/XSK modes info alardam
  2020-11-16 13:25 ` [PATCH 0/8] New netdev feature flags for XDP Toke Høiland-Jørgensen
  8 siblings, 0 replies; 14+ messages in thread
From: alardam @ 2020-11-16  9:34 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Add public xsk API to read supported XDP functions of a netdev:
 - XDP driver mode (SKB, DRV),
 - XSK bind mode (COPY, ZEROCOPY).

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 tools/lib/bpf/libbpf.map |  1 +
 tools/lib/bpf/xsk.c      | 51 +++++++++++++++++++++++++++++++++++++++-
 tools/lib/bpf/xsk.h      |  3 ++-
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 29ff4807b909..0867dd078a65 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -345,4 +345,5 @@ LIBBPF_0.3.0 {
 		btf__parse_split;
 		btf__new_empty_split;
 		btf__new_split;
+		xsk_socket__get_caps;
 } LIBBPF_0.2.0;
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 7951f7ea6db3..433c58fbed68 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -15,7 +15,6 @@
 #include <arpa/inet.h>
 #include <asm/barrier.h>
 #include <linux/compiler.h>
-#include <linux/ethtool.h>
 #include <linux/filter.h>
 #include <linux/if_ether.h>
 #include <linux/if_link.h>
@@ -31,6 +30,7 @@
 #include <sys/types.h>
 
 #include "bpf.h"
+#include "ethtool.h"
 #include "libbpf.h"
 #include "libbpf_internal.h"
 #include "xsk.h"
@@ -931,3 +931,52 @@ void xsk_socket__delete(struct xsk_socket *xsk)
 		close(xsk->fd);
 	free(xsk);
 }
+
+int xsk_socket__get_caps(const char *ifname, __u32 *xdp_caps, __u16 *bind_caps)
+{
+	struct ethnl_params param;
+	int ret;
+
+	if (!xdp_caps || !bind_caps || !ifname ||
+	    (strnlen(ifname, IFNAMSIZ) >= IFNAMSIZ))
+		return -EINVAL;
+
+	param.nl_family = ETHTOOL_GENL_NAME;
+	param.xdp_zc_flags = 0;
+	param.ifname = ifname;
+	param.xdp_flags = 0;
+
+	/* First, get the netlink family id */
+	ret = libbpf_ethnl_get_ethtool_family_id(&param);
+	if (ret)
+		return ret;
+
+	/* Second, get number of features */
+	param.features = 0;
+	ret = libbpf_ethnl_get_netdev_features(&param);
+	if (ret)
+		return ret;
+
+	/* Third, get the features description */
+	ret = libbpf_ethnl_get_netdev_features(&param);
+	if (ret)
+		return ret;
+
+	*xdp_caps  = XDP_FLAGS_SKB_MODE;
+	*bind_caps = XDP_COPY;
+
+	if (param.xdp_idx == -1 || param.xdp_zc_idx == -1)
+		return 0;
+
+	/* Finally, get features flags and process it */
+	ret = libbpf_ethnl_get_active_bits(&param);
+	if (!ret) {
+		if (param.xdp_flags) {
+			*xdp_caps |= XDP_FLAGS_DRV_MODE;
+			if (param.xdp_zc_flags)
+				*bind_caps |= XDP_ZEROCOPY;
+		}
+	}
+
+	return ret;
+}
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 1069c46364ff..ae29004570b2 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -247,7 +247,8 @@ xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
 /* Returns 0 for success and -EBUSY if the umem is still in use. */
 LIBBPF_API int xsk_umem__delete(struct xsk_umem *umem);
 LIBBPF_API void xsk_socket__delete(struct xsk_socket *xsk);
-
+LIBBPF_API int xsk_socket__get_caps(const char *ifname, __u32 *xdp_caps,
+				    __u16 *bind_caps);
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 8/8] samples/bpf/xdp: apply netdev XDP/XSK modes info
  2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
                   ` (6 preceding siblings ...)
  2020-11-16  9:34 ` [PATCH 7/8] libbpf: add API to get XSK/XDP caps alardam
@ 2020-11-16  9:34 ` alardam
  2020-11-16 13:25 ` [PATCH 0/8] New netdev feature flags for XDP Toke Høiland-Jørgensen
  8 siblings, 0 replies; 14+ messages in thread
From: alardam @ 2020-11-16  9:34 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba, ast, daniel,
	netdev, davem, john.fastabend, hawk, toke
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

From: Marek Majtyka <marekx.majtyka@intel.com>

Update xdpsock sample so that it utilizes netlink ethtool interface
to get available XDP/XSK modes. This allows to automatically choose
the best available mode of operation, if these are not provided explicitly.

Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
---
 samples/bpf/xdpsock_user.c | 117 ++++++++++++++++++++++++++++++++++---
 1 file changed, 108 insertions(+), 9 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 1149e94ca32f..780e5d1d73a0 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -53,6 +53,9 @@
 
 #define DEBUG_HEXDUMP 0
 
+#define XDP_MODES		(XDP_FLAGS_SKB_MODE | XDP_FLAGS_DRV_MODE)
+#define XSK_MODES		(XDP_COPY | XDP_ZEROCOPY)
+
 typedef __u64 u64;
 typedef __u32 u32;
 typedef __u16 u16;
@@ -86,7 +89,7 @@ static u32 irq_no;
 static int irqs_at_init = -1;
 static int opt_poll;
 static int opt_interval = 1;
-static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
+static u16 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
 static u32 opt_umem_flags;
 static int opt_unaligned_chunks;
 static int opt_mmap_flags;
@@ -95,6 +98,8 @@ static int opt_timeout = 1000;
 static bool opt_need_wakeup = true;
 static u32 opt_num_xsks = 1;
 static u32 prog_id;
+static u32 xdp_caps;
+static u16 bind_caps;
 
 struct xsk_ring_stats {
 	unsigned long rx_npkts;
@@ -957,6 +962,26 @@ static void usage(const char *prog)
 	exit(EXIT_FAILURE);
 }
 
+static inline void set_drv_mode(void)
+{
+	opt_xdp_flags |= XDP_FLAGS_DRV_MODE;
+}
+
+static inline void set_skb_mode(void)
+{
+	opt_xdp_flags |= XDP_FLAGS_SKB_MODE;
+}
+
+static inline void set_zc_mode(void)
+{
+	opt_xdp_bind_flags |= XDP_ZEROCOPY;
+}
+
+static inline void set_copy_mode(void)
+{
+	opt_xdp_bind_flags |= XDP_COPY;
+}
+
 static void parse_command_line(int argc, char **argv)
 {
 	int option_index, c;
@@ -989,20 +1014,19 @@ static void parse_command_line(int argc, char **argv)
 			opt_poll = 1;
 			break;
 		case 'S':
-			opt_xdp_flags |= XDP_FLAGS_SKB_MODE;
-			opt_xdp_bind_flags |= XDP_COPY;
+			set_skb_mode();
 			break;
 		case 'N':
-			/* default, set below */
+			set_drv_mode();
 			break;
 		case 'n':
 			opt_interval = atoi(optarg);
 			break;
 		case 'z':
-			opt_xdp_bind_flags |= XDP_ZEROCOPY;
+			set_zc_mode();
 			break;
 		case 'c':
-			opt_xdp_bind_flags |= XDP_COPY;
+			set_copy_mode();
 			break;
 		case 'u':
 			opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNK_FLAG;
@@ -1069,9 +1093,6 @@ static void parse_command_line(int argc, char **argv)
 		}
 	}
 
-	if (!(opt_xdp_flags & XDP_FLAGS_SKB_MODE))
-		opt_xdp_flags |= XDP_FLAGS_DRV_MODE;
-
 	opt_ifindex = if_nametoindex(opt_if);
 	if (!opt_ifindex) {
 		fprintf(stderr, "ERROR: interface \"%s\" does not exist\n",
@@ -1461,6 +1482,76 @@ static void enter_xsks_into_map(struct bpf_object *obj)
 	}
 }
 
+static inline u32 xdp_mode_not_set(void)
+{
+	return (opt_xdp_flags & XDP_MODES) == 0;
+}
+
+static inline u16 bind_mode_not_set(void)
+{
+	return (opt_xdp_bind_flags & XSK_MODES) == 0;
+}
+
+static inline u16 zc_mode_set(void)
+{
+	return opt_xdp_bind_flags & XDP_ZEROCOPY;
+}
+
+static inline u32 drv_mode_set(void)
+{
+	return opt_xdp_flags & XDP_FLAGS_DRV_MODE;
+}
+
+static inline u16 zc_mode_available(void)
+{
+	return bind_caps & XDP_ZEROCOPY;
+}
+
+static inline u32 drv_mode_available(void)
+{
+	return xdp_caps & XDP_FLAGS_DRV_MODE;
+}
+
+static void set_xsk_default_flags(void)
+{
+	if (drv_mode_available()) {
+		set_drv_mode();
+
+		if (zc_mode_available())
+			set_zc_mode();
+		else
+			set_copy_mode();
+	} else {
+		set_skb_mode();
+		set_copy_mode();
+	}
+}
+
+static void adjust_missing_flags(void)
+{
+	if (xdp_mode_not_set()) {
+		if (bind_mode_not_set()) {
+			set_xsk_default_flags();
+		} else {
+			if (zc_mode_set()) {
+				set_drv_mode();
+			} else {
+				if (drv_mode_available())
+					set_drv_mode();
+				else
+					set_skb_mode();
+			}
+		}
+	} else {
+		if (bind_mode_not_set()) {
+			if (drv_mode_set() && zc_mode_available())
+				set_zc_mode();
+			else
+				set_copy_mode();
+		}
+	}
+}
+
 int main(int argc, char **argv)
 {
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
@@ -1473,6 +1564,14 @@ int main(int argc, char **argv)
 
 	parse_command_line(argc, argv);
 
+	ret = xsk_socket__get_caps(opt_if, &xdp_caps, &bind_caps);
+	if (ret) {
+		fprintf(stderr, "ERROR: xsk_socket__get_caps\n");
+		exit(EXIT_FAILURE);
+	}
+
+	adjust_missing_flags();
+
 	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
 		fprintf(stderr, "ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n",
 			strerror(errno));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/8] New netdev feature flags for XDP
  2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
                   ` (7 preceding siblings ...)
  2020-11-16  9:34 ` [PATCH 8/8] samples/bpf/xdp: apply netdev XDP/XSK modes info alardam
@ 2020-11-16 13:25 ` Toke Høiland-Jørgensen
  2020-11-17  7:37   ` [Intel-wired-lan] " Magnus Karlsson
  8 siblings, 1 reply; 14+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-11-16 13:25 UTC (permalink / raw)
  To: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, daniel, netdev, davem, john.fastabend, hawk
  Cc: maciej.fijalkowski, jonathan.lemon, bpf, jeffrey.t.kirsher,
	maciejromanfijalkowski, intel-wired-lan, Marek Majtyka

alardam@gmail.com writes:

> From: Marek Majtyka <marekx.majtyka@intel.com>
>
> Implement support for checking if a netdev has native XDP and AF_XDP zero
> copy support. Previously, there was no way to do this other than to try
> to create an AF_XDP socket on the interface or load an XDP program and
> see if it worked. This commit changes this by extending existing
> netdev_features in the following way:
>  * xdp        - full XDP support (XDP_{TX, PASS, DROP, ABORT, REDIRECT})
>  * af-xdp-zc  - AF_XDP zero copy support
> NICs supporting these features are updated by turning the corresponding
> netdev feature flags on.

Thank you for working on this! The lack of a way to discover whether an
interface supports XDP is really annoying.

However, I don't think just having two separate netdev feature flags for
XDP and AF_XDP is going to cut it. Whatever mechanism we end up will
need to be able to express at least the following, in addition to your
two flags:

- Which return codes does it support (with DROP/PASS, TX and REDIRECT as
  separate options)?
- Does this interface be used as a target for XDP_REDIRECT
  (supported/supported but not enabled)?
- Does the interface support offloaded XDP?

That's already five or six more flags, and we can't rule out that we'll
need more; so I'm not sure if just defining feature bits for all of them
is a good idea.

In addition, we should be able to check this in a way so we can reject
XDP programs that use features that are not supported. E.g., program
uses REDIRECT return code (or helper), but the interface doesn't support
it? Reject at attach/load time! Or the user attempts to insert an
interface into a redirect map, but that interface doesn't implement
ndo_xdp_xmit()? Reject the insert! Etc.

That last bit can be added later, of course, but we need to make sure we
design the support in a way that it is possible to do so...

-Toke


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/8] libbpf: add functions to get XSK modes
  2020-11-16  9:34 ` [PATCH 6/8] libbpf: add functions to get XSK modes alardam
@ 2020-11-17  7:05   ` kernel test robot
  0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2020-11-17  7:05 UTC (permalink / raw)
  To: alardam, magnus.karlsson, bjorn.topel, andrii.nakryiko, kuba,
	ast, daniel, netdev, davem, john.fastabend, hawk
  Cc: kbuild-all, clang-built-linux

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]
[also build test ERROR on bpf/master jkirsher-next-queue/dev-queue linus/master v5.10-rc4 next-20201116]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/alardam-gmail-com/New-netdev-feature-flags-for-XDP/20201116-173655
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-a015-20201116 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project ace9653c11c6308401dcda2e8b26bf97e6e66e30)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/fd9c1373aeb0b1aad73c9660e2d8457d3cfed55f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review alardam-gmail-com/New-netdev-feature-flags-for-XDP/20201116-173655
        git checkout fd9c1373aeb0b1aad73c9660e2d8457d3cfed55f
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from netlink.c:19:
>> ./ethtool.h:12:10: fatal error: 'linux/ethtool_netlink.h' file not found
   #include <linux/ethtool_netlink.h>
            ^~~~~~~~~~~~~~~~~~~~~~~~~
   1 error generated.
   make[6]: *** [tools/build/Makefile.build:97: kernel/bpf/preload/staticobjs/netlink.o] Error 1
   make[6]: Target '__build' not remade because of errors.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33138 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/8] New netdev feature flags for XDP
  2020-11-16 13:25 ` [PATCH 0/8] New netdev feature flags for XDP Toke Høiland-Jørgensen
@ 2020-11-17  7:37   ` Magnus Karlsson
  2020-11-17  8:55     ` Marek Majtyka
  0 siblings, 1 reply; 14+ messages in thread
From: Magnus Karlsson @ 2020-11-17  7:37 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Marek Majtyka, Karlsson, Magnus, Björn Töpel,
	Andrii Nakryiko, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Network Development, David S. Miller,
	John Fastabend, hawk, Maciej Fijalkowski, Marek Majtyka,
	intel-wired-lan, Jonathan Lemon, bpf

On Mon, Nov 16, 2020 at 2:25 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> alardam@gmail.com writes:
>
> > From: Marek Majtyka <marekx.majtyka@intel.com>
> >
> > Implement support for checking if a netdev has native XDP and AF_XDP zero
> > copy support. Previously, there was no way to do this other than to try
> > to create an AF_XDP socket on the interface or load an XDP program and
> > see if it worked. This commit changes this by extending existing
> > netdev_features in the following way:
> >  * xdp        - full XDP support (XDP_{TX, PASS, DROP, ABORT, REDIRECT})
> >  * af-xdp-zc  - AF_XDP zero copy support
> > NICs supporting these features are updated by turning the corresponding
> > netdev feature flags on.
>
> Thank you for working on this! The lack of a way to discover whether an
> interface supports XDP is really annoying.
>
> However, I don't think just having two separate netdev feature flags for
> XDP and AF_XDP is going to cut it. Whatever mechanism we end up will
> need to be able to express at least the following, in addition to your
> two flags:
>
> - Which return codes does it support (with DROP/PASS, TX and REDIRECT as
>   separate options)?
> - Does this interface be used as a target for XDP_REDIRECT
>   (supported/supported but not enabled)?
> - Does the interface support offloaded XDP?

If we want feature discovery on this level, which seems to be a good
idea and goal to have, then it is a dead end to bunch all XDP features
into one. But fortunately, this can easily be addressed.

> That's already five or six more flags, and we can't rule out that we'll
> need more; so I'm not sure if just defining feature bits for all of them
> is a good idea.

I think this is an important question. Is extending the netdev
features flags the right way to go? If not, is there some other
interface in the kernel that could be used/extended for this? If none
of these are possible, then we (unfortunately) need a new interface
and in that case, what should it look like?

Thanks for taking a look at this Toke.

> In addition, we should be able to check this in a way so we can reject
> XDP programs that use features that are not supported. E.g., program
> uses REDIRECT return code (or helper), but the interface doesn't support
> it? Reject at attach/load time! Or the user attempts to insert an
> interface into a redirect map, but that interface doesn't implement
> ndo_xdp_xmit()? Reject the insert! Etc.
>
> That last bit can be added later, of course, but we need to make sure we
> design the support in a way that it is possible to do so...
>
> -Toke
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/8] New netdev feature flags for XDP
  2020-11-17  7:37   ` [Intel-wired-lan] " Magnus Karlsson
@ 2020-11-17  8:55     ` Marek Majtyka
  2020-11-17 18:38       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Majtyka @ 2020-11-17  8:55 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Toke Høiland-Jørgensen, Karlsson, Magnus,
	Björn Töpel, Andrii Nakryiko, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Network Development,
	David S. Miller, John Fastabend, hawk, Maciej Fijalkowski,
	Marek Majtyka, intel-wired-lan, Jonathan Lemon, bpf

On Tue, Nov 17, 2020 at 8:37 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:

Thank you for your quick answers and comments.

>
> On Mon, Nov 16, 2020 at 2:25 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > alardam@gmail.com writes:
> >
> > > From: Marek Majtyka <marekx.majtyka@intel.com>
> > >
> > > Implement support for checking if a netdev has native XDP and AF_XDP zero
> > > copy support. Previously, there was no way to do this other than to try
> > > to create an AF_XDP socket on the interface or load an XDP program and
> > > see if it worked. This commit changes this by extending existing
> > > netdev_features in the following way:
> > >  * xdp        - full XDP support (XDP_{TX, PASS, DROP, ABORT, REDIRECT})
> > >  * af-xdp-zc  - AF_XDP zero copy support
> > > NICs supporting these features are updated by turning the corresponding
> > > netdev feature flags on.
> >
> > Thank you for working on this! The lack of a way to discover whether an
> > interface supports XDP is really annoying.
> >
> > However, I don't think just having two separate netdev feature flags for
> > XDP and AF_XDP is going to cut it. Whatever mechanism we end up will
> > need to be able to express at least the following, in addition to your
> > two flags:
> >
> > - Which return codes does it support (with DROP/PASS, TX and REDIRECT as
> >   separate options)?
> > - Does this interface be used as a target for XDP_REDIRECT
> >   (supported/supported but not enabled)?
> > - Does the interface support offloaded XDP?
>
> If we want feature discovery on this level, which seems to be a good
> idea and goal to have, then it is a dead end to bunch all XDP features
> into one. But fortunately, this can easily be addressed.

Do you think that is it still considerable to have a single netdev
flag that means "some" XDP feature support which would activate new
further functionalities?

>
> > That's already five or six more flags, and we can't rule out that we'll
> > need more; so I'm not sure if just defining feature bits for all of them
> > is a good idea.
>
> I think this is an important question. Is extending the netdev
> features flags the right way to go? If not, is there some other
> interface in the kernel that could be used/extended for this? If none
> of these are possible, then we (unfortunately) need a new interface
> and in that case, what should it look like?

Toke, are you thinking about any particular existing interface or a
new specific one?

>
> Thanks for taking a look at this Toke.
>
> > In addition, we should be able to check this in a way so we can reject
> > XDP programs that use features that are not supported. E.g., program
> > uses REDIRECT return code (or helper), but the interface doesn't support
> > it? Reject at attach/load time! Or the user attempts to insert an
> > interface into a redirect map, but that interface doesn't implement
> > ndo_xdp_xmit()? Reject the insert! Etc.
> >
> > That last bit can be added later, of course, but we need to make sure we
> > design the support in a way that it is possible to do so...
> >
> > -Toke
> >
> > _______________________________________________
> > Intel-wired-lan mailing list
> > Intel-wired-lan@osuosl.org
> > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/8] New netdev feature flags for XDP
  2020-11-17  8:55     ` Marek Majtyka
@ 2020-11-17 18:38       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 14+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-11-17 18:38 UTC (permalink / raw)
  To: Marek Majtyka, Magnus Karlsson
  Cc: Karlsson, Magnus, Björn Töpel, Andrii Nakryiko,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Network Development, David S. Miller, John Fastabend, hawk,
	Maciej Fijalkowski, Marek Majtyka, intel-wired-lan,
	Jonathan Lemon, bpf

Marek Majtyka <alardam@gmail.com> writes:

> On Tue, Nov 17, 2020 at 8:37 AM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
>
> Thank you for your quick answers and comments.
>
>>
>> On Mon, Nov 16, 2020 at 2:25 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >
>> > alardam@gmail.com writes:
>> >
>> > > From: Marek Majtyka <marekx.majtyka@intel.com>
>> > >
>> > > Implement support for checking if a netdev has native XDP and AF_XDP zero
>> > > copy support. Previously, there was no way to do this other than to try
>> > > to create an AF_XDP socket on the interface or load an XDP program and
>> > > see if it worked. This commit changes this by extending existing
>> > > netdev_features in the following way:
>> > >  * xdp        - full XDP support (XDP_{TX, PASS, DROP, ABORT, REDIRECT})
>> > >  * af-xdp-zc  - AF_XDP zero copy support
>> > > NICs supporting these features are updated by turning the corresponding
>> > > netdev feature flags on.
>> >
>> > Thank you for working on this! The lack of a way to discover whether an
>> > interface supports XDP is really annoying.
>> >
>> > However, I don't think just having two separate netdev feature flags for
>> > XDP and AF_XDP is going to cut it. Whatever mechanism we end up will
>> > need to be able to express at least the following, in addition to your
>> > two flags:
>> >
>> > - Which return codes does it support (with DROP/PASS, TX and REDIRECT as
>> >   separate options)?
>> > - Does this interface be used as a target for XDP_REDIRECT
>> >   (supported/supported but not enabled)?
>> > - Does the interface support offloaded XDP?
>>
>> If we want feature discovery on this level, which seems to be a good
>> idea and goal to have, then it is a dead end to bunch all XDP features
>> into one. But fortunately, this can easily be addressed.
>
> Do you think that is it still considerable to have a single netdev
> flag that means "some" XDP feature support which would activate new
> further functionalities?

Why bother? The presence of any XDP-specific feature bits would imply
the support for XDP :)

>> > That's already five or six more flags, and we can't rule out that we'll
>> > need more; so I'm not sure if just defining feature bits for all of them
>> > is a good idea.
>>
>> I think this is an important question. Is extending the netdev
>> features flags the right way to go? If not, is there some other
>> interface in the kernel that could be used/extended for this? If none
>> of these are possible, then we (unfortunately) need a new interface
>> and in that case, what should it look like?
>
> Toke, are you thinking about any particular existing interface or a
> new specific one?

I have mostly been thinking about the internal kernel interface. The
simple thing would just be to define a whole new bitmap of XDP-specific
feature bits that the rest of the kernel can consume. That would also
mean we don't have to do pointer chasing to see if the ndos are
implemented, which Jesper pointed out the other day actually shows up on
his profiling traces.

The downside to having them be feature flags is that they can get out of
sync, of course. But if we block the support from working unless the
right flags are set, that should at least make driver developers pay
attention. Although we'd have to change all the drivers in one go, but I
suppose that's not too onerous seeing as you just did that for this
series :)

So what that boils down to is basically what you're doing in this
series, but more fine grained, via a new netdev->xdp_features instead of
burning bits in netdev->features.

As for UAPI, i dunno? Ethtool is netlink now, right? So it should be
fairly easy to just extend with a new attribute for XDP?

I believe there was originally some resistance to explicitly exposing
XDP capabilities to userspace because we wanted all drivers to implement
all features. Clearly that has not panned out, though, so as far as I'm
concerned we can just expose it and be done with it :) But I'll let
others weigh in here; the original discussions predate my involvement.

-Toke


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-11-17 18:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16  9:34 [PATCH 0/8] New netdev feature flags for XDP alardam
2020-11-16  9:34 ` [PATCH 1/8] net: ethtool: extend netdev_features flag set alardam
2020-11-16  9:34 ` [PATCH 2/8] drivers/net: turn XDP flags on alardam
2020-11-16  9:34 ` [PATCH 3/8] xsk: add usage of xdp netdev_features flags alardam
2020-11-16  9:34 ` [PATCH 4/8] xsk: add check for full support of XDP in bind alardam
2020-11-16  9:34 ` [PATCH 5/8] libbpf: extend netlink attribute API alardam
2020-11-16  9:34 ` [PATCH 6/8] libbpf: add functions to get XSK modes alardam
2020-11-17  7:05   ` kernel test robot
2020-11-16  9:34 ` [PATCH 7/8] libbpf: add API to get XSK/XDP caps alardam
2020-11-16  9:34 ` [PATCH 8/8] samples/bpf/xdp: apply netdev XDP/XSK modes info alardam
2020-11-16 13:25 ` [PATCH 0/8] New netdev feature flags for XDP Toke Høiland-Jørgensen
2020-11-17  7:37   ` [Intel-wired-lan] " Magnus Karlsson
2020-11-17  8:55     ` Marek Majtyka
2020-11-17 18:38       ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).