netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests
@ 2022-06-16 18:05 Maciej Fijalkowski
  2022-06-16 18:06 ` [PATCH v4 bpf-next 01/10] ice: compress branches in ice_set_features() Maciej Fijalkowski
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:05 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

v3->v4:
* use ice_{down,up} rather than ice_{stop,open} and check retvals
  when toggling loopback mode (Jakub)
* Remove patch that was throwing away xsk->outstanding_tx (Magnus)

v2->v3:
* split loopback patch to ice_set_features() refactor part and other
  part with actual loopback toggling support (Alexandr)
* collect more acks from Magnus

v1->v2:
* collect acks from Magnus
* drop redundant 'ret' variable in patch 4 (Magnus)
* drop redundant assignments to ifobject->xdp_flags in patch 10 (Magnus)
* use NETIF_F_LOOPBACK instead of introducing priv-flag (Alexandr)

Hi!

This set makes it possible to test ice driver with test suite that
xdpxceiver provides. In order to do it, device needs to be turned on to
loopback mode, so in first patch ice_set_features() is refactored and
then in second patch NETIF_F_LOOPBACK support is added to it.
Furthermore, ethtool's loopback test is fixed for ice (patch 3 and 4).
Besides turning device into loopback mode, I was limiting queue count to
1, so that none of the steering filters are needed.

Rest of the work is related to xdpxceiver. Currently it is assumed that
underlying device supports native XDP. It is true for veth, but might
not be for other device that might be used with xdpxceiver once this
patch set land. So, patch 4 adds a logic to find out if underlying
device supports XDP so that TEST_MODE_DRV can be used for test suite.
Patch 6 restores close() on netns fd that was removed by accident.

In patch 7, default Rx pkt stream is added so physical device testing
will be able to use shared UMEM in a way that Tx will utilize first half
of buffer space and Rx second one. Then, patch 8 adds support for running
xdpxceiver on physical devices.

Patch 9 prepares xdpxceiver to address ZC drivers that clean lazily Tx
descriptors (and therefore produce XSK descs to CQ) so that pacing
algorithm works fine.

Patch 10 finally adds new TEST_MODE_ZC for testing zero copy AF_XDP
driver support.

This work already allowed us to spot and fix two bugs in AF_XDP kernel
side ([0], [1]).

v1 is here [2].
v2 is here [3].
v3 is here [4].

[0]: https://lore.kernel.org/bpf/20220425153745.481322-1-maciej.fijalkowski@intel.com/
[1]: https://lore.kernel.org/bpf/20220607142200.576735-1-maciej.fijalkowski@intel.com/
[2]: https://lore.kernel.org/bpf/20220610150923.583202-1-maciej.fijalkowski@intel.com/
[3]: https://lore.kernel.org/bpf/20220614174749.901044-1-maciej.fijalkowski@intel.com/
[4]: https://lore.kernel.org/bpf/20220615161041.902916-1-maciej.fijalkowski@intel.com/

Thank you.


Maciej Fijalkowski (10):
  ice: compress branches in ice_set_features()
  ice: allow toggling loopback mode via ndo_set_features callback
  ice: check DD bit on Rx descriptor rather than (EOP | RS)
  ice: do not setup vlan for loopback VSI
  selftests: xsk: query for native XDP support
  selftests: xsk: add missing close() on netns fd
  selftests: xsk: introduce default Rx pkt stream
  selftests: xsk: add support for executing tests on physical device
  selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion()
  selftests: xsk: add support for zero copy testing

 drivers/net/ethernet/intel/ice/ice_ethtool.c |   2 +-
 drivers/net/ethernet/intel/ice/ice_main.c    |  81 ++--
 tools/testing/selftests/bpf/test_xsk.sh      |  52 ++-
 tools/testing/selftests/bpf/xdpxceiver.c     | 380 ++++++++++++++-----
 tools/testing/selftests/bpf/xdpxceiver.h     |   9 +-
 5 files changed, 378 insertions(+), 146 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 01/10] ice: compress branches in ice_set_features()
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  2022-06-18  1:46   ` John Fastabend
  2022-06-16 18:06 ` [PATCH v4 bpf-next 02/10] ice: allow toggling loopback mode via ndo_set_features callback Maciej Fijalkowski
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski,
	Alexandr Lobakin

Instead of rather verbose comparison of current netdev->features bits vs
the incoming ones from user, let us compress them by a helper features
set that will be the result of netdev->features XOR features. This way,
current, extensive branches:

	if (features & NETIF_F_BIT && !(netdev->features & NETIF_F_BIT))
		set_feature(true);
	else if (!(features & NETIF_F_BIT) && netdev->features & NETIF_F_BIT)
		set_feature(false);

can become:

	netdev_features_t changed = netdev->features ^ features;

	if (changed & NETIF_F_BIT)
		set_feature(!!(features & NETIF_F_BIT));

This is nothing new as currently several other drivers use this
approach, which I find much more convenient.

CC: Alexandr Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 40 +++++++++++------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index e1cae253412c..23d1b1fc39fb 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5910,44 +5910,41 @@ ice_set_vlan_features(struct net_device *netdev, netdev_features_t features)
 static int
 ice_set_features(struct net_device *netdev, netdev_features_t features)
 {
+	netdev_features_t changed = netdev->features ^ features;
 	struct ice_netdev_priv *np = netdev_priv(netdev);
 	struct ice_vsi *vsi = np->vsi;
 	struct ice_pf *pf = vsi->back;
 	int ret = 0;
 
 	/* Don't set any netdev advanced features with device in Safe Mode */
-	if (ice_is_safe_mode(vsi->back)) {
-		dev_err(ice_pf_to_dev(vsi->back), "Device is in Safe Mode - not enabling advanced netdev features\n");
+	if (ice_is_safe_mode(pf)) {
+		dev_err(ice_pf_to_dev(vsi->back),
+			"Device is in Safe Mode - not enabling advanced netdev features\n");
 		return ret;
 	}
 
 	/* Do not change setting during reset */
 	if (ice_is_reset_in_progress(pf->state)) {
-		dev_err(ice_pf_to_dev(vsi->back), "Device is resetting, changing advanced netdev features temporarily unavailable.\n");
+		dev_err(ice_pf_to_dev(pf),
+			"Device is resetting, changing advanced netdev features temporarily unavailable.\n");
 		return -EBUSY;
 	}
 
 	/* Multiple features can be changed in one call so keep features in
 	 * separate if/else statements to guarantee each feature is checked
 	 */
-	if (features & NETIF_F_RXHASH && !(netdev->features & NETIF_F_RXHASH))
-		ice_vsi_manage_rss_lut(vsi, true);
-	else if (!(features & NETIF_F_RXHASH) &&
-		 netdev->features & NETIF_F_RXHASH)
-		ice_vsi_manage_rss_lut(vsi, false);
+	if (changed & NETIF_F_RXHASH)
+		ice_vsi_manage_rss_lut(vsi, !!(features & NETIF_F_RXHASH));
 
 	ret = ice_set_vlan_features(netdev, features);
 	if (ret)
 		return ret;
 
-	if ((features & NETIF_F_NTUPLE) &&
-	    !(netdev->features & NETIF_F_NTUPLE)) {
-		ice_vsi_manage_fdir(vsi, true);
-		ice_init_arfs(vsi);
-	} else if (!(features & NETIF_F_NTUPLE) &&
-		 (netdev->features & NETIF_F_NTUPLE)) {
-		ice_vsi_manage_fdir(vsi, false);
-		ice_clear_arfs(vsi);
+	if (changed & NETIF_F_NTUPLE) {
+		bool ena = !!(features & NETIF_F_NTUPLE);
+
+		ice_vsi_manage_fdir(vsi, ena);
+		ena ? ice_init_arfs(vsi) : ice_clear_arfs(vsi);
 	}
 
 	/* don't turn off hw_tc_offload when ADQ is already enabled */
@@ -5956,11 +5953,12 @@ ice_set_features(struct net_device *netdev, netdev_features_t features)
 		return -EACCES;
 	}
 
-	if ((features & NETIF_F_HW_TC) &&
-	    !(netdev->features & NETIF_F_HW_TC))
-		set_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
-	else
-		clear_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
+	if (changed & NETIF_F_HW_TC) {
+		bool ena = !!(features & NETIF_F_HW_TC);
+
+		ena ? set_bit(ICE_FLAG_CLS_FLOWER, pf->flags) :
+		      clear_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
+	}
 
 	return 0;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 02/10] ice: allow toggling loopback mode via ndo_set_features callback
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
  2022-06-16 18:06 ` [PATCH v4 bpf-next 01/10] ice: compress branches in ice_set_features() Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  2022-06-16 18:21   ` Jakub Kicinski
  2022-06-18  1:57   ` John Fastabend
  2022-06-16 18:06 ` [PATCH v4 bpf-next 03/10] ice: check DD bit on Rx descriptor rather than (EOP | RS) Maciej Fijalkowski
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski,
	Alexandr Lobakin

Add support for NETIF_F_LOOPBACK. This feature can be set via:
$ ethtool -K eth0 loopback <on|off>

Feature can be useful for local data path tests.

CC: Alexandr Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 33 ++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 23d1b1fc39fb..5bdd515142ec 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3358,6 +3358,7 @@ static void ice_set_netdev_features(struct net_device *netdev)
 	netdev->features |= netdev->hw_features;
 
 	netdev->hw_features |= NETIF_F_HW_TC;
+	netdev->hw_features |= NETIF_F_LOOPBACK;
 
 	/* encap and VLAN devices inherit default, csumo and tso features */
 	netdev->hw_enc_features |= dflt_features | csumo_features |
@@ -5902,6 +5903,33 @@ ice_set_vlan_features(struct net_device *netdev, netdev_features_t features)
 	return 0;
 }
 
+/**
+ * ice_set_loopback - turn on/off loopback mode on underlying PF
+ * @vsi: ptr to VSI
+ * @ena: flag to indicate the on/off setting
+ */
+static int
+ice_set_loopback(struct ice_vsi *vsi, bool ena)
+{
+	bool if_running = netif_running(vsi->netdev);
+	int ret;
+
+	if (if_running && !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) {
+		ret = ice_down(vsi);
+		if (ret) {
+			netdev_err(vsi->netdev, "Preparing device to toggle loopback failed\n");
+			return ret;
+		}
+	}
+	ret = ice_aq_set_mac_loopback(&vsi->back->hw, ena, NULL);
+	if (ret)
+		netdev_err(vsi->netdev, "Failed to toggle loopback state\n");
+	if (if_running)
+		ret = ice_up(vsi);
+
+	return ret;
+}
+
 /**
  * ice_set_features - set the netdev feature flags
  * @netdev: ptr to the netdev being adjusted
@@ -5960,7 +5988,10 @@ ice_set_features(struct net_device *netdev, netdev_features_t features)
 		      clear_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
 	}
 
-	return 0;
+	if (changed & NETIF_F_LOOPBACK)
+		ret = ice_set_loopback(vsi, !!(features & NETIF_F_LOOPBACK));
+
+	return ret;
 }
 
 /**
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 03/10] ice: check DD bit on Rx descriptor rather than (EOP | RS)
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
  2022-06-16 18:06 ` [PATCH v4 bpf-next 01/10] ice: compress branches in ice_set_features() Maciej Fijalkowski
  2022-06-16 18:06 ` [PATCH v4 bpf-next 02/10] ice: allow toggling loopback mode via ndo_set_features callback Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  2022-06-18  1:58   ` John Fastabend
  2022-06-16 18:06 ` [PATCH v4 bpf-next 04/10] ice: do not setup vlan for loopback VSI Maciej Fijalkowski
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Tx side sets EOP and RS bits on descriptors to indicate that a
particular descriptor is the last one and needs to generate an irq when
it was sent. These bits should not be checked on completion path
regardless whether it's the Tx or the Rx. DD bit serves this purpose and
it indicates that a particular descriptor is either for Rx or was
successfully Txed.

Look at DD bit being set in ice_lbtest_receive_frames() instead of EOP
and RS pair.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 1e71b70f0e52..b6275a29fa0d 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -658,7 +658,7 @@ static int ice_lbtest_receive_frames(struct ice_rx_ring *rx_ring)
 		rx_desc = ICE_RX_DESC(rx_ring, i);
 
 		if (!(rx_desc->wb.status_error0 &
-		    cpu_to_le16(ICE_TX_DESC_CMD_EOP | ICE_TX_DESC_CMD_RS)))
+		    cpu_to_le16(BIT(ICE_RX_FLEX_DESC_STATUS0_DD_S))))
 			continue;
 
 		rx_buf = &rx_ring->rx_buf[i];
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 04/10] ice: do not setup vlan for loopback VSI
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
                   ` (2 preceding siblings ...)
  2022-06-16 18:06 ` [PATCH v4 bpf-next 03/10] ice: check DD bit on Rx descriptor rather than (EOP | RS) Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  2022-06-18  2:01   ` John Fastabend
  2022-06-16 18:06 ` [PATCH v4 bpf-next 05/10] selftests: xsk: query for native XDP support Maciej Fijalkowski
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Currently loopback test is failiing due to the error returned from
ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 5bdd515142ec..882f8e280317 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6026,10 +6026,12 @@ int ice_vsi_cfg(struct ice_vsi *vsi)
 	if (vsi->netdev) {
 		ice_set_rx_mode(vsi->netdev);
 
-		err = ice_vsi_vlan_setup(vsi);
+		if (vsi->type != ICE_VSI_LB) {
+			err = ice_vsi_vlan_setup(vsi);
 
-		if (err)
-			return err;
+			if (err)
+				return err;
+		}
 	}
 	ice_vsi_cfg_dcb_rings(vsi);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 05/10] selftests: xsk: query for native XDP support
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
                   ` (3 preceding siblings ...)
  2022-06-16 18:06 ` [PATCH v4 bpf-next 04/10] ice: do not setup vlan for loopback VSI Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  2022-06-18  2:12   ` John Fastabend
  2022-06-16 18:06 ` [PATCH v4 bpf-next 06/10] selftests: xsk: add missing close() on netns fd Maciej Fijalkowski
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Currently, xdpxceiver assumes that underlying device supports XDP in
native mode - it is fine by now since tests can run only on a veth pair.
Future commit is going to allow running test suite against physical
devices, so let us query the device if it is capable of running XDP
programs in native mode. This way xdpxceiver will not try to run
TEST_MODE_DRV if device being tested is not supporting it.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/testing/selftests/bpf/xdpxceiver.c | 36 ++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
index e5992a6b5e09..a1e410f6a5d8 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.c
+++ b/tools/testing/selftests/bpf/xdpxceiver.c
@@ -98,6 +98,8 @@
 #include <unistd.h>
 #include <stdatomic.h>
 #include <bpf/xsk.h>
+#include <bpf/bpf.h>
+#include <linux/filter.h>
 #include "xdpxceiver.h"
 #include "../kselftest.h"
 
@@ -1605,10 +1607,37 @@ static void ifobject_delete(struct ifobject *ifobj)
 	free(ifobj);
 }
 
+static bool is_xdp_supported(struct ifobject *ifobject)
+{
+	int flags = XDP_FLAGS_DRV_MODE;
+
+	LIBBPF_OPTS(bpf_link_create_opts, opts, .flags = flags);
+	struct bpf_insn insns[2] = {
+		BPF_MOV64_IMM(BPF_REG_0, XDP_PASS),
+		BPF_EXIT_INSN()
+	};
+	int ifindex = if_nametoindex(ifobject->ifname);
+	int prog_fd, insn_cnt = ARRAY_SIZE(insns);
+	int err;
+
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, insn_cnt, NULL);
+	if (prog_fd < 0)
+		return false;
+
+	err = bpf_xdp_attach(ifindex, prog_fd, flags, NULL);
+	if (err)
+		return false;
+
+	bpf_xdp_detach(ifindex, flags, NULL);
+
+	return true;
+}
+
 int main(int argc, char **argv)
 {
 	struct pkt_stream *pkt_stream_default;
 	struct ifobject *ifobj_tx, *ifobj_rx;
+	int modes = TEST_MODE_SKB + 1;
 	u32 i, j, failed_tests = 0;
 	struct test_spec test;
 
@@ -1636,15 +1665,18 @@ int main(int argc, char **argv)
 	init_iface(ifobj_rx, MAC2, MAC1, IP2, IP1, UDP_PORT2, UDP_PORT1,
 		   worker_testapp_validate_rx);
 
+	if (is_xdp_supported(ifobj_tx))
+		modes++;
+
 	test_spec_init(&test, ifobj_tx, ifobj_rx, 0);
 	pkt_stream_default = pkt_stream_generate(ifobj_tx->umem, DEFAULT_PKT_CNT, PKT_SIZE);
 	if (!pkt_stream_default)
 		exit_with_error(ENOMEM);
 	test.pkt_stream_default = pkt_stream_default;
 
-	ksft_set_plan(TEST_MODE_MAX * TEST_TYPE_MAX);
+	ksft_set_plan(modes * TEST_TYPE_MAX);
 
-	for (i = 0; i < TEST_MODE_MAX; i++)
+	for (i = 0; i < modes; i++)
 		for (j = 0; j < TEST_TYPE_MAX; j++) {
 			test_spec_init(&test, ifobj_tx, ifobj_rx, i);
 			run_pkt_test(&test, i, j);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 06/10] selftests: xsk: add missing close() on netns fd
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
                   ` (4 preceding siblings ...)
  2022-06-16 18:06 ` [PATCH v4 bpf-next 05/10] selftests: xsk: query for native XDP support Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  2022-06-18  2:14   ` John Fastabend
  2022-06-16 18:06 ` [PATCH v4 bpf-next 07/10] selftests: xsk: introduce default Rx pkt stream Maciej Fijalkowski
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Commit 1034b03e54ac ("selftests: xsk: Simplify cleanup of ifobjects")
removed close on netns fd, which is not correct, so let us restore it.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/testing/selftests/bpf/xdpxceiver.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
index a1e410f6a5d8..81ad69ed5839 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.c
+++ b/tools/testing/selftests/bpf/xdpxceiver.c
@@ -1591,6 +1591,8 @@ static struct ifobject *ifobject_create(void)
 	if (!ifobj->umem)
 		goto out_umem;
 
+	ifobj->ns_fd = -1;
+
 	return ifobj;
 
 out_umem:
@@ -1602,6 +1604,8 @@ static struct ifobject *ifobject_create(void)
 
 static void ifobject_delete(struct ifobject *ifobj)
 {
+	if (ifobj->ns_fd != -1)
+		close(ifobj->ns_fd);
 	free(ifobj->umem);
 	free(ifobj->xsk_arr);
 	free(ifobj);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 07/10] selftests: xsk: introduce default Rx pkt stream
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
                   ` (5 preceding siblings ...)
  2022-06-16 18:06 ` [PATCH v4 bpf-next 06/10] selftests: xsk: add missing close() on netns fd Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  2022-06-18  2:41   ` John Fastabend
  2022-06-16 18:06 ` [PATCH v4 bpf-next 08/10] selftests: xsk: add support for executing tests on physical device Maciej Fijalkowski
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

In order to prepare xdpxceiver for physical device testing, let us
introduce default Rx pkt stream. Reason for doing it is that physical
device testing will use a UMEM with a doubled size where half of it will
be used by Tx and other half by Rx. This means that pkt addresses will
differ for Tx and Rx streams. Rx thread will initialize the
xsk_umem_info::base_addr that is added here so that pkt_set(), when
working on Rx UMEM will add this offset and second half of UMEM space
will be used. Note that currently base_addr is 0 on both sides. Future
commit will do the mentioned initialization.

Previously, veth based testing worked on separate UMEMs, so single
default stream was fine.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/testing/selftests/bpf/xdpxceiver.c | 74 +++++++++++++++---------
 tools/testing/selftests/bpf/xdpxceiver.h |  4 +-
 2 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
index 81ad69ed5839..3d0731a80e4a 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.c
+++ b/tools/testing/selftests/bpf/xdpxceiver.c
@@ -428,15 +428,16 @@ static void __test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx,
 		ifobj->use_poll = false;
 		ifobj->use_fill_ring = true;
 		ifobj->release_rx = true;
-		ifobj->pkt_stream = test->pkt_stream_default;
 		ifobj->validation_func = NULL;
 
 		if (i == 0) {
 			ifobj->rx_on = false;
 			ifobj->tx_on = true;
+			ifobj->pkt_stream = test->tx_pkt_stream_default;
 		} else {
 			ifobj->rx_on = true;
 			ifobj->tx_on = false;
+			ifobj->pkt_stream = test->rx_pkt_stream_default;
 		}
 
 		memset(ifobj->umem, 0, sizeof(*ifobj->umem));
@@ -460,12 +461,15 @@ static void __test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx,
 static void test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx,
 			   struct ifobject *ifobj_rx, enum test_mode mode)
 {
-	struct pkt_stream *pkt_stream;
+	struct pkt_stream *tx_pkt_stream;
+	struct pkt_stream *rx_pkt_stream;
 	u32 i;
 
-	pkt_stream = test->pkt_stream_default;
+	tx_pkt_stream = test->tx_pkt_stream_default;
+	rx_pkt_stream = test->rx_pkt_stream_default;
 	memset(test, 0, sizeof(*test));
-	test->pkt_stream_default = pkt_stream;
+	test->tx_pkt_stream_default = tx_pkt_stream;
+	test->rx_pkt_stream_default = rx_pkt_stream;
 
 	for (i = 0; i < MAX_INTERFACES; i++) {
 		struct ifobject *ifobj = i ? ifobj_rx : ifobj_tx;
@@ -526,16 +530,17 @@ static void pkt_stream_delete(struct pkt_stream *pkt_stream)
 static void pkt_stream_restore_default(struct test_spec *test)
 {
 	struct pkt_stream *tx_pkt_stream = test->ifobj_tx->pkt_stream;
+	struct pkt_stream *rx_pkt_stream = test->ifobj_rx->pkt_stream;
 
-	if (tx_pkt_stream != test->pkt_stream_default) {
+	if (tx_pkt_stream != test->tx_pkt_stream_default) {
 		pkt_stream_delete(test->ifobj_tx->pkt_stream);
-		test->ifobj_tx->pkt_stream = test->pkt_stream_default;
+		test->ifobj_tx->pkt_stream = test->tx_pkt_stream_default;
 	}
 
-	if (test->ifobj_rx->pkt_stream != test->pkt_stream_default &&
-	    test->ifobj_rx->pkt_stream != tx_pkt_stream)
+	if (rx_pkt_stream != test->rx_pkt_stream_default) {
 		pkt_stream_delete(test->ifobj_rx->pkt_stream);
-	test->ifobj_rx->pkt_stream = test->pkt_stream_default;
+		test->ifobj_rx->pkt_stream = test->rx_pkt_stream_default;
+	}
 }
 
 static struct pkt_stream *__pkt_stream_alloc(u32 nb_pkts)
@@ -558,7 +563,7 @@ static struct pkt_stream *__pkt_stream_alloc(u32 nb_pkts)
 
 static void pkt_set(struct xsk_umem_info *umem, struct pkt *pkt, u64 addr, u32 len)
 {
-	pkt->addr = addr;
+	pkt->addr = addr + umem->base_addr;
 	pkt->len = len;
 	if (len > umem->frame_size - XDP_PACKET_HEADROOM - MIN_PKT_SIZE * 2 - umem->frame_headroom)
 		pkt->valid = false;
@@ -597,22 +602,29 @@ static void pkt_stream_replace(struct test_spec *test, u32 nb_pkts, u32 pkt_len)
 
 	pkt_stream = pkt_stream_generate(test->ifobj_tx->umem, nb_pkts, pkt_len);
 	test->ifobj_tx->pkt_stream = pkt_stream;
+	pkt_stream = pkt_stream_generate(test->ifobj_rx->umem, nb_pkts, pkt_len);
 	test->ifobj_rx->pkt_stream = pkt_stream;
 }
 
-static void pkt_stream_replace_half(struct test_spec *test, u32 pkt_len, int offset)
+static void __pkt_stream_replace_half(struct ifobject *ifobj, u32 pkt_len,
+				      int offset)
 {
-	struct xsk_umem_info *umem = test->ifobj_tx->umem;
+	struct xsk_umem_info *umem = ifobj->umem;
 	struct pkt_stream *pkt_stream;
 	u32 i;
 
-	pkt_stream = pkt_stream_clone(umem, test->pkt_stream_default);
-	for (i = 1; i < test->pkt_stream_default->nb_pkts; i += 2)
+	pkt_stream = pkt_stream_clone(umem, ifobj->pkt_stream);
+	for (i = 1; i < ifobj->pkt_stream->nb_pkts; i += 2)
 		pkt_set(umem, &pkt_stream->pkts[i],
 			(i % umem->num_frames) * umem->frame_size + offset, pkt_len);
 
-	test->ifobj_tx->pkt_stream = pkt_stream;
-	test->ifobj_rx->pkt_stream = pkt_stream;
+	ifobj->pkt_stream = pkt_stream;
+}
+
+static void pkt_stream_replace_half(struct test_spec *test, u32 pkt_len, int offset)
+{
+	__pkt_stream_replace_half(test->ifobj_tx, pkt_len, offset);
+	__pkt_stream_replace_half(test->ifobj_rx, pkt_len, offset);
 }
 
 static void pkt_stream_receive_half(struct test_spec *test)
@@ -654,7 +666,8 @@ static struct pkt *pkt_generate(struct ifobject *ifobject, u32 pkt_nb)
 	return pkt;
 }
 
-static void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts, u32 nb_pkts)
+static void __pkt_stream_generate_custom(struct ifobject *ifobj,
+					 struct pkt *pkts, u32 nb_pkts)
 {
 	struct pkt_stream *pkt_stream;
 	u32 i;
@@ -663,15 +676,20 @@ static void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts,
 	if (!pkt_stream)
 		exit_with_error(ENOMEM);
 
-	test->ifobj_tx->pkt_stream = pkt_stream;
-	test->ifobj_rx->pkt_stream = pkt_stream;
-
 	for (i = 0; i < nb_pkts; i++) {
-		pkt_stream->pkts[i].addr = pkts[i].addr;
+		pkt_stream->pkts[i].addr = pkts[i].addr + ifobj->umem->base_addr;
 		pkt_stream->pkts[i].len = pkts[i].len;
 		pkt_stream->pkts[i].payload = i;
 		pkt_stream->pkts[i].valid = pkts[i].valid;
 	}
+
+	ifobj->pkt_stream = pkt_stream;
+}
+
+static void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts, u32 nb_pkts)
+{
+	__pkt_stream_generate_custom(test->ifobj_tx, pkts, nb_pkts);
+	__pkt_stream_generate_custom(test->ifobj_rx, pkts, nb_pkts);
 }
 
 static void pkt_dump(void *pkt, u32 len)
@@ -1639,7 +1657,8 @@ static bool is_xdp_supported(struct ifobject *ifobject)
 
 int main(int argc, char **argv)
 {
-	struct pkt_stream *pkt_stream_default;
+	struct pkt_stream *rx_pkt_stream_default;
+	struct pkt_stream *tx_pkt_stream_default;
 	struct ifobject *ifobj_tx, *ifobj_rx;
 	int modes = TEST_MODE_SKB + 1;
 	u32 i, j, failed_tests = 0;
@@ -1673,10 +1692,12 @@ int main(int argc, char **argv)
 		modes++;
 
 	test_spec_init(&test, ifobj_tx, ifobj_rx, 0);
-	pkt_stream_default = pkt_stream_generate(ifobj_tx->umem, DEFAULT_PKT_CNT, PKT_SIZE);
-	if (!pkt_stream_default)
+	tx_pkt_stream_default = pkt_stream_generate(ifobj_tx->umem, DEFAULT_PKT_CNT, PKT_SIZE);
+	rx_pkt_stream_default = pkt_stream_generate(ifobj_rx->umem, DEFAULT_PKT_CNT, PKT_SIZE);
+	if (!tx_pkt_stream_default || !rx_pkt_stream_default)
 		exit_with_error(ENOMEM);
-	test.pkt_stream_default = pkt_stream_default;
+	test.tx_pkt_stream_default = tx_pkt_stream_default;
+	test.rx_pkt_stream_default = rx_pkt_stream_default;
 
 	ksft_set_plan(modes * TEST_TYPE_MAX);
 
@@ -1690,7 +1711,8 @@ int main(int argc, char **argv)
 				failed_tests++;
 		}
 
-	pkt_stream_delete(pkt_stream_default);
+	pkt_stream_delete(tx_pkt_stream_default);
+	pkt_stream_delete(rx_pkt_stream_default);
 	ifobject_delete(ifobj_tx);
 	ifobject_delete(ifobj_rx);
 
diff --git a/tools/testing/selftests/bpf/xdpxceiver.h b/tools/testing/selftests/bpf/xdpxceiver.h
index 8f672b0fe0e1..ccfc829b2e5e 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.h
+++ b/tools/testing/selftests/bpf/xdpxceiver.h
@@ -95,6 +95,7 @@ struct xsk_umem_info {
 	u32 frame_headroom;
 	void *buffer;
 	u32 frame_size;
+	u32 base_addr;
 	bool unaligned_mode;
 };
 
@@ -155,7 +156,8 @@ struct ifobject {
 struct test_spec {
 	struct ifobject *ifobj_tx;
 	struct ifobject *ifobj_rx;
-	struct pkt_stream *pkt_stream_default;
+	struct pkt_stream *tx_pkt_stream_default;
+	struct pkt_stream *rx_pkt_stream_default;
 	u16 total_steps;
 	u16 current_step;
 	u16 nb_sockets;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 08/10] selftests: xsk: add support for executing tests on physical device
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
                   ` (6 preceding siblings ...)
  2022-06-16 18:06 ` [PATCH v4 bpf-next 07/10] selftests: xsk: introduce default Rx pkt stream Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  2022-06-16 18:06 ` [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion() Maciej Fijalkowski
  2022-06-16 18:06 ` [PATCH v4 bpf-next 10/10] selftests: xsk: add support for zero copy testing Maciej Fijalkowski
  9 siblings, 0 replies; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Currently, architecture of xdpxceiver is designed strictly for
conducting veth based tests. Veth pair is created together with a
network namespace and one of the veth interfaces is moved to the
mentioned netns. Then, separate threads for Tx and Rx are spawned which
will utilize described setup.

Infrastructure described in the paragraph above can not be used for
testing AF_XDP support on physical devices. That testing will be
conducted on a single network interface and same queue. Xdpxceiver
needs to be extended to distinguish between veth tests and physical
interface tests.

Since same iface/queue id pair will be used by both Tx/Rx threads for
physical device testing, Tx thread, which happen to run after the Rx
thread, is going to create XSK socket with shared umem flag. In order to
track this setting throughout the lifetime of spawned threads, introduce
'shared_umem' boolean variable to struct ifobject and set it to true
when xdpxceiver is run against physical device. In such case, UMEM size
needs to be doubled, so half of it will be used by Rx thread and other
half by Tx thread. For two step based test types, value of XSKMAP
element under key 0 has to be updated as there is now another socket for
the second step. Also, to avoid race conditions when destroying XSK
resources, move this activity to the main thread after spawned Rx and Tx
threads have finished its job. This way it is possible to gracefully
remove shared umem without introducing synchronization mechanisms.

To run xsk selftests suite on physical device, append "-i $IFACE" when
invoking test_xsk.sh. For veth based tests, simply skip it. When "-i
$IFACE" is in place, under the hood test_xsk.sh will use $IFACE for both
interfaces supplied to xdpxceiver, which in turn will interpret that
this execution of test suite is for a physical device.

Note that currently this makes it possible only to test SKB and DRV mode
(in case underlying device has native XDP support). ZC testing support
is added in a later patch.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/testing/selftests/bpf/test_xsk.sh  |  52 +++++--
 tools/testing/selftests/bpf/xdpxceiver.c | 189 ++++++++++++++---------
 tools/testing/selftests/bpf/xdpxceiver.h |   1 +
 3 files changed, 156 insertions(+), 86 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_xsk.sh b/tools/testing/selftests/bpf/test_xsk.sh
index 567500299231..19b24cce5414 100755
--- a/tools/testing/selftests/bpf/test_xsk.sh
+++ b/tools/testing/selftests/bpf/test_xsk.sh
@@ -73,14 +73,20 @@
 #
 # Run and dump packet contents:
 #   sudo ./test_xsk.sh -D
+#
+# Run test suite for physical device in loopback mode
+#   sudo ./test_xsk.sh -i IFACE
 
 . xsk_prereqs.sh
 
-while getopts "vD" flag
+ETH=""
+
+while getopts "vDi:" flag
 do
 	case "${flag}" in
 		v) verbose=1;;
 		D) dump_pkts=1;;
+		i) ETH=${OPTARG};;
 	esac
 done
 
@@ -132,18 +138,25 @@ setup_vethPairs() {
 	ip link set ${VETH0} up
 }
 
-validate_root_exec
-validate_veth_support ${VETH0}
-validate_ip_utility
-setup_vethPairs
-
-retval=$?
-if [ $retval -ne 0 ]; then
-	test_status $retval "${TEST_NAME}"
-	cleanup_exit ${VETH0} ${VETH1} ${NS1}
-	exit $retval
+if [ ! -z $ETH ]; then
+	VETH0=${ETH}
+	VETH1=${ETH}
+	NS1=""
+else
+	validate_root_exec
+	validate_veth_support ${VETH0}
+	validate_ip_utility
+	setup_vethPairs
+
+	retval=$?
+	if [ $retval -ne 0 ]; then
+		test_status $retval "${TEST_NAME}"
+		cleanup_exit ${VETH0} ${VETH1} ${NS1}
+		exit $retval
+	fi
 fi
 
+
 if [[ $verbose -eq 1 ]]; then
 	ARGS+="-v "
 fi
@@ -152,26 +165,33 @@ if [[ $dump_pkts -eq 1 ]]; then
 	ARGS="-D "
 fi
 
+retval=$?
 test_status $retval "${TEST_NAME}"
 
 ## START TESTS
 
 statusList=()
 
-TEST_NAME="XSK_SELFTESTS_SOFTIRQ"
+TEST_NAME="XSK_SELFTESTS_${VETH0}_SOFTIRQ"
 
 execxdpxceiver
 
-cleanup_exit ${VETH0} ${VETH1} ${NS1}
-TEST_NAME="XSK_SELFTESTS_BUSY_POLL"
+if [ -z $ETH ]; then
+	cleanup_exit ${VETH0} ${VETH1} ${NS1}
+fi
+TEST_NAME="XSK_SELFTESTS_${VETH0}_BUSY_POLL"
 busy_poll=1
 
-setup_vethPairs
+if [ -z $ETH ]; then
+	setup_vethPairs
+fi
 execxdpxceiver
 
 ## END TESTS
 
-cleanup_exit ${VETH0} ${VETH1} ${NS1}
+if [ -z $ETH ]; then
+	cleanup_exit ${VETH0} ${VETH1} ${NS1}
+fi
 
 failures=0
 echo -e "\nSummary:"
diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
index 3d0731a80e4a..de4cf0432243 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.c
+++ b/tools/testing/selftests/bpf/xdpxceiver.c
@@ -296,8 +296,8 @@ static void enable_busy_poll(struct xsk_socket_info *xsk)
 		exit_with_error(errno);
 }
 
-static int xsk_configure_socket(struct xsk_socket_info *xsk, struct xsk_umem_info *umem,
-				struct ifobject *ifobject, bool shared)
+static int __xsk_configure_socket(struct xsk_socket_info *xsk, struct xsk_umem_info *umem,
+				  struct ifobject *ifobject, bool shared)
 {
 	struct xsk_socket_config cfg = {};
 	struct xsk_ring_cons *rxr;
@@ -443,6 +443,9 @@ static void __test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx,
 		memset(ifobj->umem, 0, sizeof(*ifobj->umem));
 		ifobj->umem->num_frames = DEFAULT_UMEM_BUFFERS;
 		ifobj->umem->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+		if (ifobj->shared_umem && ifobj->rx_on)
+			ifobj->umem->base_addr = DEFAULT_UMEM_BUFFERS *
+				XSK_UMEM__DEFAULT_FRAME_SIZE;
 
 		for (j = 0; j < MAX_SOCKETS; j++) {
 			memset(&ifobj->xsk_arr[j], 0, sizeof(ifobj->xsk_arr[j]));
@@ -1101,19 +1104,85 @@ static int validate_tx_invalid_descs(struct ifobject *ifobject)
 	return TEST_PASS;
 }
 
+static void xsk_configure_socket(struct test_spec *test, struct ifobject *ifobject,
+				 struct xsk_umem_info *umem, bool tx)
+{
+	int i, ret;
+
+	for (i = 0; i < test->nb_sockets; i++) {
+		bool shared = (ifobject->shared_umem && tx) ? true : !!i;
+		u32 ctr = 0;
+
+		while (ctr++ < SOCK_RECONF_CTR) {
+			ret = __xsk_configure_socket(&ifobject->xsk_arr[i], umem,
+						     ifobject, shared);
+			if (!ret)
+				break;
+
+			/* Retry if it fails as xsk_socket__create() is asynchronous */
+			if (ctr >= SOCK_RECONF_CTR)
+				exit_with_error(-ret);
+			usleep(USLEEP_MAX);
+		}
+		if (ifobject->busy_poll)
+			enable_busy_poll(&ifobject->xsk_arr[i]);
+	}
+}
+
+static void thread_common_ops_tx(struct test_spec *test, struct ifobject *ifobject)
+{
+	xsk_configure_socket(test, ifobject, test->ifobj_rx->umem, true);
+	ifobject->xsk = &ifobject->xsk_arr[0];
+	ifobject->xsk_map_fd = test->ifobj_rx->xsk_map_fd;
+	memcpy(ifobject->umem, test->ifobj_rx->umem, sizeof(struct xsk_umem_info));
+}
+
+static void xsk_populate_fill_ring(struct xsk_umem_info *umem, struct pkt_stream *pkt_stream)
+{
+	u32 idx = 0, i, buffers_to_fill;
+	int ret;
+
+	if (umem->num_frames < XSK_RING_PROD__DEFAULT_NUM_DESCS)
+		buffers_to_fill = umem->num_frames;
+	else
+		buffers_to_fill = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+
+	ret = xsk_ring_prod__reserve(&umem->fq, buffers_to_fill, &idx);
+	if (ret != buffers_to_fill)
+		exit_with_error(ENOSPC);
+	for (i = 0; i < buffers_to_fill; i++) {
+		u64 addr;
+
+		if (pkt_stream->use_addr_for_fill) {
+			struct pkt *pkt = pkt_stream_get_pkt(pkt_stream, i);
+
+			if (!pkt)
+				break;
+			addr = pkt->addr;
+		} else {
+			addr = i * umem->frame_size;
+		}
+
+		*xsk_ring_prod__fill_addr(&umem->fq, idx++) = addr;
+	}
+	xsk_ring_prod__submit(&umem->fq, buffers_to_fill);
+}
+
 static void thread_common_ops(struct test_spec *test, struct ifobject *ifobject)
 {
 	u64 umem_sz = ifobject->umem->num_frames * ifobject->umem->frame_size;
 	int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE;
 	int ret, ifindex;
 	void *bufs;
-	u32 i;
 
 	ifobject->ns_fd = switch_namespace(ifobject->nsname);
 
 	if (ifobject->umem->unaligned_mode)
 		mmap_flags |= MAP_HUGETLB;
 
+	if (ifobject->shared_umem)
+		umem_sz *= 2;
+
 	bufs = mmap(NULL, umem_sz, PROT_READ | PROT_WRITE, mmap_flags, -1, 0);
 	if (bufs == MAP_FAILED)
 		exit_with_error(errno);
@@ -1122,24 +1191,9 @@ static void thread_common_ops(struct test_spec *test, struct ifobject *ifobject)
 	if (ret)
 		exit_with_error(-ret);
 
-	for (i = 0; i < test->nb_sockets; i++) {
-		u32 ctr = 0;
-
-		while (ctr++ < SOCK_RECONF_CTR) {
-			ret = xsk_configure_socket(&ifobject->xsk_arr[i], ifobject->umem,
-						   ifobject, !!i);
-			if (!ret)
-				break;
-
-			/* Retry if it fails as xsk_socket__create() is asynchronous */
-			if (ctr >= SOCK_RECONF_CTR)
-				exit_with_error(-ret);
-			usleep(USLEEP_MAX);
-		}
+	xsk_populate_fill_ring(ifobject->umem, ifobject->pkt_stream);
 
-		if (ifobject->busy_poll)
-			enable_busy_poll(&ifobject->xsk_arr[i]);
-	}
+	xsk_configure_socket(test, ifobject, ifobject->umem, false);
 
 	ifobject->xsk = &ifobject->xsk_arr[0];
 
@@ -1159,22 +1213,18 @@ static void thread_common_ops(struct test_spec *test, struct ifobject *ifobject)
 		exit_with_error(-ret);
 }
 
-static void testapp_cleanup_xsk_res(struct ifobject *ifobj)
-{
-	print_verbose("Destroying socket\n");
-	xsk_socket__delete(ifobj->xsk->xsk);
-	munmap(ifobj->umem->buffer, ifobj->umem->num_frames * ifobj->umem->frame_size);
-	xsk_umem__delete(ifobj->umem->umem);
-}
-
 static void *worker_testapp_validate_tx(void *arg)
 {
 	struct test_spec *test = (struct test_spec *)arg;
 	struct ifobject *ifobject = test->ifobj_tx;
 	int err;
 
-	if (test->current_step == 1)
-		thread_common_ops(test, ifobject);
+	if (test->current_step == 1) {
+		if (!ifobject->shared_umem)
+			thread_common_ops(test, ifobject);
+		else
+			thread_common_ops_tx(test, ifobject);
+	}
 
 	print_verbose("Sending %d packets on interface %s\n", ifobject->pkt_stream->nb_pkts,
 		      ifobject->ifname);
@@ -1185,53 +1235,23 @@ static void *worker_testapp_validate_tx(void *arg)
 	if (err)
 		report_failure(test);
 
-	if (test->total_steps == test->current_step || err)
-		testapp_cleanup_xsk_res(ifobject);
 	pthread_exit(NULL);
 }
 
-static void xsk_populate_fill_ring(struct xsk_umem_info *umem, struct pkt_stream *pkt_stream)
-{
-	u32 idx = 0, i, buffers_to_fill;
-	int ret;
-
-	if (umem->num_frames < XSK_RING_PROD__DEFAULT_NUM_DESCS)
-		buffers_to_fill = umem->num_frames;
-	else
-		buffers_to_fill = XSK_RING_PROD__DEFAULT_NUM_DESCS;
-
-	ret = xsk_ring_prod__reserve(&umem->fq, buffers_to_fill, &idx);
-	if (ret != buffers_to_fill)
-		exit_with_error(ENOSPC);
-	for (i = 0; i < buffers_to_fill; i++) {
-		u64 addr;
-
-		if (pkt_stream->use_addr_for_fill) {
-			struct pkt *pkt = pkt_stream_get_pkt(pkt_stream, i);
-
-			if (!pkt)
-				break;
-			addr = pkt->addr;
-		} else {
-			addr = i * umem->frame_size;
-		}
-
-		*xsk_ring_prod__fill_addr(&umem->fq, idx++) = addr;
-	}
-	xsk_ring_prod__submit(&umem->fq, buffers_to_fill);
-}
-
 static void *worker_testapp_validate_rx(void *arg)
 {
 	struct test_spec *test = (struct test_spec *)arg;
 	struct ifobject *ifobject = test->ifobj_rx;
 	struct pollfd fds = { };
+	int id = 0;
 	int err;
 
-	if (test->current_step == 1)
+	if (test->current_step == 1) {
 		thread_common_ops(test, ifobject);
-
-	xsk_populate_fill_ring(ifobject->umem, ifobject->pkt_stream);
+	} else {
+		bpf_map_delete_elem(ifobject->xsk_map_fd, &id);
+		xsk_socket__update_xskmap(ifobject->xsk->xsk, ifobject->xsk_map_fd);
+	}
 
 	fds.fd = xsk_socket__fd(ifobject->xsk->xsk);
 	fds.events = POLLIN;
@@ -1249,11 +1269,20 @@ static void *worker_testapp_validate_rx(void *arg)
 		pthread_mutex_unlock(&pacing_mutex);
 	}
 
-	if (test->total_steps == test->current_step || err)
-		testapp_cleanup_xsk_res(ifobject);
 	pthread_exit(NULL);
 }
 
+static void testapp_clean_xsk_umem(struct ifobject *ifobj)
+{
+	u64 umem_sz = ifobj->umem->num_frames * ifobj->umem->frame_size;
+
+	if (ifobj->shared_umem)
+		umem_sz *= 2;
+
+	xsk_umem__delete(ifobj->umem->umem);
+	munmap(ifobj->umem->buffer, umem_sz);
+}
+
 static int testapp_validate_traffic(struct test_spec *test)
 {
 	struct ifobject *ifobj_tx = test->ifobj_tx;
@@ -1280,6 +1309,14 @@ static int testapp_validate_traffic(struct test_spec *test)
 	pthread_join(t1, NULL);
 	pthread_join(t0, NULL);
 
+	if (test->total_steps == test->current_step || test->fail) {
+		xsk_socket__delete(ifobj_tx->xsk->xsk);
+		xsk_socket__delete(ifobj_rx->xsk->xsk);
+		testapp_clean_xsk_umem(ifobj_rx);
+		if (!ifobj_tx->shared_umem)
+			testapp_clean_xsk_umem(ifobj_tx);
+	}
+
 	return !!test->fail;
 }
 
@@ -1359,9 +1396,9 @@ static void testapp_headroom(struct test_spec *test)
 static void testapp_stats_rx_dropped(struct test_spec *test)
 {
 	test_spec_set_name(test, "STAT_RX_DROPPED");
+	pkt_stream_replace_half(test, MIN_PKT_SIZE * 4, 0);
 	test->ifobj_rx->umem->frame_headroom = test->ifobj_rx->umem->frame_size -
 		XDP_PACKET_HEADROOM - MIN_PKT_SIZE * 3;
-	pkt_stream_replace_half(test, MIN_PKT_SIZE * 4, 0);
 	pkt_stream_receive_half(test);
 	test->ifobj_rx->validation_func = validate_rx_dropped;
 	testapp_validate_traffic(test);
@@ -1484,6 +1521,11 @@ static void testapp_invalid_desc(struct test_spec *test)
 		pkts[7].valid = false;
 	}
 
+	if (test->ifobj_tx->shared_umem) {
+		pkts[4].addr += UMEM_SIZE;
+		pkts[5].addr += UMEM_SIZE;
+	}
+
 	pkt_stream_generate_custom(test, pkts, ARRAY_SIZE(pkts));
 	testapp_validate_traffic(test);
 	pkt_stream_restore_default(test);
@@ -1624,7 +1666,6 @@ static void ifobject_delete(struct ifobject *ifobj)
 {
 	if (ifobj->ns_fd != -1)
 		close(ifobj->ns_fd);
-	free(ifobj->umem);
 	free(ifobj->xsk_arr);
 	free(ifobj);
 }
@@ -1663,6 +1704,7 @@ int main(int argc, char **argv)
 	int modes = TEST_MODE_SKB + 1;
 	u32 i, j, failed_tests = 0;
 	struct test_spec test;
+	bool shared_umem;
 
 	/* Use libbpf 1.0 API mode */
 	libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
@@ -1677,6 +1719,10 @@ int main(int argc, char **argv)
 	setlocale(LC_ALL, "");
 
 	parse_command_line(ifobj_tx, ifobj_rx, argc, argv);
+	shared_umem = !strcmp(ifobj_tx->ifname, ifobj_rx->ifname);
+
+	ifobj_tx->shared_umem = shared_umem;
+	ifobj_rx->shared_umem = shared_umem;
 
 	if (!validate_interface(ifobj_tx) || !validate_interface(ifobj_rx)) {
 		usage(basename(argv[0]));
@@ -1713,6 +1759,9 @@ int main(int argc, char **argv)
 
 	pkt_stream_delete(tx_pkt_stream_default);
 	pkt_stream_delete(rx_pkt_stream_default);
+	free(ifobj_rx->umem);
+	if (!ifobj_tx->shared_umem)
+		free(ifobj_tx->umem);
 	ifobject_delete(ifobj_tx);
 	ifobject_delete(ifobj_rx);
 
diff --git a/tools/testing/selftests/bpf/xdpxceiver.h b/tools/testing/selftests/bpf/xdpxceiver.h
index ccfc829b2e5e..b7aa6c7cf2be 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.h
+++ b/tools/testing/selftests/bpf/xdpxceiver.h
@@ -149,6 +149,7 @@ struct ifobject {
 	bool busy_poll;
 	bool use_fill_ring;
 	bool release_rx;
+	bool shared_umem;
 	u8 dst_mac[ETH_ALEN];
 	u8 src_mac[ETH_ALEN];
 };
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion()
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
                   ` (7 preceding siblings ...)
  2022-06-16 18:06 ` [PATCH v4 bpf-next 08/10] selftests: xsk: add support for executing tests on physical device Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  2022-06-18  2:56   ` John Fastabend
  2022-06-16 18:06 ` [PATCH v4 bpf-next 10/10] selftests: xsk: add support for zero copy testing Maciej Fijalkowski
  9 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Some of the drivers that implement support for AF_XDP Zero Copy (like
ice) can have lazy approach for cleaning Tx descriptors. For ZC, when
descriptor is cleaned, it is placed onto AF_XDP completion queue. This
means that current implementation of wait_for_tx_completion() in
xdpxceiver can get onto infinite loop, as some of the descriptors can
never reach CQ.

This function can be changed to rely on pkts_in_flight instead.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/testing/selftests/bpf/xdpxceiver.c | 3 ++-
 tools/testing/selftests/bpf/xdpxceiver.h | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
index de4cf0432243..13a3b2ac2399 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.c
+++ b/tools/testing/selftests/bpf/xdpxceiver.c
@@ -965,7 +965,7 @@ static int __send_pkts(struct ifobject *ifobject, u32 *pkt_nb)
 
 static void wait_for_tx_completion(struct xsk_socket_info *xsk)
 {
-	while (xsk->outstanding_tx)
+	while (pkts_in_flight)
 		complete_pkts(xsk, BATCH_SIZE);
 }
 
@@ -1269,6 +1269,7 @@ static void *worker_testapp_validate_rx(void *arg)
 		pthread_mutex_unlock(&pacing_mutex);
 	}
 
+	pkts_in_flight = 0;
 	pthread_exit(NULL);
 }
 
diff --git a/tools/testing/selftests/bpf/xdpxceiver.h b/tools/testing/selftests/bpf/xdpxceiver.h
index b7aa6c7cf2be..f364a92675f8 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.h
+++ b/tools/testing/selftests/bpf/xdpxceiver.h
@@ -170,6 +170,6 @@ pthread_barrier_t barr;
 pthread_mutex_t pacing_mutex = PTHREAD_MUTEX_INITIALIZER;
 pthread_cond_t pacing_cond = PTHREAD_COND_INITIALIZER;
 
-int pkts_in_flight;
+volatile int pkts_in_flight;
 
 #endif				/* XDPXCEIVER_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 10/10] selftests: xsk: add support for zero copy testing
  2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
                   ` (8 preceding siblings ...)
  2022-06-16 18:06 ` [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion() Maciej Fijalkowski
@ 2022-06-16 18:06 ` Maciej Fijalkowski
  9 siblings, 0 replies; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-16 18:06 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Introduce new mode to xdpxceiver responsible for testing AF_XDP zero
copy support of driver that serves underlying physical device. When
setting up test suite, determine whether driver has ZC support or not by
trying to bind XSK ZC socket to the interface. If it succeeded,
interpret it as ZC support being in place and do softirq and busy poll
tests for zero copy mode.

Note that Rx dropped tests are skipped since ZC path is not touching
rx_dropped stat at all.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/testing/selftests/bpf/xdpxceiver.c | 76 ++++++++++++++++++++++--
 tools/testing/selftests/bpf/xdpxceiver.h |  2 +
 2 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
index 13a3b2ac2399..98177ae01e54 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.c
+++ b/tools/testing/selftests/bpf/xdpxceiver.c
@@ -124,9 +124,20 @@ static void __exit_with_error(int error, const char *file, const char *func, int
 }
 
 #define exit_with_error(error) __exit_with_error(error, __FILE__, __func__, __LINE__)
-
-#define mode_string(test) (test)->ifobj_tx->xdp_flags & XDP_FLAGS_SKB_MODE ? "SKB" : "DRV"
 #define busy_poll_string(test) (test)->ifobj_tx->busy_poll ? "BUSY-POLL " : ""
+static char *mode_string(struct test_spec *test)
+{
+	switch (test->mode) {
+	case TEST_MODE_SKB:
+		return "SKB";
+	case TEST_MODE_DRV:
+		return "DRV";
+	case TEST_MODE_ZC:
+		return "ZC";
+	default:
+		return "BOGUS";
+	}
+}
 
 static void report_failure(struct test_spec *test)
 {
@@ -317,6 +328,51 @@ static int __xsk_configure_socket(struct xsk_socket_info *xsk, struct xsk_umem_i
 	return xsk_socket__create(&xsk->xsk, ifobject->ifname, 0, umem->umem, rxr, txr, &cfg);
 }
 
+static bool ifobj_zc_avail(struct ifobject *ifobject)
+{
+	size_t umem_sz = DEFAULT_UMEM_BUFFERS * XSK_UMEM__DEFAULT_FRAME_SIZE;
+	int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE;
+	struct xsk_socket_info *xsk;
+	struct xsk_umem_info *umem;
+	bool zc_avail = false;
+	void *bufs;
+	int ret;
+
+	bufs = mmap(NULL, umem_sz, PROT_READ | PROT_WRITE, mmap_flags, -1, 0);
+	if (bufs == MAP_FAILED)
+		exit_with_error(errno);
+
+	umem = calloc(1, sizeof(struct xsk_umem_info));
+	if (!umem) {
+		munmap(bufs, umem_sz);
+		exit_with_error(-ENOMEM);
+	}
+	umem->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+	ret = xsk_configure_umem(umem, bufs, umem_sz);
+	if (ret)
+		exit_with_error(-ret);
+
+	xsk = calloc(1, sizeof(struct xsk_socket_info));
+	if (!xsk)
+		goto out;
+	ifobject->xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	ifobject->xdp_flags |= XDP_FLAGS_DRV_MODE;
+	ifobject->bind_flags = XDP_USE_NEED_WAKEUP | XDP_ZEROCOPY;
+	ifobject->rx_on = true;
+	xsk->rxqsize = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+	ret = __xsk_configure_socket(xsk, umem, ifobject, false);
+	if (!ret)
+		zc_avail = true;
+
+	xsk_socket__delete(xsk->xsk);
+	free(xsk);
+out:
+	munmap(umem->buffer, umem_sz);
+	xsk_umem__delete(umem->umem);
+	free(umem);
+	return zc_avail;
+}
+
 static struct option long_options[] = {
 	{"interface", required_argument, 0, 'i'},
 	{"busy-poll", no_argument, 0, 'b'},
@@ -483,9 +539,14 @@ static void test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx,
 		else
 			ifobj->xdp_flags |= XDP_FLAGS_DRV_MODE;
 
-		ifobj->bind_flags = XDP_USE_NEED_WAKEUP | XDP_COPY;
+		ifobj->bind_flags = XDP_USE_NEED_WAKEUP;
+		if (mode == TEST_MODE_ZC)
+			ifobj->bind_flags |= XDP_ZEROCOPY;
+		else
+			ifobj->bind_flags |= XDP_COPY;
 	}
 
+	test->mode = mode;
 	__test_spec_init(test, ifobj_tx, ifobj_rx);
 }
 
@@ -1557,6 +1618,10 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_
 {
 	switch (type) {
 	case TEST_TYPE_STATS_RX_DROPPED:
+		if (mode == TEST_MODE_ZC) {
+			ksft_test_result_skip("Can not run RX_DROPPED test for ZC mode\n");
+			return;
+		}
 		testapp_stats_rx_dropped(test);
 		break;
 	case TEST_TYPE_STATS_TX_INVALID_DESCS:
@@ -1735,8 +1800,11 @@ int main(int argc, char **argv)
 	init_iface(ifobj_rx, MAC2, MAC1, IP2, IP1, UDP_PORT2, UDP_PORT1,
 		   worker_testapp_validate_rx);
 
-	if (is_xdp_supported(ifobj_tx))
+	if (is_xdp_supported(ifobj_tx)) {
 		modes++;
+		if (ifobj_zc_avail(ifobj_tx))
+			modes++;
+	}
 
 	test_spec_init(&test, ifobj_tx, ifobj_rx, 0);
 	tx_pkt_stream_default = pkt_stream_generate(ifobj_tx->umem, DEFAULT_PKT_CNT, PKT_SIZE);
diff --git a/tools/testing/selftests/bpf/xdpxceiver.h b/tools/testing/selftests/bpf/xdpxceiver.h
index f364a92675f8..4e552eb852b9 100644
--- a/tools/testing/selftests/bpf/xdpxceiver.h
+++ b/tools/testing/selftests/bpf/xdpxceiver.h
@@ -61,6 +61,7 @@
 enum test_mode {
 	TEST_MODE_SKB,
 	TEST_MODE_DRV,
+	TEST_MODE_ZC,
 	TEST_MODE_MAX
 };
 
@@ -163,6 +164,7 @@ struct test_spec {
 	u16 current_step;
 	u16 nb_sockets;
 	bool fail;
+	enum test_mode mode;
 	char name[MAX_TEST_NAME_SIZE];
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 02/10] ice: allow toggling loopback mode via ndo_set_features callback
  2022-06-16 18:06 ` [PATCH v4 bpf-next 02/10] ice: allow toggling loopback mode via ndo_set_features callback Maciej Fijalkowski
@ 2022-06-16 18:21   ` Jakub Kicinski
  2022-06-18  1:57   ` John Fastabend
  1 sibling, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2022-06-16 18:21 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: bpf, ast, daniel, netdev, magnus.karlsson, bjorn, Alexandr Lobakin

On Thu, 16 Jun 2022 20:06:01 +0200 Maciej Fijalkowski wrote:
> Add support for NETIF_F_LOOPBACK. This feature can be set via:
> $ ethtool -K eth0 loopback <on|off>
> 
> Feature can be useful for local data path tests.
> 
> CC: Alexandr Lobakin <alexandr.lobakin@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

Acked-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 bpf-next 01/10] ice: compress branches in ice_set_features()
  2022-06-16 18:06 ` [PATCH v4 bpf-next 01/10] ice: compress branches in ice_set_features() Maciej Fijalkowski
@ 2022-06-18  1:46   ` John Fastabend
  2022-06-20  9:48     ` Maciej Fijalkowski
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2022-06-18  1:46 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski,
	Alexandr Lobakin

Maciej Fijalkowski wrote:
> Instead of rather verbose comparison of current netdev->features bits vs
> the incoming ones from user, let us compress them by a helper features
> set that will be the result of netdev->features XOR features. This way,
> current, extensive branches:
> 
> 	if (features & NETIF_F_BIT && !(netdev->features & NETIF_F_BIT))
> 		set_feature(true);
> 	else if (!(features & NETIF_F_BIT) && netdev->features & NETIF_F_BIT)
> 		set_feature(false);
> 
> can become:
> 
> 	netdev_features_t changed = netdev->features ^ features;
> 
> 	if (changed & NETIF_F_BIT)
> 		set_feature(!!(features & NETIF_F_BIT));
> 
> This is nothing new as currently several other drivers use this
> approach, which I find much more convenient.

Looks good couple nits below. Up to you if you want to follow through
on them or not I don't have a strong opinion. For what its worth the
other intel drivers also do the 'netdev->features ^ features'
assignment.

Acked-by: John Fastabend <john.fastabend@gmail.com>

> 
> CC: Alexandr Lobakin <alexandr.lobakin@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 40 +++++++++++------------
>  1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index e1cae253412c..23d1b1fc39fb 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5910,44 +5910,41 @@ ice_set_vlan_features(struct net_device *netdev, netdev_features_t features)
>  static int
>  ice_set_features(struct net_device *netdev, netdev_features_t features)
>  {
> +	netdev_features_t changed = netdev->features ^ features;
>  	struct ice_netdev_priv *np = netdev_priv(netdev);
>  	struct ice_vsi *vsi = np->vsi;
>  	struct ice_pf *pf = vsi->back;
>  	int ret = 0;
>  
>  	/* Don't set any netdev advanced features with device in Safe Mode */
> -	if (ice_is_safe_mode(vsi->back)) {
> -		dev_err(ice_pf_to_dev(vsi->back), "Device is in Safe Mode - not enabling advanced netdev features\n");
> +	if (ice_is_safe_mode(pf)) {
> +		dev_err(ice_pf_to_dev(vsi->back),

bit of nitpicking but if you use pf in the 'if' above why not use it here
as well and save a few keys. Also matches below then.

> +			"Device is in Safe Mode - not enabling advanced netdev features\n");
>  		return ret;
>  	}
>  
>  	/* Do not change setting during reset */
>  	if (ice_is_reset_in_progress(pf->state)) {
> -		dev_err(ice_pf_to_dev(vsi->back), "Device is resetting, changing advanced netdev features temporarily unavailable.\n");
> +		dev_err(ice_pf_to_dev(pf),
> +			"Device is resetting, changing advanced netdev features temporarily unavailable.\n");
>  		return -EBUSY;
>  	}
>  

[...]

> @@ -5956,11 +5953,12 @@ ice_set_features(struct net_device *netdev, netdev_features_t features)
>  		return -EACCES;
>  	}
>  
> -	if ((features & NETIF_F_HW_TC) &&
> -	    !(netdev->features & NETIF_F_HW_TC))
> -		set_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
> -	else
> -		clear_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
> +	if (changed & NETIF_F_HW_TC) {
> +		bool ena = !!(features & NETIF_F_HW_TC);
> +
> +		ena ? set_bit(ICE_FLAG_CLS_FLOWER, pf->flags) :
> +		      clear_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
> +	}

Just a note you changed the logic slightly here. Above you always
clear the bit. But, it looks like it doesn't matter caveat being
I don't know what might happen in hardware.

>  
>  	return 0;
>  }
> -- 
> 2.27.0
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 bpf-next 02/10] ice: allow toggling loopback mode via ndo_set_features callback
  2022-06-16 18:06 ` [PATCH v4 bpf-next 02/10] ice: allow toggling loopback mode via ndo_set_features callback Maciej Fijalkowski
  2022-06-16 18:21   ` Jakub Kicinski
@ 2022-06-18  1:57   ` John Fastabend
  2022-06-20  9:52     ` Maciej Fijalkowski
  1 sibling, 1 reply; 27+ messages in thread
From: John Fastabend @ 2022-06-18  1:57 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski,
	Alexandr Lobakin

Maciej Fijalkowski wrote:
> Add support for NETIF_F_LOOPBACK. This feature can be set via:
> $ ethtool -K eth0 loopback <on|off>
> 
> Feature can be useful for local data path tests.
> 
> CC: Alexandr Lobakin <alexandr.lobakin@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---

Patch looks fine one question about that ice_set_features() function
though.

Acked-by: John Fastabend <john.fastabend@gmail.com>

[...]

> +/**
> + * ice_set_loopback - turn on/off loopback mode on underlying PF
> + * @vsi: ptr to VSI
> + * @ena: flag to indicate the on/off setting
> + */
> +static int
> +ice_set_loopback(struct ice_vsi *vsi, bool ena)
> +{
> +	bool if_running = netif_running(vsi->netdev);
> +	int ret;
> +
> +	if (if_running && !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) {
> +		ret = ice_down(vsi);
> +		if (ret) {
> +			netdev_err(vsi->netdev, "Preparing device to toggle loopback failed\n");
> +			return ret;
> +		}
> +	}
> +	ret = ice_aq_set_mac_loopback(&vsi->back->hw, ena, NULL);
> +	if (ret)
> +		netdev_err(vsi->netdev, "Failed to toggle loopback state\n");
> +	if (if_running)
> +		ret = ice_up(vsi);
> +
> +	return ret;
> +}
> +
>  /**
>   * ice_set_features - set the netdev feature flags
>   * @netdev: ptr to the netdev being adjusted
> @@ -5960,7 +5988,10 @@ ice_set_features(struct net_device *netdev, netdev_features_t features)
>  		      clear_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
>  	}
>  
> -	return 0;
> +	if (changed & NETIF_F_LOOPBACK)
> +		ret = ice_set_loopback(vsi, !!(features & NETIF_F_LOOPBACK));
> +
> +	return ret;

Unrelated to your patch, but because you are messing with 'ret' here a bit,
how come you return 0 when ice_is_safe_mode() shouldn't you push that
error up so the user who is doing the setting knows it didn't actually
work?

>  }
>  
>  /**
> -- 
> 2.27.0
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 bpf-next 03/10] ice: check DD bit on Rx descriptor rather than (EOP | RS)
  2022-06-16 18:06 ` [PATCH v4 bpf-next 03/10] ice: check DD bit on Rx descriptor rather than (EOP | RS) Maciej Fijalkowski
@ 2022-06-18  1:58   ` John Fastabend
  2022-06-20 10:53     ` Maciej Fijalkowski
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2022-06-18  1:58 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Maciej Fijalkowski wrote:
> Tx side sets EOP and RS bits on descriptors to indicate that a
> particular descriptor is the last one and needs to generate an irq when
> it was sent. These bits should not be checked on completion path
> regardless whether it's the Tx or the Rx. DD bit serves this purpose and
> it indicates that a particular descriptor is either for Rx or was
> successfully Txed.
> 
> Look at DD bit being set in ice_lbtest_receive_frames() instead of EOP
> and RS pair.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

Is this a bugfix? If so it should go to bpf tree.

> ---
>  drivers/net/ethernet/intel/ice/ice_ethtool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> index 1e71b70f0e52..b6275a29fa0d 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> @@ -658,7 +658,7 @@ static int ice_lbtest_receive_frames(struct ice_rx_ring *rx_ring)
>  		rx_desc = ICE_RX_DESC(rx_ring, i);
>  
>  		if (!(rx_desc->wb.status_error0 &
> -		    cpu_to_le16(ICE_TX_DESC_CMD_EOP | ICE_TX_DESC_CMD_RS)))
> +		    cpu_to_le16(BIT(ICE_RX_FLEX_DESC_STATUS0_DD_S))))
>  			continue;
>  
>  		rx_buf = &rx_ring->rx_buf[i];
> -- 
> 2.27.0
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 bpf-next 04/10] ice: do not setup vlan for loopback VSI
  2022-06-16 18:06 ` [PATCH v4 bpf-next 04/10] ice: do not setup vlan for loopback VSI Maciej Fijalkowski
@ 2022-06-18  2:01   ` John Fastabend
  0 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2022-06-18  2:01 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Maciej Fijalkowski wrote:
> Currently loopback test is failiing due to the error returned from
> ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

These look like fixes unrelated to BPF and probably should go as
driver fixes into net tree?

> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index 5bdd515142ec..882f8e280317 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -6026,10 +6026,12 @@ int ice_vsi_cfg(struct ice_vsi *vsi)
>  	if (vsi->netdev) {
>  		ice_set_rx_mode(vsi->netdev);
>  
> -		err = ice_vsi_vlan_setup(vsi);
> +		if (vsi->type != ICE_VSI_LB) {
> +			err = ice_vsi_vlan_setup(vsi);
>  

Extra newline here makes it less readable in my opinion.

> -		if (err)
> -			return err;
> +			if (err)
> +				return err;
> +		}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 bpf-next 05/10] selftests: xsk: query for native XDP support
  2022-06-16 18:06 ` [PATCH v4 bpf-next 05/10] selftests: xsk: query for native XDP support Maciej Fijalkowski
@ 2022-06-18  2:12   ` John Fastabend
  2022-06-20 10:55     ` Maciej Fijalkowski
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2022-06-18  2:12 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Maciej Fijalkowski wrote:
> Currently, xdpxceiver assumes that underlying device supports XDP in
> native mode - it is fine by now since tests can run only on a veth pair.
> Future commit is going to allow running test suite against physical
> devices, so let us query the device if it is capable of running XDP
> programs in native mode. This way xdpxceiver will not try to run
> TEST_MODE_DRV if device being tested is not supporting it.
> 
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>  tools/testing/selftests/bpf/xdpxceiver.c | 36 ++++++++++++++++++++++--
>  1 file changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
> index e5992a6b5e09..a1e410f6a5d8 100644
> --- a/tools/testing/selftests/bpf/xdpxceiver.c
> +++ b/tools/testing/selftests/bpf/xdpxceiver.c
> @@ -98,6 +98,8 @@
>  #include <unistd.h>
>  #include <stdatomic.h>
>  #include <bpf/xsk.h>
> +#include <bpf/bpf.h>
> +#include <linux/filter.h>
>  #include "xdpxceiver.h"
>  #include "../kselftest.h"
>  
> @@ -1605,10 +1607,37 @@ static void ifobject_delete(struct ifobject *ifobj)
>  	free(ifobj);
>  }
>  
> +static bool is_xdp_supported(struct ifobject *ifobject)
> +{
> +	int flags = XDP_FLAGS_DRV_MODE;
> +
> +	LIBBPF_OPTS(bpf_link_create_opts, opts, .flags = flags);
> +	struct bpf_insn insns[2] = {
> +		BPF_MOV64_IMM(BPF_REG_0, XDP_PASS),
> +		BPF_EXIT_INSN()
> +	};
> +	int ifindex = if_nametoindex(ifobject->ifname);
> +	int prog_fd, insn_cnt = ARRAY_SIZE(insns);
> +	int err;
> +
> +	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, insn_cnt, NULL);
> +	if (prog_fd < 0)
> +		return false;
> +
> +	err = bpf_xdp_attach(ifindex, prog_fd, flags, NULL);
> +	if (err)

Best not to leave around prog_fd in the error case or in the
good case.

> +		return false;
> +
> +	bpf_xdp_detach(ifindex, flags, NULL);
> +

close(prog_fd)

> +	return true;
> +}
> +

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 bpf-next 06/10] selftests: xsk: add missing close() on netns fd
  2022-06-16 18:06 ` [PATCH v4 bpf-next 06/10] selftests: xsk: add missing close() on netns fd Maciej Fijalkowski
@ 2022-06-18  2:14   ` John Fastabend
  0 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2022-06-18  2:14 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Maciej Fijalkowski wrote:
> Commit 1034b03e54ac ("selftests: xsk: Simplify cleanup of ifobjects")
> removed close on netns fd, which is not correct, so let us restore it.
> 
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

Needs a Fixes tag and likely send it to bpf tree unless there is some
reason not to.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 bpf-next 07/10] selftests: xsk: introduce default Rx pkt stream
  2022-06-16 18:06 ` [PATCH v4 bpf-next 07/10] selftests: xsk: introduce default Rx pkt stream Maciej Fijalkowski
@ 2022-06-18  2:41   ` John Fastabend
  2022-06-20  8:41     ` Magnus Karlsson
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2022-06-18  2:41 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Maciej Fijalkowski wrote:
> In order to prepare xdpxceiver for physical device testing, let us
> introduce default Rx pkt stream. Reason for doing it is that physical
> device testing will use a UMEM with a doubled size where half of it will
> be used by Tx and other half by Rx. This means that pkt addresses will
> differ for Tx and Rx streams. Rx thread will initialize the
> xsk_umem_info::base_addr that is added here so that pkt_set(), when
> working on Rx UMEM will add this offset and second half of UMEM space
> will be used. Note that currently base_addr is 0 on both sides. Future
> commit will do the mentioned initialization.
> 
> Previously, veth based testing worked on separate UMEMs, so single
> default stream was fine.
> 
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---

Just curious why can't we make veth use a single umem with double size?
Would be nice to have a single test setup. Why choose two vs a single
size.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion()
  2022-06-16 18:06 ` [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion() Maciej Fijalkowski
@ 2022-06-18  2:56   ` John Fastabend
  2022-06-20 12:02     ` Maciej Fijalkowski
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2022-06-18  2:56 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel
  Cc: netdev, magnus.karlsson, bjorn, kuba, Maciej Fijalkowski

Maciej Fijalkowski wrote:
> Some of the drivers that implement support for AF_XDP Zero Copy (like
> ice) can have lazy approach for cleaning Tx descriptors. For ZC, when
> descriptor is cleaned, it is placed onto AF_XDP completion queue. This
> means that current implementation of wait_for_tx_completion() in
> xdpxceiver can get onto infinite loop, as some of the descriptors can
> never reach CQ.
> 
> This function can be changed to rely on pkts_in_flight instead.
> 
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---

Sorry I'm going to need more details to follow whats going on here.

In send_pkts() we do the expected thing and send all the pkts and
then call wait_for_tx_completion().

Wait for completion is obvious,

 static void wait_for_tx_completion(struct xsk_socket_info *xsk)               
 {                                                   
        while (xsk->outstanding_tx)                                                      
                complete_pkts(xsk, BATCH_SIZE);
 }  

the 'outstanding_tx' counter appears to be decremented in complete_pkts().
This is done by looking at xdk_ring_cons__peek() makes sense to me until
it shows up here we don't know the pkt has been completely sent and
can release the resources.

Now if you just zero it on exit and call it good how do you know the
resources are safe to clean up? Or that you don't have a real bug
in the driver that isn't correctly releasing the resource.

How are users expected to use a lazy approach to tx descriptor cleaning
in this case e.g. on exit like in this case. It seems we need to
fix the root cause of ice not putting things on the completion queue
or I misunderstood the patch.


>  tools/testing/selftests/bpf/xdpxceiver.c | 3 ++-
>  tools/testing/selftests/bpf/xdpxceiver.h | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
> index de4cf0432243..13a3b2ac2399 100644
> --- a/tools/testing/selftests/bpf/xdpxceiver.c
> +++ b/tools/testing/selftests/bpf/xdpxceiver.c
> @@ -965,7 +965,7 @@ static int __send_pkts(struct ifobject *ifobject, u32 *pkt_nb)
>  
>  static void wait_for_tx_completion(struct xsk_socket_info *xsk)
>  {
> -	while (xsk->outstanding_tx)
> +	while (pkts_in_flight)
>  		complete_pkts(xsk, BATCH_SIZE);
>  }
>  
> @@ -1269,6 +1269,7 @@ static void *worker_testapp_validate_rx(void *arg)
>  		pthread_mutex_unlock(&pacing_mutex);
>  	}
>  
> +	pkts_in_flight = 0;
>  	pthread_exit(NULL);
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 07/10] selftests: xsk: introduce default Rx pkt stream
  2022-06-18  2:41   ` John Fastabend
@ 2022-06-20  8:41     ` Magnus Karlsson
  0 siblings, 0 replies; 27+ messages in thread
From: Magnus Karlsson @ 2022-06-20  8:41 UTC (permalink / raw)
  To: John Fastabend
  Cc: Maciej Fijalkowski, bpf, Alexei Starovoitov, Daniel Borkmann,
	Network Development, Karlsson, Magnus, Björn Töpel,
	Jakub Kicinski

On Sat, Jun 18, 2022 at 4:44 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Maciej Fijalkowski wrote:
> > In order to prepare xdpxceiver for physical device testing, let us
> > introduce default Rx pkt stream. Reason for doing it is that physical
> > device testing will use a UMEM with a doubled size where half of it will
> > be used by Tx and other half by Rx. This means that pkt addresses will
> > differ for Tx and Rx streams. Rx thread will initialize the
> > xsk_umem_info::base_addr that is added here so that pkt_set(), when
> > working on Rx UMEM will add this offset and second half of UMEM space
> > will be used. Note that currently base_addr is 0 on both sides. Future
> > commit will do the mentioned initialization.
> >
> > Previously, veth based testing worked on separate UMEMs, so single
> > default stream was fine.
> >
> > Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
>
> Just curious why can't we make veth use a single umem with double size?
> Would be nice to have a single test setup. Why choose two vs a single
> size.

Good point. We could do that, but I prefer if we keep the two modes as
one uses the XDP_SHARED_UMEM feature and the other one does not as it
uses private umems. The more modes we can test, the better. But what
we should do is test both modes when possible and also the third mode
when a umem is shared between separate queue ids and netdevs,
something we do not test at all today. So I say, keep it like this for
now and I can submit another patch set that extends the veth tests to
also use a shared umem (the double umem size case) and introduce
testing of the third mode when applicable.

> Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 01/10] ice: compress branches in ice_set_features()
  2022-06-18  1:46   ` John Fastabend
@ 2022-06-20  9:48     ` Maciej Fijalkowski
  0 siblings, 0 replies; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-20  9:48 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, ast, daniel, netdev, magnus.karlsson, bjorn, kuba, Alexandr Lobakin

On Fri, Jun 17, 2022 at 06:46:35PM -0700, John Fastabend wrote:
> Maciej Fijalkowski wrote:
> > Instead of rather verbose comparison of current netdev->features bits vs
> > the incoming ones from user, let us compress them by a helper features
> > set that will be the result of netdev->features XOR features. This way,
> > current, extensive branches:
> > 
> > 	if (features & NETIF_F_BIT && !(netdev->features & NETIF_F_BIT))
> > 		set_feature(true);
> > 	else if (!(features & NETIF_F_BIT) && netdev->features & NETIF_F_BIT)
> > 		set_feature(false);
> > 
> > can become:
> > 
> > 	netdev_features_t changed = netdev->features ^ features;
> > 
> > 	if (changed & NETIF_F_BIT)
> > 		set_feature(!!(features & NETIF_F_BIT));
> > 
> > This is nothing new as currently several other drivers use this
> > approach, which I find much more convenient.
> 
> Looks good couple nits below. Up to you if you want to follow through
> on them or not I don't have a strong opinion. For what its worth the
> other intel drivers also do the 'netdev->features ^ features'
> assignment.
> 
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> 
> > 
> > CC: Alexandr Lobakin <alexandr.lobakin@intel.com>
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_main.c | 40 +++++++++++------------
> >  1 file changed, 19 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> > index e1cae253412c..23d1b1fc39fb 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -5910,44 +5910,41 @@ ice_set_vlan_features(struct net_device *netdev, netdev_features_t features)
> >  static int
> >  ice_set_features(struct net_device *netdev, netdev_features_t features)
> >  {
> > +	netdev_features_t changed = netdev->features ^ features;
> >  	struct ice_netdev_priv *np = netdev_priv(netdev);
> >  	struct ice_vsi *vsi = np->vsi;
> >  	struct ice_pf *pf = vsi->back;
> >  	int ret = 0;
> >  
> >  	/* Don't set any netdev advanced features with device in Safe Mode */
> > -	if (ice_is_safe_mode(vsi->back)) {
> > -		dev_err(ice_pf_to_dev(vsi->back), "Device is in Safe Mode - not enabling advanced netdev features\n");
> > +	if (ice_is_safe_mode(pf)) {
> > +		dev_err(ice_pf_to_dev(vsi->back),
> 
> bit of nitpicking but if you use pf in the 'if' above why not use it here
> as well and save a few keys. Also matches below then.

Hey John thanks for all of the input!
You're right above, I just forgot to change it to use 'pf' in
ice_pf_to_dev(). Will fix.

> 
> > +			"Device is in Safe Mode - not enabling advanced netdev features\n");
> >  		return ret;
> >  	}
> >  
> >  	/* Do not change setting during reset */
> >  	if (ice_is_reset_in_progress(pf->state)) {
> > -		dev_err(ice_pf_to_dev(vsi->back), "Device is resetting, changing advanced netdev features temporarily unavailable.\n");
> > +		dev_err(ice_pf_to_dev(pf),
> > +			"Device is resetting, changing advanced netdev features temporarily unavailable.\n");
> >  		return -EBUSY;
> >  	}
> >  
> 
> [...]
> 
> > @@ -5956,11 +5953,12 @@ ice_set_features(struct net_device *netdev, netdev_features_t features)
> >  		return -EACCES;
> >  	}
> >  
> > -	if ((features & NETIF_F_HW_TC) &&
> > -	    !(netdev->features & NETIF_F_HW_TC))
> > -		set_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
> > -	else
> > -		clear_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
> > +	if (changed & NETIF_F_HW_TC) {
> > +		bool ena = !!(features & NETIF_F_HW_TC);
> > +
> > +		ena ? set_bit(ICE_FLAG_CLS_FLOWER, pf->flags) :
> > +		      clear_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
> > +	}
> 
> Just a note you changed the logic slightly here. Above you always
> clear the bit. But, it looks like it doesn't matter caveat being
> I don't know what might happen in hardware.

I believe that previous logic was wrong due to the fact that in case
NETIF_F_HW_TC was set but in current ice_set_features() was not modified
then this setting would be lost.

> 
> >  
> >  	return 0;
> >  }
> > -- 
> > 2.27.0
> > 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 02/10] ice: allow toggling loopback mode via ndo_set_features callback
  2022-06-18  1:57   ` John Fastabend
@ 2022-06-20  9:52     ` Maciej Fijalkowski
  0 siblings, 0 replies; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-20  9:52 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, ast, daniel, netdev, magnus.karlsson, bjorn, kuba, Alexandr Lobakin

On Fri, Jun 17, 2022 at 06:57:41PM -0700, John Fastabend wrote:
> Maciej Fijalkowski wrote:
> > Add support for NETIF_F_LOOPBACK. This feature can be set via:
> > $ ethtool -K eth0 loopback <on|off>
> > 
> > Feature can be useful for local data path tests.
> > 
> > CC: Alexandr Lobakin <alexandr.lobakin@intel.com>
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> 
> Patch looks fine one question about that ice_set_features() function
> though.
> 
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> 
> [...]
> 
> > +/**
> > + * ice_set_loopback - turn on/off loopback mode on underlying PF
> > + * @vsi: ptr to VSI
> > + * @ena: flag to indicate the on/off setting
> > + */
> > +static int
> > +ice_set_loopback(struct ice_vsi *vsi, bool ena)
> > +{
> > +	bool if_running = netif_running(vsi->netdev);
> > +	int ret;
> > +
> > +	if (if_running && !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) {
> > +		ret = ice_down(vsi);
> > +		if (ret) {
> > +			netdev_err(vsi->netdev, "Preparing device to toggle loopback failed\n");
> > +			return ret;
> > +		}
> > +	}
> > +	ret = ice_aq_set_mac_loopback(&vsi->back->hw, ena, NULL);
> > +	if (ret)
> > +		netdev_err(vsi->netdev, "Failed to toggle loopback state\n");
> > +	if (if_running)
> > +		ret = ice_up(vsi);
> > +
> > +	return ret;
> > +}
> > +
> >  /**
> >   * ice_set_features - set the netdev feature flags
> >   * @netdev: ptr to the netdev being adjusted
> > @@ -5960,7 +5988,10 @@ ice_set_features(struct net_device *netdev, netdev_features_t features)
> >  		      clear_bit(ICE_FLAG_CLS_FLOWER, pf->flags);
> >  	}
> >  
> > -	return 0;
> > +	if (changed & NETIF_F_LOOPBACK)
> > +		ret = ice_set_loopback(vsi, !!(features & NETIF_F_LOOPBACK));
> > +
> > +	return ret;
> 
> Unrelated to your patch, but because you are messing with 'ret' here a bit,
> how come you return 0 when ice_is_safe_mode() shouldn't you push that
> error up so the user who is doing the setting knows it didn't actually
> work?

Safe mode is the first thing checked in ice_set_features() and in case it
is set we give the message to user about it and return 0 immediately.

So did you miss the immediate exit or are you suggesting that for safe
mode we should return some error code, not 0 which is interpreted as
'successful' command execution?

> 
> >  }
> >  
> >  /**
> > -- 
> > 2.27.0
> > 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 03/10] ice: check DD bit on Rx descriptor rather than (EOP | RS)
  2022-06-18  1:58   ` John Fastabend
@ 2022-06-20 10:53     ` Maciej Fijalkowski
  0 siblings, 0 replies; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-20 10:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: bpf, ast, daniel, netdev, magnus.karlsson, bjorn, kuba

On Fri, Jun 17, 2022 at 06:58:57PM -0700, John Fastabend wrote:
> Maciej Fijalkowski wrote:
> > Tx side sets EOP and RS bits on descriptors to indicate that a
> > particular descriptor is the last one and needs to generate an irq when
> > it was sent. These bits should not be checked on completion path
> > regardless whether it's the Tx or the Rx. DD bit serves this purpose and
> > it indicates that a particular descriptor is either for Rx or was
> > successfully Txed.
> > 
> > Look at DD bit being set in ice_lbtest_receive_frames() instead of EOP
> > and RS pair.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> 
> Is this a bugfix? If so it should go to bpf tree.

Previous logic worked without this patch by an accident, so I don't change
the behaviour of the loopback self test itself, therefore I was not sure
if this could be classified as a bugfix.

Rx descriptor's status_error0 field has ICE_RX_FLEX_DESC_STATUS0_DD_S and
ICE_RX_FLEX_DESC_STATUS0_EOF_S set, which have the following values:

enum ice_rx_flex_desc_status_error_0_bits {
	/* Note: These are predefined bit offsets */
	ICE_RX_FLEX_DESC_STATUS0_DD_S = 0,
	ICE_RX_FLEX_DESC_STATUS0_EOF_S,
	(...)
};

Old code was only ORing two following enums:

enum ice_tx_desc_cmd_bits {
	ICE_TX_DESC_CMD_EOP			= 0x0001,
	ICE_TX_DESC_CMD_RS			= 0x0002,
	(...)
};

If BIT() was used around ICE_TX_DESC_CMD_EOP and ICE_TX_DESC_CMD_RS and if
they were checked separately then on RS bit we would fail.

(let me also check for EOF bit in next revision)

> 
> > ---
> >  drivers/net/ethernet/intel/ice/ice_ethtool.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> > index 1e71b70f0e52..b6275a29fa0d 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> > @@ -658,7 +658,7 @@ static int ice_lbtest_receive_frames(struct ice_rx_ring *rx_ring)
> >  		rx_desc = ICE_RX_DESC(rx_ring, i);
> >  
> >  		if (!(rx_desc->wb.status_error0 &
> > -		    cpu_to_le16(ICE_TX_DESC_CMD_EOP | ICE_TX_DESC_CMD_RS)))
> > +		    cpu_to_le16(BIT(ICE_RX_FLEX_DESC_STATUS0_DD_S))))
> >  			continue;
> >  
> >  		rx_buf = &rx_ring->rx_buf[i];
> > -- 
> > 2.27.0
> > 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 05/10] selftests: xsk: query for native XDP support
  2022-06-18  2:12   ` John Fastabend
@ 2022-06-20 10:55     ` Maciej Fijalkowski
  0 siblings, 0 replies; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-20 10:55 UTC (permalink / raw)
  To: John Fastabend; +Cc: bpf, ast, daniel, netdev, magnus.karlsson, bjorn, kuba

On Fri, Jun 17, 2022 at 07:12:00PM -0700, John Fastabend wrote:
> Maciej Fijalkowski wrote:
> > Currently, xdpxceiver assumes that underlying device supports XDP in
> > native mode - it is fine by now since tests can run only on a veth pair.
> > Future commit is going to allow running test suite against physical
> > devices, so let us query the device if it is capable of running XDP
> > programs in native mode. This way xdpxceiver will not try to run
> > TEST_MODE_DRV if device being tested is not supporting it.
> > 
> > Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >  tools/testing/selftests/bpf/xdpxceiver.c | 36 ++++++++++++++++++++++--
> >  1 file changed, 34 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
> > index e5992a6b5e09..a1e410f6a5d8 100644
> > --- a/tools/testing/selftests/bpf/xdpxceiver.c
> > +++ b/tools/testing/selftests/bpf/xdpxceiver.c
> > @@ -98,6 +98,8 @@
> >  #include <unistd.h>
> >  #include <stdatomic.h>
> >  #include <bpf/xsk.h>
> > +#include <bpf/bpf.h>
> > +#include <linux/filter.h>
> >  #include "xdpxceiver.h"
> >  #include "../kselftest.h"
> >  
> > @@ -1605,10 +1607,37 @@ static void ifobject_delete(struct ifobject *ifobj)
> >  	free(ifobj);
> >  }
> >  
> > +static bool is_xdp_supported(struct ifobject *ifobject)
> > +{
> > +	int flags = XDP_FLAGS_DRV_MODE;
> > +
> > +	LIBBPF_OPTS(bpf_link_create_opts, opts, .flags = flags);
> > +	struct bpf_insn insns[2] = {
> > +		BPF_MOV64_IMM(BPF_REG_0, XDP_PASS),
> > +		BPF_EXIT_INSN()
> > +	};
> > +	int ifindex = if_nametoindex(ifobject->ifname);
> > +	int prog_fd, insn_cnt = ARRAY_SIZE(insns);
> > +	int err;
> > +
> > +	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, insn_cnt, NULL);
> > +	if (prog_fd < 0)
> > +		return false;
> > +
> > +	err = bpf_xdp_attach(ifindex, prog_fd, flags, NULL);
> > +	if (err)
> 
> Best not to leave around prog_fd in the error case or in the
> good case.
> 
> > +		return false;
> > +
> > +	bpf_xdp_detach(ifindex, flags, NULL);
> > +
> 
> close(prog_fd)

Will fix

> 
> > +	return true;
> > +}
> > +

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion()
  2022-06-18  2:56   ` John Fastabend
@ 2022-06-20 12:02     ` Maciej Fijalkowski
  2022-06-20 16:36       ` John Fastabend
  0 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2022-06-20 12:02 UTC (permalink / raw)
  To: John Fastabend; +Cc: bpf, ast, daniel, netdev, magnus.karlsson, bjorn, kuba

On Fri, Jun 17, 2022 at 07:56:17PM -0700, John Fastabend wrote:
> Maciej Fijalkowski wrote:
> > Some of the drivers that implement support for AF_XDP Zero Copy (like
> > ice) can have lazy approach for cleaning Tx descriptors. For ZC, when
> > descriptor is cleaned, it is placed onto AF_XDP completion queue. This
> > means that current implementation of wait_for_tx_completion() in
> > xdpxceiver can get onto infinite loop, as some of the descriptors can
> > never reach CQ.
> > 
> > This function can be changed to rely on pkts_in_flight instead.
> > 
> > Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> 
> Sorry I'm going to need more details to follow whats going on here.
> 
> In send_pkts() we do the expected thing and send all the pkts and
> then call wait_for_tx_completion().
> 
> Wait for completion is obvious,
> 
>  static void wait_for_tx_completion(struct xsk_socket_info *xsk)               
>  {                                                   
>         while (xsk->outstanding_tx)                                                      
>                 complete_pkts(xsk, BATCH_SIZE);
>  }  
> 
> the 'outstanding_tx' counter appears to be decremented in complete_pkts().
> This is done by looking at xdk_ring_cons__peek() makes sense to me until
> it shows up here we don't know the pkt has been completely sent and
> can release the resources.

This is necessary for scenarios like l2fwd in xdpsock where you would be
taking entries from cq back to fq to refill the rx hw queue and keep going
with the flow.

> 
> Now if you just zero it on exit and call it good how do you know the
> resources are safe to clean up? Or that you don't have a real bug
> in the driver that isn't correctly releasing the resource.

xdpxceiver spawns two threads one for tx and one for rx. from rx thread
POV if receive_pkts() ended its job then this implies that tx thread
transmitted all of the frames that rx thread expected to receive. this
zeroing is then only to terminate the tx thread and finish the current
test case so that further cases under the current mode can be executed.

> 
> How are users expected to use a lazy approach to tx descriptor cleaning
> in this case e.g. on exit like in this case. It seems we need to
> fix the root cause of ice not putting things on the completion queue
> or I misunderstood the patch.

ice puts things on cq lazily on purpose as we added batching to Tx side
where we clean descs only when it's needed.

We need to exit spawned threads before we detach socket from interface.
Socket detach is done from main thread and in that case driver goes
through tx ring and places descriptors that are left to completion queue.

> 
> 
> >  tools/testing/selftests/bpf/xdpxceiver.c | 3 ++-
> >  tools/testing/selftests/bpf/xdpxceiver.h | 2 +-
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
> > index de4cf0432243..13a3b2ac2399 100644
> > --- a/tools/testing/selftests/bpf/xdpxceiver.c
> > +++ b/tools/testing/selftests/bpf/xdpxceiver.c
> > @@ -965,7 +965,7 @@ static int __send_pkts(struct ifobject *ifobject, u32 *pkt_nb)
> >  
> >  static void wait_for_tx_completion(struct xsk_socket_info *xsk)
> >  {
> > -	while (xsk->outstanding_tx)
> > +	while (pkts_in_flight)
> >  		complete_pkts(xsk, BATCH_SIZE);
> >  }
> >  
> > @@ -1269,6 +1269,7 @@ static void *worker_testapp_validate_rx(void *arg)
> >  		pthread_mutex_unlock(&pacing_mutex);
> >  	}
> >  
> > +	pkts_in_flight = 0;
> >  	pthread_exit(NULL);
> >  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion()
  2022-06-20 12:02     ` Maciej Fijalkowski
@ 2022-06-20 16:36       ` John Fastabend
  0 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2022-06-20 16:36 UTC (permalink / raw)
  To: Maciej Fijalkowski, John Fastabend
  Cc: bpf, ast, daniel, netdev, magnus.karlsson, bjorn, kuba

Maciej Fijalkowski wrote:
> On Fri, Jun 17, 2022 at 07:56:17PM -0700, John Fastabend wrote:
> > Maciej Fijalkowski wrote:
> > > Some of the drivers that implement support for AF_XDP Zero Copy (like
> > > ice) can have lazy approach for cleaning Tx descriptors. For ZC, when
> > > descriptor is cleaned, it is placed onto AF_XDP completion queue. This
> > > means that current implementation of wait_for_tx_completion() in
> > > xdpxceiver can get onto infinite loop, as some of the descriptors can
> > > never reach CQ.
> > > 
> > > This function can be changed to rely on pkts_in_flight instead.
> > > 
> > > Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > ---
> > 
> > Sorry I'm going to need more details to follow whats going on here.
> > 
> > In send_pkts() we do the expected thing and send all the pkts and
> > then call wait_for_tx_completion().
> > 
> > Wait for completion is obvious,
> > 
> >  static void wait_for_tx_completion(struct xsk_socket_info *xsk)               
> >  {                                                   
> >         while (xsk->outstanding_tx)                                                      
> >                 complete_pkts(xsk, BATCH_SIZE);
> >  }  
> > 
> > the 'outstanding_tx' counter appears to be decremented in complete_pkts().
> > This is done by looking at xdk_ring_cons__peek() makes sense to me until
> > it shows up here we don't know the pkt has been completely sent and
> > can release the resources.
> 
> This is necessary for scenarios like l2fwd in xdpsock where you would be
> taking entries from cq back to fq to refill the rx hw queue and keep going
> with the flow.
> 
> > 
> > Now if you just zero it on exit and call it good how do you know the
> > resources are safe to clean up? Or that you don't have a real bug
> > in the driver that isn't correctly releasing the resource.
> 
> xdpxceiver spawns two threads one for tx and one for rx. from rx thread
> POV if receive_pkts() ended its job then this implies that tx thread
> transmitted all of the frames that rx thread expected to receive. this
> zeroing is then only to terminate the tx thread and finish the current
> test case so that further cases under the current mode can be executed.
> 
> > 
> > How are users expected to use a lazy approach to tx descriptor cleaning
> > in this case e.g. on exit like in this case. It seems we need to
> > fix the root cause of ice not putting things on the completion queue
> > or I misunderstood the patch.
> 
> ice puts things on cq lazily on purpose as we added batching to Tx side
> where we clean descs only when it's needed.
> 
> We need to exit spawned threads before we detach socket from interface.
> Socket detach is done from main thread and in that case driver goes
> through tx ring and places descriptors that are left to completion queue.

But, in general (not this specific xdpxceiver) how does an application
that is tx only know when its OK to tear things down if the ice
driver can get stuck and never puts those on the completion queue? Should
there be some timer that fires and writes these back regardless of more
descriptors are seen? Did I understand the driver correctly.

Thanks,
John

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-06-20 16:36 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16 18:05 [PATCH v4 bpf-next 00/10] AF_XDP ZC selftests Maciej Fijalkowski
2022-06-16 18:06 ` [PATCH v4 bpf-next 01/10] ice: compress branches in ice_set_features() Maciej Fijalkowski
2022-06-18  1:46   ` John Fastabend
2022-06-20  9:48     ` Maciej Fijalkowski
2022-06-16 18:06 ` [PATCH v4 bpf-next 02/10] ice: allow toggling loopback mode via ndo_set_features callback Maciej Fijalkowski
2022-06-16 18:21   ` Jakub Kicinski
2022-06-18  1:57   ` John Fastabend
2022-06-20  9:52     ` Maciej Fijalkowski
2022-06-16 18:06 ` [PATCH v4 bpf-next 03/10] ice: check DD bit on Rx descriptor rather than (EOP | RS) Maciej Fijalkowski
2022-06-18  1:58   ` John Fastabend
2022-06-20 10:53     ` Maciej Fijalkowski
2022-06-16 18:06 ` [PATCH v4 bpf-next 04/10] ice: do not setup vlan for loopback VSI Maciej Fijalkowski
2022-06-18  2:01   ` John Fastabend
2022-06-16 18:06 ` [PATCH v4 bpf-next 05/10] selftests: xsk: query for native XDP support Maciej Fijalkowski
2022-06-18  2:12   ` John Fastabend
2022-06-20 10:55     ` Maciej Fijalkowski
2022-06-16 18:06 ` [PATCH v4 bpf-next 06/10] selftests: xsk: add missing close() on netns fd Maciej Fijalkowski
2022-06-18  2:14   ` John Fastabend
2022-06-16 18:06 ` [PATCH v4 bpf-next 07/10] selftests: xsk: introduce default Rx pkt stream Maciej Fijalkowski
2022-06-18  2:41   ` John Fastabend
2022-06-20  8:41     ` Magnus Karlsson
2022-06-16 18:06 ` [PATCH v4 bpf-next 08/10] selftests: xsk: add support for executing tests on physical device Maciej Fijalkowski
2022-06-16 18:06 ` [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx_completion() Maciej Fijalkowski
2022-06-18  2:56   ` John Fastabend
2022-06-20 12:02     ` Maciej Fijalkowski
2022-06-20 16:36       ` John Fastabend
2022-06-16 18:06 ` [PATCH v4 bpf-next 10/10] selftests: xsk: add support for zero copy testing Maciej Fijalkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).