* [PATCHv3 net-next 0/4] selftests: bonding: use slowwait when waiting
@ 2024-02-02 2:37 Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 1/4] selftests/net/forwarding: add slowwait functions Hangbin Liu
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Hangbin Liu @ 2024-02-02 2:37 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, David S . Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Liang Li, Przemek Kitszel, Hangbin Liu
There are a lot waitings in bonding tests use sleep. Let's replace them with
slowwait(added in the first patch). This could save much test time. e.g.
bond-break-lacpdu-tx.sh
before: 0m16.346s
after: 0m2.824s
bond_options.sh
before: 9m25.299s
after: 6m14.439s
bond-lladdr-target.sh
before: 0m7.090s
after: 0m6.148s
bond_macvlan.sh
before: 0m44.999s
after: 0m26.468s
In total, we could save about 200 seconds.
v3: Rebase to latest net-next
v2: Reduce slowwait sleep time to 0.1 (Paolo Abeni)
Reduce num_grat_arp() miimon time (Paolo Abeni)
Use slowwait for ping result in lag_lib.sh
Hangbin Liu (4):
selftests/net/forwarding: add slowwait functions
selftests: bonding: use tc filter to check if LACP was sent
selftests: bonding: reduce garp_test/arp_validate test time
selftests: bonding: use slowwait instead of hard code sleep
.../net/bonding/bond-break-lacpdu-tx.sh | 19 +++++-----
.../drivers/net/bonding/bond-lladdr-target.sh | 21 ++++++++--
.../drivers/net/bonding/bond_macvlan.sh | 5 +--
.../drivers/net/bonding/bond_options.sh | 38 ++++++++++++++-----
.../drivers/net/bonding/bond_topo_2d1c.sh | 6 +--
.../selftests/drivers/net/bonding/lag_lib.sh | 7 ++--
tools/testing/selftests/net/forwarding/lib.sh | 35 +++++++++++++++++
7 files changed, 99 insertions(+), 32 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCHv3 net-next 1/4] selftests/net/forwarding: add slowwait functions
2024-02-02 2:37 [PATCHv3 net-next 0/4] selftests: bonding: use slowwait when waiting Hangbin Liu
@ 2024-02-02 2:37 ` Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 2/4] selftests: bonding: use tc filter to check if LACP was sent Hangbin Liu
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Hangbin Liu @ 2024-02-02 2:37 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, David S . Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Liang Li, Przemek Kitszel, Hangbin Liu
Add slowwait functions to wait for some operations that may need a long time
to finish. The busywait executes the cmd too fast, which is kind of wasting
cpu in this scenario. At the same time, if shell debugging is enabled with
`set -x`. the busywait will output too much logs.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
tools/testing/selftests/net/forwarding/lib.sh | 35 +++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index a7ecfc8cae98..db3688f52888 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -37,6 +37,32 @@ fi
source "$net_forwarding_dir/../lib.sh"
+# timeout in seconds
+slowwait()
+{
+ local timeout=$1; shift
+
+ local start_time="$(date -u +%s)"
+ while true
+ do
+ local out
+ out=$("$@")
+ local ret=$?
+ if ((!ret)); then
+ echo -n "$out"
+ return 0
+ fi
+
+ local current_time="$(date -u +%s)"
+ if ((current_time - start_time > timeout)); then
+ echo -n "$out"
+ return 1
+ fi
+
+ sleep 0.1
+ done
+}
+
##############################################################################
# Sanity checks
@@ -478,6 +504,15 @@ busywait_for_counter()
busywait "$timeout" until_counter_is ">= $((base + delta))" "$@"
}
+slowwait_for_counter()
+{
+ local timeout=$1; shift
+ local delta=$1; shift
+
+ local base=$("$@")
+ slowwait "$timeout" until_counter_is ">= $((base + delta))" "$@"
+}
+
setup_wait_dev()
{
local dev=$1; shift
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCHv3 net-next 2/4] selftests: bonding: use tc filter to check if LACP was sent
2024-02-02 2:37 [PATCHv3 net-next 0/4] selftests: bonding: use slowwait when waiting Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 1/4] selftests/net/forwarding: add slowwait functions Hangbin Liu
@ 2024-02-02 2:37 ` Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 3/4] selftests: bonding: reduce garp_test/arp_validate test time Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 4/4] selftests: bonding: use slowwait instead of hard code sleep Hangbin Liu
3 siblings, 0 replies; 7+ messages in thread
From: Hangbin Liu @ 2024-02-02 2:37 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, David S . Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Liang Li, Przemek Kitszel, Hangbin Liu
Use tc filter to check if LACP was sent, which is accurate and save
more time.
No need to remove bonding module as some test env may buildin bonding.
And the bond link has been deleted.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
.../net/bonding/bond-break-lacpdu-tx.sh | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
index 6358df5752f9..1ec7f59db7f4 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
@@ -20,21 +20,21 @@
# +------+ +------+
#
# We use veths instead of physical interfaces
+REQUIRE_MZ=no
+NUM_NETIFS=0
+lib_dir=$(dirname "$0")
+source "$lib_dir"/../../../net/forwarding/lib.sh
set -e
-tmp=$(mktemp -q dump.XXXXXX)
cleanup() {
ip link del fab-br0 >/dev/null 2>&1 || :
ip link del fbond >/dev/null 2>&1 || :
ip link del veth1-bond >/dev/null 2>&1 || :
ip link del veth2-bond >/dev/null 2>&1 || :
- modprobe -r bonding >/dev/null 2>&1 || :
- rm -f -- ${tmp}
}
trap cleanup 0 1 2
cleanup
-sleep 1
# create the bridge
ip link add fab-br0 address 52:54:00:3B:7C:A6 mtu 1500 type bridge \
@@ -67,13 +67,12 @@ ip link set fab-br0 up
ip link set fbond up
ip addr add dev fab-br0 10.0.0.3
-tcpdump -n -i veth1-end -e ether proto 0x8809 >${tmp} 2>&1 &
-sleep 15
-pkill tcpdump >/dev/null 2>&1
rc=0
-num=$(grep "packets captured" ${tmp} | awk '{print $1}')
-if test "$num" -gt 0; then
- echo "PASS, captured ${num}"
+tc qdisc add dev veth1-end clsact
+tc filter add dev veth1-end ingress protocol 0x8809 pref 1 handle 101 flower skip_hw action pass
+if slowwait_for_counter 15 2 \
+ tc_rule_handle_stats_get "dev veth1-end ingress" 101 ".packets" "" &> /dev/null; then
+ echo "PASS, captured 2"
else
echo "FAIL"
rc=1
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCHv3 net-next 3/4] selftests: bonding: reduce garp_test/arp_validate test time
2024-02-02 2:37 [PATCHv3 net-next 0/4] selftests: bonding: use slowwait when waiting Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 1/4] selftests/net/forwarding: add slowwait functions Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 2/4] selftests: bonding: use tc filter to check if LACP was sent Hangbin Liu
@ 2024-02-02 2:37 ` Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 4/4] selftests: bonding: use slowwait instead of hard code sleep Hangbin Liu
3 siblings, 0 replies; 7+ messages in thread
From: Hangbin Liu @ 2024-02-02 2:37 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, David S . Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Liang Li, Przemek Kitszel, Hangbin Liu
The purpose of grat_arp is testing commit 9949e2efb54e ("bonding: fix
send_peer_notif overflow"). As the send_peer_notif was defined to u8,
to overflow it, we need to
send_peer_notif = num_peer_notif * peer_notif_delay = num_grat_arp * peer_notify_delay / miimon > 255
(kernel) (kernel parameter) (user parameter)
e.g. 30 (num_grat_arp) * 1000 (peer_notify_delay) / 100 (miimon) > 255.
Which need 30s to complete sending garp messages. To save the testing time,
the only way is reduce the miimon number. Something like
30 (num_grat_arp) * 100 (peer_notify_delay) / 10 (miimon) > 255.
To save more time, the 50 num_grat_arp testing could be removed.
The arp_validate_test also need to check the mii_status, which sleep
too long. Use slowwait to save some time.
For other connection checkings, make sure active slave changed first.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
.../drivers/net/bonding/bond_options.sh | 38 ++++++++++++++-----
1 file changed, 29 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
index d508486cc0bd..6fd0cff3e1e9 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
@@ -45,15 +45,23 @@ skip_ns()
}
active_slave=""
+active_slave_changed()
+{
+ local old_active_slave=$1
+ local new_active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" \
+ ".[].linkinfo.info_data.active_slave")
+ test "$old_active_slave" != "$new_active_slave"
+}
+
check_active_slave()
{
local target_active_slave=$1
+ slowwait 2 active_slave_changed $active_slave
active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
test "$active_slave" = "$target_active_slave"
check_err $? "Current active slave is $active_slave but not $target_active_slave"
}
-
# Test bonding prio option
prio_test()
{
@@ -84,13 +92,13 @@ prio_test()
# active slave should be the higher prio slave
ip -n ${s_ns} link set $active_slave down
- bond_check_connection "fail over"
check_active_slave eth2
+ bond_check_connection "fail over"
# when only 1 slave is up
ip -n ${s_ns} link set $active_slave down
- bond_check_connection "only 1 slave up"
check_active_slave eth0
+ bond_check_connection "only 1 slave up"
# when a higher prio slave change to up
ip -n ${s_ns} link set eth2 up
@@ -140,8 +148,8 @@ prio_test()
check_active_slave "eth1"
ip -n ${s_ns} link set $active_slave down
- bond_check_connection "change slave prio"
check_active_slave "eth0"
+ bond_check_connection "change slave prio"
fi
}
@@ -199,6 +207,15 @@ prio()
prio_ns "active-backup"
}
+wait_mii_up()
+{
+ for i in $(seq 0 2); do
+ mii_status=$(cmd_jq "ip -n ${s_ns} -j -d link show eth$i" ".[].linkinfo.info_slave_data.mii_status")
+ [ ${mii_status} != "UP" ] && return 1
+ done
+ return 0
+}
+
arp_validate_test()
{
local param="$1"
@@ -211,7 +228,7 @@ arp_validate_test()
[ $RET -ne 0 ] && log_test "arp_validate" "$retmsg"
# wait for a while to make sure the mii status stable
- sleep 5
+ slowwait 5 wait_mii_up
for i in $(seq 0 2); do
mii_status=$(cmd_jq "ip -n ${s_ns} -j -d link show eth$i" ".[].linkinfo.info_slave_data.mii_status")
if [ ${mii_status} != "UP" ]; then
@@ -276,10 +293,13 @@ garp_test()
active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
ip -n ${s_ns} link set ${active_slave} down
- exp_num=$(echo "${param}" | cut -f6 -d ' ')
- sleep $((exp_num + 2))
+ # wait for active link change
+ slowwait 2 active_slave_changed $active_slave
+ exp_num=$(echo "${param}" | cut -f6 -d ' ')
active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
+ slowwait_for_counter $((exp_num + 5)) $exp_num \
+ tc_rule_handle_stats_get "dev s${active_slave#eth} ingress" 101 ".packets" "-n ${g_ns}"
# check result
real_num=$(tc_rule_handle_stats_get "dev s${active_slave#eth} ingress" 101 ".packets" "-n ${g_ns}")
@@ -296,8 +316,8 @@ garp_test()
num_grat_arp()
{
local val
- for val in 10 20 30 50; do
- garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000"
+ for val in 10 20 30; do
+ garp_test "mode active-backup miimon 10 num_grat_arp $val peer_notify_delay 100"
log_test "num_grat_arp" "active-backup miimon num_grat_arp $val"
done
}
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCHv3 net-next 4/4] selftests: bonding: use slowwait instead of hard code sleep
2024-02-02 2:37 [PATCHv3 net-next 0/4] selftests: bonding: use slowwait when waiting Hangbin Liu
` (2 preceding siblings ...)
2024-02-02 2:37 ` [PATCHv3 net-next 3/4] selftests: bonding: reduce garp_test/arp_validate test time Hangbin Liu
@ 2024-02-02 2:37 ` Hangbin Liu
2024-02-03 17:41 ` Jakub Kicinski
3 siblings, 1 reply; 7+ messages in thread
From: Hangbin Liu @ 2024-02-02 2:37 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, David S . Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Liang Li, Przemek Kitszel, Hangbin Liu
Use slowwait instead of hard code sleep for bonding tests.
In function setup_prepare(), the client_create() will be called after
server_create(). So I think there is no need to sleep in server_create()
and remove it.
For lab_lib.sh, remove bonding module may affect other running bonding tests.
And some test env may buildin bond which can't be removed. The bonding
link should be removed by lag_reset_network() or netns delete.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
.../drivers/net/bonding/bond-lladdr-target.sh | 21 ++++++++++++++++---
.../drivers/net/bonding/bond_macvlan.sh | 5 ++---
.../drivers/net/bonding/bond_topo_2d1c.sh | 6 +++---
.../selftests/drivers/net/bonding/lag_lib.sh | 7 +++----
4 files changed, 26 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/bonding/bond-lladdr-target.sh b/tools/testing/selftests/drivers/net/bonding/bond-lladdr-target.sh
index 89af402fabbe..78d3e0fe6604 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond-lladdr-target.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond-lladdr-target.sh
@@ -17,6 +17,11 @@
# +----------------+
#
# We use veths instead of physical interfaces
+REQUIRE_MZ=no
+NUM_NETIFS=0
+lib_dir=$(dirname "$0")
+source "$lib_dir"/../../../net/forwarding/lib.sh
+
sw="sw-$(mktemp -u XXXXXX)"
host="ns-$(mktemp -u XXXXXX)"
@@ -26,6 +31,16 @@ cleanup()
ip netns del $host
}
+wait_lladdr_dad()
+{
+ $@ | grep fe80 | grep -qv tentative
+}
+
+wait_bond_up()
+{
+ $@ | grep -q 'state UP'
+}
+
trap cleanup 0 1 2
ip netns add $sw
@@ -37,8 +52,8 @@ ip -n $host link add veth1 type veth peer name veth1 netns $sw
ip -n $sw link add br0 type bridge
ip -n $sw link set br0 up
sw_lladdr=$(ip -n $sw addr show br0 | awk '/fe80/{print $2}' | cut -d'/' -f1)
-# sleep some time to make sure bridge lladdr pass DAD
-sleep 2
+# wait some time to make sure bridge lladdr pass DAD
+slowwait 2 wait_lladdr_dad ip -n $sw addr show br0
ip -n $host link add bond0 type bond mode 1 ns_ip6_target ${sw_lladdr} \
arp_validate 3 arp_interval 1000
@@ -53,7 +68,7 @@ ip -n $sw link set veth1 master br0
ip -n $sw link set veth0 up
ip -n $sw link set veth1 up
-sleep 5
+slowwait 5 wait_bond_up ip -n $host link show bond0
rc=0
if ip -n $host link show bond0 | grep -q LOWER_UP; then
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh b/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh
index b609fb6231f4..acd3ebed3e20 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh
@@ -58,7 +58,7 @@ macvlan_over_bond()
ip -n ${m2_ns} addr add ${m2_ip4}/24 dev macv0
ip -n ${m2_ns} addr add ${m2_ip6}/24 dev macv0
- sleep 2
+ slowwait 2 ip netns exec ${c_ns} ping ${s_ip4} -c 1 -W 0.1 &> /dev/null
check_connection "${c_ns}" "${s_ip4}" "IPv4: client->server"
check_connection "${c_ns}" "${s_ip6}" "IPv6: client->server"
@@ -69,8 +69,7 @@ macvlan_over_bond()
check_connection "${m1_ns}" "${m2_ip4}" "IPv4: macvlan_1->macvlan_2"
check_connection "${m1_ns}" "${m2_ip6}" "IPv6: macvlan_1->macvlan_2"
-
- sleep 5
+ slowwait 5 ip netns exec ${s_ns} ping ${c_ip4} -c 1 -W 0.1 &> /dev/null
check_connection "${s_ns}" "${c_ip4}" "IPv4: server->client"
check_connection "${s_ns}" "${c_ip6}" "IPv6: server->client"
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_topo_2d1c.sh b/tools/testing/selftests/drivers/net/bonding/bond_topo_2d1c.sh
index 0eb7edfb584c..195ef83cfbf1 100644
--- a/tools/testing/selftests/drivers/net/bonding/bond_topo_2d1c.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond_topo_2d1c.sh
@@ -73,7 +73,6 @@ server_create()
ip -n ${s_ns} link set bond0 up
ip -n ${s_ns} addr add ${s_ip4}/24 dev bond0
ip -n ${s_ns} addr add ${s_ip6}/24 dev bond0
- sleep 2
}
# Reset bond with new mode and options
@@ -96,7 +95,8 @@ bond_reset()
ip -n ${s_ns} link set bond0 up
ip -n ${s_ns} addr add ${s_ip4}/24 dev bond0
ip -n ${s_ns} addr add ${s_ip6}/24 dev bond0
- sleep 2
+ # Wait for IPv6 address ready as it needs DAD
+ slowwait 2 ip netns exec ${s_ns} ping6 ${c_ip6} -c 1 -W 0.1 &> /dev/null
}
server_destroy()
@@ -150,7 +150,7 @@ bond_check_connection()
{
local msg=${1:-"check connection"}
- sleep 2
+ slowwait 2 ip netns exec ${s_ns} ping ${c_ip4} -c 1 -W 0.1 &> /dev/null
ip netns exec ${s_ns} ping ${c_ip4} -c5 -i 0.1 &>/dev/null
check_err $? "${msg}: ping failed"
ip netns exec ${s_ns} ping6 ${c_ip6} -c5 -i 0.1 &>/dev/null
diff --git a/tools/testing/selftests/drivers/net/bonding/lag_lib.sh b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh
index dbdd736a41d3..bf9bcd1b5ec0 100644
--- a/tools/testing/selftests/drivers/net/bonding/lag_lib.sh
+++ b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh
@@ -107,13 +107,12 @@ lag_setup2x2()
NAMESPACES="${namespaces}"
}
-# cleanup all lag related namespaces and remove the bonding module
+# cleanup all lag related namespaces
lag_cleanup()
{
for n in ${NAMESPACES}; do
ip netns delete ${n} >/dev/null 2>&1 || true
done
- modprobe -r bonding
}
SWITCH="lag_node1"
@@ -159,7 +158,7 @@ test_bond_recovery()
create_bond $@
# verify connectivity
- ip netns exec ${CLIENT} ping ${SWITCHIP} -c 2 >/dev/null 2>&1
+ slowwait 2 ip netns exec ${CLIENT} ping ${SWITCHIP} -c 2 -W 0.1 &> /dev/null
check_err $? "No connectivity"
# force the links of the bond down
@@ -169,7 +168,7 @@ test_bond_recovery()
ip netns exec ${SWITCH} ip link set eth1 down
# re-verify connectivity
- ip netns exec ${CLIENT} ping ${SWITCHIP} -c 2 >/dev/null 2>&1
+ slowwait 2 ip netns exec ${CLIENT} ping ${SWITCHIP} -c 2 -W 0.1 &> /dev/null
local rc=$?
check_err $rc "Bond failed to recover"
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCHv3 net-next 4/4] selftests: bonding: use slowwait instead of hard code sleep
2024-02-02 2:37 ` [PATCHv3 net-next 4/4] selftests: bonding: use slowwait instead of hard code sleep Hangbin Liu
@ 2024-02-03 17:41 ` Jakub Kicinski
2024-02-04 8:31 ` Hangbin Liu
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2024-02-03 17:41 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, David S . Miller, Paolo Abeni,
Eric Dumazet, Liang Li, Przemek Kitszel
On Fri, 2 Feb 2024 10:37:54 +0800 Hangbin Liu wrote:
> diff --git a/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh b/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh
> index b609fb6231f4..acd3ebed3e20 100755
> --- a/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh
> +++ b/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh
> @@ -58,7 +58,7 @@ macvlan_over_bond()
> ip -n ${m2_ns} addr add ${m2_ip4}/24 dev macv0
> ip -n ${m2_ns} addr add ${m2_ip6}/24 dev macv0
>
> - sleep 2
> + slowwait 2 ip netns exec ${c_ns} ping ${s_ip4} -c 1 -W 0.1 &> /dev/null
>
> check_connection "${c_ns}" "${s_ip4}" "IPv4: client->server"
> check_connection "${c_ns}" "${s_ip6}" "IPv6: client->server"
> @@ -69,8 +69,7 @@ macvlan_over_bond()
> check_connection "${m1_ns}" "${m2_ip4}" "IPv4: macvlan_1->macvlan_2"
> check_connection "${m1_ns}" "${m2_ip6}" "IPv6: macvlan_1->macvlan_2"
>
> -
> - sleep 5
> + slowwait 5 ip netns exec ${s_ns} ping ${c_ip4} -c 1 -W 0.1 &> /dev/null
>
> check_connection "${s_ns}" "${c_ip4}" "IPv4: server->client"
> check_connection "${s_ns}" "${c_ip6}" "IPv6: server->client"
This makes the bond_macvlan.sh test flaky:
https://netdev.bots.linux.dev/contest.html?test=bond-macvlan-sh
I repro'd it and the ping in check_connection() fails - neigh resolution
fails. I guess we need to insert more of the slowwaits?
Reverting this patch from the pending patch tree fixes it. The runner
has no KVM support, and runs a VM with 64 CPUs. If I lower the number
of CPUs to 4 the test passes. I added the note that some flakiness may
be caused by high CPU count:
https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#tips
--
pw-bot: cr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 net-next 4/4] selftests: bonding: use slowwait instead of hard code sleep
2024-02-03 17:41 ` Jakub Kicinski
@ 2024-02-04 8:31 ` Hangbin Liu
0 siblings, 0 replies; 7+ messages in thread
From: Hangbin Liu @ 2024-02-04 8:31 UTC (permalink / raw)
To: Jakub Kicinski, Jay Vosburgh
Cc: netdev, David S . Miller, Paolo Abeni, Eric Dumazet, Liang Li,
Przemek Kitszel
On Sat, Feb 03, 2024 at 09:41:51AM -0800, Jakub Kicinski wrote:
> On Fri, 2 Feb 2024 10:37:54 +0800 Hangbin Liu wrote:
> > diff --git a/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh b/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh
> > index b609fb6231f4..acd3ebed3e20 100755
> > --- a/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh
> > +++ b/tools/testing/selftests/drivers/net/bonding/bond_macvlan.sh
> > @@ -58,7 +58,7 @@ macvlan_over_bond()
> > ip -n ${m2_ns} addr add ${m2_ip4}/24 dev macv0
> > ip -n ${m2_ns} addr add ${m2_ip6}/24 dev macv0
> >
> > - sleep 2
> > + slowwait 2 ip netns exec ${c_ns} ping ${s_ip4} -c 1 -W 0.1 &> /dev/null
> >
> > check_connection "${c_ns}" "${s_ip4}" "IPv4: client->server"
> > check_connection "${c_ns}" "${s_ip6}" "IPv6: client->server"
> > @@ -69,8 +69,7 @@ macvlan_over_bond()
> > check_connection "${m1_ns}" "${m2_ip4}" "IPv4: macvlan_1->macvlan_2"
> > check_connection "${m1_ns}" "${m2_ip6}" "IPv6: macvlan_1->macvlan_2"
> >
> > -
> > - sleep 5
> > + slowwait 5 ip netns exec ${s_ns} ping ${c_ip4} -c 1 -W 0.1 &> /dev/null
> >
> > check_connection "${s_ns}" "${c_ip4}" "IPv4: server->client"
> > check_connection "${s_ns}" "${c_ip6}" "IPv6: server->client"
>
> This makes the bond_macvlan.sh test flaky:
>
> https://netdev.bots.linux.dev/contest.html?test=bond-macvlan-sh
Hi Jakub,
Thanks for the report.
>
> I repro'd it and the ping in check_connection() fails - neigh resolution
> fails. I guess we need to insert more of the slowwaits?
>
> Reverting this patch from the pending patch tree fixes it. The runner
> has no KVM support, and runs a VM with 64 CPUs. If I lower the number
> of CPUs to 4 the test passes. I added the note that some flakiness may
> be caused by high CPU count:
>
> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#tips
Sadly, I can't reproduce it with an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz,
which has 20 Cores and 40 Processors. From your logs[1][2][3], all the tests
failed when ping from client to macvlan_2. e.g.
# TEST: balance-tlb: IPv4: client->macvlan_1 [ OK ]
# TEST: balance-tlb: IPv6: client->macvlan_1 [ OK ]
# TEST: balance-tlb: IPv4: client->macvlan_2 [FAIL]
# ping failed
# TEST: balance-tlb: IPv6: client->macvlan_2 [ OK ]
Or
# TEST: balance-alb: IPv4: client->macvlan_1 [ OK ]
# TEST: balance-alb: IPv6: client->macvlan_1 [ OK ]
# TEST: balance-alb: IPv4: client->macvlan_2 [FAIL]
# ping failed
# TEST: balance-alb: IPv6: client->macvlan_2 [ OK ]
Let us checking the client to macvlan2 connection via slowwait and see
if it works.
[1] https://netdev-2.bots.linux.dev/vmksft-bonding/results/449541/2-bond-macvlan-sh/stdout
[2] https://netdev-2.bots.linux.dev/vmksft-bonding/results/449361/4-bond-macvlan-sh/stdout
[3] https://netdev-2.bots.linux.dev/vmksft-bonding/results/449001/4-bond-macvlan-sh/stdout
Thanks
Hangbin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-02-04 8:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-02 2:37 [PATCHv3 net-next 0/4] selftests: bonding: use slowwait when waiting Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 1/4] selftests/net/forwarding: add slowwait functions Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 2/4] selftests: bonding: use tc filter to check if LACP was sent Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 3/4] selftests: bonding: reduce garp_test/arp_validate test time Hangbin Liu
2024-02-02 2:37 ` [PATCHv3 net-next 4/4] selftests: bonding: use slowwait instead of hard code sleep Hangbin Liu
2024-02-03 17:41 ` Jakub Kicinski
2024-02-04 8:31 ` Hangbin Liu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.