All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v3 0/2] bonding: 802.3ad: fix no transmission of LACPDUs
@ 2022-08-15 15:08 Jonathan Toppins
  2022-08-15 15:08 ` [PATCH net v3 1/2] selftests: include bonding tests into the kselftest infra Jonathan Toppins
  2022-08-15 15:08 ` [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs Jonathan Toppins
  0 siblings, 2 replies; 9+ messages in thread
From: Jonathan Toppins @ 2022-08-15 15:08 UTC (permalink / raw)
  To: netdev; +Cc: liuhangbin

The first patch adds some kselftest infrastructure and the reproducer
that demonstrates the problem. The second patch fixes the issue.

v3:
 * rebased to latest net/master
 * addressed comment from Hangbin

Jonathan Toppins (2):
  selftests: include bonding tests into the kselftest infra
  bonding: 802.3ad: fix no transmission of LACPDUs

 MAINTAINERS                                   |  1 +
 drivers/net/bonding/bond_3ad.c                |  3 +-
 tools/testing/selftests/Makefile              |  1 +
 .../selftests/drivers/net/bonding/Makefile    |  6 ++
 .../net/bonding/bond-break-lacpdu-tx.sh       | 81 +++++++++++++++++++
 .../selftests/drivers/net/bonding/config      |  1 +
 .../selftests/drivers/net/bonding/settings    |  1 +
 7 files changed, 93 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/drivers/net/bonding/Makefile
 create mode 100755 tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
 create mode 100644 tools/testing/selftests/drivers/net/bonding/config
 create mode 100644 tools/testing/selftests/drivers/net/bonding/settings

-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net v3 1/2] selftests: include bonding tests into the kselftest infra
  2022-08-15 15:08 [PATCH net v3 0/2] bonding: 802.3ad: fix no transmission of LACPDUs Jonathan Toppins
@ 2022-08-15 15:08 ` Jonathan Toppins
  2022-08-16  7:28   ` Hangbin Liu
  2022-08-15 15:08 ` [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs Jonathan Toppins
  1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Toppins @ 2022-08-15 15:08 UTC (permalink / raw)
  To: netdev; +Cc: liuhangbin, Shuah Khan, linux-kernel, linux-kselftest

This creates a test collection in drivers/net/bonding for bonding
specific kernel selftests.

The first test is a reproducer that provisions a bond and given the
specific order in how the ip-link(8) commands are issued the bond never
transmits an LACPDU frame on any of its slaves.

Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
---

Notes:
    v2:
     * fully integrated the test into the kselftests infrastructure
     * moved the reproducer to under
       tools/testing/selftests/drivers/net/bonding
     * reduced the test to its minimial amount and used ip-link(8) for
       all bond interface configuration
    v3:
     * rebase to latest net/master
     * remove `#set -x` requested by Hangbin

 MAINTAINERS                                   |  1 +
 tools/testing/selftests/Makefile              |  1 +
 .../selftests/drivers/net/bonding/Makefile    |  6 ++
 .../net/bonding/bond-break-lacpdu-tx.sh       | 81 +++++++++++++++++++
 .../selftests/drivers/net/bonding/config      |  1 +
 .../selftests/drivers/net/bonding/settings    |  1 +
 6 files changed, 91 insertions(+)
 create mode 100644 tools/testing/selftests/drivers/net/bonding/Makefile
 create mode 100755 tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
 create mode 100644 tools/testing/selftests/drivers/net/bonding/config
 create mode 100644 tools/testing/selftests/drivers/net/bonding/settings

diff --git a/MAINTAINERS b/MAINTAINERS
index f2d64020399b..e5fb14dc302d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3672,6 +3672,7 @@ F:	Documentation/networking/bonding.rst
 F:	drivers/net/bonding/
 F:	include/net/bond*
 F:	include/uapi/linux/if_bonding.h
+F:	tools/testing/selftests/net/bonding/
 
 BOSCH SENSORTEC BMA400 ACCELEROMETER IIO DRIVER
 M:	Dan Robertson <dan@dlrobertson.com>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 10b34bb03bc1..c2064a35688b 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -12,6 +12,7 @@ TARGETS += cpu-hotplug
 TARGETS += damon
 TARGETS += drivers/dma-buf
 TARGETS += drivers/s390x/uvdevice
+TARGETS += drivers/net/bonding
 TARGETS += efivarfs
 TARGETS += exec
 TARGETS += filesystems
diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
new file mode 100644
index 000000000000..ab6c54b12098
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for net selftests
+
+TEST_PROGS := bond-break-lacpdu-tx.sh
+
+include ../../../lib.mk
diff --git a/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
new file mode 100755
index 000000000000..47ab90596acb
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+# Regression Test:
+#   Verify LACPDUs get transmitted after setting the MAC address of
+#   the bond.
+#
+# https://bugzilla.redhat.com/show_bug.cgi?id=2020773
+#
+#       +---------+
+#       | fab-br0 |
+#       +---------+
+#            |
+#       +---------+
+#       |  fbond  |
+#       +---------+
+#        |       |
+#    +------+ +------+
+#    |veth1 | |veth2 |
+#    +------+ +------+
+#
+# We use veths instead of physical interfaces
+
+set -e
+tmp=$(mktemp -q dump.XXXXXX)
+cleanup() {
+	ip link del fab-br0 >/dev/null 2>&1 || :
+	ip link del fbond  >/dev/null 2>&1 || :
+	ip link del veth1-bond  >/dev/null 2>&1 || :
+	ip link del veth2-bond  >/dev/null 2>&1 || :
+	modprobe -r bonding  >/dev/null 2>&1 || :
+	rm -f -- ${tmp}
+}
+
+trap cleanup 0 1 2
+cleanup
+sleep 1
+
+# create the bridge
+ip link add fab-br0 address 52:54:00:3B:7C:A6 mtu 1500 type bridge \
+	forward_delay 15
+
+# create the bond
+ip link add fbond type bond mode 4 miimon 200 xmit_hash_policy 1 \
+	ad_actor_sys_prio 65535 lacp_rate fast
+
+# set bond address
+ip link set fbond address 52:54:00:3B:7C:A6
+ip link set fbond up
+
+# set again bond sysfs parameters
+ip link set fbond type bond ad_actor_sys_prio 65535
+
+# create veths
+ip link add name veth1-bond type veth peer name veth1-end
+ip link add name veth2-bond type veth peer name veth2-end
+
+# add ports
+ip link set fbond master fab-br0
+ip link set veth1-bond down master fbond
+ip link set veth2-bond down master fbond
+
+# bring up
+ip link set veth1-end up
+ip link set veth2-end up
+ip link set fab-br0 up
+ip link set fbond up
+ip addr add dev fab-br0 10.0.0.3
+
+tcpdump -n -i veth1-end -e ether proto 0x8809 >${tmp} 2>&1 &
+sleep 15
+pkill tcpdump >/dev/null 2>&1
+rc=0
+num=$(grep "packets captured" ${tmp} | awk '{print $1}')
+if test "$num" -gt 0; then
+	echo "PASS, captured ${num}"
+else
+	echo "FAIL"
+	rc=1
+fi
+exit $rc
diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config
new file mode 100644
index 000000000000..dc1c22de3c92
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/config
@@ -0,0 +1 @@
+CONFIG_BONDING=y
diff --git a/tools/testing/selftests/drivers/net/bonding/settings b/tools/testing/selftests/drivers/net/bonding/settings
new file mode 100644
index 000000000000..867e118223cd
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/settings
@@ -0,0 +1 @@
+timeout=60
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs
  2022-08-15 15:08 [PATCH net v3 0/2] bonding: 802.3ad: fix no transmission of LACPDUs Jonathan Toppins
  2022-08-15 15:08 ` [PATCH net v3 1/2] selftests: include bonding tests into the kselftest infra Jonathan Toppins
@ 2022-08-15 15:08 ` Jonathan Toppins
  2022-08-16  7:29   ` Hangbin Liu
  2022-08-16 13:11   ` Jay Vosburgh
  1 sibling, 2 replies; 9+ messages in thread
From: Jonathan Toppins @ 2022-08-15 15:08 UTC (permalink / raw)
  To: netdev
  Cc: liuhangbin, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.

To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.

line 44: ip link add fbond type bond mode 4 miimon 200 \
            xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
    bond->dev->dev_addr = new_mac

line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 60: ip link set veth1-bond down master fbond
given:
    params.ad_actor_system = 0
    params.mode = BOND_MODE_8023AD
    ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
    bond_enslave
    -> bond_3ad_initialize(); because first slave
       -> if ad.system.sys_mac_addr != bond->dev->dev_addr
          return
results:
     Nothing is run in bond_3ad_initialize() because dev_add equals
     sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
     never initialized anywhere else.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
---

Notes:
    v2:
     * split this fix from the reproducer
    v3:
     * rebased to latest net/master

 drivers/net/bonding/bond_3ad.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index d7fb33c078e8..957d30db6f95 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -84,7 +84,8 @@ enum ad_link_speed_type {
 static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = {
 	0, 0, 0, 0, 0, 0
 };
-static u16 ad_ticks_per_sec;
+
+static u16 ad_ticks_per_sec = 1000 / AD_TIMER_INTERVAL;
 static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000;
 
 static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] __long_aligned =
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3 1/2] selftests: include bonding tests into the kselftest infra
  2022-08-15 15:08 ` [PATCH net v3 1/2] selftests: include bonding tests into the kselftest infra Jonathan Toppins
@ 2022-08-16  7:28   ` Hangbin Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Hangbin Liu @ 2022-08-16  7:28 UTC (permalink / raw)
  To: Jonathan Toppins; +Cc: netdev, Shuah Khan, linux-kernel, linux-kselftest

On Mon, Aug 15, 2022 at 11:08:34AM -0400, Jonathan Toppins wrote:
> This creates a test collection in drivers/net/bonding for bonding
> specific kernel selftests.
> 
> The first test is a reproducer that provisions a bond and given the
> specific order in how the ip-link(8) commands are issued the bond never
> transmits an LACPDU frame on any of its slaves.
> 
> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
> ---
> 
> Notes:
>     v2:
>      * fully integrated the test into the kselftests infrastructure
>      * moved the reproducer to under
>        tools/testing/selftests/drivers/net/bonding
>      * reduced the test to its minimial amount and used ip-link(8) for
>        all bond interface configuration
>     v3:
>      * rebase to latest net/master
>      * remove `#set -x` requested by Hangbin
> 
>  MAINTAINERS                                   |  1 +
>  tools/testing/selftests/Makefile              |  1 +
>  .../selftests/drivers/net/bonding/Makefile    |  6 ++
>  .../net/bonding/bond-break-lacpdu-tx.sh       | 81 +++++++++++++++++++
>  .../selftests/drivers/net/bonding/config      |  1 +
>  .../selftests/drivers/net/bonding/settings    |  1 +
>  6 files changed, 91 insertions(+)
>  create mode 100644 tools/testing/selftests/drivers/net/bonding/Makefile
>  create mode 100755 tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
>  create mode 100644 tools/testing/selftests/drivers/net/bonding/config
>  create mode 100644 tools/testing/selftests/drivers/net/bonding/settings
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f2d64020399b..e5fb14dc302d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3672,6 +3672,7 @@ F:	Documentation/networking/bonding.rst
>  F:	drivers/net/bonding/
>  F:	include/net/bond*
>  F:	include/uapi/linux/if_bonding.h
> +F:	tools/testing/selftests/net/bonding/
>  
>  BOSCH SENSORTEC BMA400 ACCELEROMETER IIO DRIVER
>  M:	Dan Robertson <dan@dlrobertson.com>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index 10b34bb03bc1..c2064a35688b 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -12,6 +12,7 @@ TARGETS += cpu-hotplug
>  TARGETS += damon
>  TARGETS += drivers/dma-buf
>  TARGETS += drivers/s390x/uvdevice
> +TARGETS += drivers/net/bonding
>  TARGETS += efivarfs
>  TARGETS += exec
>  TARGETS += filesystems
> diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
> new file mode 100644
> index 000000000000..ab6c54b12098
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/bonding/Makefile
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# Makefile for net selftests
> +
> +TEST_PROGS := bond-break-lacpdu-tx.sh
> +
> +include ../../../lib.mk
> diff --git a/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
> new file mode 100755
> index 000000000000..47ab90596acb
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
> @@ -0,0 +1,81 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# Regression Test:
> +#   Verify LACPDUs get transmitted after setting the MAC address of
> +#   the bond.
> +#
> +# https://bugzilla.redhat.com/show_bug.cgi?id=2020773
> +#
> +#       +---------+
> +#       | fab-br0 |
> +#       +---------+
> +#            |
> +#       +---------+
> +#       |  fbond  |
> +#       +---------+
> +#        |       |
> +#    +------+ +------+
> +#    |veth1 | |veth2 |
> +#    +------+ +------+
> +#
> +# We use veths instead of physical interfaces
> +
> +set -e
> +tmp=$(mktemp -q dump.XXXXXX)
> +cleanup() {
> +	ip link del fab-br0 >/dev/null 2>&1 || :
> +	ip link del fbond  >/dev/null 2>&1 || :
> +	ip link del veth1-bond  >/dev/null 2>&1 || :
> +	ip link del veth2-bond  >/dev/null 2>&1 || :
> +	modprobe -r bonding  >/dev/null 2>&1 || :
> +	rm -f -- ${tmp}
> +}
> +
> +trap cleanup 0 1 2
> +cleanup
> +sleep 1
> +
> +# create the bridge
> +ip link add fab-br0 address 52:54:00:3B:7C:A6 mtu 1500 type bridge \
> +	forward_delay 15
> +
> +# create the bond
> +ip link add fbond type bond mode 4 miimon 200 xmit_hash_policy 1 \
> +	ad_actor_sys_prio 65535 lacp_rate fast
> +
> +# set bond address
> +ip link set fbond address 52:54:00:3B:7C:A6
> +ip link set fbond up
> +
> +# set again bond sysfs parameters
> +ip link set fbond type bond ad_actor_sys_prio 65535
> +
> +# create veths
> +ip link add name veth1-bond type veth peer name veth1-end
> +ip link add name veth2-bond type veth peer name veth2-end
> +
> +# add ports
> +ip link set fbond master fab-br0
> +ip link set veth1-bond down master fbond
> +ip link set veth2-bond down master fbond
> +
> +# bring up
> +ip link set veth1-end up
> +ip link set veth2-end up
> +ip link set fab-br0 up
> +ip link set fbond up
> +ip addr add dev fab-br0 10.0.0.3
> +
> +tcpdump -n -i veth1-end -e ether proto 0x8809 >${tmp} 2>&1 &
> +sleep 15
> +pkill tcpdump >/dev/null 2>&1
> +rc=0
> +num=$(grep "packets captured" ${tmp} | awk '{print $1}')
> +if test "$num" -gt 0; then
> +	echo "PASS, captured ${num}"
> +else
> +	echo "FAIL"
> +	rc=1
> +fi
> +exit $rc
> diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config
> new file mode 100644
> index 000000000000..dc1c22de3c92
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/bonding/config
> @@ -0,0 +1 @@
> +CONFIG_BONDING=y
> diff --git a/tools/testing/selftests/drivers/net/bonding/settings b/tools/testing/selftests/drivers/net/bonding/settings
> new file mode 100644
> index 000000000000..867e118223cd
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/bonding/settings
> @@ -0,0 +1 @@
> +timeout=60
> -- 
> 2.31.1
> 

Acked-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs
  2022-08-15 15:08 ` [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs Jonathan Toppins
@ 2022-08-16  7:29   ` Hangbin Liu
  2022-08-16 13:11   ` Jay Vosburgh
  1 sibling, 0 replies; 9+ messages in thread
From: Hangbin Liu @ 2022-08-16  7:29 UTC (permalink / raw)
  To: Jonathan Toppins
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

On Mon, Aug 15, 2022 at 11:08:35AM -0400, Jonathan Toppins wrote:
> This is caused by the global variable ad_ticks_per_sec being zero as
> demonstrated by the reproducer script discussed below. This causes
> all timer values in __ad_timer_to_ticks to be zero, resulting
> in the periodic timer to never fire.
> 
> To reproduce:
> Run the script in
> `tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
> puts bonding into a state where it never transmits LACPDUs.
> 
> line 44: ip link add fbond type bond mode 4 miimon 200 \
>             xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
> setting bond param: ad_actor_sys_prio
> given:
>     params.ad_actor_system = 0
> call stack:
>     bond_option_ad_actor_sys_prio()
>     -> bond_3ad_update_ad_actor_settings()
>        -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>        -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>             params.ad_actor_system == 0
> results:
>      ad.system.sys_mac_addr = bond->dev->dev_addr
> 
> line 48: ip link set fbond address 52:54:00:3B:7C:A6
> setting bond MAC addr
> call stack:
>     bond->dev->dev_addr = new_mac
> 
> line 52: ip link set fbond type bond ad_actor_sys_prio 65535
> setting bond param: ad_actor_sys_prio
> given:
>     params.ad_actor_system = 0
> call stack:
>     bond_option_ad_actor_sys_prio()
>     -> bond_3ad_update_ad_actor_settings()
>        -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>        -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>             params.ad_actor_system == 0
> results:
>      ad.system.sys_mac_addr = bond->dev->dev_addr
> 
> line 60: ip link set veth1-bond down master fbond
> given:
>     params.ad_actor_system = 0
>     params.mode = BOND_MODE_8023AD
>     ad.system.sys_mac_addr == bond->dev->dev_addr
> call stack:
>     bond_enslave
>     -> bond_3ad_initialize(); because first slave
>        -> if ad.system.sys_mac_addr != bond->dev->dev_addr
>           return
> results:
>      Nothing is run in bond_3ad_initialize() because dev_add equals
>      sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
>      never initialized anywhere else.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
> ---
> 
> Notes:
>     v2:
>      * split this fix from the reproducer
>     v3:
>      * rebased to latest net/master
> 
>  drivers/net/bonding/bond_3ad.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index d7fb33c078e8..957d30db6f95 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -84,7 +84,8 @@ enum ad_link_speed_type {
>  static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = {
>  	0, 0, 0, 0, 0, 0
>  };
> -static u16 ad_ticks_per_sec;
> +
> +static u16 ad_ticks_per_sec = 1000 / AD_TIMER_INTERVAL;
>  static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000;
>  
>  static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] __long_aligned =
> -- 
> 2.31.1
> 

Acked-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs
  2022-08-15 15:08 ` [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs Jonathan Toppins
  2022-08-16  7:29   ` Hangbin Liu
@ 2022-08-16 13:11   ` Jay Vosburgh
  2022-08-16 13:41     ` Jonathan Toppins
  1 sibling, 1 reply; 9+ messages in thread
From: Jay Vosburgh @ 2022-08-16 13:11 UTC (permalink / raw)
  To: Jonathan Toppins
  Cc: netdev, liuhangbin, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

Jonathan Toppins <jtoppins@redhat.com> wrote:

>This is caused by the global variable ad_ticks_per_sec being zero as
>demonstrated by the reproducer script discussed below. This causes
>all timer values in __ad_timer_to_ticks to be zero, resulting
>in the periodic timer to never fire.
>
>To reproduce:
>Run the script in
>`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
>puts bonding into a state where it never transmits LACPDUs.
>
>line 44: ip link add fbond type bond mode 4 miimon 200 \
>            xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
>setting bond param: ad_actor_sys_prio
>given:
>    params.ad_actor_system = 0
>call stack:
>    bond_option_ad_actor_sys_prio()
>    -> bond_3ad_update_ad_actor_settings()
>       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>            params.ad_actor_system == 0
>results:
>     ad.system.sys_mac_addr = bond->dev->dev_addr
>
>line 48: ip link set fbond address 52:54:00:3B:7C:A6
>setting bond MAC addr
>call stack:
>    bond->dev->dev_addr = new_mac
>
>line 52: ip link set fbond type bond ad_actor_sys_prio 65535
>setting bond param: ad_actor_sys_prio
>given:
>    params.ad_actor_system = 0
>call stack:
>    bond_option_ad_actor_sys_prio()
>    -> bond_3ad_update_ad_actor_settings()
>       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>            params.ad_actor_system == 0
>results:
>     ad.system.sys_mac_addr = bond->dev->dev_addr
>
>line 60: ip link set veth1-bond down master fbond
>given:
>    params.ad_actor_system = 0
>    params.mode = BOND_MODE_8023AD
>    ad.system.sys_mac_addr == bond->dev->dev_addr
>call stack:
>    bond_enslave
>    -> bond_3ad_initialize(); because first slave
>       -> if ad.system.sys_mac_addr != bond->dev->dev_addr
>          return
>results:
>     Nothing is run in bond_3ad_initialize() because dev_add equals
>     sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
>     never initialized anywhere else.
>
>Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>---
>
>Notes:
>    v2:
>     * split this fix from the reproducer
>    v3:
>     * rebased to latest net/master
>
> drivers/net/bonding/bond_3ad.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index d7fb33c078e8..957d30db6f95 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -84,7 +84,8 @@ enum ad_link_speed_type {
> static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = {
> 	0, 0, 0, 0, 0, 0
> };
>-static u16 ad_ticks_per_sec;
>+
>+static u16 ad_ticks_per_sec = 1000 / AD_TIMER_INTERVAL;
> static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000;

	I still feel like this is kind of a hack, as it's not really
fixing bond_3ad_initialize to actually work (which is the real problem
as I understand it).  If it's ok to skip all that for this case, then
why do we ever need to call bond_3ad_initialize?

	-J

> static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] __long_aligned =
>-- 
>2.31.1
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs
  2022-08-16 13:11   ` Jay Vosburgh
@ 2022-08-16 13:41     ` Jonathan Toppins
  2022-08-16 17:47       ` Jonathan Toppins
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Toppins @ 2022-08-16 13:41 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, liuhangbin, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

On 8/16/22 09:11, Jay Vosburgh wrote:
> Jonathan Toppins <jtoppins@redhat.com> wrote:
> 
>> This is caused by the global variable ad_ticks_per_sec being zero as
>> demonstrated by the reproducer script discussed below. This causes
>> all timer values in __ad_timer_to_ticks to be zero, resulting
>> in the periodic timer to never fire.
>>
>> To reproduce:
>> Run the script in
>> `tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
>> puts bonding into a state where it never transmits LACPDUs.
>>
>> line 44: ip link add fbond type bond mode 4 miimon 200 \
>>             xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
>> setting bond param: ad_actor_sys_prio
>> given:
>>     params.ad_actor_system = 0
>> call stack:
>>     bond_option_ad_actor_sys_prio()
>>     -> bond_3ad_update_ad_actor_settings()
>>        -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>>        -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>>             params.ad_actor_system == 0
>> results:
>>      ad.system.sys_mac_addr = bond->dev->dev_addr
>>
>> line 48: ip link set fbond address 52:54:00:3B:7C:A6
>> setting bond MAC addr
>> call stack:
>>     bond->dev->dev_addr = new_mac
>>
>> line 52: ip link set fbond type bond ad_actor_sys_prio 65535
>> setting bond param: ad_actor_sys_prio
>> given:
>>     params.ad_actor_system = 0
>> call stack:
>>     bond_option_ad_actor_sys_prio()
>>     -> bond_3ad_update_ad_actor_settings()
>>        -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>>        -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>>             params.ad_actor_system == 0
>> results:
>>      ad.system.sys_mac_addr = bond->dev->dev_addr
>>
>> line 60: ip link set veth1-bond down master fbond
>> given:
>>     params.ad_actor_system = 0
>>     params.mode = BOND_MODE_8023AD
>>     ad.system.sys_mac_addr == bond->dev->dev_addr
>> call stack:
>>     bond_enslave
>>     -> bond_3ad_initialize(); because first slave
>>        -> if ad.system.sys_mac_addr != bond->dev->dev_addr
>>           return
>> results:
>>      Nothing is run in bond_3ad_initialize() because dev_add equals
>>      sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
>>      never initialized anywhere else.
>>
>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>> ---
>>
>> Notes:
>>     v2:
>>      * split this fix from the reproducer
>>     v3:
>>      * rebased to latest net/master
>>
>> drivers/net/bonding/bond_3ad.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> index d7fb33c078e8..957d30db6f95 100644
>> --- a/drivers/net/bonding/bond_3ad.c
>> +++ b/drivers/net/bonding/bond_3ad.c
>> @@ -84,7 +84,8 @@ enum ad_link_speed_type {
>> static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = {
>> 	0, 0, 0, 0, 0, 0
>> };
>> -static u16 ad_ticks_per_sec;
>> +
>> +static u16 ad_ticks_per_sec = 1000 / AD_TIMER_INTERVAL;
>> static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000;
> 
> 	I still feel like this is kind of a hack, as it's not really
> fixing bond_3ad_initialize to actually work (which is the real problem
> as I understand it).  If it's ok to skip all that for this case, then
> why do we ever need to call bond_3ad_initialize?
> 

The way it is currently written you still need to call 
bond_3ad_initialize() just not for setting the tick resolution. The 
issue here is ad_ticks_per_sec is used in several places to calculate 
timer periods, __ad_timer_to_ticks(), for various timers in the 802.3ad 
protocol. And if this variable, ad_ticks_per_sec, is left uninitialized 
all of these timer periods go to zero. Since the value passed in 
bond_3ad_initialize() is an immediate value I simply moved it off of the 
call stack and set the static global variable instead.

To fix bond_3ad_initialize(), probably something like the below is 
needed, but I do not understand why the guard if check was placed in 
bond_3ad_initialize().

diff --git i/drivers/net/bonding/bond_3ad.c w/drivers/net/bonding/bond_3ad.c
index d7fb33c078e8..5b5146f5c4ea 100644
--- i/drivers/net/bonding/bond_3ad.c
+++ w/drivers/net/bonding/bond_3ad.c
@@ -2005,32 +2005,21 @@ void bond_3ad_initiate_agg_selection(struct 
bonding *bond, int timeout)
   *
   * Can be called only after the mac address of the bond is set.
   */
-void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution)
+void bond_3ad_initialize(struct bonding *bond)
  {
-	/* check that the bond is not initialized yet */
-	if (!MAC_ADDRESS_EQUAL(&(BOND_AD_INFO(bond).system.sys_mac_addr),
-				bond->dev->dev_addr)) {
-
-		BOND_AD_INFO(bond).aggregator_identifier = 0;
-
-		BOND_AD_INFO(bond).system.sys_priority =
-			bond->params.ad_actor_sys_prio;
-		if (is_zero_ether_addr(bond->params.ad_actor_system))
-			BOND_AD_INFO(bond).system.sys_mac_addr =
-			    *((struct mac_addr *)bond->dev->dev_addr);
-		else
-			BOND_AD_INFO(bond).system.sys_mac_addr =
-			    *((struct mac_addr *)bond->params.ad_actor_system);
-
-		/* initialize how many times this module is called in one
-		 * second (should be about every 100ms)
-		 */
-		ad_ticks_per_sec = tick_resolution;
+	BOND_AD_INFO(bond).aggregator_identifier = 0;
+	BOND_AD_INFO(bond).system.sys_priority =
+		bond->params.ad_actor_sys_prio;
+	if (is_zero_ether_addr(bond->params.ad_actor_system))
+		BOND_AD_INFO(bond).system.sys_mac_addr =
+		    *((struct mac_addr *)bond->dev->dev_addr);
+	else
+		BOND_AD_INFO(bond).system.sys_mac_addr =
+		    *((struct mac_addr *)bond->params.ad_actor_system);

-		bond_3ad_initiate_agg_selection(bond,
-						AD_AGGREGATOR_SELECTION_TIMER *
-						ad_ticks_per_sec);
-	}
+	bond_3ad_initiate_agg_selection(bond,
+					AD_AGGREGATOR_SELECTION_TIMER *
+					ad_ticks_per_sec);
  }

  /**
diff --git i/drivers/net/bonding/bond_main.c 
w/drivers/net/bonding/bond_main.c
index 50e60843020c..5f56af9dc3ba 100644
--- i/drivers/net/bonding/bond_main.c
+++ w/drivers/net/bonding/bond_main.c
@@ -2078,10 +2078,10 @@ int bond_enslave(struct net_device *bond_dev, 
struct net_device *slave_dev,
  		/* if this is the first slave */
  		if (!prev_slave) {
  			SLAVE_AD_INFO(new_slave)->id = 1;
-			/* Initialize AD with the number of times that the AD timer is 
called in 1 second
-			 * can be called only after the mac address of the bond is set
+			/* can be called only after the mac address of the
+			 * bond is set
  			 */
-			bond_3ad_initialize(bond, 1000/AD_TIMER_INTERVAL);
+			bond_3ad_initialize(bond);
  		} else {
  			SLAVE_AD_INFO(new_slave)->id =
  				SLAVE_AD_INFO(prev_slave)->id + 1;
diff --git i/include/net/bond_3ad.h w/include/net/bond_3ad.h
index 184105d68294..be2992e6de5d 100644
--- i/include/net/bond_3ad.h
+++ w/include/net/bond_3ad.h
@@ -290,7 +290,7 @@ static inline const char 
*bond_3ad_churn_desc(churn_state_t state)
  }

  /* ========== AD Exported functions to the main bonding code ========== */
-void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution);
+void bond_3ad_initialize(struct bonding *bond);
  void bond_3ad_bind_slave(struct slave *slave);
  void bond_3ad_unbind_slave(struct slave *slave);
  void bond_3ad_state_machine_handler(struct work_struct *);

-Jon


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs
  2022-08-16 13:41     ` Jonathan Toppins
@ 2022-08-16 17:47       ` Jonathan Toppins
  2022-08-17 23:24         ` Jay Vosburgh
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Toppins @ 2022-08-16 17:47 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, liuhangbin, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

On 8/16/22 09:41, Jonathan Toppins wrote:
> On 8/16/22 09:11, Jay Vosburgh wrote:
>> Jonathan Toppins <jtoppins@redhat.com> wrote:
>>
>>> This is caused by the global variable ad_ticks_per_sec being zero as
>>> demonstrated by the reproducer script discussed below. This causes
>>> all timer values in __ad_timer_to_ticks to be zero, resulting
>>> in the periodic timer to never fire.
>>>
>>> To reproduce:
>>> Run the script in
>>> `tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` 
>>> which
>>> puts bonding into a state where it never transmits LACPDUs.
>>>
>>> line 44: ip link add fbond type bond mode 4 miimon 200 \
>>>             xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
>>> setting bond param: ad_actor_sys_prio
>>> given:
>>>     params.ad_actor_system = 0
>>> call stack:
>>>     bond_option_ad_actor_sys_prio()
>>>     -> bond_3ad_update_ad_actor_settings()
>>>        -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>>>        -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>>>             params.ad_actor_system == 0
>>> results:
>>>      ad.system.sys_mac_addr = bond->dev->dev_addr
>>>
>>> line 48: ip link set fbond address 52:54:00:3B:7C:A6
>>> setting bond MAC addr
>>> call stack:
>>>     bond->dev->dev_addr = new_mac
>>>
>>> line 52: ip link set fbond type bond ad_actor_sys_prio 65535
>>> setting bond param: ad_actor_sys_prio
>>> given:
>>>     params.ad_actor_system = 0
>>> call stack:
>>>     bond_option_ad_actor_sys_prio()
>>>     -> bond_3ad_update_ad_actor_settings()
>>>        -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>>>        -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>>>             params.ad_actor_system == 0
>>> results:
>>>      ad.system.sys_mac_addr = bond->dev->dev_addr
>>>
>>> line 60: ip link set veth1-bond down master fbond
>>> given:
>>>     params.ad_actor_system = 0
>>>     params.mode = BOND_MODE_8023AD
>>>     ad.system.sys_mac_addr == bond->dev->dev_addr
>>> call stack:
>>>     bond_enslave
>>>     -> bond_3ad_initialize(); because first slave
>>>        -> if ad.system.sys_mac_addr != bond->dev->dev_addr
>>>           return
>>> results:
>>>      Nothing is run in bond_3ad_initialize() because dev_add equals
>>>      sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
>>>      never initialized anywhere else.
>>>
>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>>> ---
>>>
>>> Notes:
>>>     v2:
>>>      * split this fix from the reproducer
>>>     v3:
>>>      * rebased to latest net/master
>>>
>>> drivers/net/bonding/bond_3ad.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_3ad.c 
>>> b/drivers/net/bonding/bond_3ad.c
>>> index d7fb33c078e8..957d30db6f95 100644
>>> --- a/drivers/net/bonding/bond_3ad.c
>>> +++ b/drivers/net/bonding/bond_3ad.c
>>> @@ -84,7 +84,8 @@ enum ad_link_speed_type {
>>> static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = {
>>>     0, 0, 0, 0, 0, 0
>>> };
>>> -static u16 ad_ticks_per_sec;
>>> +
>>> +static u16 ad_ticks_per_sec = 1000 / AD_TIMER_INTERVAL;
>>> static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000;
>>
>>     I still feel like this is kind of a hack, as it's not really
>> fixing bond_3ad_initialize to actually work (which is the real problem
>> as I understand it).  If it's ok to skip all that for this case, then
>> why do we ever need to call bond_3ad_initialize?
>>
> 
> The way it is currently written you still need to call 
> bond_3ad_initialize() just not for setting the tick resolution. The 
> issue here is ad_ticks_per_sec is used in several places to calculate 
> timer periods, __ad_timer_to_ticks(), for various timers in the 802.3ad 
> protocol. And if this variable, ad_ticks_per_sec, is left uninitialized 
> all of these timer periods go to zero. Since the value passed in 
> bond_3ad_initialize() is an immediate value I simply moved it off of the 
> call stack and set the static global variable instead.
> 
> To fix bond_3ad_initialize(), probably something like the below is 
> needed, but I do not understand why the guard if check was placed in 
> bond_3ad_initialize().

I looked at the history of the if guard in bond_3ad_initialize and it 
has existed since the creation of the git tree. It appears since commit 
5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes 
immediately") the if guard is no longer needed and removing the if guard 
would also fix the problem, I have not tested yet.

I think this patch series can be accepted as-is because, it does fix the 
issue as demonstrated by the kselftest accompanying the series and is 
the smallest change to fix the issue.

Further, I don't see why we want to set the file-scoped variable, 
ad_ticks_per_sec, inside bond_3ad_initialize() as ad_ticks_per_sec is 
utilized across all bonds. It seems like ad_ticks_per_sec should be 
changed to a const and set at the top of the file. I see no value in 
passing the value as an unnamed constant on the stack when 
bond_3ad_initialize is called. These changes however could be done in 
the net-next tree as follow-on cleanup patches.

Jay, how would you like to proceed?

[...]

Thanks,
-Jon


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs
  2022-08-16 17:47       ` Jonathan Toppins
@ 2022-08-17 23:24         ` Jay Vosburgh
  0 siblings, 0 replies; 9+ messages in thread
From: Jay Vosburgh @ 2022-08-17 23:24 UTC (permalink / raw)
  To: Jonathan Toppins
  Cc: netdev, liuhangbin, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

Jonathan Toppins <jtoppins@redhat.com> wrote:

>On 8/16/22 09:41, Jonathan Toppins wrote:
>> On 8/16/22 09:11, Jay Vosburgh wrote:
>>> Jonathan Toppins <jtoppins@redhat.com> wrote:
>>>
>>>> This is caused by the global variable ad_ticks_per_sec being zero as
>>>> demonstrated by the reproducer script discussed below. This causes
>>>> all timer values in __ad_timer_to_ticks to be zero, resulting
>>>> in the periodic timer to never fire.
>>>>
>>>> To reproduce:
>>>> Run the script in
>>>> `tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh`
>>>> which
>>>> puts bonding into a state where it never transmits LACPDUs.
>>>>
>>>> line 44: ip link add fbond type bond mode 4 miimon 200 \
>>>>             xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
>>>> setting bond param: ad_actor_sys_prio
>>>> given:
>>>>     params.ad_actor_system = 0
>>>> call stack:
>>>>     bond_option_ad_actor_sys_prio()
>>>>     -> bond_3ad_update_ad_actor_settings()
>>>>        -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>>>>        -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>>>>             params.ad_actor_system == 0
>>>> results:
>>>>      ad.system.sys_mac_addr = bond->dev->dev_addr
>>>>
>>>> line 48: ip link set fbond address 52:54:00:3B:7C:A6
>>>> setting bond MAC addr
>>>> call stack:
>>>>     bond->dev->dev_addr = new_mac
>>>>
>>>> line 52: ip link set fbond type bond ad_actor_sys_prio 65535
>>>> setting bond param: ad_actor_sys_prio
>>>> given:
>>>>     params.ad_actor_system = 0
>>>> call stack:
>>>>     bond_option_ad_actor_sys_prio()
>>>>     -> bond_3ad_update_ad_actor_settings()
>>>>        -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
>>>>        -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
>>>>             params.ad_actor_system == 0
>>>> results:
>>>>      ad.system.sys_mac_addr = bond->dev->dev_addr
>>>>
>>>> line 60: ip link set veth1-bond down master fbond
>>>> given:
>>>>     params.ad_actor_system = 0
>>>>     params.mode = BOND_MODE_8023AD
>>>>     ad.system.sys_mac_addr == bond->dev->dev_addr
>>>> call stack:
>>>>     bond_enslave
>>>>     -> bond_3ad_initialize(); because first slave
>>>>        -> if ad.system.sys_mac_addr != bond->dev->dev_addr
>>>>           return
>>>> results:
>>>>      Nothing is run in bond_3ad_initialize() because dev_add equals
>>>>      sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
>>>>      never initialized anywhere else.
>>>>
>>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>>>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>>>> ---
>>>>
>>>> Notes:
>>>>     v2:
>>>>      * split this fix from the reproducer
>>>>     v3:
>>>>      * rebased to latest net/master
>>>>
>>>> drivers/net/bonding/bond_3ad.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/bonding/bond_3ad.c
>>>> b/drivers/net/bonding/bond_3ad.c
>>>> index d7fb33c078e8..957d30db6f95 100644
>>>> --- a/drivers/net/bonding/bond_3ad.c
>>>> +++ b/drivers/net/bonding/bond_3ad.c
>>>> @@ -84,7 +84,8 @@ enum ad_link_speed_type {
>>>> static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = {
>>>>     0, 0, 0, 0, 0, 0
>>>> };
>>>> -static u16 ad_ticks_per_sec;
>>>> +
>>>> +static u16 ad_ticks_per_sec = 1000 / AD_TIMER_INTERVAL;
>>>> static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000;
>>>
>>>     I still feel like this is kind of a hack, as it's not really
>>> fixing bond_3ad_initialize to actually work (which is the real problem
>>> as I understand it).  If it's ok to skip all that for this case, then
>>> why do we ever need to call bond_3ad_initialize?
>>>
>> The way it is currently written you still need to call
>> bond_3ad_initialize() just not for setting the tick resolution. The
>> issue here is ad_ticks_per_sec is used in several places to calculate
>> timer periods, __ad_timer_to_ticks(), for various timers in the 802.3ad
>> protocol. And if this variable, ad_ticks_per_sec, is left uninitialized
>> all of these timer periods go to zero. Since the value passed in
>> bond_3ad_initialize() is an immediate value I simply moved it off of the
>> call stack and set the static global variable instead.
>> To fix bond_3ad_initialize(), probably something like the below is
>> needed, but I do not understand why the guard if check was placed in
>> bond_3ad_initialize().
>
>I looked at the history of the if guard in bond_3ad_initialize and it has
>existed since the creation of the git tree. It appears since commit
>5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately")
>the if guard is no longer needed and removing the if guard would also fix
>the problem, I have not tested yet.

	The logic in bond_3ad_initialize probably came that way when the
code was contributed sometime during 2.4.x.

	Curiosity got the better of me, and I looked at the 2.4.35 code.
I'm not sure what the point of the test was even then, since
bond_3ad_initialize is only called when adding a first interface to the
bond, and there wasn't a way to tweak the sys_mac_addr then.

	In the current code, I think the if test could fail if
sys_mac_addr is set manually to be equal to the bond's MAC prior to
adding the first interface to the bond.  As far as I can tell, the only
result of failing the MAC test would be that the agg selection timer
wouldn't be started, which is an optimization to reduce LACP aggregator
flailing when multiple interfaces are added at the same time (IEEE
802.1AX-2000 6.4.9 and 6.4.12.1.n).

>I think this patch series can be accepted as-is because, it does fix the
>issue as demonstrated by the kselftest accompanying the series and is the
>smallest change to fix the issue.
>
>Further, I don't see why we want to set the file-scoped variable,
>ad_ticks_per_sec, inside bond_3ad_initialize() as ad_ticks_per_sec is
>utilized across all bonds. It seems like ad_ticks_per_sec should be
>changed to a const and set at the top of the file. I see no value in
>passing the value as an unnamed constant on the stack when
>bond_3ad_initialize is called. These changes however could be done in the
>net-next tree as follow-on cleanup patches.
>
>Jay, how would you like to proceed?

	I don't have issue with moving ad_ticks_per_sec to a file scope
constant.  The minimal change here, though, is effectively making the
tick_resolution parameter to bond_3ad_initialize be ignored, even though
the compiler won't complain about it.

	Given that there is already mystery in how some of this works,
I'd prefer the patches to make the code clearer, so my vote is for the
"fix it right" method, i.e., make ad_ticks_per_sec a real constant,
remove the tick_resolution parameter from bond_3ad_initialize and drop
the "if MAC_ADDRESS_COMPARE" test therein.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-17 23:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-15 15:08 [PATCH net v3 0/2] bonding: 802.3ad: fix no transmission of LACPDUs Jonathan Toppins
2022-08-15 15:08 ` [PATCH net v3 1/2] selftests: include bonding tests into the kselftest infra Jonathan Toppins
2022-08-16  7:28   ` Hangbin Liu
2022-08-15 15:08 ` [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs Jonathan Toppins
2022-08-16  7:29   ` Hangbin Liu
2022-08-16 13:11   ` Jay Vosburgh
2022-08-16 13:41     ` Jonathan Toppins
2022-08-16 17:47       ` Jonathan Toppins
2022-08-17 23:24         ` Jay Vosburgh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.