All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor
@ 2022-07-31 12:41 Vladimir Oltean
  2022-07-31 12:41 ` [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS Vladimir Oltean
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-07-31 12:41 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

Since commit 2b86cb829976 ("net: dsa: declare lockless TX feature for
slave ports") in v5.7, DSA breaks the ARP monitoring logic from the
bonding driver, fact which was pointed out by Brian Hutchinson who uses
a linux-5.10.y stable kernel.

Initially I got lured by other similar hacks introduced for other
NETIF_F_LLTX drivers, which, inspired by the bonding documentation,
update the trans_start of their TX queues by hand.

However Jakub pointed out that this simply isn't a proper solution, and
after coming to think more about it, I agree, and it doesn't work
properly with DSA nor is it maintainable for the future changes I plan
for it (multiple DSA masters in a LAG).

I've tested these changes using a DSA-based setup and a veth-based
setup, using the active-backup mode and ARP monitoring, with and without
arp_validate.

Where I'd need some help from Jakub is to make sure these changes
somehow get integrated into the 5.10 stable kernel, since that's what
Brian, who reported the issue, actually uses. I haven't provided any
Fixes tags.

More testing and other feedback is welcome.

Link to v1:
https://patchwork.kernel.org/project/netdevbpf/patch/20220715232641.952532-1-vladimir.oltean@nxp.com/

Link to v2:
https://patchwork.kernel.org/project/netdevbpf/patch/20220727152000.3616086-1-vladimir.oltean@nxp.com/

Vladimir Oltean (4):
  net: bonding: replace dev_trans_start() with the jiffies of the last
    ARP/NS
  net/sched: remove hacks added to dev_trans_start() for bonding to work
  Revert "veth: Add updating of trans_start"
  docs: net: bonding: remove mentions of trans_start

 Documentation/networking/bonding.rst |  9 -------
 drivers/net/bonding/bond_main.c      | 35 ++++++++++++++++------------
 drivers/net/veth.c                   |  4 ----
 include/net/bonding.h                | 13 ++++++++++-
 net/sched/sch_generic.c              |  8 ++-----
 5 files changed, 34 insertions(+), 35 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-07-31 12:41 [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Vladimir Oltean
@ 2022-07-31 12:41 ` Vladimir Oltean
  2022-07-31 18:53   ` Jay Vosburgh
  2022-07-31 12:41 ` [PATCH v3 net 2/4] net/sched: remove hacks added to dev_trans_start() for bonding to work Vladimir Oltean
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-07-31 12:41 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

The bonding driver piggybacks on time stamps kept by the network stack
for the purpose of the netdev TX watchdog, and this is problematic
because it does not work with NETIF_F_LLTX devices.

It is hard to say why the driver looks at dev_trans_start() of the
slave->dev, considering that this is updated even by non-ARP/NS probes
sent by us, and even by traffic not sent by us at all (for example PTP
on physical slave devices). ARP monitoring in active-backup mode appears
to still work even if we track only the last TX time of actual ARP
probes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/bonding/bond_main.c | 35 +++++++++++++++++++--------------
 include/net/bonding.h           | 13 +++++++++++-
 2 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6ba4c83fe5fc..0337cbd0d6da 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1974,6 +1974,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 	for (i = 0; i < BOND_MAX_ARP_TARGETS; i++)
 		new_slave->target_last_arp_rx[i] = new_slave->last_rx;
 
+	new_slave->last_tx = new_slave->last_rx;
+
 	if (bond->params.miimon && !bond->params.use_carrier) {
 		link_reporting = bond_check_dev_link(bond, slave_dev, 1);
 
@@ -2857,8 +2859,11 @@ static void bond_arp_send(struct slave *slave, int arp_op, __be32 dest_ip,
 		return;
 	}
 
-	if (bond_handle_vlan(slave, tags, skb))
+	if (bond_handle_vlan(slave, tags, skb)) {
+		slave_update_last_tx(slave);
 		arp_xmit(skb);
+	}
+
 	return;
 }
 
@@ -3047,8 +3052,7 @@ static int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
 			    curr_active_slave->last_link_up))
 		bond_validate_arp(bond, slave, tip, sip);
 	else if (curr_arp_slave && (arp->ar_op == htons(ARPOP_REPLY)) &&
-		 bond_time_in_interval(bond,
-				       dev_trans_start(curr_arp_slave->dev), 1))
+		 bond_time_in_interval(bond, slave_last_tx(curr_arp_slave), 1))
 		bond_validate_arp(bond, slave, sip, tip);
 
 out_unlock:
@@ -3076,8 +3080,10 @@ static void bond_ns_send(struct slave *slave, const struct in6_addr *daddr,
 	}
 
 	addrconf_addr_solict_mult(daddr, &mcaddr);
-	if (bond_handle_vlan(slave, tags, skb))
+	if (bond_handle_vlan(slave, tags, skb)) {
+		slave_update_last_tx(slave);
 		ndisc_send_skb(skb, &mcaddr, saddr);
+	}
 }
 
 static void bond_ns_send_all(struct bonding *bond, struct slave *slave)
@@ -3219,8 +3225,7 @@ static int bond_na_rcv(const struct sk_buff *skb, struct bonding *bond,
 			    curr_active_slave->last_link_up))
 		bond_validate_ns(bond, slave, saddr, daddr);
 	else if (curr_arp_slave &&
-		 bond_time_in_interval(bond,
-				       dev_trans_start(curr_arp_slave->dev), 1))
+		 bond_time_in_interval(bond, slave_last_tx(curr_arp_slave), 1))
 		bond_validate_ns(bond, slave, saddr, daddr);
 
 out:
@@ -3308,12 +3313,12 @@ static void bond_loadbalance_arp_mon(struct bonding *bond)
 	 *       so it can wait
 	 */
 	bond_for_each_slave_rcu(bond, slave, iter) {
-		unsigned long trans_start = dev_trans_start(slave->dev);
+		unsigned long last_tx = slave_last_tx(slave);
 
 		bond_propose_link_state(slave, BOND_LINK_NOCHANGE);
 
 		if (slave->link != BOND_LINK_UP) {
-			if (bond_time_in_interval(bond, trans_start, 1) &&
+			if (bond_time_in_interval(bond, last_tx, 1) &&
 			    bond_time_in_interval(bond, slave->last_rx, 1)) {
 
 				bond_propose_link_state(slave, BOND_LINK_UP);
@@ -3338,7 +3343,7 @@ static void bond_loadbalance_arp_mon(struct bonding *bond)
 			 * when the source ip is 0, so don't take the link down
 			 * if we don't know our ip yet
 			 */
-			if (!bond_time_in_interval(bond, trans_start, bond->params.missed_max) ||
+			if (!bond_time_in_interval(bond, last_tx, bond->params.missed_max) ||
 			    !bond_time_in_interval(bond, slave->last_rx, bond->params.missed_max)) {
 
 				bond_propose_link_state(slave, BOND_LINK_DOWN);
@@ -3404,7 +3409,7 @@ static void bond_loadbalance_arp_mon(struct bonding *bond)
  */
 static int bond_ab_arp_inspect(struct bonding *bond)
 {
-	unsigned long trans_start, last_rx;
+	unsigned long last_tx, last_rx;
 	struct list_head *iter;
 	struct slave *slave;
 	int commit = 0;
@@ -3455,9 +3460,9 @@ static int bond_ab_arp_inspect(struct bonding *bond)
 		 * - (more than missed_max*delta since receive AND
 		 *    the bond has an IP address)
 		 */
-		trans_start = dev_trans_start(slave->dev);
+		last_tx = slave_last_tx(slave);
 		if (bond_is_active_slave(slave) &&
-		    (!bond_time_in_interval(bond, trans_start, bond->params.missed_max) ||
+		    (!bond_time_in_interval(bond, last_tx, bond->params.missed_max) ||
 		     !bond_time_in_interval(bond, last_rx, bond->params.missed_max))) {
 			bond_propose_link_state(slave, BOND_LINK_DOWN);
 			commit++;
@@ -3474,8 +3479,8 @@ static int bond_ab_arp_inspect(struct bonding *bond)
  */
 static void bond_ab_arp_commit(struct bonding *bond)
 {
-	unsigned long trans_start;
 	struct list_head *iter;
+	unsigned long last_tx;
 	struct slave *slave;
 
 	bond_for_each_slave(bond, slave, iter) {
@@ -3484,10 +3489,10 @@ static void bond_ab_arp_commit(struct bonding *bond)
 			continue;
 
 		case BOND_LINK_UP:
-			trans_start = dev_trans_start(slave->dev);
+			last_tx = slave_last_tx(slave);
 			if (rtnl_dereference(bond->curr_active_slave) != slave ||
 			    (!rtnl_dereference(bond->curr_active_slave) &&
-			     bond_time_in_interval(bond, trans_start, 1))) {
+			     bond_time_in_interval(bond, last_tx, 1))) {
 				struct slave *current_arp_slave;
 
 				current_arp_slave = rtnl_dereference(bond->current_arp_slave);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index cb904d356e31..3b816ae8b1f3 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -161,8 +161,9 @@ struct slave {
 	struct net_device *dev; /* first - useful for panic debug */
 	struct bonding *bond; /* our master */
 	int    delay;
-	/* all three in jiffies */
+	/* all 4 in jiffies */
 	unsigned long last_link_up;
+	unsigned long last_tx;
 	unsigned long last_rx;
 	unsigned long target_last_arp_rx[BOND_MAX_ARP_TARGETS];
 	s8     link;		/* one of BOND_LINK_XXXX */
@@ -539,6 +540,16 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
 	return slave->last_rx;
 }
 
+static inline void slave_update_last_tx(struct slave *slave)
+{
+	WRITE_ONCE(slave->last_tx, jiffies);
+}
+
+static inline unsigned long slave_last_tx(struct slave *slave)
+{
+	return READ_ONCE(slave->last_tx);
+}
+
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static inline netdev_tx_t bond_netpoll_send_skb(const struct slave *slave,
 					 struct sk_buff *skb)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 net 2/4] net/sched: remove hacks added to dev_trans_start() for bonding to work
  2022-07-31 12:41 [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Vladimir Oltean
  2022-07-31 12:41 ` [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS Vladimir Oltean
@ 2022-07-31 12:41 ` Vladimir Oltean
  2022-07-31 12:41 ` [PATCH v3 net 3/4] Revert "veth: Add updating of trans_start" Vladimir Oltean
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-07-31 12:41 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

Now that the bonding driver keeps track of the last TX time of ARP and
NS probes, we effectively revert the following commits:

32d3e51a82d4 ("net_sched: use macvlan real dev trans_start in dev_trans_start()")
07ce76aa9bcf ("net_sched: make dev_trans_start return vlan's real dev trans_start")

Note that the approach of continuing to hack at this function would not
get us very far, hence the desire to take a different approach. DSA is
also a virtual device that uses NETIF_F_LLTX, but there, many uppers
share the same lower (DSA master, i.e. the physical host port of a
switch). By making dev_trans_start() on a DSA interface return the
dev_trans_start() of the master, we effectively assume that all other
DSA interfaces are silent, otherwise this corrupts the validity of the
probe timestamp data from the bonding driver's perspective.

Furthermore, the hacks didn't take into consideration the fact that the
lower interface of @dev may not have been physical either. For example,
VLAN over VLAN, or DSA with 2 masters in a LAG.

And even furthermore, there are NETIF_F_LLTX devices which are not
stacked, like veth. The hack here would not work with those, because it
would not have to provide the bonding driver something to chew at all.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/sched/sch_generic.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index dba0b3e24af5..f2bac6a1674d 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -427,14 +427,10 @@ void __qdisc_run(struct Qdisc *q)
 
 unsigned long dev_trans_start(struct net_device *dev)
 {
-	unsigned long val, res;
+	unsigned long res = READ_ONCE(netdev_get_tx_queue(dev, 0)->trans_start);
+	unsigned long val;
 	unsigned int i;
 
-	if (is_vlan_dev(dev))
-		dev = vlan_dev_real_dev(dev);
-	else if (netif_is_macvlan(dev))
-		dev = macvlan_dev_real_dev(dev);
-	res = READ_ONCE(netdev_get_tx_queue(dev, 0)->trans_start);
 	for (i = 1; i < dev->num_tx_queues; i++) {
 		val = READ_ONCE(netdev_get_tx_queue(dev, i)->trans_start);
 		if (val && time_after(val, res))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 net 3/4] Revert "veth: Add updating of trans_start"
  2022-07-31 12:41 [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Vladimir Oltean
  2022-07-31 12:41 ` [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS Vladimir Oltean
  2022-07-31 12:41 ` [PATCH v3 net 2/4] net/sched: remove hacks added to dev_trans_start() for bonding to work Vladimir Oltean
@ 2022-07-31 12:41 ` Vladimir Oltean
  2022-07-31 12:41 ` [PATCH v3 net 4/4] docs: net: bonding: remove mentions of trans_start Vladimir Oltean
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-07-31 12:41 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

This reverts commit e66e257a5d8368d9c0ba13d4630f474436533e8b. The veth
driver no longer needs these hacks which are slightly detrimential to
the fast path performance, because the bonding driver is keeping track
of TX times of ARP and NS probes by itself, which it should.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/veth.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 2cb833b3006a..466da01ba2e3 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -312,7 +312,6 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev,
 static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
-	struct netdev_queue *queue = NULL;
 	struct veth_rq *rq = NULL;
 	struct net_device *rcv;
 	int length = skb->len;
@@ -330,7 +329,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 	rxq = skb_get_queue_mapping(skb);
 	if (rxq < rcv->real_num_rx_queues) {
 		rq = &rcv_priv->rq[rxq];
-		queue = netdev_get_tx_queue(dev, rxq);
 
 		/* The napi pointer is available when an XDP program is
 		 * attached or when GRO is enabled
@@ -342,8 +340,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb_tx_timestamp(skb);
 	if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) {
-		if (queue)
-			txq_trans_cond_update(queue);
 		if (!use_napi)
 			dev_lstats_add(dev, length);
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 net 4/4] docs: net: bonding: remove mentions of trans_start
  2022-07-31 12:41 [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Vladimir Oltean
                   ` (2 preceding siblings ...)
  2022-07-31 12:41 ` [PATCH v3 net 3/4] Revert "veth: Add updating of trans_start" Vladimir Oltean
@ 2022-07-31 12:41 ` Vladimir Oltean
  2022-08-01 17:58 ` [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Jakub Kicinski
  2022-08-04  2:40 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-07-31 12:41 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

ARP monitoring no longer depends on dev->last_rx or dev_trans_start(),
so delete this information.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 Documentation/networking/bonding.rst | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index 43be3782e5df..4ebbf21590c4 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -1971,15 +1971,6 @@ uses the response as an indication that the link is operating.  This
 gives some assurance that traffic is actually flowing to and from one
 or more peers on the local network.
 
-The ARP monitor relies on the device driver itself to verify
-that traffic is flowing.  In particular, the driver must keep up to
-date the last receive time, dev->last_rx.  Drivers that use NETIF_F_LLTX
-flag must also update netdev_queue->trans_start.  If they do not, then the
-ARP monitor will immediately fail any slaves using that driver, and
-those slaves will stay down.  If networking monitoring (tcpdump, etc)
-shows the ARP requests and replies on the network, then it may be that
-your device driver is not updating last_rx and trans_start.
-
 7.2 Configuring Multiple ARP Targets
 ------------------------------------
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-07-31 12:41 ` [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS Vladimir Oltean
@ 2022-07-31 18:53   ` Jay Vosburgh
  2022-07-31 19:13     ` Vladimir Oltean
  0 siblings, 1 reply; 22+ messages in thread
From: Jay Vosburgh @ 2022-07-31 18:53 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Jonathan Toppins, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

>The bonding driver piggybacks on time stamps kept by the network stack
>for the purpose of the netdev TX watchdog, and this is problematic
>because it does not work with NETIF_F_LLTX devices.
>
>It is hard to say why the driver looks at dev_trans_start() of the
>slave->dev, considering that this is updated even by non-ARP/NS probes
>sent by us, and even by traffic not sent by us at all (for example PTP
>on physical slave devices). ARP monitoring in active-backup mode appears
>to still work even if we track only the last TX time of actual ARP
>probes.

	Because it's the closest it can get to "have we sent an ARP,"
really.  The issue with LLTX is relatively new (the bonding driver has
worked this way for longer than I've been involved, so I don't know what
the original design decisions were).

	FWIW, I've been working with the following, which is closer in
spirit to what Jakub and I discussed previously (i.e., inspecting the
device stats for virtual devices, relying on dev_trans_start for
physical devices with ndo_tx_timeout).

	This WIP includes one unrelated change: including the ifindex in
the route lookup; that would be a separate patch if it ends up being
submitted (it handles the edge case of a route on an interface other
than the bond matching before the bond itself).

	-J

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e75acb14d066..507ff5d50585 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -339,6 +339,36 @@ static bool bond_xdp_check(struct bonding *bond)
 	}
 }
 
+static void bond_poll_tx_packets(struct slave *slave)
+{
+	struct net_device *dev = slave->dev;
+	struct rtnl_link_stats64 stats;
+
+	dev_get_stats(dev, &stats);
+	if (stats.tx_packets != slave->last_tx_packets) {
+		slave->last_tx_change = jiffies;
+		slave->last_tx_packets = stats.tx_packets;
+	}
+}
+
+/* Determine time of last transmit for slave.
+ *
+ * For device drivers that maintain trans_start (presumed to be any device
+ * that implements an ndo_tx_timeout), use dev_trans_start.  For other
+ * devices (typically virtual devices), inspect interface statistics for
+ * recent change to tx_packets.
+ */
+static unsigned long bond_dev_trans_start(struct slave *slave)
+{
+	struct net_device *dev = slave->dev;
+
+	if (dev->netdev_ops->ndo_tx_timeout)
+		return dev_trans_start(dev);
+
+	bond_poll_tx_packets(slave);
+	return slave->last_tx_change;
+}
+
 /*---------------------------------- VLAN -----------------------------------*/
 
 /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid,
@@ -2943,7 +2973,7 @@ static void bond_arp_send_all(struct bonding *bond, struct slave *slave)
 
 		/* Find out through which dev should the packet go */
 		rt = ip_route_output(dev_net(bond->dev), targets[i], 0,
-				     RTO_ONLINK, 0);
+				     RTO_ONLINK, bond->dev->ifindex);
 		if (IS_ERR(rt)) {
 			/* there's no route to target - try to send arp
 			 * probe to generate any traffic (arp_validate=0)
@@ -3075,7 +3105,7 @@ static int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
 		bond_validate_arp(bond, slave, tip, sip);
 	else if (curr_arp_slave && (arp->ar_op == htons(ARPOP_REPLY)) &&
 		 bond_time_in_interval(bond,
-				       dev_trans_start(curr_arp_slave->dev), 1))
+				       bond_dev_trans_start(curr_arp_slave), 1))
 		bond_validate_arp(bond, slave, sip, tip);
 
 out_unlock:
@@ -3247,7 +3277,7 @@ static int bond_na_rcv(const struct sk_buff *skb, struct bonding *bond,
 		bond_validate_ns(bond, slave, saddr, daddr);
 	else if (curr_arp_slave &&
 		 bond_time_in_interval(bond,
-				       dev_trans_start(curr_arp_slave->dev), 1))
+				       bond_dev_trans_start(curr_arp_slave), 1))
 		bond_validate_ns(bond, slave, saddr, daddr);
 
 out:
@@ -3335,7 +3365,7 @@ static void bond_loadbalance_arp_mon(struct bonding *bond)
 	 *       so it can wait
 	 */
 	bond_for_each_slave_rcu(bond, slave, iter) {
-		unsigned long trans_start = dev_trans_start(slave->dev);
+		unsigned long trans_start = bond_dev_trans_start(slave);
 
 		bond_propose_link_state(slave, BOND_LINK_NOCHANGE);
 
@@ -3482,7 +3512,6 @@ static int bond_ab_arp_inspect(struct bonding *bond)
 		 * - (more than missed_max*delta since receive AND
 		 *    the bond has an IP address)
 		 */
-		trans_start = dev_trans_start(slave->dev);
 		if (bond_is_active_slave(slave) &&
 		    (!bond_time_in_interval(bond, trans_start, bond->params.missed_max) ||
 		     !bond_time_in_interval(bond, last_rx, bond->params.missed_max))) {
@@ -3511,7 +3540,7 @@ static void bond_ab_arp_commit(struct bonding *bond)
 			continue;
 
 		case BOND_LINK_UP:
-			trans_start = dev_trans_start(slave->dev);
+			trans_start = bond_dev_trans_start(slave);
 			if (rtnl_dereference(bond->curr_active_slave) != slave ||
 			    (!rtnl_dereference(bond->curr_active_slave) &&
 			     bond_time_in_interval(bond, trans_start, 1))) {
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 6e78d657aa05..6449fe755e8a 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -165,6 +165,8 @@ struct slave {
 	unsigned long last_link_up;
 	unsigned long last_rx;
 	unsigned long target_last_arp_rx[BOND_MAX_ARP_TARGETS];
+	unsigned long last_tx_packets;
+	unsigned long last_tx_change;
 	s8     link;		/* one of BOND_LINK_XXXX */
 	s8     link_new_state;	/* one of BOND_LINK_XXXX */
 	u8     backup:1,   /* indicates backup slave. Value corresponds with

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-07-31 18:53   ` Jay Vosburgh
@ 2022-07-31 19:13     ` Vladimir Oltean
  2022-08-02  1:04       ` Jay Vosburgh
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-07-31 19:13 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Jonathan Toppins, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

Hello Jay,

On Sun, Jul 31, 2022 at 11:53:55AM -0700, Jay Vosburgh wrote:
> Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> 
> >The bonding driver piggybacks on time stamps kept by the network stack
> >for the purpose of the netdev TX watchdog, and this is problematic
> >because it does not work with NETIF_F_LLTX devices.
> >
> >It is hard to say why the driver looks at dev_trans_start() of the
> >slave->dev, considering that this is updated even by non-ARP/NS probes
> >sent by us, and even by traffic not sent by us at all (for example PTP
> >on physical slave devices). ARP monitoring in active-backup mode appears
> >to still work even if we track only the last TX time of actual ARP
> >probes.
> 
> 	Because it's the closest it can get to "have we sent an ARP," really.

Does it really track this? It seems pretty easy to fool to me.
I don't know why keeping a last_tx the way my patch does wouldn't be
better.

> The issue with LLTX is relatively new (the bonding driver has worked
> this way for longer than I've been involved, so I don't know what the
> original design decisions were).
> 
> 	FWIW, I've been working with the following, which is closer in
> spirit to what Jakub and I discussed previously (i.e., inspecting the
> device stats for virtual devices, relying on dev_trans_start for
> physical devices with ndo_tx_timeout).
> 
> 	This WIP includes one unrelated change: including the ifindex in
> the route lookup; that would be a separate patch if it ends up being
> submitted (it handles the edge case of a route on an interface other
> than the bond matching before the bond itself).

The problem with dev_get_stats() is that it will contain hardware
statistics, which may be completely unrelated to the number of packets
software has sent. DSA can offload the Linux bridge and the bonding
driver as a bridge port, so dev_get_stats() on a physical port will
return the total number of packets that egressed that port, even without
CPU intervention. Again, even easier to fool if "have we sent an ARP"
is what the bonding driver actually wants to know.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor
  2022-07-31 12:41 [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Vladimir Oltean
                   ` (3 preceding siblings ...)
  2022-07-31 12:41 ` [PATCH v3 net 4/4] docs: net: bonding: remove mentions of trans_start Vladimir Oltean
@ 2022-08-01 17:58 ` Jakub Kicinski
  2022-08-01 23:39   ` Vladimir Oltean
  2022-08-04  2:40 ` patchwork-bot+netdevbpf
  5 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2022-08-01 17:58 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, Jonathan Toppins, Jay Vosburgh,
	Veaceslav Falico, Andy Gospodarek, Hangbin Liu, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Nikolay Aleksandrov, Stephen Hemminger

On Sun, 31 Jul 2022 15:41:04 +0300 Vladimir Oltean wrote:
> Where I'd need some help from Jakub is to make sure these changes
> somehow get integrated into the 5.10 stable kernel, since that's what
> Brian, who reported the issue, actually uses. I haven't provided any
> Fixes tags.

We can just tell Greg to pull them in after the patches appear in
Linus's tree, right? Or are there extra acrobatics required I missed?

As long as Jay is okay with the approach, LGTM. Thanks a lot for
working on the solution!

BTW we could potentially also revert a31d27fbed5d ("tun: fix bonding
active backup with arp monitoring") given we revert veth.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor
  2022-08-01 17:58 ` [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Jakub Kicinski
@ 2022-08-01 23:39   ` Vladimir Oltean
  0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-08-01 23:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, Jonathan Toppins, Jay Vosburgh,
	Veaceslav Falico, Andy Gospodarek, Hangbin Liu, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Nikolay Aleksandrov, Stephen Hemminger

On Mon, Aug 01, 2022 at 10:58:53AM -0700, Jakub Kicinski wrote:
> We can just tell Greg to pull them in after the patches appear in
> Linus's tree, right? Or are there extra acrobatics required I missed?

I forget what I was thinking of with the request for help. We can tell
Greg, but I'll have to prepare adapted patches for linux-5.15.y and
linux-5.10.y anyway, since these ones won't apply.

> BTW we could potentially also revert a31d27fbed5d ("tun: fix bonding
> active backup with arp monitoring") given we revert veth.

We could; though as I didn't notice that and as it doesn't bother
anybody we could also do it separately in net (skipping backports).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-07-31 19:13     ` Vladimir Oltean
@ 2022-08-02  1:04       ` Jay Vosburgh
  2022-08-02  1:45         ` Vladimir Oltean
  0 siblings, 1 reply; 22+ messages in thread
From: Jay Vosburgh @ 2022-08-02  1:04 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Jonathan Toppins, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

>Hello Jay,
>
>On Sun, Jul 31, 2022 at 11:53:55AM -0700, Jay Vosburgh wrote:
>> Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>> 
>> >The bonding driver piggybacks on time stamps kept by the network stack
>> >for the purpose of the netdev TX watchdog, and this is problematic
>> >because it does not work with NETIF_F_LLTX devices.
>> >
>> >It is hard to say why the driver looks at dev_trans_start() of the
>> >slave->dev, considering that this is updated even by non-ARP/NS probes
>> >sent by us, and even by traffic not sent by us at all (for example PTP
>> >on physical slave devices). ARP monitoring in active-backup mode appears
>> >to still work even if we track only the last TX time of actual ARP
>> >probes.
>> 
>> 	Because it's the closest it can get to "have we sent an ARP," really.
>
>Does it really track this? It seems pretty easy to fool to me.
>I don't know why keeping a last_tx the way my patch does wouldn't be
>better.

	The ARP monitor in general is pretty easy to fool (which was
part of the impetus for adding the "arp_validate" logic).  Ultimately it
was simpler to have the ARP monitor logic be "interface sent something
AND received an appropriate ARP" since the "sent something" came for
free from ->trans_start (which over time became less useful for this
purpose).

	And, I'm not saying your patch isn't better, rather that what I
was intending to do is minimize the change in behavior.  My concern is
that some change in semantics will break existing configurations that
rely on the old behavior.  Then, the question becomes whether the broken
configuration was reasonable or not.

	I haven't thought of anything that seems reasonable thus far;
the major change looks to be that the new logic in your patch presumes
that arp_xmit cannot fail, so if some device was discarding all TX
packets, the new logic would update "last_rx" regardless.

>> The issue with LLTX is relatively new (the bonding driver has worked
>> this way for longer than I've been involved, so I don't know what the
>> original design decisions were).
>> 
>> 	FWIW, I've been working with the following, which is closer in
>> spirit to what Jakub and I discussed previously (i.e., inspecting the
>> device stats for virtual devices, relying on dev_trans_start for
>> physical devices with ndo_tx_timeout).
>> 
>> 	This WIP includes one unrelated change: including the ifindex in
>> the route lookup; that would be a separate patch if it ends up being
>> submitted (it handles the edge case of a route on an interface other
>> than the bond matching before the bond itself).
>
>The problem with dev_get_stats() is that it will contain hardware
>statistics, which may be completely unrelated to the number of packets
>software has sent. DSA can offload the Linux bridge and the bonding
>driver as a bridge port, so dev_get_stats() on a physical port will
>return the total number of packets that egressed that port, even without
>CPU intervention. Again, even easier to fool if "have we sent an ARP"
>is what the bonding driver actually wants to know.

	I'm not well versed in how DSA works, but if a physical port is,
well, discrete from the system to a degree, why would it be part of a
bond running on the host?

	Put another way, what sort of topology / plumbing makes sense
for DSA in conjunction with bonding?

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02  1:04       ` Jay Vosburgh
@ 2022-08-02  1:45         ` Vladimir Oltean
  2022-08-02  9:05           ` Paolo Abeni
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-08-02  1:45 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Jonathan Toppins, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

On Mon, Aug 01, 2022 at 06:04:55PM -0700, Jay Vosburgh wrote:
> 	The ARP monitor in general is pretty easy to fool (which was
> part of the impetus for adding the "arp_validate" logic).  Ultimately it
> was simpler to have the ARP monitor logic be "interface sent something
> AND received an appropriate ARP" since the "sent something" came for
> free from ->trans_start (which over time became less useful for this
> purpose).
> 
> 	And, I'm not saying your patch isn't better, rather that what I
> was intending to do is minimize the change in behavior.  My concern is
> that some change in semantics will break existing configurations that
> rely on the old behavior.  Then, the question becomes whether the broken
> configuration was reasonable or not.
> 
> 	I haven't thought of anything that seems reasonable thus far;
> the major change looks to be that the new logic in your patch presumes
> that arp_xmit cannot fail, so if some device was discarding all TX
> packets, the new logic would update "last_rx" regardless.

I don't see why it matters that my logic presumes that arp_xmit cannot fail.
The bonding driver's logic doesn't put an equality sign anywhere between
"arp_xmit succeeded" and "the packet actually reached the target".
If it would, it would have a very broken understanding of Ethernet networks.
For all practical purposes, updating last_tx in the bonding driver
rather than letting the qdisc do it should make no perceivable
difference at all, even if the driver was to drop all TX packets as you
say.

> >The problem with dev_get_stats() is that it will contain hardware
> >statistics, which may be completely unrelated to the number of packets
> >software has sent. DSA can offload the Linux bridge and the bonding
> >driver as a bridge port, so dev_get_stats() on a physical port will
> >return the total number of packets that egressed that port, even without
> >CPU intervention. Again, even easier to fool if "have we sent an ARP"
> >is what the bonding driver actually wants to know.
> 
> 	I'm not well versed in how DSA works, but if a physical port is,
> well, discrete from the system to a degree, why would it be part of a
> bond running on the host?
> 
> 	Put another way, what sort of topology / plumbing makes sense
> for DSA in conjunction with bonding?

I won't insist on the details because it's going to be a never ending
story otherwise. The point with DSA and switchdev in general is that,
even though we're talking about network accelerating chips that are
"discrete from the system to a degree", they are integrated to the
network stack such that a user wouldn't know that they're running on
dedicated switching hardware rather than plain old software forwarding
between any other physical Ethernet cards. The deployment scripts would
look more or less the same, they would just work faster and with less
CPU involvement.

If you want to enable autonomous forwarding between 2 switch ports,
those 2 have net devices, which you put under the same plain old bridge
device. If you want to add one more logical port to the bridging domain
which is comprised of 2 physical ports in a LAG, you create a bonding
interface and put the net devices of those 2 other physical ports under
the bond, then you put the bond under the bridge. The DSA or switchdev
driver monitors the netdev adjacency events that take place and takes
note, in a "transparent offload" manner.

When a physical port is (directly or indirectly) under a bridge device,
traffic towards it will be a mix between traffic originated by the host
and traffic originated by other ports in the same bridge/VLAN that must
reach there.

The same overall goal of "transparent offload" is what makes it
desirable for DSA and switchdev drivers to report to dev_get_stats()
that their port counters are actually the mix of autonomously forwarded
flows and host terminated flows.

The conclusion is that using dev_get_stats() is just about as
inconclusive as dropping the last_tx time of ARP altogether, since with
offloading drivers, the hardware counters will be as far a metric from
the answer as you can get (you can have tens of millions of packets per
second forwarded without software intervention, and the bonding driver
would think: oh, how many ARPs!).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02  1:45         ` Vladimir Oltean
@ 2022-08-02  9:05           ` Paolo Abeni
  2022-08-02 16:11             ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2022-08-02  9:05 UTC (permalink / raw)
  To: Vladimir Oltean, Jay Vosburgh
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Veaceslav Falico, Andy Gospodarek, Hangbin Liu, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Nikolay Aleksandrov, Stephen Hemminger

On Tue, 2022-08-02 at 01:45 +0000, Vladimir Oltean wrote:
> On Mon, Aug 01, 2022 at 06:04:55PM -0700, Jay Vosburgh wrote:
> > 	The ARP monitor in general is pretty easy to fool (which was
> > part of the impetus for adding the "arp_validate" logic).  Ultimately it
> > was simpler to have the ARP monitor logic be "interface sent something
> > AND received an appropriate ARP" since the "sent something" came for
> > free from ->trans_start (which over time became less useful for this
> > purpose).
> > 
> > 	And, I'm not saying your patch isn't better, rather that what I
> > was intending to do is minimize the change in behavior.  My concern is
> > that some change in semantics will break existing configurations that
> > rely on the old behavior.  Then, the question becomes whether the broken
> > configuration was reasonable or not.
> > 
> > 	I haven't thought of anything that seems reasonable thus far;
> > the major change looks to be that the new logic in your patch presumes
> > that arp_xmit cannot fail, so if some device was discarding all TX
> > packets, the new logic would update "last_rx" regardless.
> 
> I don't see why it matters that my logic presumes that arp_xmit cannot fail.
> The bonding driver's logic doesn't put an equality sign anywhere between
> "arp_xmit succeeded" and "the packet actually reached the target".
> If it would, it would have a very broken understanding of Ethernet networks.
> For all practical purposes, updating last_tx in the bonding driver
> rather than letting the qdisc do it should make no perceivable
> difference at all, even if the driver was to drop all TX packets as you
> say.

I personally like Vladimir approach. I *think* it should be reasonably
safe.

If drops in arp_xmit() are really a concerns, what about let the latter
function return the NF error code as other output functions?

In any case, this looks like a significative rework, do you mind
consider it for the net-next, when it re-open?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02  9:05           ` Paolo Abeni
@ 2022-08-02 16:11             ` Jakub Kicinski
  2022-08-02 16:29               ` Jay Vosburgh
  2022-08-02 16:30               ` Vladimir Oltean
  0 siblings, 2 replies; 22+ messages in thread
From: Jakub Kicinski @ 2022-08-02 16:11 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Paolo Abeni, Jay Vosburgh, netdev, David S. Miller, Eric Dumazet,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Veaceslav Falico, Andy Gospodarek, Hangbin Liu, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Nikolay Aleksandrov, Stephen Hemminger

On Tue, 02 Aug 2022 11:05:19 +0200 Paolo Abeni wrote:
> In any case, this looks like a significative rework, do you mind
> consider it for the net-next, when it re-open?

It does seem like it could be a lot for stable.

Perhaps we could take:

https://lore.kernel.org/all/20220727152000.3616086-1-vladimir.oltean@nxp.com/

as is, without the extra work Stephen asked for (since it's gonna be
reverted in net-next, anyway)? How do you feel about that option?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02 16:11             ` Jakub Kicinski
@ 2022-08-02 16:29               ` Jay Vosburgh
  2022-08-02 16:30               ` Vladimir Oltean
  1 sibling, 0 replies; 22+ messages in thread
From: Jay Vosburgh @ 2022-08-02 16:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vladimir Oltean, Paolo Abeni, netdev, David S. Miller,
	Eric Dumazet, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Jonathan Toppins, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

Jakub Kicinski <kuba@kernel.org> wrote:

>On Tue, 02 Aug 2022 11:05:19 +0200 Paolo Abeni wrote:
>> In any case, this looks like a significative rework, do you mind
>> consider it for the net-next, when it re-open?
>
>It does seem like it could be a lot for stable.
>
>Perhaps we could take:
>
>https://lore.kernel.org/all/20220727152000.3616086-1-vladimir.oltean@nxp.com/
>
>as is, without the extra work Stephen asked for (since it's gonna be
>reverted in net-next, anyway)? How do you feel about that option?

	Would that mean that the older stable kernels end up with a
different implementation that's unique to stable, or are you proposing
to take the linked patch,

"net/sched: make dev_trans_start() have a better chance of working with
stacked interfaces"

	as a complete replacement for this series?

	Alternatively, would it be more comfortable to just put this
patch (1/4) to stable and not backport the others?  If I understand
correctly, this patch enables the functionality, and the others are
cleaning up logic that isn't necessary after 1/4 is applied.

	I think this patch will work as described, and haven't thought
of any non-crazy scenarios that it could break (e.g., things that depend
on the "drop after arp_send" discussed elsewhere).

	I also think this patch is preferable to the "stacked
interfaces" patch: it limits the scope to just bonding, doesn't change
dev_trans_start() itself, and should cover any type of interface in a
bond.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02 16:11             ` Jakub Kicinski
  2022-08-02 16:29               ` Jay Vosburgh
@ 2022-08-02 16:30               ` Vladimir Oltean
  2022-08-02 17:33                 ` Paolo Abeni
  1 sibling, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-08-02 16:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paolo Abeni, Jay Vosburgh, netdev, David S. Miller, Eric Dumazet,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Veaceslav Falico, Andy Gospodarek, Hangbin Liu, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Nikolay Aleksandrov, Stephen Hemminger

On Tue, Aug 02, 2022 at 09:11:10AM -0700, Jakub Kicinski wrote:
> On Tue, 02 Aug 2022 11:05:19 +0200 Paolo Abeni wrote:
> > In any case, this looks like a significative rework, do you mind
> > consider it for the net-next, when it re-open?
> 
> It does seem like it could be a lot for stable.
> 
> Perhaps we could take:
> 
> https://lore.kernel.org/all/20220727152000.3616086-1-vladimir.oltean@nxp.com/
> 
> as is, without the extra work Stephen asked for (since it's gonna be
> reverted in net-next, anyway)? How do you feel about that option?

The patch you've linked to doesn't really do something sane. Deferring the
dev_trans_start() call to the lower device means, in the context of DSA,
retrieving the trans_start of the master's TX queues. But the same DSA
master (host port) services more than 1 DSA interface, so if you have
swp0 and swp1 in a bond0 with ARP monitoring, and swp2 is running an
iperf3 session, bond0 will happily interpret that as meaning that ARP
packets are continuously being sent.

Does it work, in the sense that the link comes up when it should, and
doesn't when it shouldn't? Yeah, but this proves to me that most of the
handling around the ARP monitor is just random gibberish/snake oil that
could have simply not been written in the first place, or I'm missing
some of the finer points. (happy to be proven wrong and see Cunningham's
law work its magic)

How about applying this to the "net" tree when it starts tracking the
5.20 release candidates (effectively a continuation of today's net-next),
and I can send backport patches after a month or so of some more testing?
I can prepare a backported version of this for 5.10 that Brian could
keep in his system during this time, and we could watch that for further
strange behavior.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02 16:30               ` Vladimir Oltean
@ 2022-08-02 17:33                 ` Paolo Abeni
  2022-08-02 18:00                   ` Jay Vosburgh
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2022-08-02 17:33 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski
  Cc: Jay Vosburgh, netdev, David S. Miller, Eric Dumazet, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Veaceslav Falico, Andy Gospodarek, Hangbin Liu, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Nikolay Aleksandrov, Stephen Hemminger

On Tue, 2022-08-02 at 16:30 +0000, Vladimir Oltean wrote:
> On Tue, Aug 02, 2022 at 09:11:10AM -0700, Jakub Kicinski wrote:
> > On Tue, 02 Aug 2022 11:05:19 +0200 Paolo Abeni wrote:
> > > In any case, this looks like a significative rework, do you mind
> > > consider it for the net-next, when it re-open?
> > 
> > It does seem like it could be a lot for stable.

I'm sorry, I did not intend to block the series. It looked to me there
was no agreement on this, and I was wondering if a net-next target
would allow a clean solution to make eveyone happy.

I now see it's relevant to have something we can queue for stable.

I'm ok with Jay suggestion:
> Alternatively, would it be more comfortable to just put this
> patch (1/4) to stable and not backport the others? 

The above works for me - I thought it was not ok for Jay, but since he
is proposing such sulution, I guess I was wrong.

Paolo



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02 17:33                 ` Paolo Abeni
@ 2022-08-02 18:00                   ` Jay Vosburgh
  2022-08-02 19:10                     ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Jay Vosburgh @ 2022-08-02 18:00 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Vladimir Oltean, Jakub Kicinski, netdev, David S. Miller,
	Eric Dumazet, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Jonathan Toppins, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

Paolo Abeni <pabeni@redhat.com> wrote:

>On Tue, 2022-08-02 at 16:30 +0000, Vladimir Oltean wrote:
>> On Tue, Aug 02, 2022 at 09:11:10AM -0700, Jakub Kicinski wrote:
>> > On Tue, 02 Aug 2022 11:05:19 +0200 Paolo Abeni wrote:
>> > > In any case, this looks like a significative rework, do you mind
>> > > consider it for the net-next, when it re-open?
>> > 
>> > It does seem like it could be a lot for stable.
>
>I'm sorry, I did not intend to block the series. It looked to me there
>was no agreement on this, and I was wondering if a net-next target
>would allow a clean solution to make eveyone happy.
>
>I now see it's relevant to have something we can queue for stable.
>
>I'm ok with Jay suggestion:
>> Alternatively, would it be more comfortable to just put this
>> patch (1/4) to stable and not backport the others? 
>
>The above works for me - I thought it was not ok for Jay, but since he
>is proposing such sulution, I guess I was wrong.

	My original reluctance was that I hadn't had an opportunity to
sufficiently review the patch set to think through the potential
regressions.  There might be something I haven't thought of, but I think
would only manifest in very unusual configurations.

	I'm ok with applying the series to net-next when it's available,
and backporting 1/4 for stable (and 4/4 with it, since that's the
documentation update).

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02 18:00                   ` Jay Vosburgh
@ 2022-08-02 19:10                     ` Jakub Kicinski
  2022-08-02 20:24                       ` Jay Vosburgh
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2022-08-02 19:10 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Paolo Abeni, Vladimir Oltean, netdev, David S. Miller,
	Eric Dumazet, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Jonathan Toppins, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

On Tue, 02 Aug 2022 11:00:41 -0700 Jay Vosburgh wrote:
> >> Alternatively, would it be more comfortable to just put this
> >> patch (1/4) to stable and not backport the others?   
> >
> >The above works for me - I thought it was not ok for Jay, but since he
> >is proposing such sulution, I guess I was wrong.  
> 
> 	My original reluctance was that I hadn't had an opportunity to
> sufficiently review the patch set to think through the potential
> regressions.  There might be something I haven't thought of, but I think
> would only manifest in very unusual configurations.
> 
> 	I'm ok with applying the series to net-next when it's available,
> and backporting 1/4 for stable (and 4/4 with it, since that's the
> documentation update).
> 
> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>

One more time, sorry :) If I'm reading things right Vladimir and 
I would like this to be part of 5.20, Paolo is okay with that,
Jay would prefer to delay it until 5.21.

Is that right?

My preference for 5.20 is because we do have active users reporting
problems in stable, and by moving to 5.21 we're delaying things by
2 weeks. At the same time, 5.20 vs 5.21 doesn't matter as we intend 
to hit stable users with these change before either of those is out.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02 19:10                     ` Jakub Kicinski
@ 2022-08-02 20:24                       ` Jay Vosburgh
  2022-08-02 20:33                         ` Jakub Kicinski
  2022-08-02 20:34                         ` Paolo Abeni
  0 siblings, 2 replies; 22+ messages in thread
From: Jay Vosburgh @ 2022-08-02 20:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paolo Abeni, Vladimir Oltean, netdev, David S. Miller,
	Eric Dumazet, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Jonathan Toppins, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

Jakub Kicinski <kuba@kernel.org> wrote:

>On Tue, 02 Aug 2022 11:00:41 -0700 Jay Vosburgh wrote:
>> >> Alternatively, would it be more comfortable to just put this
>> >> patch (1/4) to stable and not backport the others?   
>> >
>> >The above works for me - I thought it was not ok for Jay, but since he
>> >is proposing such sulution, I guess I was wrong.  
>> 
>> 	My original reluctance was that I hadn't had an opportunity to
>> sufficiently review the patch set to think through the potential
>> regressions.  There might be something I haven't thought of, but I think
>> would only manifest in very unusual configurations.
>> 
>> 	I'm ok with applying the series to net-next when it's available,
>> and backporting 1/4 for stable (and 4/4 with it, since that's the
>> documentation update).
>> 
>> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
>
>One more time, sorry :) If I'm reading things right Vladimir and 
>I would like this to be part of 5.20, Paolo is okay with that,
>Jay would prefer to delay it until 5.21.
>
>Is that right?

	I'm sure there's an Abbott & Costello joke in here somewhere,
but I thought Paolo preferred net-next, and I said I was ok with that.

>My preference for 5.20 is because we do have active users reporting
>problems in stable, and by moving to 5.21 we're delaying things by
>2 weeks. At the same time, 5.20 vs 5.21 doesn't matter as we intend 
>to hit stable users with these change before either of those is out.

	I have no objection to 5.20 if you & Paolo don't object.

	For stable, I believe that 1/4 (and 4/4 for docs) is the minimum
set to resolve the functional issues; is the plan to send all 4 patches
to stable, or just 1 and 4?

	I do think this patch does widen the scope of failures that may
go undetected on the TX side, but most of the time the failure to
receive the ARP on the RX side should cover for that.  Regardless,
that's a concern for later that doesn't need to be hashed out right now.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02 20:24                       ` Jay Vosburgh
@ 2022-08-02 20:33                         ` Jakub Kicinski
  2022-08-02 20:34                         ` Paolo Abeni
  1 sibling, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2022-08-02 20:33 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Paolo Abeni, Vladimir Oltean, netdev, David S. Miller,
	Eric Dumazet, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Jonathan Toppins, Veaceslav Falico, Andy Gospodarek, Hangbin Liu,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Nikolay Aleksandrov,
	Stephen Hemminger

On Tue, 02 Aug 2022 13:24:34 -0700 Jay Vosburgh wrote:
> >One more time, sorry :) If I'm reading things right Vladimir and 
> >I would like this to be part of 5.20, Paolo is okay with that,
> >Jay would prefer to delay it until 5.21.
> >
> >Is that right?  
> 
> 	I'm sure there's an Abbott & Costello joke in here somewhere,
> but I thought Paolo preferred net-next, and I said I was ok with that.

:D

> >My preference for 5.20 is because we do have active users reporting
> >problems in stable, and by moving to 5.21 we're delaying things by
> >2 weeks. At the same time, 5.20 vs 5.21 doesn't matter as we intend 
> >to hit stable users with these change before either of those is out.  
> 
> 	I have no objection to 5.20 if you & Paolo don't object.
> 
> 	For stable, I believe that 1/4 (and 4/4 for docs) is the minimum
> set to resolve the functional issues; is the plan to send all 4 patches
> to stable, or just 1 and 4?

1 & 4 for stable SGTM.

> 	I do think this patch does widen the scope of failures that may
> go undetected on the TX side, but most of the time the failure to
> receive the ARP on the RX side should cover for that.  Regardless,
> that's a concern for later that doesn't need to be hashed out right now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
  2022-08-02 20:24                       ` Jay Vosburgh
  2022-08-02 20:33                         ` Jakub Kicinski
@ 2022-08-02 20:34                         ` Paolo Abeni
  1 sibling, 0 replies; 22+ messages in thread
From: Paolo Abeni @ 2022-08-02 20:34 UTC (permalink / raw)
  To: Jay Vosburgh, Jakub Kicinski
  Cc: Vladimir Oltean, netdev, David S. Miller, Eric Dumazet,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Jonathan Toppins,
	Veaceslav Falico, Andy Gospodarek, Hangbin Liu, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Nikolay Aleksandrov, Stephen Hemminger

On Tue, 2022-08-02 at 13:24 -0700, Jay Vosburgh wrote:
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > On Tue, 02 Aug 2022 11:00:41 -0700 Jay Vosburgh wrote:
> > > > > Alternatively, would it be more comfortable to just put this
> > > > > patch (1/4) to stable and not backport the others?   
> > > > 
> > > > The above works for me - I thought it was not ok for Jay, but since he
> > > > is proposing such sulution, I guess I was wrong.  
> > > 
> > > 	My original reluctance was that I hadn't had an opportunity to
> > > sufficiently review the patch set to think through the potential
> > > regressions.  There might be something I haven't thought of, but I think
> > > would only manifest in very unusual configurations.
> > > 
> > > 	I'm ok with applying the series to net-next when it's available,
> > > and backporting 1/4 for stable (and 4/4 with it, since that's the
> > > documentation update).
> > > 
> > > Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
> > 
> > One more time, sorry :) If I'm reading things right Vladimir and 
> > I would like this to be part of 5.20, Paolo is okay with that,
> > Jay would prefer to delay it until 5.21.
> > 
> > Is that right?
> 
> 	I'm sure there's an Abbott & Costello joke in here somewhere,

At least not intentionally, not on my side! :)

> but I thought Paolo preferred net-next, and I said I was ok with that.

I initially suggested net-next, but then I agreed for a minimal fix for
net.

> > My preference for 5.20 is because we do have active users reporting
> > problems in stable, and by moving to 5.21 we're delaying things by
> > 2 weeks. At the same time, 5.20 vs 5.21 doesn't matter as we intend 
> > to hit stable users with these change before either of those is out.
> 
> 	I have no objection to 5.20 if you & Paolo don't object.

I also don't have objection for 5.20 (6.0)

> 	For stable, I believe that 1/4 (and 4/4 for docs) is the minimum
> set to resolve the functional issues; is the plan to send all 4 patches
> to stable, or just 1 and 4?

I think that for stable 1 && 4 only would be the better option.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor
  2022-07-31 12:41 [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Vladimir Oltean
                   ` (4 preceding siblings ...)
  2022-08-01 17:58 ` [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Jakub Kicinski
@ 2022-08-04  2:40 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 22+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-08-04  2:40 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, davem, edumazet, kuba, pabeni, andrew, vivien.didelot,
	f.fainelli, jtoppins, j.vosburgh, vfalico, andy, liuhangbin, jhs,
	xiyou.wangcong, jiri, razor, stephen

Hello:

This series was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 31 Jul 2022 15:41:04 +0300 you wrote:
> Since commit 2b86cb829976 ("net: dsa: declare lockless TX feature for
> slave ports") in v5.7, DSA breaks the ARP monitoring logic from the
> bonding driver, fact which was pointed out by Brian Hutchinson who uses
> a linux-5.10.y stable kernel.
> 
> Initially I got lured by other similar hacks introduced for other
> NETIF_F_LLTX drivers, which, inspired by the bonding documentation,
> update the trans_start of their TX queues by hand.
> 
> [...]

Here is the summary with links:
  - [v3,net,1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
    https://git.kernel.org/netdev/net/c/06799a9085e1
  - [v3,net,2/4] net/sched: remove hacks added to dev_trans_start() for bonding to work
    https://git.kernel.org/netdev/net/c/4873a1b2024d
  - [v3,net,3/4] Revert "veth: Add updating of trans_start"
    https://git.kernel.org/netdev/net/c/08b403d5bf07
  - [v3,net,4/4] docs: net: bonding: remove mentions of trans_start
    https://git.kernel.org/netdev/net/c/cba8d8f57dfb

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-08-04  2:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-31 12:41 [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Vladimir Oltean
2022-07-31 12:41 ` [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS Vladimir Oltean
2022-07-31 18:53   ` Jay Vosburgh
2022-07-31 19:13     ` Vladimir Oltean
2022-08-02  1:04       ` Jay Vosburgh
2022-08-02  1:45         ` Vladimir Oltean
2022-08-02  9:05           ` Paolo Abeni
2022-08-02 16:11             ` Jakub Kicinski
2022-08-02 16:29               ` Jay Vosburgh
2022-08-02 16:30               ` Vladimir Oltean
2022-08-02 17:33                 ` Paolo Abeni
2022-08-02 18:00                   ` Jay Vosburgh
2022-08-02 19:10                     ` Jakub Kicinski
2022-08-02 20:24                       ` Jay Vosburgh
2022-08-02 20:33                         ` Jakub Kicinski
2022-08-02 20:34                         ` Paolo Abeni
2022-07-31 12:41 ` [PATCH v3 net 2/4] net/sched: remove hacks added to dev_trans_start() for bonding to work Vladimir Oltean
2022-07-31 12:41 ` [PATCH v3 net 3/4] Revert "veth: Add updating of trans_start" Vladimir Oltean
2022-07-31 12:41 ` [PATCH v3 net 4/4] docs: net: bonding: remove mentions of trans_start Vladimir Oltean
2022-08-01 17:58 ` [PATCH v3 net 0/4] Make DSA work with bonding's ARP monitor Jakub Kicinski
2022-08-01 23:39   ` Vladimir Oltean
2022-08-04  2:40 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.