All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] bonding: fix 802.3ad state sent to partner when unbinding slave
@ 2018-11-26 22:54 Toni Peltonen
  2018-11-27  0:29 ` Jay Vosburgh
  0 siblings, 1 reply; 3+ messages in thread
From: Toni Peltonen @ 2018-11-26 22:54 UTC (permalink / raw)
  To: netdev

Previously when unbinding a slave the 802.3ad implementation only told
partner that the port is not suitable for aggregation by setting the port
aggregation state from aggregatable to individual. This is not enough. If the
physical layer still stays up and we only unbinded this port from the bond there
is nothing in the aggregation status alone to prevent the partner from sending
traffic towards us. To ensure that the partner doesn't consider this
port at all anymore we should also disable collecting and distributing to
signal that this actor is going away.

I have tested this behaviour againts Arista EOS switches with mlx5 cards
(physical link stays up when even when interface is down) and simulated
the same situation virtually Linux <-> Linux with two network namespaces
running two veth device pairs. In both cases setting aggregation to
individual doesn't alone prevent traffic from being to sent towards this
port given that the link stays up in partners end. Partner still keeps
it's end in collecting + distributing state and continues until timeout is
reached. In most cases this means we are losing the traffic partner sends
towards our port while we wait for timeout. This is most visible with slow
periodic time (LAPC rate slow).

Other open source implementations like Open VSwitch and libreswitch, and
vendor implementations like Arista EOS, seem to disable collecting +
distributing to when doing similar port disabling/detaching/removing change.
With this patch kernel implementation would behave the same way and ensure
partner doesn't consider our actor viable anymore.

Signed-off-by: Toni Peltonen <peltzi@peltzi.fi>
---
 drivers/net/bonding/bond_3ad.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index f43fb2f958a5..6776c33753dc 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2086,6 +2086,8 @@ void bond_3ad_unbind_slave(struct slave *slave)
 		   aggregator->aggregator_identifier);
 
 	/* Tell the partner that this port is not suitable for aggregation */
+	port->actor_oper_port_state &= ~AD_STATE_COLLECTING;
+	port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING;
 	port->actor_oper_port_state &= ~AD_STATE_AGGREGATION;
 	__update_lacpdu_from_port(port);
 	ad_lacpdu_send(port);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad state sent to partner when unbinding slave
  2018-11-26 22:54 [PATCH net] bonding: fix 802.3ad state sent to partner when unbinding slave Toni Peltonen
@ 2018-11-27  0:29 ` Jay Vosburgh
  2018-11-27 11:44   ` Toni Peltonen
  0 siblings, 1 reply; 3+ messages in thread
From: Jay Vosburgh @ 2018-11-27  0:29 UTC (permalink / raw)
  To: Toni Peltonen; +Cc: netdev

Toni Peltonen <peltzi@peltzi.fi> wrote:

>Previously when unbinding a slave the 802.3ad implementation only told
>partner that the port is not suitable for aggregation by setting the port
>aggregation state from aggregatable to individual. This is not enough. If the
>physical layer still stays up and we only unbinded this port from the bond there
>is nothing in the aggregation status alone to prevent the partner from sending
>traffic towards us. To ensure that the partner doesn't consider this
>port at all anymore we should also disable collecting and distributing to
>signal that this actor is going away.
>
>I have tested this behaviour againts Arista EOS switches with mlx5 cards
>(physical link stays up when even when interface is down) and simulated
>the same situation virtually Linux <-> Linux with two network namespaces
>running two veth device pairs. In both cases setting aggregation to
>individual doesn't alone prevent traffic from being to sent towards this
>port given that the link stays up in partners end. Partner still keeps
>it's end in collecting + distributing state and continues until timeout is
>reached. In most cases this means we are losing the traffic partner sends
>towards our port while we wait for timeout. This is most visible with slow
>periodic time (LAPC rate slow).

	"LAPC" -> "LACP"

>Other open source implementations like Open VSwitch and libreswitch, and
>vendor implementations like Arista EOS, seem to disable collecting +
>distributing to when doing similar port disabling/detaching/removing change.
>With this patch kernel implementation would behave the same way and ensure
>partner doesn't consider our actor viable anymore.

	After re-reading the relevant bits of 802.1AX (particularly
5.4.9 on recordPDU and update_Selected) I'm going to suggest also
clearing AD_STATE_SYNCHRONIZATION, based on:

	Partner_Oper_Port_State.Synchronization is also set to TRUE if
	the value of Actor_State.Aggregation in the received PDU is set
	to FALSE (i.e., indicates an Individual link),
	Actor_State.Synchronization in the received PDU is set to TRUE,
	and LACP will actively maintain the link.

	Per the above, learing _SYNC in the LACPDU should un-sync the
port, inducing the Mux state machine (figure 5-15) to exit C_D state and
go to ATTACHED state (disabling Coll/Dist).

	But, either way, as this is a hint to get the link partner to
stop using the port, this looks reasonable to me.  

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>

	-J	

>Signed-off-by: Toni Peltonen <peltzi@peltzi.fi>
>---
> drivers/net/bonding/bond_3ad.c | 2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index f43fb2f958a5..6776c33753dc 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -2086,6 +2086,8 @@ void bond_3ad_unbind_slave(struct slave *slave)
> 		   aggregator->aggregator_identifier);
> 
> 	/* Tell the partner that this port is not suitable for aggregation */
>+	port->actor_oper_port_state &= ~AD_STATE_COLLECTING;
>+	port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING;
> 	port->actor_oper_port_state &= ~AD_STATE_AGGREGATION;
> 	__update_lacpdu_from_port(port);
> 	ad_lacpdu_send(port);
>-- 
>2.19.0

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad state sent to partner when unbinding slave
  2018-11-27  0:29 ` Jay Vosburgh
@ 2018-11-27 11:44   ` Toni Peltonen
  0 siblings, 0 replies; 3+ messages in thread
From: Toni Peltonen @ 2018-11-27 11:44 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

On Mon, Nov 26, 2018 at 04:29:58PM -0800, Jay Vosburgh wrote:
> Toni Peltonen <peltzi@peltzi.fi> wrote:
> 
> >Previously when unbinding a slave the 802.3ad implementation only told
> >partner that the port is not suitable for aggregation by setting the port
> >aggregation state from aggregatable to individual. This is not enough. If the
> >physical layer still stays up and we only unbinded this port from the bond there
> >is nothing in the aggregation status alone to prevent the partner from sending
> >traffic towards us. To ensure that the partner doesn't consider this
> >port at all anymore we should also disable collecting and distributing to
> >signal that this actor is going away.
> >
> >I have tested this behaviour againts Arista EOS switches with mlx5 cards
> >(physical link stays up when even when interface is down) and simulated
> >the same situation virtually Linux <-> Linux with two network namespaces
> >running two veth device pairs. In both cases setting aggregation to
> >individual doesn't alone prevent traffic from being to sent towards this
> >port given that the link stays up in partners end. Partner still keeps
> >it's end in collecting + distributing state and continues until timeout is
> >reached. In most cases this means we are losing the traffic partner sends
> >towards our port while we wait for timeout. This is most visible with slow
> >periodic time (LAPC rate slow).
> 
> 	"LAPC" -> "LACP"
> 

I will fix the typing error and send updated version.

> >Other open source implementations like Open VSwitch and libreswitch, and
> >vendor implementations like Arista EOS, seem to disable collecting +
> >distributing to when doing similar port disabling/detaching/removing change.
> >With this patch kernel implementation would behave the same way and ensure
> >partner doesn't consider our actor viable anymore.
> 
> 	After re-reading the relevant bits of 802.1AX (particularly
> 5.4.9 on recordPDU and update_Selected) I'm going to suggest also
> clearing AD_STATE_SYNCHRONIZATION, based on:
> 
> 	Partner_Oper_Port_State.Synchronization is also set to TRUE if
> 	the value of Actor_State.Aggregation in the received PDU is set
> 	to FALSE (i.e., indicates an Individual link),
> 	Actor_State.Synchronization in the received PDU is set to TRUE,
> 	and LACP will actively maintain the link.
> 
> 	Per the above, learing _SYNC in the LACPDU should un-sync the
> port, inducing the Mux state machine (figure 5-15) to exit C_D state and
> go to ATTACHED state (disabling Coll/Dist).
> 
> 	But, either way, as this is a hint to get the link partner to
> stop using the port, this looks reasonable to me.  

After looking more closely into that I agree. It seems that also
clearing the AD_STATE_SYNCHRONIZATION would make sense here. I will test
this and send a new version with it if it doesn't cause any side
effects.

> 
> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
> 
> 	-J	
> 
> >Signed-off-by: Toni Peltonen <peltzi@peltzi.fi>
> >---
> > drivers/net/bonding/bond_3ad.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >index f43fb2f958a5..6776c33753dc 100644
> >--- a/drivers/net/bonding/bond_3ad.c
> >+++ b/drivers/net/bonding/bond_3ad.c
> >@@ -2086,6 +2086,8 @@ void bond_3ad_unbind_slave(struct slave *slave)
> > 		   aggregator->aggregator_identifier);
> > 
> > 	/* Tell the partner that this port is not suitable for aggregation */
> >+	port->actor_oper_port_state &= ~AD_STATE_COLLECTING;
> >+	port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING;
> > 	port->actor_oper_port_state &= ~AD_STATE_AGGREGATION;
> > 	__update_lacpdu_from_port(port);
> > 	ad_lacpdu_send(port);
> >-- 
> >2.19.0
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-11-27 22:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26 22:54 [PATCH net] bonding: fix 802.3ad state sent to partner when unbinding slave Toni Peltonen
2018-11-27  0:29 ` Jay Vosburgh
2018-11-27 11:44   ` Toni Peltonen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.