netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
@ 2015-04-27 17:38 anuradhak
  2015-04-28  5:45 ` Scott Feldman
  2015-04-29 22:08 ` Stephen Hemminger
  0 siblings, 2 replies; 9+ messages in thread
From: anuradhak @ 2015-04-27 17:38 UTC (permalink / raw)
  To: davem, sfeldma; +Cc: netdev, roopa, gospo, wkok, anuradhak

From: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>

User space daemons can detect errors in the network that need to be
notified to the switch device drivers. 

Drivers can react to this error state by doing a phy-down on the
switch-port which would result in a carrier-off locally and on the
directly connected switch. Doing that would prevent loops and
black-holes in the network.

One such use case is the multi-chassis LAG application -
1. The MLAG application runs on peer switches (say Switch0 and Switch1)
   synchronizing states, forwarding entries etc. between the two
   switches over the peer-link (this is a link directly connecting the
   two switches).
2. An MLAG election process designates one of the switches as a primary
   (for e.g. Switch0 is primary and Switch1 is secondary). 
3. The peer link plays a critical role in allowing Switch0-Switch1 to
   function as a single LAG partner to the downstream dual-connected
   servers. When the peer-link between the switches goes down we have a
   split-brain situation. Switch0 and Switch1 are no longer in sync and
   are acting independently. This can result in traffic loops and
   traffic black-holing in the network. 
4. To prevent these problems the MLAG application on the secondary
   switch phy-downs the MLAG ports on detecting the peer-link down.
   This will be seen as a carrier down on servers that are
   dual-connected to Switch0 and Switch1.
5. Specifically a dual-connected server will see a carrier-down on the
   port connected to the MLAG secondary, Switch1, and will stop using
   that port for traffic TX. So traffic black holing is prevented.

v2 to v3:
   In response to Dave’s comments I have tried to make IFF_PROTODOWN
   more easily consumable by providing switchdev APIs to control the
   phy state of the switch port. The use case is relevant primarily to
   switch drivers at this point. That is the reason for making the
   change in rocker (commonly used switch driver example).

   One other change that could be done is to bring back the net-core
   change to hold the oper state down in response to IFF_PROTO_DOWN.
   This would be a driver agnostic change and the phy-down could be done
   in addition by interested switch drivers.

v1 to v2:
   Based on Dave's suggestion I have moved out aggregating of error bits
   across applications to a user space framework. This patch now simply
   notifies an aggregated error bit to drivers enabling them to handle
   the error gracefully.


Anuradha Karuppiah (4):
  net core: Add IFF_PROTO_DOWN support.
  switchdev: APIs for setting physical state of the switch port.
  rocker: Handle IFF_PROTODOWN by doing a PHYS-DOWN on the switch port.
  ip link: Config and display IFF_PROTO_DOWN flag.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>

 drivers/net/ethernet/rocker/rocker.c |   16 +++++++++++++++-
 include/net/switchdev.h              |   12 ++++++++++++
 include/uapi/linux/if.h              |    4 ++++
 net/8021q/vlan_dev.c                 |    3 ++-
 net/core/dev.c                       |    8 +++++++-
 net/switchdev/switchdev.c            |   23 +++++++++++++++++++++++
 6 files changed, 63 insertions(+), 3 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
  2015-04-27 17:38 [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag anuradhak
@ 2015-04-28  5:45 ` Scott Feldman
  2015-04-28 15:43   ` Anuradha Karuppiah
       [not found]   ` <CACcJQnRw5HVUb0M3A2u_zbMtp85pi+kdCUa5gaY6cN4HXpVyeQ@mail.gmail.com>
  2015-04-29 22:08 ` Stephen Hemminger
  1 sibling, 2 replies; 9+ messages in thread
From: Scott Feldman @ 2015-04-28  5:45 UTC (permalink / raw)
  To: anuradhak
  Cc: David S. Miller, Netdev, Roopa Prabhu, Andy Gospodarek, Wilson Kok

On Mon, Apr 27, 2015 at 10:38 AM,  <anuradhak@cumulusnetworks.com> wrote:
> From: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
>
> User space daemons can detect errors in the network that need to be
> notified to the switch device drivers.
>
> Drivers can react to this error state by doing a phy-down on the
> switch-port which would result in a carrier-off locally and on the
> directly connected switch. Doing that would prevent loops and
> black-holes in the network.

(Sorry if this was asked earlier)

Can the application simply send a SETLINK with IFF_UP clear and the
port driver's ndo_stop would bring the PHY link down?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
  2015-04-28  5:45 ` Scott Feldman
@ 2015-04-28 15:43   ` Anuradha Karuppiah
       [not found]   ` <CACcJQnRw5HVUb0M3A2u_zbMtp85pi+kdCUa5gaY6cN4HXpVyeQ@mail.gmail.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Anuradha Karuppiah @ 2015-04-28 15:43 UTC (permalink / raw)
  Cc: Netdev

On Mon, Apr 27, 2015 at 10:45 PM, Scott Feldman <sfeldma@gmail.com> wrote:
> On Mon, Apr 27, 2015 at 10:38 AM,  <anuradhak@cumulusnetworks.com> wrote:
>> From: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
>>
>> User space daemons can detect errors in the network that need to be
>> notified to the switch device drivers.
>>
>> Drivers can react to this error state by doing a phy-down on the
>> switch-port which would result in a carrier-off locally and on the
>> directly connected switch. Doing that would prevent loops and
>> black-holes in the network.
>
> (Sorry if this was asked earlier)
>
> Can the application simply send a SETLINK with IFF_UP clear and the
> port driver's ndo_stop would bring the PHY link down?

(Re-sending as plain text) -

Yes, Clearing IFF_UP on detecting errors (PROTO_DOWN) is possible and we tried
that implementation as well. Unfortunately it failed because of the following
reasons -

1. There is no way to disambiguate between admin_down (!IFF_UP) and an
APP/driver enforced error_down (IFF_PROTO_DOWN). Administrator or
automatation-scripts that monitor the config assumed that switch-port
configuration had somehow fallen out of sync (and attempted to reinstate the
admin_up repeatedly).

2. Automatic error recovery was not possible; consider the following scenario
for e.g.
   a. The MLAG peer-link is down so the MLAG app on the secondary switch has
      proto_down’ed all the MLAG ports (including switch-port swp1) by clearing
      IFF_UP.
   b. At the same time the administrator is in the process of making some
      changes on the network connected to swp1. To avoid doing it live he would
      admin_disable swp1 (!IFF_UP) by doing an "ip link set swp1 down" (this
      is a no-op as event #a has already cleared IFF_UP on swp1).
   c. If the MLAG peer-link recovers at this point the MLAG app on the
      secondary switch would try to automatically recover the MLAG ports
      by clearing proto_down (i.e. setting IFF_UP); including on swp1. Doing
      that overrides the administrator’s directive to keep swp1 admin_down.
      Overriding an admin-down in a live network can be very dangerous so it
      is not possible to do auto-error-recovery unless we have a way to
      disambiguate between the admin and error states.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
       [not found]   ` <CACcJQnRw5HVUb0M3A2u_zbMtp85pi+kdCUa5gaY6cN4HXpVyeQ@mail.gmail.com>
@ 2015-04-28 19:37     ` Scott Feldman
  2015-04-28 20:04       ` Anuradha Karuppiah
  0 siblings, 1 reply; 9+ messages in thread
From: Scott Feldman @ 2015-04-28 19:37 UTC (permalink / raw)
  To: Anuradha Karuppiah
  Cc: David S. Miller, Netdev, Roopa Prabhu, Andy Gospodarek, Wilson Kok

On Tue, Apr 28, 2015 at 8:39 AM, Anuradha Karuppiah
<anuradhak@cumulusnetworks.com> wrote:
>
>
> On Mon, Apr 27, 2015 at 10:45 PM, Scott Feldman <sfeldma@gmail.com> wrote:
>>
>> On Mon, Apr 27, 2015 at 10:38 AM,  <anuradhak@cumulusnetworks.com> wrote:
>> > From: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
>> >
>> > User space daemons can detect errors in the network that need to be
>> > notified to the switch device drivers.
>> >
>> > Drivers can react to this error state by doing a phy-down on the
>> > switch-port which would result in a carrier-off locally and on the
>> > directly connected switch. Doing that would prevent loops and
>> > black-holes in the network.
>>
>> (Sorry if this was asked earlier)
>>
>> Can the application simply send a SETLINK with IFF_UP clear and the
>> port driver's ndo_stop would bring the PHY link down?
>
>
> Yes, Clearing IFF_UP on detecting errors (PROTO_DOWN) is possible and we
> tried
> that implementation as well. Unfortunately it failed because of the
> following
> reasons -
>
> 1. There is no way to disambiguate between admin_down (!IFF_UP) and an
> APP/driver enforced error_down (IFF_PROTO_DOWN). Administrator or
> automation-scripts that monitor the config assumed that switch-port
> configuration had somehow fallen out of sync (and attempted to reinstate the
> admin_up repeatedly).
>
> 2. Automatic error recovery was not possible; consider the following
> scenario
> for e.g.
>    a. The MLAG peer-link is down so the MLAG app on the secondary switch has
>       proto_down’ed all the MLAG ports (including switch-port swp1) by
> clearing
>       IFF_UP.
>    b. At the same time the administrator is in the process of making some
>       changes on the network connected to swp1. To avoid doing it live he
> would
>       admin_disable swp1 (!IFF_UP) by doing an "ip link set swp1 down" (this
>       is a no-op as event #a has already cleared IFF_UP on swp1).
>    c. If the MLAG peer-link recovers at this point the MLAG app on the
>       secondary switch would try to automatically recover the MLAG ports
>       by clearing proto_down (i.e. setting IFF_UP); including on swp1. Doing
>       that overrides the administrator’s directive to keep swp1 admin_down.
>       Overriding an admin-down in a live network can be very dangerous so it
>       is not possible to do auto-error-recovery unless we have a way to
>       disambiguate between the admin and error states

That makes sense.

Dang, this is so close to IFF_DORMANT.  The interface can be IFF_UP
and link mode can be DORMANT.  Can the port driver kill PHY link if
dev->flags&IFF_DORMANT in ndo_set_rx_mode()?  Would require
IFF_DORMANT is included in dev->flags in __dev_change_flags().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
  2015-04-28 19:37     ` Scott Feldman
@ 2015-04-28 20:04       ` Anuradha Karuppiah
  2015-04-29  0:28         ` Scott Feldman
  0 siblings, 1 reply; 9+ messages in thread
From: Anuradha Karuppiah @ 2015-04-28 20:04 UTC (permalink / raw)
  To: Scott Feldman
  Cc: David S. Miller, Netdev, Roopa Prabhu, Andy Gospodarek, Wilson Kok

On Tue, Apr 28, 2015 at 12:37 PM, Scott Feldman <sfeldma@gmail.com> wrote:
> On Tue, Apr 28, 2015 at 8:39 AM, Anuradha Karuppiah
> <anuradhak@cumulusnetworks.com> wrote:
>>
>>
>> On Mon, Apr 27, 2015 at 10:45 PM, Scott Feldman <sfeldma@gmail.com> wrote:
>>>
>>> On Mon, Apr 27, 2015 at 10:38 AM,  <anuradhak@cumulusnetworks.com> wrote:
>>> > From: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
>>> >
>>> > User space daemons can detect errors in the network that need to be
>>> > notified to the switch device drivers.
>>> >
>>> > Drivers can react to this error state by doing a phy-down on the
>>> > switch-port which would result in a carrier-off locally and on the
>>> > directly connected switch. Doing that would prevent loops and
>>> > black-holes in the network.
>>>
>>> (Sorry if this was asked earlier)
>>>
>>> Can the application simply send a SETLINK with IFF_UP clear and the
>>> port driver's ndo_stop would bring the PHY link down?
>>
>>
>> Yes, Clearing IFF_UP on detecting errors (PROTO_DOWN) is possible and we
>> tried
>> that implementation as well. Unfortunately it failed because of the
>> following
>> reasons -
>>
>> 1. There is no way to disambiguate between admin_down (!IFF_UP) and an
>> APP/driver enforced error_down (IFF_PROTO_DOWN). Administrator or
>> automation-scripts that monitor the config assumed that switch-port
>> configuration had somehow fallen out of sync (and attempted to reinstate the
>> admin_up repeatedly).
>>
>> 2. Automatic error recovery was not possible; consider the following
>> scenario
>> for e.g.
>>    a. The MLAG peer-link is down so the MLAG app on the secondary switch has
>>       proto_down’ed all the MLAG ports (including switch-port swp1) by
>> clearing
>>       IFF_UP.
>>    b. At the same time the administrator is in the process of making some
>>       changes on the network connected to swp1. To avoid doing it live he
>> would
>>       admin_disable swp1 (!IFF_UP) by doing an "ip link set swp1 down" (this
>>       is a no-op as event #a has already cleared IFF_UP on swp1).
>>    c. If the MLAG peer-link recovers at this point the MLAG app on the
>>       secondary switch would try to automatically recover the MLAG ports
>>       by clearing proto_down (i.e. setting IFF_UP); including on swp1. Doing
>>       that overrides the administrator’s directive to keep swp1 admin_down.
>>       Overriding an admin-down in a live network can be very dangerous so it
>>       is not possible to do auto-error-recovery unless we have a way to
>>       disambiguate between the admin and error states
>
> That makes sense.
>
> Dang, this is so close to IFF_DORMANT.  The interface can be IFF_UP
> and link mode can be DORMANT.  Can the port driver kill PHY link if
> dev->flags&IFF_DORMANT in ndo_set_rx_mode()?  Would require
> IFF_DORMANT is included in dev->flags in __dev_change_flags().

Yes, IFF_DORMANT does seem close to what is needed; in the current/standard
interpretation IFF_DORMANT keeps the switch port phy-up and running (and most
PDUs are also exchanged in the dormant state). Like you said we could
re-interpret IFF_DORMANT in this context to phy-down the switch-port;
unfortunately we are already using IFF_DORMANT as well (in its standard
interpretation)...

We are using the dormant mode (for the MLAG app itself) to hold the MLAG port
in a brief/transition-ary suspended state when the switch-port link/carrier up
happens. This has been done to co-ordinate states across the MLAG peer switches
and to ensure that egress port block masks are programmed on the peer switch
before transitioning the local switch port to an OPER_UP state. If we didn't do
that the dual-connected server would see duplicate packets every time a
link-down to link-up happened on a MLAG port.

So IFF_DORMANT re-interpretation is not going to be easily possible for the
MLAG use case.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
  2015-04-28 20:04       ` Anuradha Karuppiah
@ 2015-04-29  0:28         ` Scott Feldman
  2015-04-29 22:04           ` Anuradha Karuppiah
  0 siblings, 1 reply; 9+ messages in thread
From: Scott Feldman @ 2015-04-29  0:28 UTC (permalink / raw)
  To: Anuradha Karuppiah
  Cc: David S. Miller, Netdev, Roopa Prabhu, Andy Gospodarek, Wilson Kok

On Tue, Apr 28, 2015 at 1:04 PM, Anuradha Karuppiah
<anuradhak@cumulusnetworks.com> wrote:
> On Tue, Apr 28, 2015 at 12:37 PM, Scott Feldman <sfeldma@gmail.com> wrote:
>> On Tue, Apr 28, 2015 at 8:39 AM, Anuradha Karuppiah
>> <anuradhak@cumulusnetworks.com> wrote:
>>>
>>>
>>> On Mon, Apr 27, 2015 at 10:45 PM, Scott Feldman <sfeldma@gmail.com> wrote:
>>>>
>>>> On Mon, Apr 27, 2015 at 10:38 AM,  <anuradhak@cumulusnetworks.com> wrote:
>>>> > From: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
>>>> >
>>>> > User space daemons can detect errors in the network that need to be
>>>> > notified to the switch device drivers.
>>>> >
>>>> > Drivers can react to this error state by doing a phy-down on the
>>>> > switch-port which would result in a carrier-off locally and on the
>>>> > directly connected switch. Doing that would prevent loops and
>>>> > black-holes in the network.
>>>>
>>>> (Sorry if this was asked earlier)
>>>>
>>>> Can the application simply send a SETLINK with IFF_UP clear and the
>>>> port driver's ndo_stop would bring the PHY link down?
>>>
>>>
>>> Yes, Clearing IFF_UP on detecting errors (PROTO_DOWN) is possible and we
>>> tried
>>> that implementation as well. Unfortunately it failed because of the
>>> following
>>> reasons -
>>>
>>> 1. There is no way to disambiguate between admin_down (!IFF_UP) and an
>>> APP/driver enforced error_down (IFF_PROTO_DOWN). Administrator or
>>> automation-scripts that monitor the config assumed that switch-port
>>> configuration had somehow fallen out of sync (and attempted to reinstate the
>>> admin_up repeatedly).
>>>
>>> 2. Automatic error recovery was not possible; consider the following
>>> scenario
>>> for e.g.
>>>    a. The MLAG peer-link is down so the MLAG app on the secondary switch has
>>>       proto_down’ed all the MLAG ports (including switch-port swp1) by
>>> clearing
>>>       IFF_UP.
>>>    b. At the same time the administrator is in the process of making some
>>>       changes on the network connected to swp1. To avoid doing it live he
>>> would
>>>       admin_disable swp1 (!IFF_UP) by doing an "ip link set swp1 down" (this
>>>       is a no-op as event #a has already cleared IFF_UP on swp1).
>>>    c. If the MLAG peer-link recovers at this point the MLAG app on the
>>>       secondary switch would try to automatically recover the MLAG ports
>>>       by clearing proto_down (i.e. setting IFF_UP); including on swp1. Doing
>>>       that overrides the administrator’s directive to keep swp1 admin_down.
>>>       Overriding an admin-down in a live network can be very dangerous so it
>>>       is not possible to do auto-error-recovery unless we have a way to
>>>       disambiguate between the admin and error states
>>
>> That makes sense.
>>
>> Dang, this is so close to IFF_DORMANT.  The interface can be IFF_UP
>> and link mode can be DORMANT.  Can the port driver kill PHY link if
>> dev->flags&IFF_DORMANT in ndo_set_rx_mode()?  Would require
>> IFF_DORMANT is included in dev->flags in __dev_change_flags().
>
> Yes, IFF_DORMANT does seem close to what is needed; in the current/standard
> interpretation IFF_DORMANT keeps the switch port phy-up and running (and most
> PDUs are also exchanged in the dormant state). Like you said we could
> re-interpret IFF_DORMANT in this context to phy-down the switch-port;
> unfortunately we are already using IFF_DORMANT as well (in its standard
> interpretation)...

That makes sense; best to not confuse IFF_DORMANT with this new need.

> We are using the dormant mode (for the MLAG app itself) to hold the MLAG port
> in a brief/transition-ary suspended state when the switch-port link/carrier up
> happens. This has been done to co-ordinate states across the MLAG peer switches
> and to ensure that egress port block masks are programmed on the peer switch
> before transitioning the local switch port to an OPER_UP state. If we didn't do
> that the dual-connected server would see duplicate packets every time a
> link-down to link-up happened on a MLAG port.

How can we see this in action?  I didn't find where the kernel egress
blocks the port when dormant.  What are the requirements for a kernel
port driver to support your MLAG app?  Is this MLAG app available
somewhere?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
  2015-04-29  0:28         ` Scott Feldman
@ 2015-04-29 22:04           ` Anuradha Karuppiah
  0 siblings, 0 replies; 9+ messages in thread
From: Anuradha Karuppiah @ 2015-04-29 22:04 UTC (permalink / raw)
  To: Scott Feldman
  Cc: David S. Miller, Netdev, Roopa Prabhu, Andy Gospodarek, Wilson Kok

On Tue, Apr 28, 2015 at 5:28 PM, Scott Feldman <sfeldma@gmail.com> wrote:
> On Tue, Apr 28, 2015 at 1:04 PM, Anuradha Karuppiah
> <anuradhak@cumulusnetworks.com> wrote:
>> On Tue, Apr 28, 2015 at 12:37 PM, Scott Feldman <sfeldma@gmail.com> wrote:
>>> On Tue, Apr 28, 2015 at 8:39 AM, Anuradha Karuppiah
>>> <anuradhak@cumulusnetworks.com> wrote:
>>>>
>>>>
>>>> On Mon, Apr 27, 2015 at 10:45 PM, Scott Feldman <sfeldma@gmail.com> wrote:
>>>>>
>>>>> On Mon, Apr 27, 2015 at 10:38 AM,  <anuradhak@cumulusnetworks.com> wrote:
>>>>> > From: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
>>>>> >
>>>>> > User space daemons can detect errors in the network that need to be
>>>>> > notified to the switch device drivers.
>>>>> >
>>>>> > Drivers can react to this error state by doing a phy-down on the
>>>>> > switch-port which would result in a carrier-off locally and on the
>>>>> > directly connected switch. Doing that would prevent loops and
>>>>> > black-holes in the network.
>>>>>
>>>>> (Sorry if this was asked earlier)
>>>>>
>>>>> Can the application simply send a SETLINK with IFF_UP clear and the
>>>>> port driver's ndo_stop would bring the PHY link down?
>>>>
>>>>
>>>> Yes, Clearing IFF_UP on detecting errors (PROTO_DOWN) is possible and we
>>>> tried
>>>> that implementation as well. Unfortunately it failed because of the
>>>> following
>>>> reasons -
>>>>
>>>> 1. There is no way to disambiguate between admin_down (!IFF_UP) and an
>>>> APP/driver enforced error_down (IFF_PROTO_DOWN). Administrator or
>>>> automation-scripts that monitor the config assumed that switch-port
>>>> configuration had somehow fallen out of sync (and attempted to reinstate the
>>>> admin_up repeatedly).
>>>>
>>>> 2. Automatic error recovery was not possible; consider the following
>>>> scenario
>>>> for e.g.
>>>>    a. The MLAG peer-link is down so the MLAG app on the secondary switch has
>>>>       proto_down’ed all the MLAG ports (including switch-port swp1) by
>>>> clearing
>>>>       IFF_UP.
>>>>    b. At the same time the administrator is in the process of making some
>>>>       changes on the network connected to swp1. To avoid doing it live he
>>>> would
>>>>       admin_disable swp1 (!IFF_UP) by doing an "ip link set swp1 down" (this
>>>>       is a no-op as event #a has already cleared IFF_UP on swp1).
>>>>    c. If the MLAG peer-link recovers at this point the MLAG app on the
>>>>       secondary switch would try to automatically recover the MLAG ports
>>>>       by clearing proto_down (i.e. setting IFF_UP); including on swp1. Doing
>>>>       that overrides the administrator’s directive to keep swp1 admin_down.
>>>>       Overriding an admin-down in a live network can be very dangerous so it
>>>>       is not possible to do auto-error-recovery unless we have a way to
>>>>       disambiguate between the admin and error states
>>>
>>> That makes sense.
>>>
>>> Dang, this is so close to IFF_DORMANT.  The interface can be IFF_UP
>>> and link mode can be DORMANT.  Can the port driver kill PHY link if
>>> dev->flags&IFF_DORMANT in ndo_set_rx_mode()?  Would require
>>> IFF_DORMANT is included in dev->flags in __dev_change_flags().
>>
>> Yes, IFF_DORMANT does seem close to what is needed; in the current/standard
>> interpretation IFF_DORMANT keeps the switch port phy-up and running (and most
>> PDUs are also exchanged in the dormant state). Like you said we could
>> re-interpret IFF_DORMANT in this context to phy-down the switch-port;
>> unfortunately we are already using IFF_DORMANT as well (in its standard
>> interpretation)...
>
> That makes sense; best to not confuse IFF_DORMANT with this new need.
>
>> We are using the dormant mode (for the MLAG app itself) to hold the MLAG port
>> in a brief/transition-ary suspended state when the switch-port link/carrier up
>> happens. This has been done to co-ordinate states across the MLAG peer switches
>> and to ensure that egress port block masks are programmed on the peer switch
>> before transitioning the local switch port to an OPER_UP state. If we didn't do
>> that the dual-connected server would see duplicate packets every time a
>> link-down to link-up happened on a MLAG port.
>
> How can we see this in action?  I didn't find where the kernel egress
> blocks the port when dormant.  What are the requirements for a kernel
> port driver to support your MLAG app?  Is this MLAG app available
> somewhere?

Traffic forwarding on local dormant switch ports is being done implicitly by
using MSTP which puts !OPER_UP (OPER_DOWN, OPER_DORMANT) ports in an STP
disabled/blocking state. Egress traffic blocking is really needed on the peer
switch to prevent the traffic from the peer link being sent again to
the server.We are using
ebtables for this purpose currently so there are no additional kernel
requirements.

Further details on the MLAG app can also be found at -
https://www.netdev01.org/sessions/23

We are actively working on consolidating the MLAG app and making it available
for everybody's use soon. Getting proto_down out was part of that process.

PROTO_DOWN also has other use cases - like a link-dampening app which can
monitor (and proto_down) flapping or otherwise-misbehaving switch ports and
attempt paced/periodic auto-recovery.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
  2015-04-27 17:38 [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag anuradhak
  2015-04-28  5:45 ` Scott Feldman
@ 2015-04-29 22:08 ` Stephen Hemminger
  2015-04-29 22:58   ` Anuradha Karuppiah
  1 sibling, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2015-04-29 22:08 UTC (permalink / raw)
  To: anuradhak; +Cc: davem, sfeldma, netdev, roopa, gospo, wkok

On Mon, 27 Apr 2015 10:38:20 -0700
anuradhak@cumulusnetworks.com wrote:

> From: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
> 
> User space daemons can detect errors in the network that need to be
> notified to the switch device drivers. 
> 
> Drivers can react to this error state by doing a phy-down on the
> switch-port which would result in a carrier-off locally and on the
> directly connected switch. Doing that would prevent loops and
> black-holes in the network.
> 
> One such use case is the multi-chassis LAG application -
> 1. The MLAG application runs on peer switches (say Switch0 and Switch1)
>    synchronizing states, forwarding entries etc. between the two
>    switches over the peer-link (this is a link directly connecting the
>    two switches).
> 2. An MLAG election process designates one of the switches as a primary
>    (for e.g. Switch0 is primary and Switch1 is secondary). 
> 3. The peer link plays a critical role in allowing Switch0-Switch1 to
>    function as a single LAG partner to the downstream dual-connected
>    servers. When the peer-link between the switches goes down we have a
>    split-brain situation. Switch0 and Switch1 are no longer in sync and
>    are acting independently. This can result in traffic loops and
>    traffic black-holing in the network. 
> 4. To prevent these problems the MLAG application on the secondary
>    switch phy-downs the MLAG ports on detecting the peer-link down.
>    This will be seen as a carrier down on servers that are
>    dual-connected to Switch0 and Switch1.
> 5. Specifically a dual-connected server will see a carrier-down on the
>    port connected to the MLAG secondary, Switch1, and will stop using
>    that port for traffic TX. So traffic black holing is prevented.
> 
> v2 to v3:
>    In response to Dave’s comments I have tried to make IFF_PROTODOWN
>    more easily consumable by providing switchdev APIs to control the
>    phy state of the switch port. The use case is relevant primarily to
>    switch drivers at this point. That is the reason for making the
>    change in rocker (commonly used switch driver example).
> 
>    One other change that could be done is to bring back the net-core
>    change to hold the oper state down in response to IFF_PROTO_DOWN.
>    This would be a driver agnostic change and the phy-down could be done
>    in addition by interested switch drivers.
> 
> v1 to v2:
>    Based on Dave's suggestion I have moved out aggregating of error bits
>    across applications to a user space framework. This patch now simply
>    notifies an aggregated error bit to drivers enabling them to handle
>    the error gracefully.
> 
> 
> Anuradha Karuppiah (4):
>   net core: Add IFF_PROTO_DOWN support.
>   switchdev: APIs for setting physical state of the switch port.
>   rocker: Handle IFF_PROTODOWN by doing a PHYS-DOWN on the switch port.
>   ip link: Config and display IFF_PROTO_DOWN flag.
> 
> Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
> 
>  drivers/net/ethernet/rocker/rocker.c |   16 +++++++++++++++-
>  include/net/switchdev.h              |   12 ++++++++++++
>  include/uapi/linux/if.h              |    4 ++++
>  net/8021q/vlan_dev.c                 |    3 ++-
>  net/core/dev.c                       |    8 +++++++-
>  net/switchdev/switchdev.c            |   23 +++++++++++++++++++++++
>  6 files changed, 63 insertions(+), 3 deletions(-)
> 

How does this interact with operstate?
It seems RFC2863 operstate (Documentation/network/operstates.txt) already
has concept of LOWERLAYERDOWN

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
  2015-04-29 22:08 ` Stephen Hemminger
@ 2015-04-29 22:58   ` Anuradha Karuppiah
  0 siblings, 0 replies; 9+ messages in thread
From: Anuradha Karuppiah @ 2015-04-29 22:58 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, Scott Feldman, netdev, Roopa Prabhu,
	Andy Gospodarek, Wilson Kok

On Wed, Apr 29, 2015 at 3:08 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Mon, 27 Apr 2015 10:38:20 -0700
> anuradhak@cumulusnetworks.com wrote:
>
>> From: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
>>
>> User space daemons can detect errors in the network that need to be
>> notified to the switch device drivers.
>>
>> Drivers can react to this error state by doing a phy-down on the
>> switch-port which would result in a carrier-off locally and on the
>> directly connected switch. Doing that would prevent loops and
>> black-holes in the network.
>>
>> One such use case is the multi-chassis LAG application -
>> 1. The MLAG application runs on peer switches (say Switch0 and Switch1)
>>    synchronizing states, forwarding entries etc. between the two
>>    switches over the peer-link (this is a link directly connecting the
>>    two switches).
>> 2. An MLAG election process designates one of the switches as a primary
>>    (for e.g. Switch0 is primary and Switch1 is secondary).
>> 3. The peer link plays a critical role in allowing Switch0-Switch1 to
>>    function as a single LAG partner to the downstream dual-connected
>>    servers. When the peer-link between the switches goes down we have a
>>    split-brain situation. Switch0 and Switch1 are no longer in sync and
>>    are acting independently. This can result in traffic loops and
>>    traffic black-holing in the network.
>> 4. To prevent these problems the MLAG application on the secondary
>>    switch phy-downs the MLAG ports on detecting the peer-link down.
>>    This will be seen as a carrier down on servers that are
>>    dual-connected to Switch0 and Switch1.
>> 5. Specifically a dual-connected server will see a carrier-down on the
>>    port connected to the MLAG secondary, Switch1, and will stop using
>>    that port for traffic TX. So traffic black holing is prevented.
>>
>> v2 to v3:
>>    In response to Dave’s comments I have tried to make IFF_PROTODOWN
>>    more easily consumable by providing switchdev APIs to control the
>>    phy state of the switch port. The use case is relevant primarily to
>>    switch drivers at this point. That is the reason for making the
>>    change in rocker (commonly used switch driver example).
>>
>>    One other change that could be done is to bring back the net-core
>>    change to hold the oper state down in response to IFF_PROTO_DOWN.
>>    This would be a driver agnostic change and the phy-down could be done
>>    in addition by interested switch drivers.
>>
>> v1 to v2:
>>    Based on Dave's suggestion I have moved out aggregating of error bits
>>    across applications to a user space framework. This patch now simply
>>    notifies an aggregated error bit to drivers enabling them to handle
>>    the error gracefully.
>>
>>
>> Anuradha Karuppiah (4):
>>   net core: Add IFF_PROTO_DOWN support.
>>   switchdev: APIs for setting physical state of the switch port.
>>   rocker: Handle IFF_PROTODOWN by doing a PHYS-DOWN on the switch port.
>>   ip link: Config and display IFF_PROTO_DOWN flag.
>>
>> Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
>> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>>
>>  drivers/net/ethernet/rocker/rocker.c |   16 +++++++++++++++-
>>  include/net/switchdev.h              |   12 ++++++++++++
>>  include/uapi/linux/if.h              |    4 ++++
>>  net/8021q/vlan_dev.c                 |    3 ++-
>>  net/core/dev.c                       |    8 +++++++-
>>  net/switchdev/switchdev.c            |   23 +++++++++++++++++++++++
>>  6 files changed, 63 insertions(+), 3 deletions(-)
>>
>
> How does this interact with operstate?
> It seems RFC2863 operstate (Documentation/network/operstates.txt) already
> has concept of LOWERLAYERDOWN
>

IFF_PROTO_DOWN doesn't directly change (or interact with) the RFC2683 defined
oper state machine.

It is intended to notify the switch driver that an APP detected errors on a
switch port (say swp1). And the switch driver is expected to react to the
notification by doing a phy down on the switch port (if it did have phy control
of the port via the corresponding SDK).
1. Doing that would result in a carrier down locally and on the directly
connected remote switch.
2. This in turn will be reported as a netif_carrier_off(swp1) by the switch
driver and result in an IF_OPER_LOWERLAYERDOWN on swp1 via the existing
link_watch/rfc2863_policy processing itself.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-04-29 22:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-27 17:38 [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag anuradhak
2015-04-28  5:45 ` Scott Feldman
2015-04-28 15:43   ` Anuradha Karuppiah
     [not found]   ` <CACcJQnRw5HVUb0M3A2u_zbMtp85pi+kdCUa5gaY6cN4HXpVyeQ@mail.gmail.com>
2015-04-28 19:37     ` Scott Feldman
2015-04-28 20:04       ` Anuradha Karuppiah
2015-04-29  0:28         ` Scott Feldman
2015-04-29 22:04           ` Anuradha Karuppiah
2015-04-29 22:08 ` Stephen Hemminger
2015-04-29 22:58   ` Anuradha Karuppiah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).