Re: [RFC net-next] net: track locally triggered link loss

From: Jakub Kicinski <kuba@kernel.org>
To: Saeed Mahameed <saeedm@nvidia.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
	netdev@vger.kernel.org, linux@armlinux.org.uk, olteanv@gmail.com,
	hkallweit1@gmail.com, f.fainelli@gmail.com,
	michael.chan@broadcom.com
Subject: Re: [RFC net-next] net: track locally triggered link loss
Date: Fri, 20 May 2022 16:03:19 -0700	[thread overview]
Message-ID: <20220520160319.15ed87b9@kernel.org> (raw)
In-Reply-To: <20220520220832.kh4lndzy7hvyus6f@sx1>

On Fri, 20 May 2022 15:08:32 -0700 Saeed Mahameed wrote:
> >I'm not sure this is a good example. If the PHY is doing an autoneg,
> >the link really is down for around a second. The link peer will also
> >so the link go down and come back up. So this seems like a legitimate
> >time to set the carrier off and then back on again.
> >  
> +1
> 
> also looks very racy, what happens if a real phy link event happens in
> between carrier_change_start() and carrier_change_end() ?

But physical world is racy. What if the link flaps twice while the IRQs
are masked? There will always be cases where we miss or miscount events.

> I think you shouldn't treat configuration flows where the driver actually
> toggles the phy link as a special case, they should be counted as a real
> link flap.. because they are.

That's not the direction of the patch at all - I'm counting locally
generated events, I don't care as much if the link went down or not.

I believe that creating a system which would at scale try to correlate
events between peers is impractical.

> It's impossible from the driver level to know if a FW link event is
> due to configuration causes or external forces !

You mean because FW or another entity (other than local host) asked for
the link to be reset? How is that different from switch taking it down?
Either way the host has lost link due to a non-local event. (3a) or (3b)

> the new 3 APIs are going to be a heavy burden on drivers to maintain. if
> you agree with the above and treat all phy link events as one, then we end
> up with one new API drivers has to maintain "net_if_carrier_admin_off()"
> which is manageable.

I really don't think it's that hard...

> But what about SW netdevices, should all of them change to use the "admin"
> version ?

The new statistic is an opt-in (via netdev->has_carrier_down_local)
I think the same rules as to real devices should apply to SW devices
but I don't intend to implement the changes for any.

> We should keep current carrier logic as is and add new state/counter
> to count real phy link state.
> 
> netif_phy_link_down(netdev) {
>     set_bit(__LINK_STATE_NOPHYLINK, &dev->state);
>     atomic_inc(netdev->phy_link_down);
>     netif_carrier_off(ndetdev);
> }
> 
> netif_phy_link_up(netdev) {...}
> 
> such API should be maintained by real HW device drivers. 

"phy_link_down" has a ring of "API v2 this time we'll get it right".

Does this differentiate between locally vs non-locally generated events?

PTAL at the categorization in the commit message. There are three
classes of events, we need three counters. local vs non-local and
link went down vs flow was paused by SW are independent and overlapping.
Doesn't matter what the counters are called, translating between them 
is basic math.