All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Link down reasons
       [not found] <AM0PR0502MB38261D4F4F7A3BB5E0FDCD10D7B10@AM0PR0502MB3826.eurprd05.prod.outlook.com>
@ 2020-05-27 21:38 ` Andrew Lunn
  2020-05-28  5:56   ` Amit Cohen
  2020-05-28  8:40 ` Oleksij Rempel
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Lunn @ 2020-05-27 21:38 UTC (permalink / raw)
  To: Amit Cohen; +Cc: mlxsw, netdev, o.rempel

On Wed, May 27, 2020 at 03:41:22PM +0000, Amit Cohen wrote:
> Hi Andrew,
> 
> We are planning to send a set that exposes link-down reason in ethtool.
> 
> It seems that the ability of your set “Ethernet cable test support” can be
> integrated with link-down reason.
> 
>  
> 
> The idea is to expose reason and subreason (if there is):
> 
> $ ethtool ethX
> 
> …
> 
> Link detected: no (No cable) // No sub reason
> 
>  
> 
> $ ethtool ethY
> 
> Link detected: no (Autoneg failure, No partner detected)
> 
>  
> 
> Currently we have reason “cable issue” and subreasons “unsupported cable” and
> “shorted cable”.
> 
> The mechanism of cable test can be integrated and allow us report “cable issue”
> reason and “shorted cable” subreason.

Hi Amit

I don't really see them being combinable. First off, your API seems
too limiting. How do you say which pair is broken, or at what
distance? What about open cable, as opposed to shorted cable?

So i would suggest:

Link detected: no (cable issue)

And then recommend the user uses ethtool --cable-test to get all the
details, and you have a much more flexible API to provide as much or
as little information as you have.

   Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Link down reasons
  2020-05-27 21:38 ` Link down reasons Andrew Lunn
@ 2020-05-28  5:56   ` Amit Cohen
  2020-05-28  9:12     ` Petr Machata
  0 siblings, 1 reply; 11+ messages in thread
From: Amit Cohen @ 2020-05-28  5:56 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: mlxsw, netdev, o.rempel

Andrew Lunn <andrew@lunn.ch> writes:

>On Wed, May 27, 2020 at 03:41:22PM +0000, Amit Cohen wrote:
>> Hi Andrew,
>> 
>> We are planning to send a set that exposes link-down reason in ethtool.
>> 
>> It seems that the ability of your set “Ethernet cable test support” 
>> can be integrated with link-down reason.
>> 
>>  
>> 
>> The idea is to expose reason and subreason (if there is):
>> 
>> $ ethtool ethX
>> 
>> …
>> 
>> Link detected: no (No cable) // No sub reason
>> 
>>  
>> 
>> $ ethtool ethY
>> 
>> Link detected: no (Autoneg failure, No partner detected)
>> 
>>  
>> 
>> Currently we have reason “cable issue” and subreasons “unsupported 
>> cable” and “shorted cable”.
>> 
>> The mechanism of cable test can be integrated and allow us report “cable issue”
>> reason and “shorted cable” subreason.
>
>Hi Amit
>
>I don't really see them being combinable. First off, your API seems too limiting. How do you say which pair is broken, or at what distance? What about open cable, as opposed to shorted cable?
>
>So i would suggest:
>
>Link detected: no (cable issue)
>
>And then recommend the user uses ethtool --cable-test to get all the details, and you have a much more flexible API to provide as much or as little information as you have.
>
>   Andrew

Thanks!
Link-down reason has to consider cable-test or not? In order to report "cable issue", we assume that the driver implemented link-down reason in addition to cable-test?
I'm asking about PHY driver for example that implemented cable-test and not link-down reason, so according to cable-test we should report "cable issue" as a link-down reason or do not expose reason here?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Link down reasons
       [not found] <AM0PR0502MB38261D4F4F7A3BB5E0FDCD10D7B10@AM0PR0502MB3826.eurprd05.prod.outlook.com>
  2020-05-27 21:38 ` Link down reasons Andrew Lunn
@ 2020-05-28  8:40 ` Oleksij Rempel
  2020-05-28  9:22   ` Petr Machata
  1 sibling, 1 reply; 11+ messages in thread
From: Oleksij Rempel @ 2020-05-28  8:40 UTC (permalink / raw)
  To: Amit Cohen; +Cc: andrew, mlxsw, netdev

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

On Wed, May 27, 2020 at 03:41:22PM +0000, Amit Cohen wrote:
> Hi Andrew,
> We are planning to send a set that exposes link-down reason in ethtool.
> It seems that the ability of your set "Ethernet cable test support" can be integrated with link-down reason.
> 
> The idea is to expose reason and subreason (if there is):
> $ ethtool ethX
> ...
> Link detected: no (No cable) // No sub reason
> 
> $ ethtool ethY
> Link detected: no (Autoneg failure, No partner detected)
> 
> Currently we have reason "cable issue" and subreasons "unsupported cable" and "shorted cable".
> The mechanism of cable test can be integrated and allow us report "cable issue" reason and "shorted cable" subreason.
> 
> However, the way the kernel reports the results of the cable test (cable-test, some seconds, get-result) may be problematic with link-down reason concept.
> We can ignore this information when reporting link-down reason, or report only if testing cable have a result,
> or maybe start testing cable and meanwhile report something like "discovery in progress". But all the options are not perfect.
> 
> We would like to know your opinion about it.

I would add some more reasons:
- master slave resolution issues: both link partners are master or
  slave.
- signal quality drop. In this case driver should be extended to notify
  the system if SQI is under some configurable limit.


Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Link down reasons
  2020-05-28  5:56   ` Amit Cohen
@ 2020-05-28  9:12     ` Petr Machata
  2020-05-28 15:40       ` Andrew Lunn
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Machata @ 2020-05-28  9:12 UTC (permalink / raw)
  To: Amit Cohen; +Cc: Andrew Lunn, mlxsw, netdev, o.rempel


Amit Cohen <amitc@mellanox.com> writes:

> Andrew Lunn <andrew@lunn.ch> writes:
>
>>On Wed, May 27, 2020 at 03:41:22PM +0000, Amit Cohen wrote:
>>> Hi Andrew,
>>>
>>> We are planning to send a set that exposes link-down reason in ethtool.
>>>
>>> It seems that the ability of your set “Ethernet cable test support”
>>> can be integrated with link-down reason.
>>>
>>>
>>>
>>> The idea is to expose reason and subreason (if there is):
>>>
>>> $ ethtool ethX
>>>
>>> …
>>>
>>> Link detected: no (No cable) // No sub reason
>>>
>>>
>>>
>>> $ ethtool ethY
>>>
>>> Link detected: no (Autoneg failure, No partner detected)
>>>
>>>
>>>
>>> Currently we have reason “cable issue” and subreasons “unsupported
>>> cable” and “shorted cable”.
>>>
>>> The mechanism of cable test can be integrated and allow us report “cable issue”
>>> reason and “shorted cable” subreason.
>>
>>I don't really see them being combinable. First off, your API seems
>>too limiting. How do you say which pair is broken, or at what
>>distance? What about open cable, as opposed to shorted cable?

Under the proposed API, open cable and shorted cable would be two
different subreasons of "cable issue" reason.

>>So i would suggest:
>>
>>Link detected: no (cable issue)
>>
>>And then recommend the user uses ethtool --cable-test to get all the
>>details, and you have a much more flexible API to provide as much or
>>as little information as you have.

Yeah, the API that we are thinking of makes this possible.

Andrew, pardon my ignorance in these matters, can a PHY driver in
general determine that the issue is with the cable, even without running
the fairly expensive cable test? I.e. is it reasonable to expect that
the driver can tell us, yeah, the cable is baked (=> cable issue), but
please run the cable test to learn the details?

Ideally we would just use the cable test when trying to determine the
link-down reason, but the peculiar reporting mechanism and the fact that
it is a lengthy operation prevent this.

> Link-down reason has to consider cable-test or not? In order to report
> "cable issue", we assume that the driver implemented link-down reason
> in addition to cable-test?

Note that the "driver" here can refer both to a MAC driver as well as a
PHY driver. Ethtool can in principle do a cascade of callbacks to
different subsystems when trying to figure out why the link is down.

> I'm asking about PHY driver for example that implemented cable-test
> and not link-down reason, so according to cable-test we should report
> "cable issue" as a link-down reason or do not expose reason here?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Link down reasons
  2020-05-28  8:40 ` Oleksij Rempel
@ 2020-05-28  9:22   ` Petr Machata
  2020-05-28 10:28     ` Oleksij Rempel
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Machata @ 2020-05-28  9:22 UTC (permalink / raw)
  To: Oleksij Rempel; +Cc: Amit Cohen, andrew, mlxsw, netdev


Oleksij Rempel <o.rempel@pengutronix.de> writes:

> I would add some more reasons:
> - master slave resolution issues: both link partners are master or
>   slave.

I guess we should send the RFC, so that we can talk particulars. We
currently don't have anything like master/slave mismatch in the API, but
that's just because our FW does not report this. The idea is that if MAC
and/or PHY driver can't express some fail using the existing codes, it
creates a new one.

> - signal quality drop. In this case driver should be extended to notify
>   the system if SQI is under some configurable limit.

As SQI goes down, will the PHY driver eventually shut down the port?

Because if yes, that's exactly the situation when it would later report,
yeah, the link is down because SQI was rubbish. In the proposed API, we
would model this as "signal integrity issue", with a possible subreason
of "low SQI", or something along those lines.

E.g., mlxsw can report module temperatures. But whether the port goes
down is a separate mechanism. So when a port is down, the driver can
tell you, yeah, it is down, because it was overheated. And separate from
that you can check the module temperatures. SQI might be a similar
issue.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Link down reasons
  2020-05-28  9:22   ` Petr Machata
@ 2020-05-28 10:28     ` Oleksij Rempel
  0 siblings, 0 replies; 11+ messages in thread
From: Oleksij Rempel @ 2020-05-28 10:28 UTC (permalink / raw)
  To: Petr Machata; +Cc: Amit Cohen, andrew, mlxsw, netdev

On Thu, May 28, 2020 at 11:22:47AM +0200, Petr Machata wrote:
> 
> Oleksij Rempel <o.rempel@pengutronix.de> writes:
> 
> > I would add some more reasons:
> > - master slave resolution issues: both link partners are master or
> >   slave.
> 
> I guess we should send the RFC, so that we can talk particulars. We
> currently don't have anything like master/slave mismatch in the API, but
> that's just because our FW does not report this. The idea is that if MAC
> and/or PHY driver can't express some fail using the existing codes, it
> creates a new one.

ok

> > - signal quality drop. In this case driver should be extended to notify
> >   the system if SQI is under some configurable limit.
> 
> As SQI goes down, will the PHY driver eventually shut down the port?

Not in current implementation. But it is possible at least with
nxp_tja11xx PHY.

> Because if yes, that's exactly the situation when it would later report,
> yeah, the link is down because SQI was rubbish. In the proposed API, we
> would model this as "signal integrity issue", with a possible subreason
> of "low SQI", or something along those lines.
> 
> E.g., mlxsw can report module temperatures. But whether the port goes
> down is a separate mechanism. So when a port is down, the driver can
> tell you, yeah, it is down, because it was overheated. And separate from
> that you can check the module temperatures. SQI might be a similar
> issue.

nxp_tja11xx can go down or can't go up on under vlotage or over temerature
error. So, I assume this are two more possible link-down reasons.


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Link down reasons
  2020-05-28  9:12     ` Petr Machata
@ 2020-05-28 15:40       ` Andrew Lunn
  2020-05-28 16:54         ` Petr Machata
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Lunn @ 2020-05-28 15:40 UTC (permalink / raw)
  To: Petr Machata; +Cc: Amit Cohen, mlxsw, netdev, o.rempel

> Andrew, pardon my ignorance in these matters, can a PHY driver in
> general determine that the issue is with the cable, even without running
> the fairly expensive cable test?

No. To diagnose a problem, you need the link to be idle. If the link
peer is sending frames, they interfere with TDR. So all the cable
testing i've seen first manipulates the auto-negotiation to make the
link peer go quiet. That takes 1 1/2 seconds. There are some
optimizations possible, e.g. if the cable is so broken it never
establishes link, you can skip this. But Ethernet tends to be robust,
it drops back to 100Mbps only using two pairs if one of the four pairs
is broken, for example.

   Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Link down reasons
  2020-05-28 15:40       ` Andrew Lunn
@ 2020-05-28 16:54         ` Petr Machata
  2020-05-28 17:17           ` Michal Kubecek
  2020-05-28 18:37           ` Andrew Lunn
  0 siblings, 2 replies; 11+ messages in thread
From: Petr Machata @ 2020-05-28 16:54 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Amit Cohen, mlxsw, netdev, o.rempel


Andrew Lunn <andrew@lunn.ch> writes:

>> Andrew, pardon my ignorance in these matters, can a PHY driver in
>> general determine that the issue is with the cable, even without running
>> the fairly expensive cable test?
>
> No. To diagnose a problem, you need the link to be idle. If the link
> peer is sending frames, they interfere with TDR. So all the cable
> testing i've seen first manipulates the auto-negotiation to make the
> link peer go quiet. That takes 1 1/2 seconds. There are some
> optimizations possible, e.g. if the cable is so broken it never
> establishes link, you can skip this. But Ethernet tends to be robust,
> it drops back to 100Mbps only using two pairs if one of the four pairs
> is broken, for example.

OK, thanks. I suspect our FW is doing this behind the scenes, because it
can report a shorted cable.

In another e-mail you suggested this:

    Link detected: no (cable issue)

But if the link just silently falls back to 100Mbps, there would never
be an opportunity for phy to actually report a down reason. So there
probably is no way for the phy layer to make use of this particular
down reason.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Link down reasons
  2020-05-28 16:54         ` Petr Machata
@ 2020-05-28 17:17           ` Michal Kubecek
  2020-05-28 18:37           ` Andrew Lunn
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Kubecek @ 2020-05-28 17:17 UTC (permalink / raw)
  To: netdev; +Cc: Petr Machata, Andrew Lunn, Amit Cohen, mlxsw, o.rempel

On Thu, May 28, 2020 at 06:54:24PM +0200, Petr Machata wrote:
> In another e-mail you suggested this:
> 
>     Link detected: no (cable issue)
> 
> But if the link just silently falls back to 100Mbps, there would never
> be an opportunity for phy to actually report a down reason. So there
> probably is no way for the phy layer to make use of this particular
> down reason.

Perhaps we could use more general name than "link down reason", e.g.
"extended state", and it could be reported even if the link is still up
(if there is something to report).

Michal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Link down reasons
  2020-05-28 16:54         ` Petr Machata
  2020-05-28 17:17           ` Michal Kubecek
@ 2020-05-28 18:37           ` Andrew Lunn
  2020-05-29  9:03             ` Petr Machata
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Lunn @ 2020-05-28 18:37 UTC (permalink / raw)
  To: Petr Machata; +Cc: Amit Cohen, mlxsw, netdev, o.rempel

On Thu, May 28, 2020 at 06:54:24PM +0200, Petr Machata wrote:
> 
> Andrew Lunn <andrew@lunn.ch> writes:
> 
> >> Andrew, pardon my ignorance in these matters, can a PHY driver in
> >> general determine that the issue is with the cable, even without running
> >> the fairly expensive cable test?
> >
> > No. To diagnose a problem, you need the link to be idle. If the link
> > peer is sending frames, they interfere with TDR. So all the cable
> > testing i've seen first manipulates the auto-negotiation to make the
> > link peer go quiet. That takes 1 1/2 seconds. There are some
> > optimizations possible, e.g. if the cable is so broken it never
> > establishes link, you can skip this. But Ethernet tends to be robust,
> > it drops back to 100Mbps only using two pairs if one of the four pairs
> > is broken, for example.
> 
> OK, thanks. I suspect our FW is doing this behind the scenes, because it
> can report a shorted cable.
> 
> In another e-mail you suggested this:
> 
>     Link detected: no (cable issue)
> 
> But if the link just silently falls back to 100Mbps, there would never
> be an opportunity for phy to actually report a down reason. So there
> probably is no way for the phy layer to make use of this particular
> down reason.

It is called downshift. And we have support for it in the phylib core,
if the PHY has the needed vendor register.

https://elixir.bootlin.com/linux/v5.7-rc7/source/drivers/net/phy/phy-core.c#L341
https://elixir.bootlin.com/linux/v5.7-rc7/source/drivers/net/phy/phy.c#L95

There are also standard phylib/ethtool ways to configure it, how many
times the PHY should try to establish a 1G link before downshifting to
100M.

So in theory we could report:

Link detected: yes (downshifted)

Assuming your proposed API support a reason why it is up, not just a
reason why it is down?

	Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Link down reasons
  2020-05-28 18:37           ` Andrew Lunn
@ 2020-05-29  9:03             ` Petr Machata
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Machata @ 2020-05-29  9:03 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Amit Cohen, mlxsw, netdev, o.rempel, Michal Kubecek


Andrew Lunn <andrew@lunn.ch> writes:

> It is called downshift. And we have support for it in the phylib core,
> if the PHY has the needed vendor register.
>
> https://elixir.bootlin.com/linux/v5.7-rc7/source/drivers/net/phy/phy-core.c#L341
> https://elixir.bootlin.com/linux/v5.7-rc7/source/drivers/net/phy/phy.c#L95

Thanks for the references.

> So in theory we could report:
>
> Link detected: yes (downshifted)
>
> Assuming your proposed API support a reason why it is up, not just a
> reason why it is down?

Michal Kubecek <mkubecek@suse.cz> writes:

> Perhaps we could use more general name than "link down reason", e.g.
> "extended state", and it could be reported even if the link is still up

All right, that makes sense to me. Let's make it extended state.

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-05-29  9:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <AM0PR0502MB38261D4F4F7A3BB5E0FDCD10D7B10@AM0PR0502MB3826.eurprd05.prod.outlook.com>
2020-05-27 21:38 ` Link down reasons Andrew Lunn
2020-05-28  5:56   ` Amit Cohen
2020-05-28  9:12     ` Petr Machata
2020-05-28 15:40       ` Andrew Lunn
2020-05-28 16:54         ` Petr Machata
2020-05-28 17:17           ` Michal Kubecek
2020-05-28 18:37           ` Andrew Lunn
2020-05-29  9:03             ` Petr Machata
2020-05-28  8:40 ` Oleksij Rempel
2020-05-28  9:22   ` Petr Machata
2020-05-28 10:28     ` Oleksij Rempel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.