* Re: Link down reasons [not found] <AM0PR0502MB38261D4F4F7A3BB5E0FDCD10D7B10@AM0PR0502MB3826.eurprd05.prod.outlook.com> @ 2020-05-27 21:38 ` Andrew Lunn 2020-05-28 5:56 ` Amit Cohen 2020-05-28 8:40 ` Oleksij Rempel 1 sibling, 1 reply; 11+ messages in thread From: Andrew Lunn @ 2020-05-27 21:38 UTC (permalink / raw) To: Amit Cohen; +Cc: mlxsw, netdev, o.rempel On Wed, May 27, 2020 at 03:41:22PM +0000, Amit Cohen wrote: > Hi Andrew, > > We are planning to send a set that exposes link-down reason in ethtool. > > It seems that the ability of your set “Ethernet cable test support” can be > integrated with link-down reason. > > > > The idea is to expose reason and subreason (if there is): > > $ ethtool ethX > > … > > Link detected: no (No cable) // No sub reason > > > > $ ethtool ethY > > Link detected: no (Autoneg failure, No partner detected) > > > > Currently we have reason “cable issue” and subreasons “unsupported cable” and > “shorted cable”. > > The mechanism of cable test can be integrated and allow us report “cable issue” > reason and “shorted cable” subreason. Hi Amit I don't really see them being combinable. First off, your API seems too limiting. How do you say which pair is broken, or at what distance? What about open cable, as opposed to shorted cable? So i would suggest: Link detected: no (cable issue) And then recommend the user uses ethtool --cable-test to get all the details, and you have a much more flexible API to provide as much or as little information as you have. Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Link down reasons 2020-05-27 21:38 ` Link down reasons Andrew Lunn @ 2020-05-28 5:56 ` Amit Cohen 2020-05-28 9:12 ` Petr Machata 0 siblings, 1 reply; 11+ messages in thread From: Amit Cohen @ 2020-05-28 5:56 UTC (permalink / raw) To: Andrew Lunn; +Cc: mlxsw, netdev, o.rempel Andrew Lunn <andrew@lunn.ch> writes: >On Wed, May 27, 2020 at 03:41:22PM +0000, Amit Cohen wrote: >> Hi Andrew, >> >> We are planning to send a set that exposes link-down reason in ethtool. >> >> It seems that the ability of your set “Ethernet cable test support” >> can be integrated with link-down reason. >> >> >> >> The idea is to expose reason and subreason (if there is): >> >> $ ethtool ethX >> >> … >> >> Link detected: no (No cable) // No sub reason >> >> >> >> $ ethtool ethY >> >> Link detected: no (Autoneg failure, No partner detected) >> >> >> >> Currently we have reason “cable issue” and subreasons “unsupported >> cable” and “shorted cable”. >> >> The mechanism of cable test can be integrated and allow us report “cable issue” >> reason and “shorted cable” subreason. > >Hi Amit > >I don't really see them being combinable. First off, your API seems too limiting. How do you say which pair is broken, or at what distance? What about open cable, as opposed to shorted cable? > >So i would suggest: > >Link detected: no (cable issue) > >And then recommend the user uses ethtool --cable-test to get all the details, and you have a much more flexible API to provide as much or as little information as you have. > > Andrew Thanks! Link-down reason has to consider cable-test or not? In order to report "cable issue", we assume that the driver implemented link-down reason in addition to cable-test? I'm asking about PHY driver for example that implemented cable-test and not link-down reason, so according to cable-test we should report "cable issue" as a link-down reason or do not expose reason here? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Link down reasons 2020-05-28 5:56 ` Amit Cohen @ 2020-05-28 9:12 ` Petr Machata 2020-05-28 15:40 ` Andrew Lunn 0 siblings, 1 reply; 11+ messages in thread From: Petr Machata @ 2020-05-28 9:12 UTC (permalink / raw) To: Amit Cohen; +Cc: Andrew Lunn, mlxsw, netdev, o.rempel Amit Cohen <amitc@mellanox.com> writes: > Andrew Lunn <andrew@lunn.ch> writes: > >>On Wed, May 27, 2020 at 03:41:22PM +0000, Amit Cohen wrote: >>> Hi Andrew, >>> >>> We are planning to send a set that exposes link-down reason in ethtool. >>> >>> It seems that the ability of your set “Ethernet cable test support” >>> can be integrated with link-down reason. >>> >>> >>> >>> The idea is to expose reason and subreason (if there is): >>> >>> $ ethtool ethX >>> >>> … >>> >>> Link detected: no (No cable) // No sub reason >>> >>> >>> >>> $ ethtool ethY >>> >>> Link detected: no (Autoneg failure, No partner detected) >>> >>> >>> >>> Currently we have reason “cable issue” and subreasons “unsupported >>> cable” and “shorted cable”. >>> >>> The mechanism of cable test can be integrated and allow us report “cable issue” >>> reason and “shorted cable” subreason. >> >>I don't really see them being combinable. First off, your API seems >>too limiting. How do you say which pair is broken, or at what >>distance? What about open cable, as opposed to shorted cable? Under the proposed API, open cable and shorted cable would be two different subreasons of "cable issue" reason. >>So i would suggest: >> >>Link detected: no (cable issue) >> >>And then recommend the user uses ethtool --cable-test to get all the >>details, and you have a much more flexible API to provide as much or >>as little information as you have. Yeah, the API that we are thinking of makes this possible. Andrew, pardon my ignorance in these matters, can a PHY driver in general determine that the issue is with the cable, even without running the fairly expensive cable test? I.e. is it reasonable to expect that the driver can tell us, yeah, the cable is baked (=> cable issue), but please run the cable test to learn the details? Ideally we would just use the cable test when trying to determine the link-down reason, but the peculiar reporting mechanism and the fact that it is a lengthy operation prevent this. > Link-down reason has to consider cable-test or not? In order to report > "cable issue", we assume that the driver implemented link-down reason > in addition to cable-test? Note that the "driver" here can refer both to a MAC driver as well as a PHY driver. Ethtool can in principle do a cascade of callbacks to different subsystems when trying to figure out why the link is down. > I'm asking about PHY driver for example that implemented cable-test > and not link-down reason, so according to cable-test we should report > "cable issue" as a link-down reason or do not expose reason here? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Link down reasons 2020-05-28 9:12 ` Petr Machata @ 2020-05-28 15:40 ` Andrew Lunn 2020-05-28 16:54 ` Petr Machata 0 siblings, 1 reply; 11+ messages in thread From: Andrew Lunn @ 2020-05-28 15:40 UTC (permalink / raw) To: Petr Machata; +Cc: Amit Cohen, mlxsw, netdev, o.rempel > Andrew, pardon my ignorance in these matters, can a PHY driver in > general determine that the issue is with the cable, even without running > the fairly expensive cable test? No. To diagnose a problem, you need the link to be idle. If the link peer is sending frames, they interfere with TDR. So all the cable testing i've seen first manipulates the auto-negotiation to make the link peer go quiet. That takes 1 1/2 seconds. There are some optimizations possible, e.g. if the cable is so broken it never establishes link, you can skip this. But Ethernet tends to be robust, it drops back to 100Mbps only using two pairs if one of the four pairs is broken, for example. Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Link down reasons 2020-05-28 15:40 ` Andrew Lunn @ 2020-05-28 16:54 ` Petr Machata 2020-05-28 17:17 ` Michal Kubecek 2020-05-28 18:37 ` Andrew Lunn 0 siblings, 2 replies; 11+ messages in thread From: Petr Machata @ 2020-05-28 16:54 UTC (permalink / raw) To: Andrew Lunn; +Cc: Amit Cohen, mlxsw, netdev, o.rempel Andrew Lunn <andrew@lunn.ch> writes: >> Andrew, pardon my ignorance in these matters, can a PHY driver in >> general determine that the issue is with the cable, even without running >> the fairly expensive cable test? > > No. To diagnose a problem, you need the link to be idle. If the link > peer is sending frames, they interfere with TDR. So all the cable > testing i've seen first manipulates the auto-negotiation to make the > link peer go quiet. That takes 1 1/2 seconds. There are some > optimizations possible, e.g. if the cable is so broken it never > establishes link, you can skip this. But Ethernet tends to be robust, > it drops back to 100Mbps only using two pairs if one of the four pairs > is broken, for example. OK, thanks. I suspect our FW is doing this behind the scenes, because it can report a shorted cable. In another e-mail you suggested this: Link detected: no (cable issue) But if the link just silently falls back to 100Mbps, there would never be an opportunity for phy to actually report a down reason. So there probably is no way for the phy layer to make use of this particular down reason. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Link down reasons 2020-05-28 16:54 ` Petr Machata @ 2020-05-28 17:17 ` Michal Kubecek 2020-05-28 18:37 ` Andrew Lunn 1 sibling, 0 replies; 11+ messages in thread From: Michal Kubecek @ 2020-05-28 17:17 UTC (permalink / raw) To: netdev; +Cc: Petr Machata, Andrew Lunn, Amit Cohen, mlxsw, o.rempel On Thu, May 28, 2020 at 06:54:24PM +0200, Petr Machata wrote: > In another e-mail you suggested this: > > Link detected: no (cable issue) > > But if the link just silently falls back to 100Mbps, there would never > be an opportunity for phy to actually report a down reason. So there > probably is no way for the phy layer to make use of this particular > down reason. Perhaps we could use more general name than "link down reason", e.g. "extended state", and it could be reported even if the link is still up (if there is something to report). Michal ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Link down reasons 2020-05-28 16:54 ` Petr Machata 2020-05-28 17:17 ` Michal Kubecek @ 2020-05-28 18:37 ` Andrew Lunn 2020-05-29 9:03 ` Petr Machata 1 sibling, 1 reply; 11+ messages in thread From: Andrew Lunn @ 2020-05-28 18:37 UTC (permalink / raw) To: Petr Machata; +Cc: Amit Cohen, mlxsw, netdev, o.rempel On Thu, May 28, 2020 at 06:54:24PM +0200, Petr Machata wrote: > > Andrew Lunn <andrew@lunn.ch> writes: > > >> Andrew, pardon my ignorance in these matters, can a PHY driver in > >> general determine that the issue is with the cable, even without running > >> the fairly expensive cable test? > > > > No. To diagnose a problem, you need the link to be idle. If the link > > peer is sending frames, they interfere with TDR. So all the cable > > testing i've seen first manipulates the auto-negotiation to make the > > link peer go quiet. That takes 1 1/2 seconds. There are some > > optimizations possible, e.g. if the cable is so broken it never > > establishes link, you can skip this. But Ethernet tends to be robust, > > it drops back to 100Mbps only using two pairs if one of the four pairs > > is broken, for example. > > OK, thanks. I suspect our FW is doing this behind the scenes, because it > can report a shorted cable. > > In another e-mail you suggested this: > > Link detected: no (cable issue) > > But if the link just silently falls back to 100Mbps, there would never > be an opportunity for phy to actually report a down reason. So there > probably is no way for the phy layer to make use of this particular > down reason. It is called downshift. And we have support for it in the phylib core, if the PHY has the needed vendor register. https://elixir.bootlin.com/linux/v5.7-rc7/source/drivers/net/phy/phy-core.c#L341 https://elixir.bootlin.com/linux/v5.7-rc7/source/drivers/net/phy/phy.c#L95 There are also standard phylib/ethtool ways to configure it, how many times the PHY should try to establish a 1G link before downshifting to 100M. So in theory we could report: Link detected: yes (downshifted) Assuming your proposed API support a reason why it is up, not just a reason why it is down? Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Link down reasons 2020-05-28 18:37 ` Andrew Lunn @ 2020-05-29 9:03 ` Petr Machata 0 siblings, 0 replies; 11+ messages in thread From: Petr Machata @ 2020-05-29 9:03 UTC (permalink / raw) To: Andrew Lunn; +Cc: Amit Cohen, mlxsw, netdev, o.rempel, Michal Kubecek Andrew Lunn <andrew@lunn.ch> writes: > It is called downshift. And we have support for it in the phylib core, > if the PHY has the needed vendor register. > > https://elixir.bootlin.com/linux/v5.7-rc7/source/drivers/net/phy/phy-core.c#L341 > https://elixir.bootlin.com/linux/v5.7-rc7/source/drivers/net/phy/phy.c#L95 Thanks for the references. > So in theory we could report: > > Link detected: yes (downshifted) > > Assuming your proposed API support a reason why it is up, not just a > reason why it is down? Michal Kubecek <mkubecek@suse.cz> writes: > Perhaps we could use more general name than "link down reason", e.g. > "extended state", and it could be reported even if the link is still up All right, that makes sense to me. Let's make it extended state. Thanks! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Link down reasons [not found] <AM0PR0502MB38261D4F4F7A3BB5E0FDCD10D7B10@AM0PR0502MB3826.eurprd05.prod.outlook.com> 2020-05-27 21:38 ` Link down reasons Andrew Lunn @ 2020-05-28 8:40 ` Oleksij Rempel 2020-05-28 9:22 ` Petr Machata 1 sibling, 1 reply; 11+ messages in thread From: Oleksij Rempel @ 2020-05-28 8:40 UTC (permalink / raw) To: Amit Cohen; +Cc: andrew, mlxsw, netdev [-- Attachment #1: Type: text/plain, Size: 1736 bytes --] On Wed, May 27, 2020 at 03:41:22PM +0000, Amit Cohen wrote: > Hi Andrew, > We are planning to send a set that exposes link-down reason in ethtool. > It seems that the ability of your set "Ethernet cable test support" can be integrated with link-down reason. > > The idea is to expose reason and subreason (if there is): > $ ethtool ethX > ... > Link detected: no (No cable) // No sub reason > > $ ethtool ethY > Link detected: no (Autoneg failure, No partner detected) > > Currently we have reason "cable issue" and subreasons "unsupported cable" and "shorted cable". > The mechanism of cable test can be integrated and allow us report "cable issue" reason and "shorted cable" subreason. > > However, the way the kernel reports the results of the cable test (cable-test, some seconds, get-result) may be problematic with link-down reason concept. > We can ignore this information when reporting link-down reason, or report only if testing cable have a result, > or maybe start testing cable and meanwhile report something like "discovery in progress". But all the options are not perfect. > > We would like to know your opinion about it. I would add some more reasons: - master slave resolution issues: both link partners are master or slave. - signal quality drop. In this case driver should be extended to notify the system if SQI is under some configurable limit. Regards, Oleksij -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Link down reasons 2020-05-28 8:40 ` Oleksij Rempel @ 2020-05-28 9:22 ` Petr Machata 2020-05-28 10:28 ` Oleksij Rempel 0 siblings, 1 reply; 11+ messages in thread From: Petr Machata @ 2020-05-28 9:22 UTC (permalink / raw) To: Oleksij Rempel; +Cc: Amit Cohen, andrew, mlxsw, netdev Oleksij Rempel <o.rempel@pengutronix.de> writes: > I would add some more reasons: > - master slave resolution issues: both link partners are master or > slave. I guess we should send the RFC, so that we can talk particulars. We currently don't have anything like master/slave mismatch in the API, but that's just because our FW does not report this. The idea is that if MAC and/or PHY driver can't express some fail using the existing codes, it creates a new one. > - signal quality drop. In this case driver should be extended to notify > the system if SQI is under some configurable limit. As SQI goes down, will the PHY driver eventually shut down the port? Because if yes, that's exactly the situation when it would later report, yeah, the link is down because SQI was rubbish. In the proposed API, we would model this as "signal integrity issue", with a possible subreason of "low SQI", or something along those lines. E.g., mlxsw can report module temperatures. But whether the port goes down is a separate mechanism. So when a port is down, the driver can tell you, yeah, it is down, because it was overheated. And separate from that you can check the module temperatures. SQI might be a similar issue. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Link down reasons 2020-05-28 9:22 ` Petr Machata @ 2020-05-28 10:28 ` Oleksij Rempel 0 siblings, 0 replies; 11+ messages in thread From: Oleksij Rempel @ 2020-05-28 10:28 UTC (permalink / raw) To: Petr Machata; +Cc: Amit Cohen, andrew, mlxsw, netdev On Thu, May 28, 2020 at 11:22:47AM +0200, Petr Machata wrote: > > Oleksij Rempel <o.rempel@pengutronix.de> writes: > > > I would add some more reasons: > > - master slave resolution issues: both link partners are master or > > slave. > > I guess we should send the RFC, so that we can talk particulars. We > currently don't have anything like master/slave mismatch in the API, but > that's just because our FW does not report this. The idea is that if MAC > and/or PHY driver can't express some fail using the existing codes, it > creates a new one. ok > > - signal quality drop. In this case driver should be extended to notify > > the system if SQI is under some configurable limit. > > As SQI goes down, will the PHY driver eventually shut down the port? Not in current implementation. But it is possible at least with nxp_tja11xx PHY. > Because if yes, that's exactly the situation when it would later report, > yeah, the link is down because SQI was rubbish. In the proposed API, we > would model this as "signal integrity issue", with a possible subreason > of "low SQI", or something along those lines. > > E.g., mlxsw can report module temperatures. But whether the port goes > down is a separate mechanism. So when a port is down, the driver can > tell you, yeah, it is down, because it was overheated. And separate from > that you can check the module temperatures. SQI might be a similar > issue. nxp_tja11xx can go down or can't go up on under vlotage or over temerature error. So, I assume this are two more possible link-down reasons. -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-05-29 9:03 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <AM0PR0502MB38261D4F4F7A3BB5E0FDCD10D7B10@AM0PR0502MB3826.eurprd05.prod.outlook.com> 2020-05-27 21:38 ` Link down reasons Andrew Lunn 2020-05-28 5:56 ` Amit Cohen 2020-05-28 9:12 ` Petr Machata 2020-05-28 15:40 ` Andrew Lunn 2020-05-28 16:54 ` Petr Machata 2020-05-28 17:17 ` Michal Kubecek 2020-05-28 18:37 ` Andrew Lunn 2020-05-29 9:03 ` Petr Machata 2020-05-28 8:40 ` Oleksij Rempel 2020-05-28 9:22 ` Petr Machata 2020-05-28 10:28 ` Oleksij Rempel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.