netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* realtek PHY commit bbc4d71d63549 causes regression
@ 2020-10-17 14:20 Ard Biesheuvel
  2020-10-17 14:44 ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-17 14:20 UTC (permalink / raw)
  To: open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Ilias Apalodimas, Willy Liu, David S. Miller, Jakub Kicinski,
	Sasha Levin, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Masahisa Kojima

Hello all,

I just upgraded my arm64 SynQuacer box to 5.8.16 and lost all network
connectivity. This box has a on-SoC socionext 'netsec' network
controller wired to a Realtek 80211e PHY, and this was working without
problems until the following commit was merged

commit bbc4d71d63549bcd003a430de18a72a742d8c91e
Author: Willy Liu <willy.liu@realtek.com>
Date:   Tue Sep 29 10:10:49 2020 +0800

net: phy: realtek: fix rtl8211e rx/tx delay config

It was backported to v5.8 as b9f0dcfbfc07719be7cc732cda4e609280704605

The commit log in question is a bit vague, as it does not explain what
is wrong with the commit it aims to fix, or what the changes are meant
to achieve.

Since it breaks h/w in the field, can we please revert it? (and
backport the revert?)

Regards,
Ard.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 14:20 realtek PHY commit bbc4d71d63549 causes regression Ard Biesheuvel
@ 2020-10-17 14:44 ` Andrew Lunn
  2020-10-17 14:46   ` Ard Biesheuvel
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-10-17 14:44 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Ilias Apalodimas, Willy Liu, David S. Miller, Jakub Kicinski,
	Sasha Levin, Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sat, Oct 17, 2020 at 04:20:36PM +0200, Ard Biesheuvel wrote:
> Hello all,
> 
> I just upgraded my arm64 SynQuacer box to 5.8.16 and lost all network
> connectivity.

Hi Ard

Please could you point me at the DT files.

> This box has a on-SoC socionext 'netsec' network controller wired to
> a Realtek 80211e PHY, and this was working without problems until
> the following commit was merged

It could be this fix has uncovered a bug in the DT file. Before this
fix, if there is an phy-mode property in DT, it could of been ignored.
Now the phy-handle property is correctly implemented. So it could be
the DT has the wrong value, e.g. it has rgmii-rxid when maybe it
should have rgmii-id.

       Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 14:44 ` Andrew Lunn
@ 2020-10-17 14:46   ` Ard Biesheuvel
  2020-10-17 15:11     ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-17 14:46 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Ilias Apalodimas, Willy Liu, David S. Miller, Jakub Kicinski,
	Sasha Levin, Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sat, 17 Oct 2020 at 16:44, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sat, Oct 17, 2020 at 04:20:36PM +0200, Ard Biesheuvel wrote:
> > Hello all,
> >
> > I just upgraded my arm64 SynQuacer box to 5.8.16 and lost all network
> > connectivity.
>
> Hi Ard
>
> Please could you point me at the DT files.
>
> > This box has a on-SoC socionext 'netsec' network controller wired to
> > a Realtek 80211e PHY, and this was working without problems until
> > the following commit was merged
>
> It could be this fix has uncovered a bug in the DT file. Before this
> fix, if there is an phy-mode property in DT, it could of been ignored.
> Now the phy-handle property is correctly implemented. So it could be
> the DT has the wrong value, e.g. it has rgmii-rxid when maybe it
> should have rgmii-id.
>

This is an ACPI system. The phy-mode device property is set to 'rgmii'

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 14:46   ` Ard Biesheuvel
@ 2020-10-17 15:11     ` Andrew Lunn
  2020-10-17 15:18       ` Ard Biesheuvel
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-10-17 15:11 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Ilias Apalodimas, Willy Liu, David S. Miller, Jakub Kicinski,
	Sasha Levin, Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sat, Oct 17, 2020 at 04:46:23PM +0200, Ard Biesheuvel wrote:
> On Sat, 17 Oct 2020 at 16:44, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Sat, Oct 17, 2020 at 04:20:36PM +0200, Ard Biesheuvel wrote:
> > > Hello all,
> > >
> > > I just upgraded my arm64 SynQuacer box to 5.8.16 and lost all network
> > > connectivity.
> >
> > Hi Ard
> >
> > Please could you point me at the DT files.
> >
> > > This box has a on-SoC socionext 'netsec' network controller wired to
> > > a Realtek 80211e PHY, and this was working without problems until
> > > the following commit was merged
> >
> > It could be this fix has uncovered a bug in the DT file. Before this
> > fix, if there is an phy-mode property in DT, it could of been ignored.
> > Now the phy-handle property is correctly implemented. So it could be
> > the DT has the wrong value, e.g. it has rgmii-rxid when maybe it
> > should have rgmii-id.
> >
> 
> This is an ACPI system. The phy-mode device property is set to 'rgmii'

Hi Ard

Please try rgmii-id.

Also, do you have the schematic? Can you see if there are any
strapping resistors? It could be, there are strapping resistors to put
it into rgmii-id. Now that the phy-mode properties is respected, the
reset defaults are being over-written to rgmii, which breaks the link.
Or the bootloader has already set the PHY mode to rgmii-id.

You can also use '' as the phy-mode, which results in
PHY_INTERFACE_MODE_NA, which effectively means, don't touch the PHY
mode, something else has already set it up. This might actually be the
correct way to go for ACPI. In the DT world, we tend to assume the
bootloader has done the absolute minimum and Linux should configure
everything. The ACPI takes the opposite view, the firmware will do the
basic hardware configuration, and Linux should not touch it, or ask
ACPI to modify it.

      Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 15:11     ` Andrew Lunn
@ 2020-10-17 15:18       ` Ard Biesheuvel
  2020-10-17 16:14         ` Ilias Apalodimas
  0 siblings, 1 reply; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-17 15:18 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Ilias Apalodimas, Willy Liu, David S. Miller, Jakub Kicinski,
	Sasha Levin, Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sat, 17 Oct 2020 at 17:11, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sat, Oct 17, 2020 at 04:46:23PM +0200, Ard Biesheuvel wrote:
> > On Sat, 17 Oct 2020 at 16:44, Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > On Sat, Oct 17, 2020 at 04:20:36PM +0200, Ard Biesheuvel wrote:
> > > > Hello all,
> > > >
> > > > I just upgraded my arm64 SynQuacer box to 5.8.16 and lost all network
> > > > connectivity.
> > >
> > > Hi Ard
> > >
> > > Please could you point me at the DT files.
> > >
> > > > This box has a on-SoC socionext 'netsec' network controller wired to
> > > > a Realtek 80211e PHY, and this was working without problems until
> > > > the following commit was merged
> > >
> > > It could be this fix has uncovered a bug in the DT file. Before this
> > > fix, if there is an phy-mode property in DT, it could of been ignored.
> > > Now the phy-handle property is correctly implemented. So it could be
> > > the DT has the wrong value, e.g. it has rgmii-rxid when maybe it
> > > should have rgmii-id.
> > >
> >
> > This is an ACPI system. The phy-mode device property is set to 'rgmii'
>
> Hi Ard
>
> Please try rgmii-id.
>
> Also, do you have the schematic? Can you see if there are any
> strapping resistors? It could be, there are strapping resistors to put
> it into rgmii-id. Now that the phy-mode properties is respected, the
> reset defaults are being over-written to rgmii, which breaks the link.
> Or the bootloader has already set the PHY mode to rgmii-id.
>
> You can also use '' as the phy-mode, which results in
> PHY_INTERFACE_MODE_NA, which effectively means, don't touch the PHY
> mode, something else has already set it up. This might actually be the
> correct way to go for ACPI. In the DT world, we tend to assume the
> bootloader has done the absolute minimum and Linux should configure
> everything. The ACPI takes the opposite view, the firmware will do the
> basic hardware configuration, and Linux should not touch it, or ask
> ACPI to modify it.
>

Indeed, the firmware should have set this up. This would mean we could
do this in the driver: it currently uses

priv->phy_interface = device_get_phy_mode(&pdev->dev);

Can we just assign that to PHY_INTERFACE_MODE_NA instead?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 15:18       ` Ard Biesheuvel
@ 2020-10-17 16:14         ` Ilias Apalodimas
  2020-10-17 16:17           ` Ard Biesheuvel
  2020-10-17 18:01           ` Andrew Lunn
  0 siblings, 2 replies; 41+ messages in thread
From: Ilias Apalodimas @ 2020-10-17 16:14 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andrew Lunn, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

Hi Ard, 

On Sat, Oct 17, 2020 at 05:18:16PM +0200, Ard Biesheuvel wrote:
> On Sat, 17 Oct 2020 at 17:11, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Sat, Oct 17, 2020 at 04:46:23PM +0200, Ard Biesheuvel wrote:
> > > On Sat, 17 Oct 2020 at 16:44, Andrew Lunn <andrew@lunn.ch> wrote:
> > > >
> > > > On Sat, Oct 17, 2020 at 04:20:36PM +0200, Ard Biesheuvel wrote:
> > > > > Hello all,
> > > > >
> > > > > I just upgraded my arm64 SynQuacer box to 5.8.16 and lost all network
> > > > > connectivity.
> > > >
> > > > Hi Ard
> > > >
> > > > Please could you point me at the DT files.
> > > >
> > > > > This box has a on-SoC socionext 'netsec' network controller wired to
> > > > > a Realtek 80211e PHY, and this was working without problems until
> > > > > the following commit was merged
> > > >
> > > > It could be this fix has uncovered a bug in the DT file. Before this
> > > > fix, if there is an phy-mode property in DT, it could of been ignored.
> > > > Now the phy-handle property is correctly implemented. So it could be
> > > > the DT has the wrong value, e.g. it has rgmii-rxid when maybe it
> > > > should have rgmii-id.
> > > >
> > >
> > > This is an ACPI system. The phy-mode device property is set to 'rgmii'
> >
> > Hi Ard
> >
> > Please try rgmii-id.
> >
> > Also, do you have the schematic? Can you see if there are any
> > strapping resistors? It could be, there are strapping resistors to put
> > it into rgmii-id. Now that the phy-mode properties is respected, the
> > reset defaults are being over-written to rgmii, which breaks the link.
> > Or the bootloader has already set the PHY mode to rgmii-id.
> >
> > You can also use '' as the phy-mode, which results in
> > PHY_INTERFACE_MODE_NA, which effectively means, don't touch the PHY
> > mode, something else has already set it up. This might actually be the
> > correct way to go for ACPI. In the DT world, we tend to assume the
> > bootloader has done the absolute minimum and Linux should configure
> > everything. The ACPI takes the opposite view, the firmware will do the
> > basic hardware configuration, and Linux should not touch it, or ask
> > ACPI to modify it.
> >
> 
> Indeed, the firmware should have set this up.

Would EDK2 take care of the RGMII Rx/Tx delays even when configured to 
use a DT instead of ACPI?

> This would mean we could > do this in the driver: it currently uses
> 
> priv->phy_interface = device_get_phy_mode(&pdev->dev);
> 
> Can we just assign that to PHY_INTERFACE_MODE_NA instead?

Thanks
/Ilias

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 16:14         ` Ilias Apalodimas
@ 2020-10-17 16:17           ` Ard Biesheuvel
  2020-10-17 16:21             ` Ilias Apalodimas
  2020-10-17 18:04             ` Andrew Lunn
  2020-10-17 18:01           ` Andrew Lunn
  1 sibling, 2 replies; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-17 16:17 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Andrew Lunn, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sat, 17 Oct 2020 at 18:14, Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>
> Hi Ard,
>
> On Sat, Oct 17, 2020 at 05:18:16PM +0200, Ard Biesheuvel wrote:
> > On Sat, 17 Oct 2020 at 17:11, Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > On Sat, Oct 17, 2020 at 04:46:23PM +0200, Ard Biesheuvel wrote:
> > > > On Sat, 17 Oct 2020 at 16:44, Andrew Lunn <andrew@lunn.ch> wrote:
> > > > >
> > > > > On Sat, Oct 17, 2020 at 04:20:36PM +0200, Ard Biesheuvel wrote:
> > > > > > Hello all,
> > > > > >
> > > > > > I just upgraded my arm64 SynQuacer box to 5.8.16 and lost all network
> > > > > > connectivity.
> > > > >
> > > > > Hi Ard
> > > > >
> > > > > Please could you point me at the DT files.
> > > > >
> > > > > > This box has a on-SoC socionext 'netsec' network controller wired to
> > > > > > a Realtek 80211e PHY, and this was working without problems until
> > > > > > the following commit was merged
> > > > >
> > > > > It could be this fix has uncovered a bug in the DT file. Before this
> > > > > fix, if there is an phy-mode property in DT, it could of been ignored.
> > > > > Now the phy-handle property is correctly implemented. So it could be
> > > > > the DT has the wrong value, e.g. it has rgmii-rxid when maybe it
> > > > > should have rgmii-id.
> > > > >
> > > >
> > > > This is an ACPI system. The phy-mode device property is set to 'rgmii'
> > >
> > > Hi Ard
> > >
> > > Please try rgmii-id.
> > >
> > > Also, do you have the schematic? Can you see if there are any
> > > strapping resistors? It could be, there are strapping resistors to put
> > > it into rgmii-id. Now that the phy-mode properties is respected, the
> > > reset defaults are being over-written to rgmii, which breaks the link.
> > > Or the bootloader has already set the PHY mode to rgmii-id.
> > >
> > > You can also use '' as the phy-mode, which results in
> > > PHY_INTERFACE_MODE_NA, which effectively means, don't touch the PHY
> > > mode, something else has already set it up. This might actually be the
> > > correct way to go for ACPI. In the DT world, we tend to assume the
> > > bootloader has done the absolute minimum and Linux should configure
> > > everything. The ACPI takes the opposite view, the firmware will do the
> > > basic hardware configuration, and Linux should not touch it, or ask
> > > ACPI to modify it.
> > >
> >
> > Indeed, the firmware should have set this up.
>
> Would EDK2 take care of the RGMII Rx/Tx delays even when configured to
> use a DT instead of ACPI?
>

Yes. The network driver has no awareness whatsoever which h/w
description is being provided to the OS.


> > This would mean we could > do this in the driver: it currently uses
> >
> > priv->phy_interface = device_get_phy_mode(&pdev->dev);
> >
> > Can we just assign that to PHY_INTERFACE_MODE_NA instead?
>

I have tried this, and it seems to fix the issue. I will send out a
patch against the netsec driver.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 16:17           ` Ard Biesheuvel
@ 2020-10-17 16:21             ` Ilias Apalodimas
  2020-10-17 18:04             ` Andrew Lunn
  1 sibling, 0 replies; 41+ messages in thread
From: Ilias Apalodimas @ 2020-10-17 16:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andrew Lunn, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

Hi Ard,

[...]
> > > > You can also use '' as the phy-mode, which results in
> > > > PHY_INTERFACE_MODE_NA, which effectively means, don't touch the PHY
> > > > mode, something else has already set it up. This might actually be the
> > > > correct way to go for ACPI. In the DT world, we tend to assume the
> > > > bootloader has done the absolute minimum and Linux should configure
> > > > everything. The ACPI takes the opposite view, the firmware will do the
> > > > basic hardware configuration, and Linux should not touch it, or ask
> > > > ACPI to modify it.
> > > >
> > >
> > > Indeed, the firmware should have set this up.
> >
> > Would EDK2 take care of the RGMII Rx/Tx delays even when configured to
> > use a DT instead of ACPI?
> >
>
> Yes. The network driver has no awareness whatsoever which h/w
> description is being provided to the OS.
>
>
> > > This would mean we could > do this in the driver: it currently uses
> > >
> > > priv->phy_interface = device_get_phy_mode(&pdev->dev);
> > >
> > > Can we just assign that to PHY_INTERFACE_MODE_NA instead?
> >
>
> I have tried this, and it seems to fix the issue. I will send out a
> patch against the netsec driver.

Great thanks!

Cheers
/Ilias

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 16:14         ` Ilias Apalodimas
  2020-10-17 16:17           ` Ard Biesheuvel
@ 2020-10-17 18:01           ` Andrew Lunn
  1 sibling, 0 replies; 41+ messages in thread
From: Andrew Lunn @ 2020-10-17 18:01 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Ard Biesheuvel, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

> Would EDK2 take care of the RGMII Rx/Tx delays even when configured to 
> use a DT instead of ACPI?

But what about those users using u-boot and DT, not EDK2?

    Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 16:17           ` Ard Biesheuvel
  2020-10-17 16:21             ` Ilias Apalodimas
@ 2020-10-17 18:04             ` Andrew Lunn
  2020-10-17 18:11               ` Ard Biesheuvel
  1 sibling, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-10-17 18:04 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ilias Apalodimas, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

> I have tried this, and it seems to fix the issue. I will send out a
> patch against the netsec driver.

Please also fix the firmware so it does not pass rgmii.

If there are pure DT systems, which do require phy-mode to be used, we
will need to revert your proposed change in order to make the MAC
driver work as it should, rather than work around the broken firmware.

	Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 18:04             ` Andrew Lunn
@ 2020-10-17 18:11               ` Ard Biesheuvel
  2020-10-17 18:17                 ` Ard Biesheuvel
  2020-10-17 18:27                 ` Andrew Lunn
  0 siblings, 2 replies; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-17 18:11 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sat, 17 Oct 2020 at 20:04, Andrew Lunn <andrew@lunn.ch> wrote:
>
> > I have tried this, and it seems to fix the issue. I will send out a
> > patch against the netsec driver.
>
> Please also fix the firmware so it does not pass rgmii.
>
> If there are pure DT systems, which do require phy-mode to be used, we
> will need to revert your proposed change in order to make the MAC
> driver work as it should, rather than work around the broken firmware.
>

What do you mean by 'pure' DT system? Only EDK2 based firmware exists
for this platform, and it can be configured to boot either in DT or in
ACPI mode. In both cases, it will pass the same, incorrect PHY mode,
and in both cases, the firmware will already have configured the PHY
correctly.

So what I propose to do is drop the handling of the [mandatory]
phy-mode device property from the netsec driver (which is really only
used by this board). As we don't really need a phy-mode to begin with,
and given that firmware exists in the field that passes the wrong
value, the only option I see for passing a value here is to use a
different, *optional* DT property (force-phy-mode or
phy-mode-override) that takes the place of phy-mode. For ACPI boot,
you will just need to fix your firmware if you are using a different
PHY configuration.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 18:11               ` Ard Biesheuvel
@ 2020-10-17 18:17                 ` Ard Biesheuvel
  2020-10-17 18:27                 ` Andrew Lunn
  1 sibling, 0 replies; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-17 18:17 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sat, 17 Oct 2020 at 20:11, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Sat, 17 Oct 2020 at 20:04, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > I have tried this, and it seems to fix the issue. I will send out a
> > > patch against the netsec driver.
> >
> > Please also fix the firmware so it does not pass rgmii.
> >
> > If there are pure DT systems, which do require phy-mode to be used, we
> > will need to revert your proposed change in order to make the MAC
> > driver work as it should, rather than work around the broken firmware.
> >
>
> What do you mean by 'pure' DT system? Only EDK2 based firmware exists
> for this platform, and it can be configured to boot either in DT or in
> ACPI mode. In both cases, it will pass the same, incorrect PHY mode,
> and in both cases, the firmware will already have configured the PHY
> correctly.
>
> So what I propose to do is drop the handling of the [mandatory]
> phy-mode device property from the netsec driver (which is really only
> used by this board). As we don't really need a phy-mode to begin with,
> and given that firmware exists in the field that passes the wrong
> value, the only option I see for passing a value here is to use a
> different, *optional* DT property (force-phy-mode or
> phy-mode-override) that takes the place of phy-mode. For ACPI boot,
> you will just need to fix your firmware if you are using a different
> PHY configuration.

... or add a property that makes 'phy-mode' take effect, and default
to ignoring it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 18:11               ` Ard Biesheuvel
  2020-10-17 18:17                 ` Ard Biesheuvel
@ 2020-10-17 18:27                 ` Andrew Lunn
  2020-10-17 18:55                   ` Ard Biesheuvel
  1 sibling, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-10-17 18:27 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ilias Apalodimas, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sat, Oct 17, 2020 at 08:11:24PM +0200, Ard Biesheuvel wrote:
> On Sat, 17 Oct 2020 at 20:04, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > I have tried this, and it seems to fix the issue. I will send out a
> > > patch against the netsec driver.
> >
> > Please also fix the firmware so it does not pass rgmii.
> >
> > If there are pure DT systems, which do require phy-mode to be used, we
> > will need to revert your proposed change in order to make the MAC
> > driver work as it should, rather than work around the broken firmware.
> >
> 
> What do you mean by 'pure' DT system? Only EDK2 based firmware exists
> for this platform

Currently, only EDK2 based firmware exists. Is there anything stopping
somebody using u-boot? ACPI is aimed for server class systems, on
ARM. If anybody wants to use this SoC in am embedded setting, not
server, then they are more likely to use DT, especially when you need
a complex network, eg. an Ethernet switch. It seems like ACPI is too
simple to support complex network hardware found in some embedded
systems.

> So what I propose to do is drop the handling of the [mandatory]
> phy-mode device property from the netsec driver (which is really only
> used by this board). As we don't really need a phy-mode to begin with,
> and given that firmware exists in the field that passes the wrong
> value, the only option I see for passing a value here is to use a
> different, *optional* DT property (force-phy-mode or
> phy-mode-override) that takes the place of phy-mode.

No, sorry, this is an ACPI problem, not a DT problem. I don't want to
accept DT hacks because of broken ACPI.

We have been through this before, when the Atheros PHY fixed is RGMII
delay support, and lots of platforms broke. Everybody just updated
their DT and were happy. I see no reason why ACPI should be different.

      Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 18:27                 ` Andrew Lunn
@ 2020-10-17 18:55                   ` Ard Biesheuvel
  2020-10-17 19:49                     ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-17 18:55 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sat, 17 Oct 2020 at 20:27, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sat, Oct 17, 2020 at 08:11:24PM +0200, Ard Biesheuvel wrote:
> > On Sat, 17 Oct 2020 at 20:04, Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > I have tried this, and it seems to fix the issue. I will send out a
> > > > patch against the netsec driver.
> > >
> > > Please also fix the firmware so it does not pass rgmii.
> > >
> > > If there are pure DT systems, which do require phy-mode to be used, we
> > > will need to revert your proposed change in order to make the MAC
> > > driver work as it should, rather than work around the broken firmware.
> > >
> >
> > What do you mean by 'pure' DT system? Only EDK2 based firmware exists
> > for this platform
>
> Currently, only EDK2 based firmware exists. Is there anything stopping
> somebody using u-boot? ACPI is aimed for server class systems, on
> ARM. If anybody wants to use this SoC in am embedded setting, not
> server, then they are more likely to use DT, especially when you need
> a complex network, eg. an Ethernet switch. It seems like ACPI is too
> simple to support complex network hardware found in some embedded
> systems.
>
> > So what I propose to do is drop the handling of the [mandatory]
> > phy-mode device property from the netsec driver (which is really only
> > used by this board). As we don't really need a phy-mode to begin with,
> > and given that firmware exists in the field that passes the wrong
> > value, the only option I see for passing a value here is to use a
> > different, *optional* DT property (force-phy-mode or
> > phy-mode-override) that takes the place of phy-mode.
>
> No, sorry, this is an ACPI problem, not a DT problem. I don't want to
> accept DT hacks because of broken ACPI.
>
> We have been through this before, when the Atheros PHY fixed is RGMII
> delay support, and lots of platforms broke. Everybody just updated
> their DT and were happy. I see no reason why ACPI should be different.
>

I don't understand why you insist on framing this as a ACPI vs DT
issue. AFAICT, the only meaningful distinction here is between
firmware that configures the PHY and firmware that doesn't.

Broken firmware exists for this platform, and it provides incorrect DT
data as well as incorrect ACPI data, but it does configure the PHY.
Fixing that firmware involves fixing both, and it is easily updatable
on this platform, so it is almost as simple as dropping a new DT file
in your /boot partition. But you still need to do that.

So we can fix this firmware by just setting phy-mode to the empty string, right?

So the only question is how we deal with broken firmware. Again, I
don't see much point in distinguishing between DT and ACPI here, as in
both cases, the same action is required on the part of the user to
change something on their system before they can upgrade their kernel.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 18:55                   ` Ard Biesheuvel
@ 2020-10-17 19:49                     ` Andrew Lunn
  2020-10-17 22:19                       ` Ard Biesheuvel
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-10-17 19:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ilias Apalodimas, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

> So we can fix this firmware by just setting phy-mode to the empty
> string, right?

I've never actually tried it, but i think so. There are no DT files
actually doing that, so you really do need to test it and see. And
there might be some differences between device_get_phy_mode() and
of_get_phy_mode().

	Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 19:49                     ` Andrew Lunn
@ 2020-10-17 22:19                       ` Ard Biesheuvel
  2020-10-17 23:02                         ` Andrew Lunn
  2020-10-18 14:20                         ` Masami Hiramatsu
  0 siblings, 2 replies; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-17 22:19 UTC (permalink / raw)
  To: Andrew Lunn, Daniel Thompson, Sumit Garg, Alex Bennée,
	Masami Hiramatsu, steve
  Cc: Ilias Apalodimas, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

(cc'ing some folks who may care about functional networking on SynQuacer)

On Sat, 17 Oct 2020 at 21:49, Andrew Lunn <andrew@lunn.ch> wrote:
>
> > So we can fix this firmware by just setting phy-mode to the empty
> > string, right?
>
> I've never actually tried it, but i think so. There are no DT files
> actually doing that, so you really do need to test it and see. And
> there might be some differences between device_get_phy_mode() and
> of_get_phy_mode().
>

Yes, that works fine. Fixed now in the latest firmware build [0]

But that still leaves the question whether and how to work around this
for units in the field. Ignoring the PHY mode in the driver would
help, as all known hardware ships with firmware that configures the
PHY with the correct settings, but we will lose the ability to use
other PHY modes in the future, in case the SoC is ever used with DT
based minimal firmware that does not configure networking.

Since ACPI implies rich firmware, we could make ACPI probing of the
driver ignore the phy-mode setting in the DSDT. But if we don't do the
same for DT, it would mean DT users are forced to upgrade their
firmware, and hopefully do so before upgrading to a kernel that breaks
networking. (These boxes are often used headless, so this can be
annoying)


[0] http://snapshots.linaro.org/components/kernel/leg-96boards-developerbox-edk2/83/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 22:19                       ` Ard Biesheuvel
@ 2020-10-17 23:02                         ` Andrew Lunn
  2020-10-18 10:35                           ` Ard Biesheuvel
  2020-10-18 14:20                         ` Masami Hiramatsu
  1 sibling, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-10-17 23:02 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Daniel Thompson, Sumit Garg, Alex Bennée, Masami Hiramatsu,
	steve, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sun, Oct 18, 2020 at 12:19:25AM +0200, Ard Biesheuvel wrote:
> (cc'ing some folks who may care about functional networking on SynQuacer)
> 
> On Sat, 17 Oct 2020 at 21:49, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > So we can fix this firmware by just setting phy-mode to the empty
> > > string, right?
> >
> > I've never actually tried it, but i think so. There are no DT files
> > actually doing that, so you really do need to test it and see. And
> > there might be some differences between device_get_phy_mode() and
> > of_get_phy_mode().
> >
> 
> Yes, that works fine. Fixed now in the latest firmware build [0]

Thanks for reporting back on that. It probably needs to be something
we always recommend for ACPI systems, either use "", or preferably no
phy-mode at all, and let the driver default to NA. ACPI and networking
is at a very early stage at the moment, since UEFA says nothing about
it. It will take a while before we figure out the best practices, and
some vendor gets something added to the ACPI specifications.

> But that still leaves the question whether and how to work around this
> for units in the field. Ignoring the PHY mode in the driver would
> help, as all known hardware ships with firmware that configures the
> PHY with the correct settings, but we will lose the ability to use
> other PHY modes in the future, in case the SoC is ever used with DT
> based minimal firmware that does not configure networking.
> 
> Since ACPI implies rich firmware, we could make ACPI probing of the
> driver ignore the phy-mode setting in the DSDT.

I'm O.K. with that, for this driver, but please add a comment in the
code about why ACPI ignores DSDT, because of older broken firmware.

> But if we don't do the same for DT, it would mean DT users are
> forced to upgrade their firmware, and hopefully do so before
> upgrading to a kernel that breaks networking.

Nothing new there. As i said, we have been through this before with
the Atheros PHY and others.

One option would be to move the DT into the kernel and fix it. Most
distributions already use the DT version shipped with the kernel, so
they would automatically get the fixed DT at the same time as the
kernel which needs the fix.

       Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 23:02                         ` Andrew Lunn
@ 2020-10-18 10:35                           ` Ard Biesheuvel
  2020-10-18 10:56                             ` Ard Biesheuvel
  2020-10-18 15:38                             ` Andrew Lunn
  0 siblings, 2 replies; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-18 10:35 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Daniel Thompson, Sumit Garg, Alex Bennée, Masami Hiramatsu,
	steve, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sun, 18 Oct 2020 at 01:02, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sun, Oct 18, 2020 at 12:19:25AM +0200, Ard Biesheuvel wrote:
> > (cc'ing some folks who may care about functional networking on SynQuacer)
> >
> > On Sat, 17 Oct 2020 at 21:49, Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > So we can fix this firmware by just setting phy-mode to the empty
> > > > string, right?
> > >
> > > I've never actually tried it, but i think so. There are no DT files
> > > actually doing that, so you really do need to test it and see. And
> > > there might be some differences between device_get_phy_mode() and
> > > of_get_phy_mode().
> > >
> >
> > Yes, that works fine. Fixed now in the latest firmware build [0]
>
> Thanks for reporting back on that. It probably needs to be something
> we always recommend for ACPI systems, either use "", or preferably no
> phy-mode at all, and let the driver default to NA. ACPI and networking
> is at a very early stage at the moment, since UEFA says nothing about
> it. It will take a while before we figure out the best practices, and
> some vendor gets something added to the ACPI specifications.
>

You mean MDIO topology, right? Every x86 PC and server in the world
uses ACPI, and networking doesn't seem to be a problem there, so it is
simply a matter of choosing the right abstraction level. I know this
is much easier to achieve when all the network controllers are on PCIe
cards, but the point remains valid: exhaustively describing the entire
SoC like we do for DT is definitely not the right choice for ACPI
systems. This also means that ACPI is simply not the right fit for all
designs, and we should push back harder against the tick-the-box
exercises that are going on to use ACPI for describing unusual designs
that are never going to boot RHEL or other ACPI-only distros anyway.

> > But that still leaves the question whether and how to work around this
> > for units in the field. Ignoring the PHY mode in the driver would
> > help, as all known hardware ships with firmware that configures the
> > PHY with the correct settings, but we will lose the ability to use
> > other PHY modes in the future, in case the SoC is ever used with DT
> > based minimal firmware that does not configure networking.
> >
> > Since ACPI implies rich firmware, we could make ACPI probing of the
> > driver ignore the phy-mode setting in the DSDT.
>
> I'm O.K. with that, for this driver, but please add a comment in the
> code about why ACPI ignores DSDT, because of older broken firmware.
>

Sure.

> > But if we don't do the same for DT, it would mean DT users are
> > forced to upgrade their firmware, and hopefully do so before
> > upgrading to a kernel that breaks networking.
>
> Nothing new there. As i said, we have been through this before with
> the Atheros PHY and others.
>
> One option would be to move the DT into the kernel and fix it. Most
> distributions already use the DT version shipped with the kernel, so
> they would automatically get the fixed DT at the same time as the
> kernel which needs the fix.
>

The DT is not a flat file that you can simply source from elsewhere -
it is constructed at boot using firmware settings and other data that
is different between systems.


I do have a question about the history here, btw. As I understand it,
before commit f81dadbcf7fd ("net: phy: realtek: Add rtl8211e rx/tx
delays config"), the phy-mode setting did not have any effect on
Realtek PHYs in the first place, right? Since otherwise, this platform
would never have worked from the beginning.

So commit f81dadbcf7fd was backported to -stable, even though it
didn't actually work, and had to be fixed in bbc4d71d63549bcd ("net:
phy: realtek: fix rtl8211e rx/tx delay config"), which is the commit
that causes the regression on these boards.

So why was commit f81dadbcf7fd sent to -stable in the first place? It
doesn't have a fixes tag or cc:stable, and given that it is not
actually correct to begin with, there seems to be little justification
for this. Surely, no platform exists in the field that requires this
commit (as it is broken) or the followup fix (since only was never
taken into account before)

In summary, I think that - even if we agree that the way forward is to
fix the firmware and make the driver do the right thing without any
quirks - these patches do not belong in -stable, and should be
reverted, unless anyone can point out a system that was working
before, got broken and was fixed again by f81dadbcf7fd (i.e., a true
regression)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-18 10:35                           ` Ard Biesheuvel
@ 2020-10-18 10:56                             ` Ard Biesheuvel
  2020-10-18 15:45                               ` Andrew Lunn
  2020-10-18 15:38                             ` Andrew Lunn
  1 sibling, 1 reply; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-18 10:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Daniel Thompson, Sumit Garg, Alex Bennée, Masami Hiramatsu,
	steve, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sun, 18 Oct 2020 at 12:35, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Sun, 18 Oct 2020 at 01:02, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Sun, Oct 18, 2020 at 12:19:25AM +0200, Ard Biesheuvel wrote:
> > > (cc'ing some folks who may care about functional networking on SynQuacer)
> > >
> > > On Sat, 17 Oct 2020 at 21:49, Andrew Lunn <andrew@lunn.ch> wrote:
> > > >
> > > > > So we can fix this firmware by just setting phy-mode to the empty
> > > > > string, right?
> > > >
> > > > I've never actually tried it, but i think so. There are no DT files
> > > > actually doing that, so you really do need to test it and see. And
> > > > there might be some differences between device_get_phy_mode() and
> > > > of_get_phy_mode().
> > > >
> > >
> > > Yes, that works fine. Fixed now in the latest firmware build [0]
> >
> > Thanks for reporting back on that. It probably needs to be something
> > we always recommend for ACPI systems, either use "", or preferably no
> > phy-mode at all, and let the driver default to NA. ACPI and networking
> > is at a very early stage at the moment, since UEFA says nothing about
> > it. It will take a while before we figure out the best practices, and
> > some vendor gets something added to the ACPI specifications.
> >
>
> You mean MDIO topology, right? Every x86 PC and server in the world
> uses ACPI, and networking doesn't seem to be a problem there, so it is
> simply a matter of choosing the right abstraction level. I know this
> is much easier to achieve when all the network controllers are on PCIe
> cards, but the point remains valid: exhaustively describing the entire
> SoC like we do for DT is definitely not the right choice for ACPI
> systems. This also means that ACPI is simply not the right fit for all
> designs, and we should push back harder against the tick-the-box
> exercises that are going on to use ACPI for describing unusual designs
> that are never going to boot RHEL or other ACPI-only distros anyway.
>
> > > But that still leaves the question whether and how to work around this
> > > for units in the field. Ignoring the PHY mode in the driver would
> > > help, as all known hardware ships with firmware that configures the
> > > PHY with the correct settings, but we will lose the ability to use
> > > other PHY modes in the future, in case the SoC is ever used with DT
> > > based minimal firmware that does not configure networking.
> > >
> > > Since ACPI implies rich firmware, we could make ACPI probing of the
> > > driver ignore the phy-mode setting in the DSDT.
> >
> > I'm O.K. with that, for this driver, but please add a comment in the
> > code about why ACPI ignores DSDT, because of older broken firmware.
> >
>
> Sure.
>
> > > But if we don't do the same for DT, it would mean DT users are
> > > forced to upgrade their firmware, and hopefully do so before
> > > upgrading to a kernel that breaks networking.
> >
> > Nothing new there. As i said, we have been through this before with
> > the Atheros PHY and others.
> >
> > One option would be to move the DT into the kernel and fix it. Most
> > distributions already use the DT version shipped with the kernel, so
> > they would automatically get the fixed DT at the same time as the
> > kernel which needs the fix.
> >
>
> The DT is not a flat file that you can simply source from elsewhere -
> it is constructed at boot using firmware settings and other data that
> is different between systems.
>
>
> I do have a question about the history here, btw. As I understand it,
> before commit f81dadbcf7fd ("net: phy: realtek: Add rtl8211e rx/tx
> delays config"), the phy-mode setting did not have any effect on
> Realtek PHYs in the first place, right? Since otherwise, this platform
> would never have worked from the beginning.
>
> So commit f81dadbcf7fd was backported to -stable, even though it
> didn't actually work, and had to be fixed in bbc4d71d63549bcd ("net:
> phy: realtek: fix rtl8211e rx/tx delay config"), which is the commit
> that causes the regression on these boards.
>
> So why was commit f81dadbcf7fd sent to -stable in the first place? It
> doesn't have a fixes tag or cc:stable, and given that it is not
> actually correct to begin with, there seems to be little justification
> for this. Surely, no platform exists in the field that requires this
> commit (as it is broken) or the followup fix (since only was never
> taken into account before)
>
> In summary, I think that - even if we agree that the way forward is to
> fix the firmware and make the driver do the right thing without any
> quirks - these patches do not belong in -stable, and should be
> reverted, unless anyone can point out a system that was working
> before, got broken and was fixed again by f81dadbcf7fd (i.e., a true
> regression)

Oops - f81dadbcf7fd was never backported, only the fix was. Apologies
for mixing that up.

However, that leaves the question why bbc4d71d63549bcd was backported,
although I understand why the discussion is a bit trickier there. But
if it did not fix a regression, only broken code that never worked in
the first place, I am not convinced it belongs in -stable.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-17 22:19                       ` Ard Biesheuvel
  2020-10-17 23:02                         ` Andrew Lunn
@ 2020-10-18 14:20                         ` Masami Hiramatsu
  1 sibling, 0 replies; 41+ messages in thread
From: Masami Hiramatsu @ 2020-10-18 14:20 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andrew Lunn, Daniel Thompson, Sumit Garg, Alex Bennée,
	steve, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sun, 18 Oct 2020 00:19:25 +0200
Ard Biesheuvel <ardb@kernel.org> wrote:

> (cc'ing some folks who may care about functional networking on SynQuacer)
> 
> On Sat, 17 Oct 2020 at 21:49, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > So we can fix this firmware by just setting phy-mode to the empty
> > > string, right?
> >
> > I've never actually tried it, but i think so. There are no DT files
> > actually doing that, so you really do need to test it and see. And
> > there might be some differences between device_get_phy_mode() and
> > of_get_phy_mode().
> >
> 
> Yes, that works fine. Fixed now in the latest firmware build [0]

Great! I've just started chasing that bug this Friday.
Thanks!

> But that still leaves the question whether and how to work around this
> for units in the field. Ignoring the PHY mode in the driver would
> help, as all known hardware ships with firmware that configures the
> PHY with the correct settings, but we will lose the ability to use
> other PHY modes in the future, in case the SoC is ever used with DT
> based minimal firmware that does not configure networking.
> 
> Since ACPI implies rich firmware, we could make ACPI probing of the
> driver ignore the phy-mode setting in the DSDT. But if we don't do the
> same for DT, it would mean DT users are forced to upgrade their
> firmware, and hopefully do so before upgrading to a kernel that breaks
> networking. (These boxes are often used headless, so this can be
> annoying)
> 
> 
> [0] http://snapshots.linaro.org/components/kernel/leg-96boards-developerbox-edk2/83/


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-18 10:35                           ` Ard Biesheuvel
  2020-10-18 10:56                             ` Ard Biesheuvel
@ 2020-10-18 15:38                             ` Andrew Lunn
  1 sibling, 0 replies; 41+ messages in thread
From: Andrew Lunn @ 2020-10-18 15:38 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Daniel Thompson, Sumit Garg, Alex Bennée, Masami Hiramatsu,
	steve, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

> > Thanks for reporting back on that. It probably needs to be something
> > we always recommend for ACPI systems, either use "", or preferably no
> > phy-mode at all, and let the driver default to NA. ACPI and networking
> > is at a very early stage at the moment, since UEFA says nothing about
> > it. It will take a while before we figure out the best practices, and
> > some vendor gets something added to the ACPI specifications.
> >
> 
> You mean MDIO topology, right?

That is part of it. I2C for SFPs, ethernet switches, etc.  I think
SFPs are going to be the real sticking point, since when you get above
10Gbps, you are into the land of SFPs, and server platforms are those
which will be going above 10G first.

> Every x86 PC and server in the world uses ACPI, and networking
> doesn't seem to be a problem there, so it is simply a matter of
> choosing the right abstraction level. I know this is much easier to
> achieve when all the network controllers are on PCIe cards, but the
> point remains valid: exhaustively describing the entire SoC like we
> do for DT is definitely not the right choice for ACPI systems.

Agreed. And i am pushing back. But we also have the conflict that some
SoC IP is used in systems that will boot RHEL, Debian, etc, and in
systems that are DT based. Do you want to write two drivers? One
assuming ACPI is doing all the work, and another where DT describes
the system and drivers and network core does all the work configuring
the hardware. For the same piece of hardware?

> I do have a question about the history here, btw. As I understand it,
> before commit f81dadbcf7fd ("net: phy: realtek: Add rtl8211e rx/tx
> delays config"), the phy-mode setting did not have any effect on
> Realtek PHYs in the first place, right? Since otherwise, this platform
> would never have worked from the beginning.

I suspect it did work to some extent, but not fully. So it could be,
it worked for whatever platform Serge was using, but failed in some
other cases, e.g. you board, and left the delays alone.

Later the vendor came along and said the code was wrong, and provided
some bits of information from the datasheet which is not publicly
available. As a result f81dadbcf7fd happened. Since that fixed a
previous patch, it was given a Fixes: tag and i assume back ported.

> So commit f81dadbcf7fd was backported to -stable, even though it
> didn't actually work, and had to be fixed in bbc4d71d63549bcd ("net:
> phy: realtek: fix rtl8211e rx/tx delay config"), which is the commit
> that causes the regression on these boards.
> 
> So why was commit f81dadbcf7fd sent to -stable in the first place?

f81dadbcf7fd first appears in v5.2. I don't see it in 4.19.152, the
LTS kernel older than that. So it is not in -stable.

    Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-18 10:56                             ` Ard Biesheuvel
@ 2020-10-18 15:45                               ` Andrew Lunn
  2020-10-25 14:16                                 ` Ard Biesheuvel
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-10-18 15:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Daniel Thompson, Sumit Garg, Alex Bennée, Masami Hiramatsu,
	steve, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

> However, that leaves the question why bbc4d71d63549bcd was backported,
> although I understand why the discussion is a bit trickier there. But
> if it did not fix a regression, only broken code that never worked in
> the first place, I am not convinced it belongs in -stable.

Please ask Serge Semin what platform he tested on. I kind of expect it
worked for him, in some limited way, enough that it passed his
testing.

	Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-18 15:45                               ` Andrew Lunn
@ 2020-10-25 14:16                                 ` Ard Biesheuvel
  2020-10-25 14:28                                   ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-25 14:16 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Daniel Thompson, Sumit Garg, Alex Bennée, Masami Hiramatsu,
	Steve McIntyre, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sun, 18 Oct 2020 at 17:45, Andrew Lunn <andrew@lunn.ch> wrote:
>
> > However, that leaves the question why bbc4d71d63549bcd was backported,
> > although I understand why the discussion is a bit trickier there. But
> > if it did not fix a regression, only broken code that never worked in
> > the first place, I am not convinced it belongs in -stable.
>
> Please ask Serge Semin what platform he tested on. I kind of expect it
> worked for him, in some limited way, enough that it passed his
> testing.
>

I'll make a note here that a rather large number of platforms got
broken by the same fix for the Realtek PHY driver:

https://lore.kernel.org/lkml/?q=bbc4d71d6354

I seriously doubt whether disabling TX/RX delay when it is enabled by
h/w straps is the right thing to do here.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-25 14:16                                 ` Ard Biesheuvel
@ 2020-10-25 14:28                                   ` Andrew Lunn
  2020-10-25 14:34                                     ` Ard Biesheuvel
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-10-25 14:28 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Daniel Thompson, Sumit Garg, Alex Bennée, Masami Hiramatsu,
	Steve McIntyre, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sun, Oct 25, 2020 at 03:16:36PM +0100, Ard Biesheuvel wrote:
> On Sun, 18 Oct 2020 at 17:45, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > However, that leaves the question why bbc4d71d63549bcd was backported,
> > > although I understand why the discussion is a bit trickier there. But
> > > if it did not fix a regression, only broken code that never worked in
> > > the first place, I am not convinced it belongs in -stable.
> >
> > Please ask Serge Semin what platform he tested on. I kind of expect it
> > worked for him, in some limited way, enough that it passed his
> > testing.
> >
> 
> I'll make a note here that a rather large number of platforms got
> broken by the same fix for the Realtek PHY driver:
> 
> https://lore.kernel.org/lkml/?q=bbc4d71d6354
> 
> I seriously doubt whether disabling TX/RX delay when it is enabled by
> h/w straps is the right thing to do here.

The device tree is explicitly asking for rgmii. If it wanted the
hardware left alone, it should of used PHY_INTERFACE_MODE_NA.

But we might be able to compromise for a cycle or two. As far as i
understand the hardware, we can read the strapping. If we find the
strapping resisters are present, but rgmii is in DT, print a warning
that the device tree needs upgrading, and ignore the DT mode. We can
add this to stable, but not net-next.

    Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-25 14:28                                   ` Andrew Lunn
@ 2020-10-25 14:34                                     ` Ard Biesheuvel
  2020-10-25 14:42                                       ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-25 14:34 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Daniel Thompson, Sumit Garg, Alex Bennée, Masami Hiramatsu,
	Steve McIntyre, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sun, 25 Oct 2020 at 15:29, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sun, Oct 25, 2020 at 03:16:36PM +0100, Ard Biesheuvel wrote:
> > On Sun, 18 Oct 2020 at 17:45, Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > However, that leaves the question why bbc4d71d63549bcd was backported,
> > > > although I understand why the discussion is a bit trickier there. But
> > > > if it did not fix a regression, only broken code that never worked in
> > > > the first place, I am not convinced it belongs in -stable.
> > >
> > > Please ask Serge Semin what platform he tested on. I kind of expect it
> > > worked for him, in some limited way, enough that it passed his
> > > testing.
> > >
> >
> > I'll make a note here that a rather large number of platforms got
> > broken by the same fix for the Realtek PHY driver:
> >
> > https://lore.kernel.org/lkml/?q=bbc4d71d6354
> >
> > I seriously doubt whether disabling TX/RX delay when it is enabled by
> > h/w straps is the right thing to do here.
>
> The device tree is explicitly asking for rgmii. If it wanted the
> hardware left alone, it should of used PHY_INTERFACE_MODE_NA.
>

Would you suggest that these DTs remove the phy-mode instead? As I
don't see anyone proposing that.

> But we might be able to compromise for a cycle or two. As far as i
> understand the hardware, we can read the strapping. If we find the
> strapping resisters are present, but rgmii is in DT, print a warning
> that the device tree needs upgrading, and ignore the DT mode. We can
> add this to stable, but not net-next.
>

That sounds reasonable, given how many different platforms seem to be
affected, and production ones may be running stable distro kernels,
and not expecting their Ethernet to fail without warning.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-25 14:34                                     ` Ard Biesheuvel
@ 2020-10-25 14:42                                       ` Andrew Lunn
  2020-10-29 14:21                                         ` Ilias Apalodimas
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-10-25 14:42 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Daniel Thompson, Sumit Garg, Alex Bennée, Masami Hiramatsu,
	Steve McIntyre, Ilias Apalodimas,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Sun, Oct 25, 2020 at 03:34:06PM +0100, Ard Biesheuvel wrote:
> On Sun, 25 Oct 2020 at 15:29, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Sun, Oct 25, 2020 at 03:16:36PM +0100, Ard Biesheuvel wrote:
> > > On Sun, 18 Oct 2020 at 17:45, Andrew Lunn <andrew@lunn.ch> wrote:
> > > >
> > > > > However, that leaves the question why bbc4d71d63549bcd was backported,
> > > > > although I understand why the discussion is a bit trickier there. But
> > > > > if it did not fix a regression, only broken code that never worked in
> > > > > the first place, I am not convinced it belongs in -stable.
> > > >
> > > > Please ask Serge Semin what platform he tested on. I kind of expect it
> > > > worked for him, in some limited way, enough that it passed his
> > > > testing.
> > > >
> > >
> > > I'll make a note here that a rather large number of platforms got
> > > broken by the same fix for the Realtek PHY driver:
> > >
> > > https://lore.kernel.org/lkml/?q=bbc4d71d6354
> > >
> > > I seriously doubt whether disabling TX/RX delay when it is enabled by
> > > h/w straps is the right thing to do here.
> >
> > The device tree is explicitly asking for rgmii. If it wanted the
> > hardware left alone, it should of used PHY_INTERFACE_MODE_NA.
> >
> 
> Would you suggest that these DTs remove the phy-mode instead? As I
> don't see anyone proposing that.

What is also O.K, for most MAC drivers. Some might enforce it is
present, in which case, you can set it to "", which will get parsed as
PHY_INTERFACE_MODE_NA. But a few MAC drivers might configure there MII
bus depending on the PHY mode, RGMII vs GMII.

    Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-25 14:42                                       ` Andrew Lunn
@ 2020-10-29 14:21                                         ` Ilias Apalodimas
  2020-10-29 14:39                                           ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Ilias Apalodimas @ 2020-10-29 14:21 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ard Biesheuvel, Daniel Thompson, Sumit Garg, Alex Bennée,
	Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

Hi Andrew

On Sun, Oct 25, 2020 at 03:42:58PM +0100, Andrew Lunn wrote:
> On Sun, Oct 25, 2020 at 03:34:06PM +0100, Ard Biesheuvel wrote:
> > On Sun, 25 Oct 2020 at 15:29, Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > On Sun, Oct 25, 2020 at 03:16:36PM +0100, Ard Biesheuvel wrote:
> > > > On Sun, 18 Oct 2020 at 17:45, Andrew Lunn <andrew@lunn.ch> wrote:
> > > > >
> > > > > > However, that leaves the question why bbc4d71d63549bcd was backported,
> > > > > > although I understand why the discussion is a bit trickier there. But
> > > > > > if it did not fix a regression, only broken code that never worked in
> > > > > > the first place, I am not convinced it belongs in -stable.
> > > > >
> > > > > Please ask Serge Semin what platform he tested on. I kind of expect it
> > > > > worked for him, in some limited way, enough that it passed his
> > > > > testing.
> > > > >
> > > >
> > > > I'll make a note here that a rather large number of platforms got
> > > > broken by the same fix for the Realtek PHY driver:
> > > >
> > > > https://lore.kernel.org/lkml/?q=bbc4d71d6354
> > > >
> > > > I seriously doubt whether disabling TX/RX delay when it is enabled by
> > > > h/w straps is the right thing to do here.
> > >
> > > The device tree is explicitly asking for rgmii. If it wanted the
> > > hardware left alone, it should of used PHY_INTERFACE_MODE_NA.
> > >
> > 
> > Would you suggest that these DTs remove the phy-mode instead? As I
> > don't see anyone proposing that.
> 
> What is also O.K, for most MAC drivers. Some might enforce it is
> present, in which case, you can set it to "", which will get parsed as
> PHY_INTERFACE_MODE_NA. But a few MAC drivers might configure there MII
> bus depending on the PHY mode, RGMII vs GMII.
> 
>     Andrew

What about reverting the realtek PHY commit from stable?
As Ard said it doesn't really fix anything (usage wise) and causes a bunch of
problems.

If I understand correctly we have 3 options:
1. 'Hack' the  drivers in stable to fix it (and most of those hacks will take 
   a long time to remove)
2. Update DTE of all affected devices, backport it to stable and force users to
update
3. Revert the PHY commit

imho [3] is the least painful solution.


Thanks
/Ilias

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-29 14:21                                         ` Ilias Apalodimas
@ 2020-10-29 14:39                                           ` Andrew Lunn
  2020-10-29 14:42                                             ` Ard Biesheuvel
  2020-10-29 14:46                                             ` Ilias Apalodimas
  0 siblings, 2 replies; 41+ messages in thread
From: Andrew Lunn @ 2020-10-29 14:39 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Ard Biesheuvel, Daniel Thompson, Sumit Garg, Alex Bennée,
	Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

> What about reverting the realtek PHY commit from stable?
> As Ard said it doesn't really fix anything (usage wise) and causes a bunch of
> problems.
> 
> If I understand correctly we have 3 options:
> 1. 'Hack' the  drivers in stable to fix it (and most of those hacks will take 
>    a long time to remove)
> 2. Update DTE of all affected devices, backport it to stable and force users to
> update
> 3. Revert the PHY commit
> 
> imho [3] is the least painful solution.

The PHY commit is correct, in that it fixes a bug. So i don't want to
remove it.

Backporting it to stable is what is causing most of the issues today,
combined with a number of broken DT descriptions. So i would be happy
for stable to get a patch which looks at the strapping, sees ID is
enabled via strapping, warns the DT blob is FUBAR, and then ignores
the requested PHY-mode. That gives developers time to fix their broken
DT.

	  Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-29 14:39                                           ` Andrew Lunn
@ 2020-10-29 14:42                                             ` Ard Biesheuvel
  2020-10-29 14:50                                               ` Andrew Lunn
  2020-10-29 14:46                                             ` Ilias Apalodimas
  1 sibling, 1 reply; 41+ messages in thread
From: Ard Biesheuvel @ 2020-10-29 14:42 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, Daniel Thompson, Sumit Garg, Alex Bennée,
	Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Thu, 29 Oct 2020 at 15:39, Andrew Lunn <andrew@lunn.ch> wrote:
>
> > What about reverting the realtek PHY commit from stable?
> > As Ard said it doesn't really fix anything (usage wise) and causes a bunch of
> > problems.
> >
> > If I understand correctly we have 3 options:
> > 1. 'Hack' the  drivers in stable to fix it (and most of those hacks will take
> >    a long time to remove)
> > 2. Update DTE of all affected devices, backport it to stable and force users to
> > update
> > 3. Revert the PHY commit
> >
> > imho [3] is the least painful solution.
>
> The PHY commit is correct, in that it fixes a bug. So i don't want to
> remove it.
>
> Backporting it to stable is what is causing most of the issues today,
> combined with a number of broken DT descriptions. So i would be happy
> for stable to get a patch which looks at the strapping, sees ID is
> enabled via strapping, warns the DT blob is FUBAR, and then ignores
> the requested PHY-mode. That gives developers time to fix their broken
> DT.
>

IIRC there is no public documentation for this PHY, right? So most
people that are affected by this are not actually able to implement
this workaround, even if they wanted to.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-29 14:39                                           ` Andrew Lunn
  2020-10-29 14:42                                             ` Ard Biesheuvel
@ 2020-10-29 14:46                                             ` Ilias Apalodimas
  2020-11-05 17:31                                               ` Jernej Škrabec
  1 sibling, 1 reply; 41+ messages in thread
From: Ilias Apalodimas @ 2020-10-29 14:46 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ard Biesheuvel, Daniel Thompson, Sumit Garg, Alex Bennée,
	Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

On Thu, Oct 29, 2020 at 03:39:34PM +0100, Andrew Lunn wrote:
> > What about reverting the realtek PHY commit from stable?
> > As Ard said it doesn't really fix anything (usage wise) and causes a bunch of
> > problems.
> > 
> > If I understand correctly we have 3 options:
> > 1. 'Hack' the  drivers in stable to fix it (and most of those hacks will take 
> >    a long time to remove)
> > 2. Update DTE of all affected devices, backport it to stable and force users to
> > update
> > 3. Revert the PHY commit
> > 
> > imho [3] is the least painful solution.
> 
> The PHY commit is correct, in that it fixes a bug. So i don't want to
> remove it.

Yea I meant revert the commit from were ever it was backported, not on current 
upstream. I agree it's correct from a coding point of view, but it never 
actually fixes anything functionality wise of the affected platforms. 
On the contrary, it breaks platforms without warning.

> 
> Backporting it to stable is what is causing most of the issues today,
> combined with a number of broken DT descriptions. So i would be happy
> for stable to get a patch which looks at the strapping, sees ID is
> enabled via strapping, warns the DT blob is FUBAR, and then ignores
> the requested PHY-mode. That gives developers time to fix their broken
> DT.

(Ard replied on this while I was typing)

> 
> 	  Andrew

Cheers
/Ilias

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-29 14:42                                             ` Ard Biesheuvel
@ 2020-10-29 14:50                                               ` Andrew Lunn
  0 siblings, 0 replies; 41+ messages in thread
From: Andrew Lunn @ 2020-10-29 14:50 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ilias Apalodimas, Daniel Thompson, Sumit Garg, Alex Bennée,
	Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima

> IIRC there is no public documentation for this PHY, right? So most
> people that are affected by this are not actually able to implement
> this workaround, even if they wanted to.

The information you need is in the mailing list, where a realtek
person describes what the bits do.

       Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-10-29 14:46                                             ` Ilias Apalodimas
@ 2020-11-05 17:31                                               ` Jernej Škrabec
  2020-11-13 13:50                                                 ` Arnd Bergmann
  0 siblings, 1 reply; 41+ messages in thread
From: Jernej Škrabec @ 2020-11-05 17:31 UTC (permalink / raw)
  To: Andrew Lunn, netdev
  Cc: Ard Biesheuvel, Daniel Thompson, Sumit Garg, Alex Bennée,
	Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

Hi!

Dne četrtek, 29. oktober 2020 ob 15:46:44 CET je Ilias Apalodimas napisal(a):
> On Thu, Oct 29, 2020 at 03:39:34PM +0100, Andrew Lunn wrote:
> > > What about reverting the realtek PHY commit from stable?
> > > As Ard said it doesn't really fix anything (usage wise) and causes a 
bunch of
> > > problems.
> > > 
> > > If I understand correctly we have 3 options:
> > > 1. 'Hack' the  drivers in stable to fix it (and most of those hacks will 
take 
> > >    a long time to remove)
> > > 2. Update DTE of all affected devices, backport it to stable and force 
users to
> > > update
> > > 3. Revert the PHY commit
> > > 
> > > imho [3] is the least painful solution.
> > 
> > The PHY commit is correct, in that it fixes a bug. So i don't want to
> > remove it.
> 
> Yea I meant revert the commit from were ever it was backported, not on 
current 
> upstream. I agree it's correct from a coding point of view, but it never 
> actually fixes anything functionality wise of the affected platforms. 

Sadly, there is one board - Pine64 Plus - where HW settings are wrong and it 
actually needs SW override. Until this Realtek PHY driver fix was merged, it 
was unclear what magic value provided by Realtek to board manufacturer does.

Reference:
https://lore.kernel.org/netdev/20191001082912.12905-3-icenowy@aosc.io/

Best regards,
Jernej

> On the contrary, it breaks platforms without warning.
> 
> > 
> > Backporting it to stable is what is causing most of the issues today,
> > combined with a number of broken DT descriptions. So i would be happy
> > for stable to get a patch which looks at the strapping, sees ID is
> > enabled via strapping, warns the DT blob is FUBAR, and then ignores
> > the requested PHY-mode. That gives developers time to fix their broken
> > DT.
> 
> (Ard replied on this while I was typing)
> 
> > 
> > 	  Andrew
> 
> Cheers
> /Ilias
> 
> 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-11-05 17:31                                               ` Jernej Škrabec
@ 2020-11-13 13:50                                                 ` Arnd Bergmann
  2020-11-13 14:44                                                   ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Arnd Bergmann @ 2020-11-13 13:50 UTC (permalink / raw)
  To: Jernej Škrabec
  Cc: Andrew Lunn, Ard Biesheuvel, Daniel Thompson, Sumit Garg,
	Alex Bennée, Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

On Thu, Nov 5, 2020 at 6:28 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote:
> Dne četrtek, 29. oktober 2020 ob 15:46:44 CET je Ilias Apalodimas napisal(a):
> > On Thu, Oct 29, 2020 at 03:39:34PM +0100, Andrew Lunn wrote:
> > > > What about reverting the realtek PHY commit from stable?
> > > > As Ard said it doesn't really fix anything (usage wise) and causes a
> bunch of
> > > > problems.
> > > >
> > > > If I understand correctly we have 3 options:
> > > > 1. 'Hack' the  drivers in stable to fix it (and most of those hacks will
> take
> > > >    a long time to remove)
> > > > 2. Update DTE of all affected devices, backport it to stable and force
> users to
> > > > update
> > > > 3. Revert the PHY commit
> > > >
> > > > imho [3] is the least painful solution.

There is also the option of patching the dtb in memory while booting one
of the affected machines.

> > > The PHY commit is correct, in that it fixes a bug. So i don't want to
> > > remove it.
> >
> > Yea I meant revert the commit from were ever it was backported, not on
> current
> > upstream. I agree it's correct from a coding point of view, but it never
> > actually fixes anything functionality wise of the affected platforms.
>
> Sadly, there is one board - Pine64 Plus - where HW settings are wrong and it
> actually needs SW override. Until this Realtek PHY driver fix was merged, it
> was unclear what magic value provided by Realtek to board manufacturer does.
>
> Reference:
> https://lore.kernel.org/netdev/20191001082912.12905-3-icenowy@aosc.io/

I have merged the fixes from the allwinner tree now, but I still think we
need something better than this, as the current state breaks any existing
dtb file that has the incorrect values, and this really should not have been
considered for backporting to stable kernels.

       Arnd

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-11-13 13:50                                                 ` Arnd Bergmann
@ 2020-11-13 14:44                                                   ` Andrew Lunn
  2020-11-13 15:33                                                     ` Arnd Bergmann
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-11-13 14:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jernej Škrabec, Ard Biesheuvel, Daniel Thompson, Sumit Garg,
	Alex Bennée, Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

> > Sadly, there is one board - Pine64 Plus - where HW settings are wrong and it
> > actually needs SW override. Until this Realtek PHY driver fix was merged, it
> > was unclear what magic value provided by Realtek to board manufacturer does.
> >
> > Reference:
> > https://lore.kernel.org/netdev/20191001082912.12905-3-icenowy@aosc.io/
> 
> I have merged the fixes from the allwinner tree now, but I still think we
> need something better than this, as the current state breaks any existing
> dtb file that has the incorrect values, and this really should not have been
> considered for backporting to stable kernels.

Hi Arnd

This PHY driver bug hiding DT bug is always hard to handle. We have
been though it once before with the Atheros PHY. All the buggy DT
files were fixed in about one cycle.

Now that we know there is a board which really does want rgmii when it
says rgmii, we cannot simply ignore it in the PHY driver.

But the whole story is muddy because of the backport to stable.  It
might make sense to revert the stable change, and just leave HEAD
broken. That then gives people more time to fix DT blobs. But we have
to consider Pine64 Plus, are we happy breaking that for a while, when
it is actually doing everything correct, and is bug free?

	 Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-11-13 14:44                                                   ` Andrew Lunn
@ 2020-11-13 15:33                                                     ` Arnd Bergmann
  2020-11-13 16:56                                                       ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Arnd Bergmann @ 2020-11-13 15:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jernej Škrabec, Ard Biesheuvel, Daniel Thompson, Sumit Garg,
	Alex Bennée, Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

On Fri, Nov 13, 2020 at 3:45 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > > Sadly, there is one board - Pine64 Plus - where HW settings are wrong and it
> > > actually needs SW override. Until this Realtek PHY driver fix was merged, it
> > > was unclear what magic value provided by Realtek to board manufacturer does.
> > >
> > > Reference:
> > > https://lore.kernel.org/netdev/20191001082912.12905-3-icenowy@aosc.io/
> >
> > I have merged the fixes from the allwinner tree now, but I still think we
> > need something better than this, as the current state breaks any existing
> > dtb file that has the incorrect values, and this really should not have been
> > considered for backporting to stable kernels.
>
> Hi Arnd
>
> This PHY driver bug hiding DT bug is always hard to handle. We have
> been though it once before with the Atheros PHY. All the buggy DT
> files were fixed in about one cycle.

Do you have a link to the problem for the Atheros PHY?

I'm generally skeptical about the idea of being able to fix all DTBs,
some of the problems with that being:

- There is no way to identify which of of the 2019 dts files in the
  kernel actually have this particular phy, because it does not
  have a device node in the dt. Looking only at files that set
  phy-mode="rgmii" limits it to 235 files, but that is still more than
  anyone can test.

- if there was a way to automate identifying the dts files that
  need to be modified, we should also be able to do it at runtime

- I really want to encourage stable dtb files that users can ship
  with their boards in the same way that we treat firmware
  on PCs and classic Open Firmware machines as stable.
  Updating a kernel should never require updating the firmware
  first.

- The model advocated by EBBR [1] actually requires this in
  order to deal with distros updating kernels. When you use a
  distro with grub (or similar) as the bootloader, you can choose
  between multiple kernels through a boot menu, but a single
  DTB is provided by the firmware to the bootloader.
  There is an ongoing discussion on how to deal with picking
  the right DTB for a kernel being picked by the boot loader, while
  also having the firmware modify an arbitrary set of properties
  (memory size, boot arguments, mac address, random seed,
  ...) in there, but it would be best to just not have to rely on that
  kind of mechanism.

- Not every custom board should need to have its dts file
  in the kernel. I'm less concerned about incorrect properties
  in out-of-tree dts files, but would strongly discourage
  other incompatible binding changes.

> Now that we know there is a board which really does want rgmii when it
> says rgmii, we cannot simply ignore it in the PHY driver.
>
> But the whole story is muddy because of the backport to stable.  It
> might make sense to revert the stable change, and just leave HEAD
> broken. That then gives people more time to fix DT blobs. But we have
> to consider Pine64 Plus, are we happy breaking that for a while, when
> it is actually doing everything correct, and is bug free?

I agree this makes the problem harder, but I have still hope that
we can come up with a code solution that can deal with this
one board that needs to have the correct settings applied as well
as the others on which we have traditionally ignored them.

As I understand it so far, the reason this board needs a different
setting is that the strapping pins are wired incorrectly, while all
other boards set them right and work correctly by default. I would
much prefer a way to identify this behavior in dts and have the phy
driver just warn when it finds a mismatch between the internal
delay setting in DT and the strapping pins but keep using the
setting from the strapping pins when there is a conflict.

There are surely other ways to achieve the same state of having
it all working without requiring the fixed dts.

     Arnd

[1] https://arm-software.github.io/ebbr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-11-13 15:33                                                     ` Arnd Bergmann
@ 2020-11-13 16:56                                                       ` Andrew Lunn
  2020-11-13 21:26                                                         ` Arnd Bergmann
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-11-13 16:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jernej Škrabec, Ard Biesheuvel, Daniel Thompson, Sumit Garg,
	Alex Bennée, Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

> > Hi Arnd
> >
> > This PHY driver bug hiding DT bug is always hard to handle. We have
> > been though it once before with the Atheros PHY. All the buggy DT
> > files were fixed in about one cycle.
> 
> Do you have a link to the problem for the Atheros PHY?

commit cd28d1d6e52e740130745429b3ff0af7cbba7b2c
Author: Vinod Koul <vkoul@kernel.org>
Date:   Mon Jan 21 14:43:17 2019 +0530

    net: phy: at803x: Disable phy delay for RGMII mode
    
    For RGMII mode, phy delay should be disabled. Add this case along
    with disable delay routines.
    
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

and

commit 6d4cd041f0af5b4c8fc742b4a68eac22e420e28c
Author: Vinod Koul <vkoul@kernel.org>
Date:   Thu Feb 21 15:53:15 2019 +0530

    net: phy: at803x: disable delay only for RGMII mode
    
    Per "Documentation/devicetree/bindings/net/ethernet.txt" RGMII mode
    should not have delay in PHY whereas RGMII_ID and RGMII_RXID/RGMII_TXID
    can have delay in PHY.
    
    So disable the delay only for RGMII mode and enable for other modes.
    Also treat the default case as disabled delays.
    
    Fixes: cd28d1d6e52e: ("net: phy: at803x: Disable phy delay for RGMII mode")

Looking at the git history, it seems like it also took two attempts to
get it working correctly, but the time between the two patches was
much shorted for the atheros PHY.

You will find DT patches converting rgmii to rgmii-id started soon
afterwards.

> I'm generally skeptical about the idea of being able to fix all DTBs,
> some of the problems with that being:
> 
> - There is no way to identify which of of the 2019 dts files in the
>   kernel actually have this particular phy, because it does not
>   have a device node in the dt. Looking only at files that set
>   phy-mode="rgmii" limits it to 235 files, but that is still more than
>   anyone can test.

You can narrow it down a bit. The rtl8211e was added
2014-06-10. Anything older than that, is unlikely to be a problem.
And you can ignore marvell, broadcom, etc boards. They are unlikely to
use a realtek PHY.

But i agree, we cannot test them all. We probably need to look at what
boards we know are broken, and get siblings tested.

> - if there was a way to automate identifying the dts files that
>   need to be modified, we should also be able to do it at runtime

We can get a hint, that there might be a problem, but we can get false
positives. These DT blobs are broken because they rely on strapping
resisters to put the PHY into the correct RGMII mode. We can read
these strapping resistors and compare them against what the software
is asking for. If they differ, it could be the DT blob is buggy. But
there are cases where the DT blob is correct, the strapping is wrong,
eg Pine64 Plus. It is doing everything correctly in DT.

> I agree this makes the problem harder, but I have still hope that
> we can come up with a code solution that can deal with this
> one board that needs to have the correct settings applied as well
> as the others on which we have traditionally ignored them.
> 
> As I understand it so far, the reason this board needs a different
> setting is that the strapping pins are wired incorrectly, while all
> other boards set them right and work correctly by default. I would
> much prefer a way to identify this behavior in dts and have the phy
> driver just warn when it finds a mismatch between the internal
> delay setting in DT and the strapping pins but keep using the
> setting from the strapping pins when there is a conflict.

So what you are suggesting is that the pine board, and any other board
which comes along in the future using this PHY which really wants
RGMII, needs a boolean DT property:

"realtek,IRealyDoWantRGMII_IAmNotBroken"

in the PHY node?

And if it is missing, we ignore when the MAC asks for RGMII and
actually do RGMII_ID?

We might also need to talk to the FreeBSD folks.

https://reviews.freebsd.org/D13591

Do we need to ask them to be bug compatible to Linux? Are the same DT
file being used?

That still leaves ACPI systems. Do we want to stuff this DT property
into an ACPI table? That seems to go against what ACPI people say
saying, ACPI is not DT with an extra wrapper around it.

   Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-11-13 16:56                                                       ` Andrew Lunn
@ 2020-11-13 21:26                                                         ` Arnd Bergmann
  2020-11-13 22:43                                                           ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Arnd Bergmann @ 2020-11-13 21:26 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jernej Škrabec, Ard Biesheuvel, Daniel Thompson, Sumit Garg,
	Alex Bennée, Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

On Fri, Nov 13, 2020 at 5:57 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > > Hi Arnd
> > >
> > > This PHY driver bug hiding DT bug is always hard to handle. We have
> > > been though it once before with the Atheros PHY. All the buggy DT
> > > files were fixed in about one cycle.
> >
> > Do you have a link to the problem for the Atheros PHY?
>
> commit cd28d1d6e52e740130745429b3ff0af7cbba7b2c
> Author: Vinod Koul <vkoul@kernel.org>
> Date:   Mon Jan 21 14:43:17 2019 +0530
>
>     net: phy: at803x: Disable phy delay for RGMII mode
>
>     For RGMII mode, phy delay should be disabled. Add this case along
>     with disable delay routines.
>
>     Signed-off-by: Vinod Koul <vkoul@kernel.org>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> and
>
> commit 6d4cd041f0af5b4c8fc742b4a68eac22e420e28c
> Author: Vinod Koul <vkoul@kernel.org>
> Date:   Thu Feb 21 15:53:15 2019 +0530
>
>     net: phy: at803x: disable delay only for RGMII mode
>
>     Per "Documentation/devicetree/bindings/net/ethernet.txt" RGMII mode
>     should not have delay in PHY whereas RGMII_ID and RGMII_RXID/RGMII_TXID
>     can have delay in PHY.
>
>     So disable the delay only for RGMII mode and enable for other modes.
>     Also treat the default case as disabled delays.
>
>     Fixes: cd28d1d6e52e: ("net: phy: at803x: Disable phy delay for RGMII mode")
>
> Looking at the git history, it seems like it also took two attempts to
> get it working correctly, but the time between the two patches was
> much shorted for the atheros PHY.

ok, thanks.

> > I agree this makes the problem harder, but I have still hope that
> > we can come up with a code solution that can deal with this
> > one board that needs to have the correct settings applied as well
> > as the others on which we have traditionally ignored them.
> >
> > As I understand it so far, the reason this board needs a different
> > setting is that the strapping pins are wired incorrectly, while all
> > other boards set them right and work correctly by default. I would
> > much prefer a way to identify this behavior in dts and have the phy
> > driver just warn when it finds a mismatch between the internal
> > delay setting in DT and the strapping pins but keep using the
> > setting from the strapping pins when there is a conflict.
>
> So what you are suggesting is that the pine board, and any other board
> which comes along in the future using this PHY which really wants
> RGMII, needs a boolean DT property:
>
> "realtek,IRealyDoWantRGMII_IAmNotBroken"
>
> in the PHY node?
>
> And if it is missing, we ignore when the MAC asks for RGMII and
> actually do RGMII_ID?

Something of that sort. I also see a similar patch in KSZ9031
now, see 7dd8f0ba88fc ("arm: dts: imx6qdl-udoo: fix rgmii phy-mode
for ksz9031 phy")

As this exact mismatch between rgmii and rgmii-id mode is apparently
a more widespread problem, the best workaround I can think of
is that we redefine the phy-mode="rgmii" property to actually mean
"use rgmii mode and let the phy driver decide the delay configuration",
with a new string to mean specifically rgmii mode with no delay.

That way, the existing behavior of all phy drivers would become
conformant, and we could change the one board that needs to
disable the internal delay to use the new property.

If we have three drivers that confused the meaning of
phy-mode="rgmii", there may still be others.

> We might also need to talk to the FreeBSD folks.
>
> https://reviews.freebsd.org/D13591
>
> Do we need to ask them to be bug compatible to Linux? Are the same DT
> file being used?

We are moving towards sharing the dts files better, yes.
In a lot of cases this does mean bug compatibility, the same
way that the DT support on Linux still has some quirks that
go back to bugs in MacOS or AIX on PowerPC.

> That still leaves ACPI systems. Do we want to stuff this DT property
> into an ACPI table? That seems to go against what ACPI people say
> saying, ACPI is not DT with an extra wrapper around it.

As far as I know, the only affected ACPI system has the same
issue and it should be handled the same way, also not requiring
a firmware update.

      Arnd

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-11-13 21:26                                                         ` Arnd Bergmann
@ 2020-11-13 22:43                                                           ` Andrew Lunn
  2020-11-13 22:49                                                             ` Ard Biesheuvel
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-11-13 22:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jernej Škrabec, Ard Biesheuvel, Daniel Thompson, Sumit Garg,
	Alex Bennée, Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

Hi Arnd

> Something of that sort. I also see a similar patch in KSZ9031
> now, see 7dd8f0ba88fc ("arm: dts: imx6qdl-udoo: fix rgmii phy-mode
> for ksz9031 phy")
> 
> As this exact mismatch between rgmii and rgmii-id mode is apparently
> a more widespread problem, the best workaround I can think of
> is that we redefine the phy-mode="rgmii" property to actually mean
> "use rgmii mode and let the phy driver decide the delay configuration",

The problem is, the PHY driver has no idea what the delay
configuration should be. That is the whole point of the DT property.

The MAC and the PHY have to work together to ensure one of them
inserts the delay. In most cases, the MAC driver reads the property
and passes it unmodified to the PHY. The PHY then does what it is
told. In some cases, the MAC decides to add the delay, it changes the
rgmii-id to rgmii before passing it onto the PHY. The PHY does as it
is told, and it works. And a very small number of boards simply have
longer clock lines than signal lines, so the PCB adds the delay. It is
not clearly defined how that should be described in DT, but it works
so far because most MAC drivers don't add delays, pass the 'rgmii'
from DT to the PHY and it does as it is told and does not add delays.

There is one more case, which is not used very often. The PHY is
passed the NA values, which means, don't touch, something else has set
it up. So when the straps are doing the correct thing, you could pass
NA. However, some MAC drivers look at the phy mode, see it is one of
the 4 rgmii modes, and configure their end to rgmii, instead of gmii,
mii, sgmii, etc. How networking does ACPI is still very undefined, but
i think we need to push for ACPI to pass NA, and the firmware does all
the setup. That seems to be ACPI way.

> with a new string to mean specifically rgmii mode with no delay.

As you said, we have phy-mode="rgmii" 235 times. How many of those are
going to break when you change the definition of rgmii?  I have no
idea, but my gut feeling is more than the number of boards which are
currently broken because of the problem with this PHY.

And, as i said above, some MAC drivers look for one of the 4 RGMII
modes in order to configure their side. If you add a new string, you
need to review all the MAC drivers and make sure they check for all 5
strings, not 4. Some of that is easy, modify
phy_interface_mode_is_rgmii(), but not all MAC use it, and it is no
help in a switch statement.

And we are potentially going to get into the same problem
again. History has shown, we cannot get 4 properties right. Do you
think we will do any better getting 5 properties right? Especially
when phy-mode="rgmii" does not mean rgmii, but do whatever you think
might be correct?

Having suffered the pain from the Atheros PHY, this is something i
review much more closely, so hopefully we are getting better at
this. But PHY drivers live for a long time, ksz9031 was added 7 years
ago, well before we started looking closely at delays. I expect more
similar problems will keep being found over the next decade.

To some extent, we actually need DT writers to properly test their
DT. If both rgmii and rgmii-id works, there is a 50% chance whatever
they pick is wrong. And it would be nice if they told the networking
people so we can fix the PHY.

       Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-11-13 22:43                                                           ` Andrew Lunn
@ 2020-11-13 22:49                                                             ` Ard Biesheuvel
  2020-11-14  0:40                                                               ` Andrew Lunn
  0 siblings, 1 reply; 41+ messages in thread
From: Ard Biesheuvel @ 2020-11-13 22:49 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Arnd Bergmann, Jernej Škrabec, Daniel Thompson, Sumit Garg,
	Alex Bennée, Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

On Fri, 13 Nov 2020 at 23:43, Andrew Lunn <andrew@lunn.ch> wrote:
>
> Hi Arnd
>
> > Something of that sort. I also see a similar patch in KSZ9031
> > now, see 7dd8f0ba88fc ("arm: dts: imx6qdl-udoo: fix rgmii phy-mode
> > for ksz9031 phy")
> >
> > As this exact mismatch between rgmii and rgmii-id mode is apparently
> > a more widespread problem, the best workaround I can think of
> > is that we redefine the phy-mode="rgmii" property to actually mean
> > "use rgmii mode and let the phy driver decide the delay configuration",
>
> The problem is, the PHY driver has no idea what the delay
> configuration should be. That is the whole point of the DT property.
>
> The MAC and the PHY have to work together to ensure one of them
> inserts the delay. In most cases, the MAC driver reads the property
> and passes it unmodified to the PHY. The PHY then does what it is
> told. In some cases, the MAC decides to add the delay, it changes the
> rgmii-id to rgmii before passing it onto the PHY. The PHY does as it
> is told, and it works. And a very small number of boards simply have
> longer clock lines than signal lines, so the PCB adds the delay. It is
> not clearly defined how that should be described in DT, but it works
> so far because most MAC drivers don't add delays, pass the 'rgmii'
> from DT to the PHY and it does as it is told and does not add delays.
>
> There is one more case, which is not used very often. The PHY is
> passed the NA values, which means, don't touch, something else has set
> it up. So when the straps are doing the correct thing, you could pass
> NA. However, some MAC drivers look at the phy mode, see it is one of
> the 4 rgmii modes, and configure their end to rgmii, instead of gmii,
> mii, sgmii, etc. How networking does ACPI is still very undefined, but
> i think we need to push for ACPI to pass NA, and the firmware does all
> the setup. That seems to be ACPI way.
>
> > with a new string to mean specifically rgmii mode with no delay.
>
> As you said, we have phy-mode="rgmii" 235 times. How many of those are
> going to break when you change the definition of rgmii?  I have no
> idea, but my gut feeling is more than the number of boards which are
> currently broken because of the problem with this PHY.
>
> And, as i said above, some MAC drivers look for one of the 4 RGMII
> modes in order to configure their side. If you add a new string, you
> need to review all the MAC drivers and make sure they check for all 5
> strings, not 4. Some of that is easy, modify
> phy_interface_mode_is_rgmii(), but not all MAC use it, and it is no
> help in a switch statement.
>
> And we are potentially going to get into the same problem
> again. History has shown, we cannot get 4 properties right. Do you
> think we will do any better getting 5 properties right? Especially
> when phy-mode="rgmii" does not mean rgmii, but do whatever you think
> might be correct?
>
> Having suffered the pain from the Atheros PHY, this is something i
> review much more closely, so hopefully we are getting better at
> this. But PHY drivers live for a long time, ksz9031 was added 7 years
> ago, well before we started looking closely at delays. I expect more
> similar problems will keep being found over the next decade.
>
> To some extent, we actually need DT writers to properly test their
> DT. If both rgmii and rgmii-id works, there is a 50% chance whatever
> they pick is wrong. And it would be nice if they told the networking
> people so we can fix the PHY.
>

One question that still has not been answered is how many actual
platforms were fixed by backporting Realtek's follow up fix to
-stable. My suspicion is none. That by itself should be enough
justification to revert the backport of that change.

I do agree that we should fix this properly going forward, and if we
do manage to fix this in a backwards compatible way, we should
backport that fix. But letting the current situation exist because
nobody can be bothered to fix it properly is not the right solution
IMHO.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-11-13 22:49                                                             ` Ard Biesheuvel
@ 2020-11-14  0:40                                                               ` Andrew Lunn
  2020-11-14 10:09                                                                 ` Ard Biesheuvel
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Lunn @ 2020-11-14  0:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arnd Bergmann, Jernej Škrabec, Daniel Thompson, Sumit Garg,
	Alex Bennée, Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

> One question that still has not been answered is how many actual
> platforms were fixed by backporting Realtek's follow up fix to
> -stable. My suspicion is none. That by itself should be enough
> justification to revert the backport of that change.
 
I think i've already said that would be a good idea. It makes the
problem less critical. But the problem is still there, we are just
kicking the can down the road. I've not seen much activity actually
fixing the broken DT. So i suspect when we catch up with the can, we
will mostly still be in the same place. Actually, maybe worse, because
broken DTs have been copy/pasted for new boards?

	Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: realtek PHY commit bbc4d71d63549 causes regression
  2020-11-14  0:40                                                               ` Andrew Lunn
@ 2020-11-14 10:09                                                                 ` Ard Biesheuvel
  0 siblings, 0 replies; 41+ messages in thread
From: Ard Biesheuvel @ 2020-11-14 10:09 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Arnd Bergmann, Jernej Škrabec, Daniel Thompson, Sumit Garg,
	Alex Bennée, Masami Hiramatsu, Steve McIntyre,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Willy Liu, David S. Miller, Jakub Kicinski, Sasha Levin,
	Florian Fainelli, Heiner Kallweit, Masahisa Kojima,
	Ilias Apalodimas

On Sat, 14 Nov 2020 at 01:40, Andrew Lunn <andrew@lunn.ch> wrote:
>
> > One question that still has not been answered is how many actual
> > platforms were fixed by backporting Realtek's follow up fix to
> > -stable. My suspicion is none. That by itself should be enough
> > justification to revert the backport of that change.
>
> I think i've already said that would be a good idea. It makes the
> problem less critical. But the problem is still there, we are just
> kicking the can down the road. I've not seen much activity actually
> fixing the broken DT. So i suspect when we catch up with the can, we
> will mostly still be in the same place. Actually, maybe worse, because
> broken DTs have been copy/pasted for new boards?
>

I don't see how that matters. If the new board ships with a stable
kernel, things should simply work as they did before. If the new board
ships with a new kernel, things won't work in the first place, so it
is unlikely to cause a regression in the field.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-11-14 10:10 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-17 14:20 realtek PHY commit bbc4d71d63549 causes regression Ard Biesheuvel
2020-10-17 14:44 ` Andrew Lunn
2020-10-17 14:46   ` Ard Biesheuvel
2020-10-17 15:11     ` Andrew Lunn
2020-10-17 15:18       ` Ard Biesheuvel
2020-10-17 16:14         ` Ilias Apalodimas
2020-10-17 16:17           ` Ard Biesheuvel
2020-10-17 16:21             ` Ilias Apalodimas
2020-10-17 18:04             ` Andrew Lunn
2020-10-17 18:11               ` Ard Biesheuvel
2020-10-17 18:17                 ` Ard Biesheuvel
2020-10-17 18:27                 ` Andrew Lunn
2020-10-17 18:55                   ` Ard Biesheuvel
2020-10-17 19:49                     ` Andrew Lunn
2020-10-17 22:19                       ` Ard Biesheuvel
2020-10-17 23:02                         ` Andrew Lunn
2020-10-18 10:35                           ` Ard Biesheuvel
2020-10-18 10:56                             ` Ard Biesheuvel
2020-10-18 15:45                               ` Andrew Lunn
2020-10-25 14:16                                 ` Ard Biesheuvel
2020-10-25 14:28                                   ` Andrew Lunn
2020-10-25 14:34                                     ` Ard Biesheuvel
2020-10-25 14:42                                       ` Andrew Lunn
2020-10-29 14:21                                         ` Ilias Apalodimas
2020-10-29 14:39                                           ` Andrew Lunn
2020-10-29 14:42                                             ` Ard Biesheuvel
2020-10-29 14:50                                               ` Andrew Lunn
2020-10-29 14:46                                             ` Ilias Apalodimas
2020-11-05 17:31                                               ` Jernej Škrabec
2020-11-13 13:50                                                 ` Arnd Bergmann
2020-11-13 14:44                                                   ` Andrew Lunn
2020-11-13 15:33                                                     ` Arnd Bergmann
2020-11-13 16:56                                                       ` Andrew Lunn
2020-11-13 21:26                                                         ` Arnd Bergmann
2020-11-13 22:43                                                           ` Andrew Lunn
2020-11-13 22:49                                                             ` Ard Biesheuvel
2020-11-14  0:40                                                               ` Andrew Lunn
2020-11-14 10:09                                                                 ` Ard Biesheuvel
2020-10-18 15:38                             ` Andrew Lunn
2020-10-18 14:20                         ` Masami Hiramatsu
2020-10-17 18:01           ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).