linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Issue with RTL8111 NIC after upgrade to kernel 4.19
@ 2018-11-21 10:51 Norbert Jurkeit
  2018-11-21 19:57 ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Norbert Jurkeit @ 2018-11-21 10:51 UTC (permalink / raw)
  To: nic_swsd, davem, netdev, linux-kernel
  Cc: michael.wiktowy, jcline, marc.c.dionne

Dear Linux developers,

after upgrade from distro kernel 4.18.18 to 4.19.2 some Fedora users 
(including me) encounter issues with RTL8111 based Ethernet interfaces. 
The interface is not started correctly after boot with kernel 4.19.2 
(i.e. is reported by ifconfig as up, but not running, the LEDs besides 
the connector stay off and no data is transferred).

At least for me the issue seems to occur only after reboot from a 
previous session with kernel 4.18.18, but not after power-on, which 
means the hardware is not fully initialized by the new r8169 driver. 
Marc Dionne, another affected Fedora user, thinks that module "realtek" 
needs to be loaded, which does not happen automatically. Please see 
https://bugzilla.redhat.com/show_bug.cgi?id=1650984 for more details.

As Fedora kernel maintainers usually don't apply many changes to the 
upstream kernel, users of other distros will likely be affected as well. 
We therefore kindly ask you to investigate the issue and fix it in 
future upstream kernels.

Many thanks and best regards,

Norbert Jurkeit


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-21 10:51 Issue with RTL8111 NIC after upgrade to kernel 4.19 Norbert Jurkeit
@ 2018-11-21 19:57 ` Heiner Kallweit
  2018-11-21 20:20   ` Andrew Lunn
  0 siblings, 1 reply; 14+ messages in thread
From: Heiner Kallweit @ 2018-11-21 19:57 UTC (permalink / raw)
  To: Norbert Jurkeit, nic_swsd, Andrew Lunn, Florian Fainelli
  Cc: davem, netdev, linux-kernel, michael.wiktowy, jcline, marc.c.dionne

On 21.11.2018 11:51, Norbert Jurkeit wrote:
> Dear Linux developers,
> 
> after upgrade from distro kernel 4.18.18 to 4.19.2 some Fedora users (including me) encounter issues with RTL8111 based Ethernet interfaces. The interface is not started correctly after boot with kernel 4.19.2 (i.e. is reported by ifconfig as up, but not running, the LEDs besides the connector stay off and no data is transferred).
> 
> At least for me the issue seems to occur only after reboot from a previous session with kernel 4.18.18, but not after power-on, which means the hardware is not fully initialized by the new r8169 driver. Marc Dionne, another affected Fedora user, thinks that module "realtek" needs to be loaded, which does not happen automatically. Please see https://bugzilla.redhat.com/show_bug.cgi?id=1650984 for more details.
> 

Thanks for reporting the issue. I can't reproduce the error on my
system and obviously it affects only some users, so it seems to be
chip-version- or system-dependent. Having to load the realtek PHY
driver module manually is weird as this should be done by
request_module() in phy_device_create().

request_module() is supposed to be synchronous, however after some
reading this may not be 100% guaranteed. Maybe the module init
function on some systems isn't finished yet when request_module()
returns. As a result the genphy driver may be used instead of
the PHY version-specific driver.

That the issue occurs after the switch to 4.19 isn't a big surprise
because with this version the r8169 driver switched to phylib.

First I'd like to get an overview of the affected systems. I'll
comment in the linked bug ticket to address the affected users.


> As Fedora kernel maintainers usually don't apply many changes to the upstream kernel, users of other distros will likely be affected as well. We therefore kindly ask you to investigate the issue and fix it in future upstream kernels.
> 
> Many thanks and best regards,
> 
> Norbert Jurkeit
> 
> 
Heiner

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-21 19:57 ` Heiner Kallweit
@ 2018-11-21 20:20   ` Andrew Lunn
  2018-11-21 20:32     ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Lunn @ 2018-11-21 20:20 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Norbert Jurkeit, nic_swsd, Florian Fainelli, davem, netdev,
	linux-kernel, michael.wiktowy, jcline, marc.c.dionne

> request_module() is supposed to be synchronous, however after some
> reading this may not be 100% guaranteed. Maybe the module init
> function on some systems isn't finished yet when request_module()
> returns. As a result the genphy driver may be used instead of
> the PHY version-specific driver.

Hi Heiner

That would be true for all PHYs i think. We would of noticed this
problem with other systems using other PHY drivers.

	Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-21 20:20   ` Andrew Lunn
@ 2018-11-21 20:32     ` Heiner Kallweit
  2018-11-21 20:49       ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Heiner Kallweit @ 2018-11-21 20:32 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Norbert Jurkeit, nic_swsd, Florian Fainelli, davem, netdev,
	linux-kernel, michael.wiktowy, jcline, marc.c.dionne

On 21.11.2018 21:20, Andrew Lunn wrote:
>> request_module() is supposed to be synchronous, however after some
>> reading this may not be 100% guaranteed. Maybe the module init
>> function on some systems isn't finished yet when request_module()
>> returns. As a result the genphy driver may be used instead of
>> the PHY version-specific driver.
> 
> Hi Heiner
> 
> That would be true for all PHYs i think. We would of noticed this
> problem with other systems using other PHY drivers.
> 
> 	Andrew
> 
It could be a timing issue affecting certain systems only. At least
for now I don't have a good explanation why loading the module via
request_module() and loading it upfront manually makes a difference.

One affected user just reported the PHY to be a RTL8211B. This is
what I expected, because this PHY crashes when writing to the MMD
registers (the MMD registers are used otherwise by this PHY).
See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy
stubs for MMD register access for rtl8211b").

Let's see whether the other affected systems use the same PHY
version.

Heiner


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-21 20:32     ` Heiner Kallweit
@ 2018-11-21 20:49       ` Heiner Kallweit
  2018-11-21 20:52         ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Heiner Kallweit @ 2018-11-21 20:49 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Norbert Jurkeit, nic_swsd, Florian Fainelli, davem, netdev,
	linux-kernel, michael.wiktowy, jcline, marc.c.dionne

On 21.11.2018 21:32, Heiner Kallweit wrote:
> On 21.11.2018 21:20, Andrew Lunn wrote:
>>> request_module() is supposed to be synchronous, however after some
>>> reading this may not be 100% guaranteed. Maybe the module init
>>> function on some systems isn't finished yet when request_module()
>>> returns. As a result the genphy driver may be used instead of
>>> the PHY version-specific driver.
>>
>> Hi Heiner
>>
>> That would be true for all PHYs i think. We would of noticed this
>> problem with other systems using other PHY drivers.
>>
>> 	Andrew
>>
> It could be a timing issue affecting certain systems only. At least
> for now I don't have a good explanation why loading the module via
> request_module() and loading it upfront manually makes a difference.
> 
> One affected user just reported the PHY to be a RTL8211B. This is
> what I expected, because this PHY crashes when writing to the MMD
> registers (the MMD registers are used otherwise by this PHY).
> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy
> stubs for MMD register access for rtl8211b").
> 
> Let's see whether the other affected systems use the same PHY
> version.
> 
Next report is also about a RTL8211B and as I assumed:
- W/o manually loading the realtek module the genphy driver is used
  and network fails.
- W/ manually loading the realtek module the proper RTL8211B PHY
  driver is used and network works.

So it seems that even after request_module() the PHY driver isn't
yet available when device and driver are matched.

If further reports support this (pre-)analysis, then indeed it
seems to be a timing issue and a proper fix most likely is
difficult. As a workaround I could imagine to add a delay loop
after request_module() checking for a Realtek PHY driver via
driver_find(). When adding one small delay after this we should
be sufficiently sure that all Realtek PHY drivers are registered.

> Heiner
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-21 20:49       ` Heiner Kallweit
@ 2018-11-21 20:52         ` Heiner Kallweit
  2018-11-21 21:53           ` Marc Dionne
  0 siblings, 1 reply; 14+ messages in thread
From: Heiner Kallweit @ 2018-11-21 20:52 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Norbert Jurkeit, nic_swsd, Florian Fainelli, davem, netdev,
	linux-kernel, michael.wiktowy, jcline, marc.c.dionne

On 21.11.2018 21:49, Heiner Kallweit wrote:
> On 21.11.2018 21:32, Heiner Kallweit wrote:
>> On 21.11.2018 21:20, Andrew Lunn wrote:
>>>> request_module() is supposed to be synchronous, however after some
>>>> reading this may not be 100% guaranteed. Maybe the module init
>>>> function on some systems isn't finished yet when request_module()
>>>> returns. As a result the genphy driver may be used instead of
>>>> the PHY version-specific driver.
>>>
>>> Hi Heiner
>>>
>>> That would be true for all PHYs i think. We would of noticed this
>>> problem with other systems using other PHY drivers.
>>>
>>> 	Andrew
>>>
>> It could be a timing issue affecting certain systems only. At least
>> for now I don't have a good explanation why loading the module via
>> request_module() and loading it upfront manually makes a difference.
>>
>> One affected user just reported the PHY to be a RTL8211B. This is
>> what I expected, because this PHY crashes when writing to the MMD
>> registers (the MMD registers are used otherwise by this PHY).
>> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy
>> stubs for MMD register access for rtl8211b").
>>
>> Let's see whether the other affected systems use the same PHY
>> version.
>>
> Next report is also about a RTL8211B and as I assumed:
> - W/o manually loading the realtek module the genphy driver is used
>   and network fails.
> - W/ manually loading the realtek module the proper RTL8211B PHY
>   driver is used and network works.
> 
> So it seems that even after request_module() the PHY driver isn't
> yet available when device and driver are matched.
> 
> If further reports support this (pre-)analysis, then indeed it
> seems to be a timing issue and a proper fix most likely is
> difficult. As a workaround I could imagine to add a delay loop
> after request_module() checking for a Realtek PHY driver via
> driver_find(). When adding one small delay after this we should
> be sufficiently sure that all Realtek PHY drivers are registered.
> 
Uups, no. We talk about phylib here, not about the r8169 driver.
So we need a different solution.

>> Heiner
>>
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-21 20:52         ` Heiner Kallweit
@ 2018-11-21 21:53           ` Marc Dionne
  2018-11-21 22:28             ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Marc Dionne @ 2018-11-21 21:53 UTC (permalink / raw)
  To: hkallweit1
  Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli,
	David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy,
	jcline

On Wed, Nov 21, 2018 at 4:52 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 21.11.2018 21:49, Heiner Kallweit wrote:
> > On 21.11.2018 21:32, Heiner Kallweit wrote:
> >> On 21.11.2018 21:20, Andrew Lunn wrote:
> >>>> request_module() is supposed to be synchronous, however after some
> >>>> reading this may not be 100% guaranteed. Maybe the module init
> >>>> function on some systems isn't finished yet when request_module()
> >>>> returns. As a result the genphy driver may be used instead of
> >>>> the PHY version-specific driver.
> >>>
> >>> Hi Heiner
> >>>
> >>> That would be true for all PHYs i think. We would of noticed this
> >>> problem with other systems using other PHY drivers.
> >>>
> >>>     Andrew
> >>>
> >> It could be a timing issue affecting certain systems only. At least
> >> for now I don't have a good explanation why loading the module via
> >> request_module() and loading it upfront manually makes a difference.
> >>
> >> One affected user just reported the PHY to be a RTL8211B. This is
> >> what I expected, because this PHY crashes when writing to the MMD
> >> registers (the MMD registers are used otherwise by this PHY).
> >> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy
> >> stubs for MMD register access for rtl8211b").
> >>
> >> Let's see whether the other affected systems use the same PHY
> >> version.
> >>
> > Next report is also about a RTL8211B and as I assumed:
> > - W/o manually loading the realtek module the genphy driver is used
> >   and network fails.
> > - W/ manually loading the realtek module the proper RTL8211B PHY
> >   driver is used and network works.
> >
> > So it seems that even after request_module() the PHY driver isn't
> > yet available when device and driver are matched.
> >
> > If further reports support this (pre-)analysis, then indeed it
> > seems to be a timing issue and a proper fix most likely is
> > difficult. As a workaround I could imagine to add a delay loop
> > after request_module() checking for a Realtek PHY driver via
> > driver_find(). When adding one small delay after this we should
> > be sufficiently sure that all Realtek PHY drivers are registered.
> >
> Uups, no. We talk about phylib here, not about the r8169 driver.
> So we need a different solution.
>
> >> Heiner

Thanks for the explanation, better than my crude attempt at
understanding what was going on.

If you have any proposed fixes or diagnostic patches based on current
mainline I can quickly compile and test them here on an affected
system.  It doesn't fail consistently for me (as others have
reported), but that could be because it depends on the timing.

Marc

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-21 21:53           ` Marc Dionne
@ 2018-11-21 22:28             ` Heiner Kallweit
  2018-11-21 23:13               ` Marc Dionne
  0 siblings, 1 reply; 14+ messages in thread
From: Heiner Kallweit @ 2018-11-21 22:28 UTC (permalink / raw)
  To: Marc Dionne
  Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli,
	David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy,
	jcline

On 21.11.2018 22:53, Marc Dionne wrote:
> On Wed, Nov 21, 2018 at 4:52 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 21.11.2018 21:49, Heiner Kallweit wrote:
>>> On 21.11.2018 21:32, Heiner Kallweit wrote:
>>>> On 21.11.2018 21:20, Andrew Lunn wrote:
>>>>>> request_module() is supposed to be synchronous, however after some
>>>>>> reading this may not be 100% guaranteed. Maybe the module init
>>>>>> function on some systems isn't finished yet when request_module()
>>>>>> returns. As a result the genphy driver may be used instead of
>>>>>> the PHY version-specific driver.
>>>>>
>>>>> Hi Heiner
>>>>>
>>>>> That would be true for all PHYs i think. We would of noticed this
>>>>> problem with other systems using other PHY drivers.
>>>>>
>>>>>     Andrew
>>>>>
>>>> It could be a timing issue affecting certain systems only. At least
>>>> for now I don't have a good explanation why loading the module via
>>>> request_module() and loading it upfront manually makes a difference.
>>>>
>>>> One affected user just reported the PHY to be a RTL8211B. This is
>>>> what I expected, because this PHY crashes when writing to the MMD
>>>> registers (the MMD registers are used otherwise by this PHY).
>>>> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy
>>>> stubs for MMD register access for rtl8211b").
>>>>
>>>> Let's see whether the other affected systems use the same PHY
>>>> version.
>>>>
>>> Next report is also about a RTL8211B and as I assumed:
>>> - W/o manually loading the realtek module the genphy driver is used
>>>   and network fails.
>>> - W/ manually loading the realtek module the proper RTL8211B PHY
>>>   driver is used and network works.
>>>
>>> So it seems that even after request_module() the PHY driver isn't
>>> yet available when device and driver are matched.
>>>
>>> If further reports support this (pre-)analysis, then indeed it
>>> seems to be a timing issue and a proper fix most likely is
>>> difficult. As a workaround I could imagine to add a delay loop
>>> after request_module() checking for a Realtek PHY driver via
>>> driver_find(). When adding one small delay after this we should
>>> be sufficiently sure that all Realtek PHY drivers are registered.
>>>
>> Uups, no. We talk about phylib here, not about the r8169 driver.
>> So we need a different solution.
>>
>>>> Heiner
> 
> Thanks for the explanation, better than my crude attempt at
> understanding what was going on.
> 
> If you have any proposed fixes or diagnostic patches based on current
> mainline I can quickly compile and test them here on an affected
> system.  It doesn't fail consistently for me (as others have
> reported), but that could be because it depends on the timing.
> 

Thanks for the offer. Can you try the following diagnostic patch
and check whether it helps?

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 55202a0ac..84f417f8b 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -607,6 +607,8 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
         */
        request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id));

+       msleep(1000);
+
        device_initialize(&mdiodev->dev);

        return dev;


> Marc
> 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-21 22:28             ` Heiner Kallweit
@ 2018-11-21 23:13               ` Marc Dionne
  2018-11-22 18:17                 ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Marc Dionne @ 2018-11-21 23:13 UTC (permalink / raw)
  To: hkallweit1
  Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli,
	David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy,
	jcline

On Wed, Nov 21, 2018 at 6:28 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 21.11.2018 22:53, Marc Dionne wrote:
> > On Wed, Nov 21, 2018 at 4:52 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> On 21.11.2018 21:49, Heiner Kallweit wrote:
> >>> On 21.11.2018 21:32, Heiner Kallweit wrote:
> >>>> On 21.11.2018 21:20, Andrew Lunn wrote:
> >>>>>> request_module() is supposed to be synchronous, however after some
> >>>>>> reading this may not be 100% guaranteed. Maybe the module init
> >>>>>> function on some systems isn't finished yet when request_module()
> >>>>>> returns. As a result the genphy driver may be used instead of
> >>>>>> the PHY version-specific driver.
> >>>>>
> >>>>> Hi Heiner
> >>>>>
> >>>>> That would be true for all PHYs i think. We would of noticed this
> >>>>> problem with other systems using other PHY drivers.
> >>>>>
> >>>>>     Andrew
> >>>>>
> >>>> It could be a timing issue affecting certain systems only. At least
> >>>> for now I don't have a good explanation why loading the module via
> >>>> request_module() and loading it upfront manually makes a difference.
> >>>>
> >>>> One affected user just reported the PHY to be a RTL8211B. This is
> >>>> what I expected, because this PHY crashes when writing to the MMD
> >>>> registers (the MMD registers are used otherwise by this PHY).
> >>>> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy
> >>>> stubs for MMD register access for rtl8211b").
> >>>>
> >>>> Let's see whether the other affected systems use the same PHY
> >>>> version.
> >>>>
> >>> Next report is also about a RTL8211B and as I assumed:
> >>> - W/o manually loading the realtek module the genphy driver is used
> >>>   and network fails.
> >>> - W/ manually loading the realtek module the proper RTL8211B PHY
> >>>   driver is used and network works.
> >>>
> >>> So it seems that even after request_module() the PHY driver isn't
> >>> yet available when device and driver are matched.
> >>>
> >>> If further reports support this (pre-)analysis, then indeed it
> >>> seems to be a timing issue and a proper fix most likely is
> >>> difficult. As a workaround I could imagine to add a delay loop
> >>> after request_module() checking for a Realtek PHY driver via
> >>> driver_find(). When adding one small delay after this we should
> >>> be sufficiently sure that all Realtek PHY drivers are registered.
> >>>
> >> Uups, no. We talk about phylib here, not about the r8169 driver.
> >> So we need a different solution.
> >>
> >>>> Heiner
> >
> > Thanks for the explanation, better than my crude attempt at
> > understanding what was going on.
> >
> > If you have any proposed fixes or diagnostic patches based on current
> > mainline I can quickly compile and test them here on an affected
> > system.  It doesn't fail consistently for me (as others have
> > reported), but that could be because it depends on the timing.
> >
>
> Thanks for the offer. Can you try the following diagnostic patch
> and check whether it helps?
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 55202a0ac..84f417f8b 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -607,6 +607,8 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
>          */
>         request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id));
>
> +       msleep(1000);
> +
>         device_initialize(&mdiodev->dev);
>
>         return dev;

I was not able to get it to fail with that extra delay; hard to be
100% sure but the network starts even when switching over from the
distro kernel after a boot where network failed to start.

There's a side issue that network startup is taking a full minute
longer than it should, but that's possibly unrelated.

Marc

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-21 23:13               ` Marc Dionne
@ 2018-11-22 18:17                 ` Heiner Kallweit
  2018-11-22 18:57                   ` Andrew Lunn
  2018-11-22 19:29                   ` Marc Dionne
  0 siblings, 2 replies; 14+ messages in thread
From: Heiner Kallweit @ 2018-11-22 18:17 UTC (permalink / raw)
  To: Marc Dionne
  Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli,
	David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy,
	jcline

On 22.11.2018 00:13, Marc Dionne wrote:
> On Wed, Nov 21, 2018 at 6:28 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 21.11.2018 22:53, Marc Dionne wrote:
>>> On Wed, Nov 21, 2018 at 4:52 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> On 21.11.2018 21:49, Heiner Kallweit wrote:
>>>>> On 21.11.2018 21:32, Heiner Kallweit wrote:
>>>>>> On 21.11.2018 21:20, Andrew Lunn wrote:
>>>>>>>> request_module() is supposed to be synchronous, however after some
>>>>>>>> reading this may not be 100% guaranteed. Maybe the module init
>>>>>>>> function on some systems isn't finished yet when request_module()
>>>>>>>> returns. As a result the genphy driver may be used instead of
>>>>>>>> the PHY version-specific driver.
>>>>>>>
>>>>>>> Hi Heiner
>>>>>>>
>>>>>>> That would be true for all PHYs i think. We would of noticed this
>>>>>>> problem with other systems using other PHY drivers.
>>>>>>>
>>>>>>>     Andrew
>>>>>>>
>>>>>> It could be a timing issue affecting certain systems only. At least
>>>>>> for now I don't have a good explanation why loading the module via
>>>>>> request_module() and loading it upfront manually makes a difference.
>>>>>>
>>>>>> One affected user just reported the PHY to be a RTL8211B. This is
>>>>>> what I expected, because this PHY crashes when writing to the MMD
>>>>>> registers (the MMD registers are used otherwise by this PHY).
>>>>>> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy
>>>>>> stubs for MMD register access for rtl8211b").
>>>>>>
>>>>>> Let's see whether the other affected systems use the same PHY
>>>>>> version.
>>>>>>
>>>>> Next report is also about a RTL8211B and as I assumed:
>>>>> - W/o manually loading the realtek module the genphy driver is used
>>>>>   and network fails.
>>>>> - W/ manually loading the realtek module the proper RTL8211B PHY
>>>>>   driver is used and network works.
>>>>>
>>>>> So it seems that even after request_module() the PHY driver isn't
>>>>> yet available when device and driver are matched.
>>>>>
>>>>> If further reports support this (pre-)analysis, then indeed it
>>>>> seems to be a timing issue and a proper fix most likely is
>>>>> difficult. As a workaround I could imagine to add a delay loop
>>>>> after request_module() checking for a Realtek PHY driver via
>>>>> driver_find(). When adding one small delay after this we should
>>>>> be sufficiently sure that all Realtek PHY drivers are registered.
>>>>>
>>>> Uups, no. We talk about phylib here, not about the r8169 driver.
>>>> So we need a different solution.
>>>>
>>>>>> Heiner
>>>
>>> Thanks for the explanation, better than my crude attempt at
>>> understanding what was going on.
>>>
>>> If you have any proposed fixes or diagnostic patches based on current
>>> mainline I can quickly compile and test them here on an affected
>>> system.  It doesn't fail consistently for me (as others have
>>> reported), but that could be because it depends on the timing.
>>>
>>
>> Thanks for the offer. Can you try the following diagnostic patch
>> and check whether it helps?
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 55202a0ac..84f417f8b 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -607,6 +607,8 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
>>          */
>>         request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id));
>>
>> +       msleep(1000);
>> +
>>         device_initialize(&mdiodev->dev);
>>
>>         return dev;
> 
> I was not able to get it to fail with that extra delay; hard to be
> 100% sure but the network starts even when switching over from the
> distro kernel after a boot where network failed to start.
> 
Thanks a lot for testing. Could you please test also the following
as an alternative to the delay?

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 55202a0ac..aeccb2323 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
        new_driver->mdiodrv.driver.probe = phy_probe;
        new_driver->mdiodrv.driver.remove = phy_remove;
        new_driver->mdiodrv.driver.owner = owner;
+       new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;

        retval = driver_register(&new_driver->mdiodrv.driver);
        if (retval) {

> There's a side issue that network startup is taking a full minute
> longer than it should, but that's possibly unrelated.
> 
> Marc
> 
Thanks, Heiner

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-22 18:17                 ` Heiner Kallweit
@ 2018-11-22 18:57                   ` Andrew Lunn
  2018-11-22 19:48                     ` Heiner Kallweit
  2018-11-22 19:29                   ` Marc Dionne
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Lunn @ 2018-11-22 18:57 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Marc Dionne, norbert.jurkeit, nic_swsd, Florian Fainelli,
	David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy,
	jcline

> Thanks a lot for testing. Could you please test also the following
> as an alternative to the delay?
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 55202a0ac..aeccb2323 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
>         new_driver->mdiodrv.driver.probe = phy_probe;
>         new_driver->mdiodrv.driver.remove = phy_remove;
>         new_driver->mdiodrv.driver.owner = owner;
> +       new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
> 
>         retval = driver_register(&new_driver->mdiodrv.driver);
>         if (retval) {


Humm, maybe i don't understand the issue correctly.

When the MDIO bus is registered, we scan the bus looking for PHYs.
When we find a PHY, we call phy_device_create(). That will then
trigger the loading of the kernel module which should driver this phy.

Sometime later, the PHY driver module gets loaded and calls
phy_drivers_register() to register the list of IDs it supports.  The
driver core will then call phy_bus_match() to see if the newly loaded
driver matches to a device we have created. If so, the driver will be
associated to the device.

Sometime later, the MAC tries to attach to the phy using
phy_attach_direct(). If there is no driver associated to the device,
we use the generic PHY driver.

I thought the issue was the race condition between loading the module
and the MAC attaching to it? We are getting the generic driver because
the specific driver is still loading?

If that really is the issue, i think phy_attach_direct() should try
loading the module again, doing it synchronously. Only when it fails
should the generic driver be associated to the device.

       Andrew


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-22 18:17                 ` Heiner Kallweit
  2018-11-22 18:57                   ` Andrew Lunn
@ 2018-11-22 19:29                   ` Marc Dionne
  2018-11-22 19:53                     ` Heiner Kallweit
  1 sibling, 1 reply; 14+ messages in thread
From: Marc Dionne @ 2018-11-22 19:29 UTC (permalink / raw)
  To: hkallweit1
  Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli,
	David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy,
	jcline

On Thu, Nov 22, 2018 at 2:17 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> Thanks a lot for testing. Could you please test also the following
> as an alternative to the delay?
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 55202a0ac..aeccb2323 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
>         new_driver->mdiodrv.driver.probe = phy_probe;
>         new_driver->mdiodrv.driver.remove = phy_remove;
>         new_driver->mdiodrv.driver.owner = owner;
> +       new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
>
>         retval = driver_register(&new_driver->mdiodrv.driver);
>         if (retval) {

That also gets me network reliably, switching between a kernel where
it fails (distro 4.19 kernel) and the custom kernel with the patch.

> > There's a side issue that network startup is taking a full minute
> > longer than it should, but that's possibly unrelated.

BTW that's an unrelated rng issue, for which I have a workaround.

> >
> Thanks, Heiner

Marc

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-22 18:57                   ` Andrew Lunn
@ 2018-11-22 19:48                     ` Heiner Kallweit
  0 siblings, 0 replies; 14+ messages in thread
From: Heiner Kallweit @ 2018-11-22 19:48 UTC (permalink / raw)
  To: Andrew Lunn, Greg Kroah-Hartman, Rafael J. Wysocki
  Cc: Marc Dionne, norbert.jurkeit, nic_swsd, Florian Fainelli,
	David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy,
	jcline

On 22.11.2018 19:57, Andrew Lunn wrote:
>> Thanks a lot for testing. Could you please test also the following
>> as an alternative to the delay?
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 55202a0ac..aeccb2323 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
>>         new_driver->mdiodrv.driver.probe = phy_probe;
>>         new_driver->mdiodrv.driver.remove = phy_remove;
>>         new_driver->mdiodrv.driver.owner = owner;
>> +       new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
>>
>>         retval = driver_register(&new_driver->mdiodrv.driver);
>>         if (retval) {
> 
> 
> Humm, maybe i don't understand the issue correctly.
> 
> When the MDIO bus is registered, we scan the bus looking for PHYs.
> When we find a PHY, we call phy_device_create(). That will then
> trigger the loading of the kernel module which should driver this phy.
> 
> Sometime later, the PHY driver module gets loaded and calls
> phy_drivers_register() to register the list of IDs it supports.  The
> driver core will then call phy_bus_match() to see if the newly loaded
> driver matches to a device we have created. If so, the driver will be
> associated to the device.
> 
request_module() is synchronous and should return only once the init
function of the loaded module has been executed. Means in case of
phylib: all PHY drivers of the module have been registered

> Sometime later, the MAC tries to attach to the phy using
> phy_attach_direct(). If there is no driver associated to the device,
> we use the generic PHY driver.
> 
> I thought the issue was the race condition between loading the module
> and the MAC attaching to it? We are getting the generic driver because
> the specific driver is still loading?
> 
No. The issue happens before: For whatever reason the PHY driver doesn't
bind to the PHY device, even though they match. Therefore
phy_attach_direct() then binds the genphy driver.

In theory it shouldn't make a difference whether device or driver is
registered first. When a driver is registered it checks all unbound
devices on the same bus for a match.

Standard is asynchronous driver probing, therefore in case of phylib
(in at least a lot of cases) the driver probes the devices on the bus
once the PHY device was registered. This seems to fail and I have no
idea why.

It has been reported that loading the PHY driver module upfront fixes
the issue. Therefore I assumed it may help to let the PHY driver probe
for devices earlier (even though the device isn't even registered yet).
And indeed (see mail just sent by Marc) using synchronous probing
fixes the issue.
It seems to me that probing triggered from device registering and
probing triggered from driver registering somehow interfere.

Of course it's not very satisfying to have a fix but to not understand
the root cause of the issue. I add Greg and Rafael as maintainers of
the driver core to the discussion. Maybe they can shed some light on
the situation.

> If that really is the issue, i think phy_attach_direct() should try
> loading the module again, doing it synchronously. Only when it fails
> should the generic driver be associated to the device.
> 
>        Andrew
> 
> 
Heiner


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19
  2018-11-22 19:29                   ` Marc Dionne
@ 2018-11-22 19:53                     ` Heiner Kallweit
  0 siblings, 0 replies; 14+ messages in thread
From: Heiner Kallweit @ 2018-11-22 19:53 UTC (permalink / raw)
  To: Marc Dionne
  Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli,
	David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy,
	jcline

On 22.11.2018 20:29, Marc Dionne wrote:
> On Thu, Nov 22, 2018 at 2:17 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> Thanks a lot for testing. Could you please test also the following
>> as an alternative to the delay?
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 55202a0ac..aeccb2323 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
>>         new_driver->mdiodrv.driver.probe = phy_probe;
>>         new_driver->mdiodrv.driver.remove = phy_remove;
>>         new_driver->mdiodrv.driver.owner = owner;
>> +       new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
>>
>>         retval = driver_register(&new_driver->mdiodrv.driver);
>>         if (retval) {
> 
> That also gets me network reliably, switching between a kernel where
> it fails (distro 4.19 kernel) and the custom kernel with the patch.
> 
Thanks again for the very quick response. The result is good and bad
news at once: good because it seems we have a fix, bad because we
don't understand the root cause of the issue yet.

>>> There's a side issue that network startup is taking a full minute
>>> longer than it should, but that's possibly unrelated.
> 
> BTW that's an unrelated rng issue, for which I have a workaround.
> 
Indeed totally unrelated, but sounds familiar. I had the same issue
on a headless system which therefore generates very little entropy.
I fixed it by setting CONFIG_RANDOM_TRUST_CPU=y

>>>
>> Thanks, Heiner
> 
> Marc
> 
Heiner

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-11-22 20:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-21 10:51 Issue with RTL8111 NIC after upgrade to kernel 4.19 Norbert Jurkeit
2018-11-21 19:57 ` Heiner Kallweit
2018-11-21 20:20   ` Andrew Lunn
2018-11-21 20:32     ` Heiner Kallweit
2018-11-21 20:49       ` Heiner Kallweit
2018-11-21 20:52         ` Heiner Kallweit
2018-11-21 21:53           ` Marc Dionne
2018-11-21 22:28             ` Heiner Kallweit
2018-11-21 23:13               ` Marc Dionne
2018-11-22 18:17                 ` Heiner Kallweit
2018-11-22 18:57                   ` Andrew Lunn
2018-11-22 19:48                     ` Heiner Kallweit
2018-11-22 19:29                   ` Marc Dionne
2018-11-22 19:53                     ` Heiner Kallweit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).