* Issue with RTL8111 NIC after upgrade to kernel 4.19 @ 2018-11-21 10:51 Norbert Jurkeit 2018-11-21 19:57 ` Heiner Kallweit 0 siblings, 1 reply; 14+ messages in thread From: Norbert Jurkeit @ 2018-11-21 10:51 UTC (permalink / raw) To: nic_swsd, davem, netdev, linux-kernel Cc: michael.wiktowy, jcline, marc.c.dionne Dear Linux developers, after upgrade from distro kernel 4.18.18 to 4.19.2 some Fedora users (including me) encounter issues with RTL8111 based Ethernet interfaces. The interface is not started correctly after boot with kernel 4.19.2 (i.e. is reported by ifconfig as up, but not running, the LEDs besides the connector stay off and no data is transferred). At least for me the issue seems to occur only after reboot from a previous session with kernel 4.18.18, but not after power-on, which means the hardware is not fully initialized by the new r8169 driver. Marc Dionne, another affected Fedora user, thinks that module "realtek" needs to be loaded, which does not happen automatically. Please see https://bugzilla.redhat.com/show_bug.cgi?id=1650984 for more details. As Fedora kernel maintainers usually don't apply many changes to the upstream kernel, users of other distros will likely be affected as well. We therefore kindly ask you to investigate the issue and fix it in future upstream kernels. Many thanks and best regards, Norbert Jurkeit ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-21 10:51 Issue with RTL8111 NIC after upgrade to kernel 4.19 Norbert Jurkeit @ 2018-11-21 19:57 ` Heiner Kallweit 2018-11-21 20:20 ` Andrew Lunn 0 siblings, 1 reply; 14+ messages in thread From: Heiner Kallweit @ 2018-11-21 19:57 UTC (permalink / raw) To: Norbert Jurkeit, nic_swsd, Andrew Lunn, Florian Fainelli Cc: davem, netdev, linux-kernel, michael.wiktowy, jcline, marc.c.dionne On 21.11.2018 11:51, Norbert Jurkeit wrote: > Dear Linux developers, > > after upgrade from distro kernel 4.18.18 to 4.19.2 some Fedora users (including me) encounter issues with RTL8111 based Ethernet interfaces. The interface is not started correctly after boot with kernel 4.19.2 (i.e. is reported by ifconfig as up, but not running, the LEDs besides the connector stay off and no data is transferred). > > At least for me the issue seems to occur only after reboot from a previous session with kernel 4.18.18, but not after power-on, which means the hardware is not fully initialized by the new r8169 driver. Marc Dionne, another affected Fedora user, thinks that module "realtek" needs to be loaded, which does not happen automatically. Please see https://bugzilla.redhat.com/show_bug.cgi?id=1650984 for more details. > Thanks for reporting the issue. I can't reproduce the error on my system and obviously it affects only some users, so it seems to be chip-version- or system-dependent. Having to load the realtek PHY driver module manually is weird as this should be done by request_module() in phy_device_create(). request_module() is supposed to be synchronous, however after some reading this may not be 100% guaranteed. Maybe the module init function on some systems isn't finished yet when request_module() returns. As a result the genphy driver may be used instead of the PHY version-specific driver. That the issue occurs after the switch to 4.19 isn't a big surprise because with this version the r8169 driver switched to phylib. First I'd like to get an overview of the affected systems. I'll comment in the linked bug ticket to address the affected users. > As Fedora kernel maintainers usually don't apply many changes to the upstream kernel, users of other distros will likely be affected as well. We therefore kindly ask you to investigate the issue and fix it in future upstream kernels. > > Many thanks and best regards, > > Norbert Jurkeit > > Heiner ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-21 19:57 ` Heiner Kallweit @ 2018-11-21 20:20 ` Andrew Lunn 2018-11-21 20:32 ` Heiner Kallweit 0 siblings, 1 reply; 14+ messages in thread From: Andrew Lunn @ 2018-11-21 20:20 UTC (permalink / raw) To: Heiner Kallweit Cc: Norbert Jurkeit, nic_swsd, Florian Fainelli, davem, netdev, linux-kernel, michael.wiktowy, jcline, marc.c.dionne > request_module() is supposed to be synchronous, however after some > reading this may not be 100% guaranteed. Maybe the module init > function on some systems isn't finished yet when request_module() > returns. As a result the genphy driver may be used instead of > the PHY version-specific driver. Hi Heiner That would be true for all PHYs i think. We would of noticed this problem with other systems using other PHY drivers. Andrew ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-21 20:20 ` Andrew Lunn @ 2018-11-21 20:32 ` Heiner Kallweit 2018-11-21 20:49 ` Heiner Kallweit 0 siblings, 1 reply; 14+ messages in thread From: Heiner Kallweit @ 2018-11-21 20:32 UTC (permalink / raw) To: Andrew Lunn Cc: Norbert Jurkeit, nic_swsd, Florian Fainelli, davem, netdev, linux-kernel, michael.wiktowy, jcline, marc.c.dionne On 21.11.2018 21:20, Andrew Lunn wrote: >> request_module() is supposed to be synchronous, however after some >> reading this may not be 100% guaranteed. Maybe the module init >> function on some systems isn't finished yet when request_module() >> returns. As a result the genphy driver may be used instead of >> the PHY version-specific driver. > > Hi Heiner > > That would be true for all PHYs i think. We would of noticed this > problem with other systems using other PHY drivers. > > Andrew > It could be a timing issue affecting certain systems only. At least for now I don't have a good explanation why loading the module via request_module() and loading it upfront manually makes a difference. One affected user just reported the PHY to be a RTL8211B. This is what I expected, because this PHY crashes when writing to the MMD registers (the MMD registers are used otherwise by this PHY). See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b"). Let's see whether the other affected systems use the same PHY version. Heiner ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-21 20:32 ` Heiner Kallweit @ 2018-11-21 20:49 ` Heiner Kallweit 2018-11-21 20:52 ` Heiner Kallweit 0 siblings, 1 reply; 14+ messages in thread From: Heiner Kallweit @ 2018-11-21 20:49 UTC (permalink / raw) To: Andrew Lunn Cc: Norbert Jurkeit, nic_swsd, Florian Fainelli, davem, netdev, linux-kernel, michael.wiktowy, jcline, marc.c.dionne On 21.11.2018 21:32, Heiner Kallweit wrote: > On 21.11.2018 21:20, Andrew Lunn wrote: >>> request_module() is supposed to be synchronous, however after some >>> reading this may not be 100% guaranteed. Maybe the module init >>> function on some systems isn't finished yet when request_module() >>> returns. As a result the genphy driver may be used instead of >>> the PHY version-specific driver. >> >> Hi Heiner >> >> That would be true for all PHYs i think. We would of noticed this >> problem with other systems using other PHY drivers. >> >> Andrew >> > It could be a timing issue affecting certain systems only. At least > for now I don't have a good explanation why loading the module via > request_module() and loading it upfront manually makes a difference. > > One affected user just reported the PHY to be a RTL8211B. This is > what I expected, because this PHY crashes when writing to the MMD > registers (the MMD registers are used otherwise by this PHY). > See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy > stubs for MMD register access for rtl8211b"). > > Let's see whether the other affected systems use the same PHY > version. > Next report is also about a RTL8211B and as I assumed: - W/o manually loading the realtek module the genphy driver is used and network fails. - W/ manually loading the realtek module the proper RTL8211B PHY driver is used and network works. So it seems that even after request_module() the PHY driver isn't yet available when device and driver are matched. If further reports support this (pre-)analysis, then indeed it seems to be a timing issue and a proper fix most likely is difficult. As a workaround I could imagine to add a delay loop after request_module() checking for a Realtek PHY driver via driver_find(). When adding one small delay after this we should be sufficiently sure that all Realtek PHY drivers are registered. > Heiner > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-21 20:49 ` Heiner Kallweit @ 2018-11-21 20:52 ` Heiner Kallweit 2018-11-21 21:53 ` Marc Dionne 0 siblings, 1 reply; 14+ messages in thread From: Heiner Kallweit @ 2018-11-21 20:52 UTC (permalink / raw) To: Andrew Lunn Cc: Norbert Jurkeit, nic_swsd, Florian Fainelli, davem, netdev, linux-kernel, michael.wiktowy, jcline, marc.c.dionne On 21.11.2018 21:49, Heiner Kallweit wrote: > On 21.11.2018 21:32, Heiner Kallweit wrote: >> On 21.11.2018 21:20, Andrew Lunn wrote: >>>> request_module() is supposed to be synchronous, however after some >>>> reading this may not be 100% guaranteed. Maybe the module init >>>> function on some systems isn't finished yet when request_module() >>>> returns. As a result the genphy driver may be used instead of >>>> the PHY version-specific driver. >>> >>> Hi Heiner >>> >>> That would be true for all PHYs i think. We would of noticed this >>> problem with other systems using other PHY drivers. >>> >>> Andrew >>> >> It could be a timing issue affecting certain systems only. At least >> for now I don't have a good explanation why loading the module via >> request_module() and loading it upfront manually makes a difference. >> >> One affected user just reported the PHY to be a RTL8211B. This is >> what I expected, because this PHY crashes when writing to the MMD >> registers (the MMD registers are used otherwise by this PHY). >> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy >> stubs for MMD register access for rtl8211b"). >> >> Let's see whether the other affected systems use the same PHY >> version. >> > Next report is also about a RTL8211B and as I assumed: > - W/o manually loading the realtek module the genphy driver is used > and network fails. > - W/ manually loading the realtek module the proper RTL8211B PHY > driver is used and network works. > > So it seems that even after request_module() the PHY driver isn't > yet available when device and driver are matched. > > If further reports support this (pre-)analysis, then indeed it > seems to be a timing issue and a proper fix most likely is > difficult. As a workaround I could imagine to add a delay loop > after request_module() checking for a Realtek PHY driver via > driver_find(). When adding one small delay after this we should > be sufficiently sure that all Realtek PHY drivers are registered. > Uups, no. We talk about phylib here, not about the r8169 driver. So we need a different solution. >> Heiner >> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-21 20:52 ` Heiner Kallweit @ 2018-11-21 21:53 ` Marc Dionne 2018-11-21 22:28 ` Heiner Kallweit 0 siblings, 1 reply; 14+ messages in thread From: Marc Dionne @ 2018-11-21 21:53 UTC (permalink / raw) To: hkallweit1 Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli, David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy, jcline On Wed, Nov 21, 2018 at 4:52 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > On 21.11.2018 21:49, Heiner Kallweit wrote: > > On 21.11.2018 21:32, Heiner Kallweit wrote: > >> On 21.11.2018 21:20, Andrew Lunn wrote: > >>>> request_module() is supposed to be synchronous, however after some > >>>> reading this may not be 100% guaranteed. Maybe the module init > >>>> function on some systems isn't finished yet when request_module() > >>>> returns. As a result the genphy driver may be used instead of > >>>> the PHY version-specific driver. > >>> > >>> Hi Heiner > >>> > >>> That would be true for all PHYs i think. We would of noticed this > >>> problem with other systems using other PHY drivers. > >>> > >>> Andrew > >>> > >> It could be a timing issue affecting certain systems only. At least > >> for now I don't have a good explanation why loading the module via > >> request_module() and loading it upfront manually makes a difference. > >> > >> One affected user just reported the PHY to be a RTL8211B. This is > >> what I expected, because this PHY crashes when writing to the MMD > >> registers (the MMD registers are used otherwise by this PHY). > >> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy > >> stubs for MMD register access for rtl8211b"). > >> > >> Let's see whether the other affected systems use the same PHY > >> version. > >> > > Next report is also about a RTL8211B and as I assumed: > > - W/o manually loading the realtek module the genphy driver is used > > and network fails. > > - W/ manually loading the realtek module the proper RTL8211B PHY > > driver is used and network works. > > > > So it seems that even after request_module() the PHY driver isn't > > yet available when device and driver are matched. > > > > If further reports support this (pre-)analysis, then indeed it > > seems to be a timing issue and a proper fix most likely is > > difficult. As a workaround I could imagine to add a delay loop > > after request_module() checking for a Realtek PHY driver via > > driver_find(). When adding one small delay after this we should > > be sufficiently sure that all Realtek PHY drivers are registered. > > > Uups, no. We talk about phylib here, not about the r8169 driver. > So we need a different solution. > > >> Heiner Thanks for the explanation, better than my crude attempt at understanding what was going on. If you have any proposed fixes or diagnostic patches based on current mainline I can quickly compile and test them here on an affected system. It doesn't fail consistently for me (as others have reported), but that could be because it depends on the timing. Marc ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-21 21:53 ` Marc Dionne @ 2018-11-21 22:28 ` Heiner Kallweit 2018-11-21 23:13 ` Marc Dionne 0 siblings, 1 reply; 14+ messages in thread From: Heiner Kallweit @ 2018-11-21 22:28 UTC (permalink / raw) To: Marc Dionne Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli, David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy, jcline On 21.11.2018 22:53, Marc Dionne wrote: > On Wed, Nov 21, 2018 at 4:52 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >> >> On 21.11.2018 21:49, Heiner Kallweit wrote: >>> On 21.11.2018 21:32, Heiner Kallweit wrote: >>>> On 21.11.2018 21:20, Andrew Lunn wrote: >>>>>> request_module() is supposed to be synchronous, however after some >>>>>> reading this may not be 100% guaranteed. Maybe the module init >>>>>> function on some systems isn't finished yet when request_module() >>>>>> returns. As a result the genphy driver may be used instead of >>>>>> the PHY version-specific driver. >>>>> >>>>> Hi Heiner >>>>> >>>>> That would be true for all PHYs i think. We would of noticed this >>>>> problem with other systems using other PHY drivers. >>>>> >>>>> Andrew >>>>> >>>> It could be a timing issue affecting certain systems only. At least >>>> for now I don't have a good explanation why loading the module via >>>> request_module() and loading it upfront manually makes a difference. >>>> >>>> One affected user just reported the PHY to be a RTL8211B. This is >>>> what I expected, because this PHY crashes when writing to the MMD >>>> registers (the MMD registers are used otherwise by this PHY). >>>> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy >>>> stubs for MMD register access for rtl8211b"). >>>> >>>> Let's see whether the other affected systems use the same PHY >>>> version. >>>> >>> Next report is also about a RTL8211B and as I assumed: >>> - W/o manually loading the realtek module the genphy driver is used >>> and network fails. >>> - W/ manually loading the realtek module the proper RTL8211B PHY >>> driver is used and network works. >>> >>> So it seems that even after request_module() the PHY driver isn't >>> yet available when device and driver are matched. >>> >>> If further reports support this (pre-)analysis, then indeed it >>> seems to be a timing issue and a proper fix most likely is >>> difficult. As a workaround I could imagine to add a delay loop >>> after request_module() checking for a Realtek PHY driver via >>> driver_find(). When adding one small delay after this we should >>> be sufficiently sure that all Realtek PHY drivers are registered. >>> >> Uups, no. We talk about phylib here, not about the r8169 driver. >> So we need a different solution. >> >>>> Heiner > > Thanks for the explanation, better than my crude attempt at > understanding what was going on. > > If you have any proposed fixes or diagnostic patches based on current > mainline I can quickly compile and test them here on an affected > system. It doesn't fail consistently for me (as others have > reported), but that could be because it depends on the timing. > Thanks for the offer. Can you try the following diagnostic patch and check whether it helps? diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 55202a0ac..84f417f8b 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -607,6 +607,8 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id, */ request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id)); + msleep(1000); + device_initialize(&mdiodev->dev); return dev; > Marc > ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-21 22:28 ` Heiner Kallweit @ 2018-11-21 23:13 ` Marc Dionne 2018-11-22 18:17 ` Heiner Kallweit 0 siblings, 1 reply; 14+ messages in thread From: Marc Dionne @ 2018-11-21 23:13 UTC (permalink / raw) To: hkallweit1 Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli, David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy, jcline On Wed, Nov 21, 2018 at 6:28 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > On 21.11.2018 22:53, Marc Dionne wrote: > > On Wed, Nov 21, 2018 at 4:52 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > >> > >> On 21.11.2018 21:49, Heiner Kallweit wrote: > >>> On 21.11.2018 21:32, Heiner Kallweit wrote: > >>>> On 21.11.2018 21:20, Andrew Lunn wrote: > >>>>>> request_module() is supposed to be synchronous, however after some > >>>>>> reading this may not be 100% guaranteed. Maybe the module init > >>>>>> function on some systems isn't finished yet when request_module() > >>>>>> returns. As a result the genphy driver may be used instead of > >>>>>> the PHY version-specific driver. > >>>>> > >>>>> Hi Heiner > >>>>> > >>>>> That would be true for all PHYs i think. We would of noticed this > >>>>> problem with other systems using other PHY drivers. > >>>>> > >>>>> Andrew > >>>>> > >>>> It could be a timing issue affecting certain systems only. At least > >>>> for now I don't have a good explanation why loading the module via > >>>> request_module() and loading it upfront manually makes a difference. > >>>> > >>>> One affected user just reported the PHY to be a RTL8211B. This is > >>>> what I expected, because this PHY crashes when writing to the MMD > >>>> registers (the MMD registers are used otherwise by this PHY). > >>>> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy > >>>> stubs for MMD register access for rtl8211b"). > >>>> > >>>> Let's see whether the other affected systems use the same PHY > >>>> version. > >>>> > >>> Next report is also about a RTL8211B and as I assumed: > >>> - W/o manually loading the realtek module the genphy driver is used > >>> and network fails. > >>> - W/ manually loading the realtek module the proper RTL8211B PHY > >>> driver is used and network works. > >>> > >>> So it seems that even after request_module() the PHY driver isn't > >>> yet available when device and driver are matched. > >>> > >>> If further reports support this (pre-)analysis, then indeed it > >>> seems to be a timing issue and a proper fix most likely is > >>> difficult. As a workaround I could imagine to add a delay loop > >>> after request_module() checking for a Realtek PHY driver via > >>> driver_find(). When adding one small delay after this we should > >>> be sufficiently sure that all Realtek PHY drivers are registered. > >>> > >> Uups, no. We talk about phylib here, not about the r8169 driver. > >> So we need a different solution. > >> > >>>> Heiner > > > > Thanks for the explanation, better than my crude attempt at > > understanding what was going on. > > > > If you have any proposed fixes or diagnostic patches based on current > > mainline I can quickly compile and test them here on an affected > > system. It doesn't fail consistently for me (as others have > > reported), but that could be because it depends on the timing. > > > > Thanks for the offer. Can you try the following diagnostic patch > and check whether it helps? > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 55202a0ac..84f417f8b 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -607,6 +607,8 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id, > */ > request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id)); > > + msleep(1000); > + > device_initialize(&mdiodev->dev); > > return dev; I was not able to get it to fail with that extra delay; hard to be 100% sure but the network starts even when switching over from the distro kernel after a boot where network failed to start. There's a side issue that network startup is taking a full minute longer than it should, but that's possibly unrelated. Marc ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-21 23:13 ` Marc Dionne @ 2018-11-22 18:17 ` Heiner Kallweit 2018-11-22 18:57 ` Andrew Lunn 2018-11-22 19:29 ` Marc Dionne 0 siblings, 2 replies; 14+ messages in thread From: Heiner Kallweit @ 2018-11-22 18:17 UTC (permalink / raw) To: Marc Dionne Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli, David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy, jcline On 22.11.2018 00:13, Marc Dionne wrote: > On Wed, Nov 21, 2018 at 6:28 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >> >> On 21.11.2018 22:53, Marc Dionne wrote: >>> On Wed, Nov 21, 2018 at 4:52 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >>>> >>>> On 21.11.2018 21:49, Heiner Kallweit wrote: >>>>> On 21.11.2018 21:32, Heiner Kallweit wrote: >>>>>> On 21.11.2018 21:20, Andrew Lunn wrote: >>>>>>>> request_module() is supposed to be synchronous, however after some >>>>>>>> reading this may not be 100% guaranteed. Maybe the module init >>>>>>>> function on some systems isn't finished yet when request_module() >>>>>>>> returns. As a result the genphy driver may be used instead of >>>>>>>> the PHY version-specific driver. >>>>>>> >>>>>>> Hi Heiner >>>>>>> >>>>>>> That would be true for all PHYs i think. We would of noticed this >>>>>>> problem with other systems using other PHY drivers. >>>>>>> >>>>>>> Andrew >>>>>>> >>>>>> It could be a timing issue affecting certain systems only. At least >>>>>> for now I don't have a good explanation why loading the module via >>>>>> request_module() and loading it upfront manually makes a difference. >>>>>> >>>>>> One affected user just reported the PHY to be a RTL8211B. This is >>>>>> what I expected, because this PHY crashes when writing to the MMD >>>>>> registers (the MMD registers are used otherwise by this PHY). >>>>>> See also commit 0231b1a074c6 ("net: phy: realtek: Use the dummy >>>>>> stubs for MMD register access for rtl8211b"). >>>>>> >>>>>> Let's see whether the other affected systems use the same PHY >>>>>> version. >>>>>> >>>>> Next report is also about a RTL8211B and as I assumed: >>>>> - W/o manually loading the realtek module the genphy driver is used >>>>> and network fails. >>>>> - W/ manually loading the realtek module the proper RTL8211B PHY >>>>> driver is used and network works. >>>>> >>>>> So it seems that even after request_module() the PHY driver isn't >>>>> yet available when device and driver are matched. >>>>> >>>>> If further reports support this (pre-)analysis, then indeed it >>>>> seems to be a timing issue and a proper fix most likely is >>>>> difficult. As a workaround I could imagine to add a delay loop >>>>> after request_module() checking for a Realtek PHY driver via >>>>> driver_find(). When adding one small delay after this we should >>>>> be sufficiently sure that all Realtek PHY drivers are registered. >>>>> >>>> Uups, no. We talk about phylib here, not about the r8169 driver. >>>> So we need a different solution. >>>> >>>>>> Heiner >>> >>> Thanks for the explanation, better than my crude attempt at >>> understanding what was going on. >>> >>> If you have any proposed fixes or diagnostic patches based on current >>> mainline I can quickly compile and test them here on an affected >>> system. It doesn't fail consistently for me (as others have >>> reported), but that could be because it depends on the timing. >>> >> >> Thanks for the offer. Can you try the following diagnostic patch >> and check whether it helps? >> >> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >> index 55202a0ac..84f417f8b 100644 >> --- a/drivers/net/phy/phy_device.c >> +++ b/drivers/net/phy/phy_device.c >> @@ -607,6 +607,8 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id, >> */ >> request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id)); >> >> + msleep(1000); >> + >> device_initialize(&mdiodev->dev); >> >> return dev; > > I was not able to get it to fail with that extra delay; hard to be > 100% sure but the network starts even when switching over from the > distro kernel after a boot where network failed to start. > Thanks a lot for testing. Could you please test also the following as an alternative to the delay? diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 55202a0ac..aeccb2323 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner) new_driver->mdiodrv.driver.probe = phy_probe; new_driver->mdiodrv.driver.remove = phy_remove; new_driver->mdiodrv.driver.owner = owner; + new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS; retval = driver_register(&new_driver->mdiodrv.driver); if (retval) { > There's a side issue that network startup is taking a full minute > longer than it should, but that's possibly unrelated. > > Marc > Thanks, Heiner ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-22 18:17 ` Heiner Kallweit @ 2018-11-22 18:57 ` Andrew Lunn 2018-11-22 19:48 ` Heiner Kallweit 2018-11-22 19:29 ` Marc Dionne 1 sibling, 1 reply; 14+ messages in thread From: Andrew Lunn @ 2018-11-22 18:57 UTC (permalink / raw) To: Heiner Kallweit Cc: Marc Dionne, norbert.jurkeit, nic_swsd, Florian Fainelli, David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy, jcline > Thanks a lot for testing. Could you please test also the following > as an alternative to the delay? > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 55202a0ac..aeccb2323 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner) > new_driver->mdiodrv.driver.probe = phy_probe; > new_driver->mdiodrv.driver.remove = phy_remove; > new_driver->mdiodrv.driver.owner = owner; > + new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS; > > retval = driver_register(&new_driver->mdiodrv.driver); > if (retval) { Humm, maybe i don't understand the issue correctly. When the MDIO bus is registered, we scan the bus looking for PHYs. When we find a PHY, we call phy_device_create(). That will then trigger the loading of the kernel module which should driver this phy. Sometime later, the PHY driver module gets loaded and calls phy_drivers_register() to register the list of IDs it supports. The driver core will then call phy_bus_match() to see if the newly loaded driver matches to a device we have created. If so, the driver will be associated to the device. Sometime later, the MAC tries to attach to the phy using phy_attach_direct(). If there is no driver associated to the device, we use the generic PHY driver. I thought the issue was the race condition between loading the module and the MAC attaching to it? We are getting the generic driver because the specific driver is still loading? If that really is the issue, i think phy_attach_direct() should try loading the module again, doing it synchronously. Only when it fails should the generic driver be associated to the device. Andrew ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-22 18:57 ` Andrew Lunn @ 2018-11-22 19:48 ` Heiner Kallweit 0 siblings, 0 replies; 14+ messages in thread From: Heiner Kallweit @ 2018-11-22 19:48 UTC (permalink / raw) To: Andrew Lunn, Greg Kroah-Hartman, Rafael J. Wysocki Cc: Marc Dionne, norbert.jurkeit, nic_swsd, Florian Fainelli, David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy, jcline On 22.11.2018 19:57, Andrew Lunn wrote: >> Thanks a lot for testing. Could you please test also the following >> as an alternative to the delay? >> >> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >> index 55202a0ac..aeccb2323 100644 >> --- a/drivers/net/phy/phy_device.c >> +++ b/drivers/net/phy/phy_device.c >> @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner) >> new_driver->mdiodrv.driver.probe = phy_probe; >> new_driver->mdiodrv.driver.remove = phy_remove; >> new_driver->mdiodrv.driver.owner = owner; >> + new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS; >> >> retval = driver_register(&new_driver->mdiodrv.driver); >> if (retval) { > > > Humm, maybe i don't understand the issue correctly. > > When the MDIO bus is registered, we scan the bus looking for PHYs. > When we find a PHY, we call phy_device_create(). That will then > trigger the loading of the kernel module which should driver this phy. > > Sometime later, the PHY driver module gets loaded and calls > phy_drivers_register() to register the list of IDs it supports. The > driver core will then call phy_bus_match() to see if the newly loaded > driver matches to a device we have created. If so, the driver will be > associated to the device. > request_module() is synchronous and should return only once the init function of the loaded module has been executed. Means in case of phylib: all PHY drivers of the module have been registered > Sometime later, the MAC tries to attach to the phy using > phy_attach_direct(). If there is no driver associated to the device, > we use the generic PHY driver. > > I thought the issue was the race condition between loading the module > and the MAC attaching to it? We are getting the generic driver because > the specific driver is still loading? > No. The issue happens before: For whatever reason the PHY driver doesn't bind to the PHY device, even though they match. Therefore phy_attach_direct() then binds the genphy driver. In theory it shouldn't make a difference whether device or driver is registered first. When a driver is registered it checks all unbound devices on the same bus for a match. Standard is asynchronous driver probing, therefore in case of phylib (in at least a lot of cases) the driver probes the devices on the bus once the PHY device was registered. This seems to fail and I have no idea why. It has been reported that loading the PHY driver module upfront fixes the issue. Therefore I assumed it may help to let the PHY driver probe for devices earlier (even though the device isn't even registered yet). And indeed (see mail just sent by Marc) using synchronous probing fixes the issue. It seems to me that probing triggered from device registering and probing triggered from driver registering somehow interfere. Of course it's not very satisfying to have a fix but to not understand the root cause of the issue. I add Greg and Rafael as maintainers of the driver core to the discussion. Maybe they can shed some light on the situation. > If that really is the issue, i think phy_attach_direct() should try > loading the module again, doing it synchronously. Only when it fails > should the generic driver be associated to the device. > > Andrew > > Heiner ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-22 18:17 ` Heiner Kallweit 2018-11-22 18:57 ` Andrew Lunn @ 2018-11-22 19:29 ` Marc Dionne 2018-11-22 19:53 ` Heiner Kallweit 1 sibling, 1 reply; 14+ messages in thread From: Marc Dionne @ 2018-11-22 19:29 UTC (permalink / raw) To: hkallweit1 Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli, David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy, jcline On Thu, Nov 22, 2018 at 2:17 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > Thanks a lot for testing. Could you please test also the following > as an alternative to the delay? > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 55202a0ac..aeccb2323 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner) > new_driver->mdiodrv.driver.probe = phy_probe; > new_driver->mdiodrv.driver.remove = phy_remove; > new_driver->mdiodrv.driver.owner = owner; > + new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS; > > retval = driver_register(&new_driver->mdiodrv.driver); > if (retval) { That also gets me network reliably, switching between a kernel where it fails (distro 4.19 kernel) and the custom kernel with the patch. > > There's a side issue that network startup is taking a full minute > > longer than it should, but that's possibly unrelated. BTW that's an unrelated rng issue, for which I have a workaround. > > > Thanks, Heiner Marc ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with RTL8111 NIC after upgrade to kernel 4.19 2018-11-22 19:29 ` Marc Dionne @ 2018-11-22 19:53 ` Heiner Kallweit 0 siblings, 0 replies; 14+ messages in thread From: Heiner Kallweit @ 2018-11-22 19:53 UTC (permalink / raw) To: Marc Dionne Cc: andrew, norbert.jurkeit, nic_swsd, Florian Fainelli, David Miller, netdev, Linux Kernel Mailing List, michael.wiktowy, jcline On 22.11.2018 20:29, Marc Dionne wrote: > On Thu, Nov 22, 2018 at 2:17 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >> Thanks a lot for testing. Could you please test also the following >> as an alternative to the delay? >> >> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >> index 55202a0ac..aeccb2323 100644 >> --- a/drivers/net/phy/phy_device.c >> +++ b/drivers/net/phy/phy_device.c >> @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner) >> new_driver->mdiodrv.driver.probe = phy_probe; >> new_driver->mdiodrv.driver.remove = phy_remove; >> new_driver->mdiodrv.driver.owner = owner; >> + new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS; >> >> retval = driver_register(&new_driver->mdiodrv.driver); >> if (retval) { > > That also gets me network reliably, switching between a kernel where > it fails (distro 4.19 kernel) and the custom kernel with the patch. > Thanks again for the very quick response. The result is good and bad news at once: good because it seems we have a fix, bad because we don't understand the root cause of the issue yet. >>> There's a side issue that network startup is taking a full minute >>> longer than it should, but that's possibly unrelated. > > BTW that's an unrelated rng issue, for which I have a workaround. > Indeed totally unrelated, but sounds familiar. I had the same issue on a headless system which therefore generates very little entropy. I fixed it by setting CONFIG_RANDOM_TRUST_CPU=y >>> >> Thanks, Heiner > > Marc > Heiner ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-11-22 20:00 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-11-21 10:51 Issue with RTL8111 NIC after upgrade to kernel 4.19 Norbert Jurkeit 2018-11-21 19:57 ` Heiner Kallweit 2018-11-21 20:20 ` Andrew Lunn 2018-11-21 20:32 ` Heiner Kallweit 2018-11-21 20:49 ` Heiner Kallweit 2018-11-21 20:52 ` Heiner Kallweit 2018-11-21 21:53 ` Marc Dionne 2018-11-21 22:28 ` Heiner Kallweit 2018-11-21 23:13 ` Marc Dionne 2018-11-22 18:17 ` Heiner Kallweit 2018-11-22 18:57 ` Andrew Lunn 2018-11-22 19:48 ` Heiner Kallweit 2018-11-22 19:29 ` Marc Dionne 2018-11-22 19:53 ` Heiner Kallweit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).