All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
@ 2018-12-24 11:21 Heiner Kallweit
  2018-12-25 15:17 ` Heiner Kallweit
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Heiner Kallweit @ 2018-12-24 11:21 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: netdev, Norbert Jurkeit, Frank Crawford

phy_device_create() uses request_module() to load the PHY driver module
based on the PHY ID of the device. There is some timing issue which
sometimes prevents the PHY driver to bind to the device. In such cases
the genphy driver is used what can cause problems if genphy isn't
compatible with the respective PHY.
It turned out that the first fix can fix the issue in some but not all
cases. Moving the call to device_initialize() before the call to
request_module() was reported to fix the issue.
I can't explain where the root cause of the issue is and why this fix
works. AFAICS device_initialize() just initializes the device struct
w/o doing anything that could interfere with e.g. bus_add_driver().
This patch removes the first preliminary fix attempt.

Reference:
https://bugzilla.redhat.com/show_bug.cgi?id=1650984

Fixes: c85ddecae6e5 ("net: phy: add workaround for issue where PHY driver doesn't bind to the device")
Tested-by: Norbert Jurkeit <norbert.jurkeit@web.de>
Tested-by: Frank Crawford <frank@crawford.emu.id.au>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 26c41ede5..ac0a83c7d 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -579,6 +579,7 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
 		dev->c45_ids = *c45_ids;
 	dev->irq = bus->irq[addr];
 	dev_set_name(&mdiodev->dev, PHY_ID_FMT, bus->id, addr);
+	device_initialize(&mdiodev->dev);
 
 	dev->state = PHY_DOWN;
 
@@ -598,8 +599,6 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
 	 */
 	request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id));
 
-	device_initialize(&mdiodev->dev);
-
 	return dev;
 }
 EXPORT_SYMBOL(phy_device_create);
@@ -2191,14 +2190,6 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
 	new_driver->mdiodrv.driver.remove = phy_remove;
 	new_driver->mdiodrv.driver.owner = owner;
 
-	/* The following works around an issue where the PHY driver doesn't bind
-	 * to the device, resulting in the genphy driver being used instead of
-	 * the dedicated driver. The root cause of the issue isn't known yet
-	 * and seems to be in the base driver core. Once this is fixed we may
-	 * remove this workaround.
-	 */
-	new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
-
 	retval = driver_register(&new_driver->mdiodrv.driver);
 	if (retval) {
 		pr_err("%s: Error %d in registering driver\n",
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-24 11:21 [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device Heiner Kallweit
@ 2018-12-25 15:17 ` Heiner Kallweit
  2018-12-28 21:02 ` David Miller
  2018-12-29  2:42 ` Florian Fainelli
  2 siblings, 0 replies; 17+ messages in thread
From: Heiner Kallweit @ 2018-12-25 15:17 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: netdev, Norbert Jurkeit, Frank Crawford

On 24.12.2018 12:21, Heiner Kallweit wrote:
> phy_device_create() uses request_module() to load the PHY driver module
> based on the PHY ID of the device. There is some timing issue which
> sometimes prevents the PHY driver to bind to the device. In such cases
> the genphy driver is used what can cause problems if genphy isn't
> compatible with the respective PHY.
> It turned out that the first fix can fix the issue in some but not all
> cases. Moving the call to device_initialize() before the call to
> request_module() was reported to fix the issue.
> I can't explain where the root cause of the issue is and why this fix
> works. AFAICS device_initialize() just initializes the device struct
> w/o doing anything that could interfere with e.g. bus_add_driver().
> This patch removes the first preliminary fix attempt.
> 
> Reference:
> https://bugzilla.redhat.com/show_bug.cgi?id=1650984
> 
> Fixes: c85ddecae6e5 ("net: phy: add workaround for issue where PHY driver doesn't bind to the device")
> Tested-by: Norbert Jurkeit <norbert.jurkeit@web.de>
> Tested-by: Frank Crawford <frank@crawford.emu.id.au>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
[..]

I got a report from a user that for him the issue still isn't fixed.
So we need to check alternative approaches. Still I think the root cause
must be in the driver core, therefore set Greg and Rafael on the list.
Could you please wait with applying this patch?

Thanks, Heiner

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-24 11:21 [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device Heiner Kallweit
  2018-12-25 15:17 ` Heiner Kallweit
@ 2018-12-28 21:02 ` David Miller
  2018-12-28 21:56   ` Heiner Kallweit
  2018-12-29  2:42 ` Florian Fainelli
  2 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2018-12-28 21:02 UTC (permalink / raw)
  To: hkallweit1
  Cc: andrew, f.fainelli, gregkh, rafael, netdev, norbert.jurkeit, frank

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Mon, 24 Dec 2018 12:21:12 +0100

> phy_device_create() uses request_module() to load the PHY driver module
> based on the PHY ID of the device. There is some timing issue which
> sometimes prevents the PHY driver to bind to the device. In such cases
> the genphy driver is used what can cause problems if genphy isn't
> compatible with the respective PHY.
> It turned out that the first fix can fix the issue in some but not all
> cases. Moving the call to device_initialize() before the call to
> request_module() was reported to fix the issue.
> I can't explain where the root cause of the issue is and why this fix
> works. AFAICS device_initialize() just initializes the device struct
> w/o doing anything that could interfere with e.g. bus_add_driver().
> This patch removes the first preliminary fix attempt.
> 
> Reference:
> https://bugzilla.redhat.com/show_bug.cgi?id=1650984
> 
> Fixes: c85ddecae6e5 ("net: phy: add workaround for issue where PHY driver doesn't bind to the device")
> Tested-by: Norbert Jurkeit <norbert.jurkeit@web.de>
> Tested-by: Frank Crawford <frank@crawford.emu.id.au>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

I can't see any way this can make a difference.  I'm really stumped
and even if it did fix all of the issues people are seeing I wouldn't
want to apply this patch.

We need to know why, and put the why into the commit log message.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-28 21:02 ` David Miller
@ 2018-12-28 21:56   ` Heiner Kallweit
  0 siblings, 0 replies; 17+ messages in thread
From: Heiner Kallweit @ 2018-12-28 21:56 UTC (permalink / raw)
  To: David Miller
  Cc: andrew, f.fainelli, gregkh, rafael, netdev, norbert.jurkeit, frank

On 28.12.2018 22:02, David Miller wrote:
> From: Heiner Kallweit <hkallweit1@gmail.com>
> Date: Mon, 24 Dec 2018 12:21:12 +0100
> 
>> phy_device_create() uses request_module() to load the PHY driver module
>> based on the PHY ID of the device. There is some timing issue which
>> sometimes prevents the PHY driver to bind to the device. In such cases
>> the genphy driver is used what can cause problems if genphy isn't
>> compatible with the respective PHY.
>> It turned out that the first fix can fix the issue in some but not all
>> cases. Moving the call to device_initialize() before the call to
>> request_module() was reported to fix the issue.
>> I can't explain where the root cause of the issue is and why this fix
>> works. AFAICS device_initialize() just initializes the device struct
>> w/o doing anything that could interfere with e.g. bus_add_driver().
>> This patch removes the first preliminary fix attempt.
>>
>> Reference:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1650984
>>
>> Fixes: c85ddecae6e5 ("net: phy: add workaround for issue where PHY driver doesn't bind to the device")
>> Tested-by: Norbert Jurkeit <norbert.jurkeit@web.de>
>> Tested-by: Frank Crawford <frank@crawford.emu.id.au>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> 
> I can't see any way this can make a difference.  I'm really stumped
> and even if it did fix all of the issues people are seeing I wouldn't
> want to apply this patch.
> 
> We need to know why, and put the why into the commit log message.
> 
> Thanks.
> 
IIRC I sent a mail to you already asking not to apply the patch because
again I had users complaining that the fix doesn't work for them.
Root cause seems to be a tricky race in the drive base core, therefore
I included Greg and Rafael. Due to this race even small code changes
which doesn't make sense from a functional perspective help for
few users.

I can't reproduce the issue on my systems, therefore I requested more
debug info from requested users. Once I have more input I'll go on
with the root cause analysis.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-24 11:21 [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device Heiner Kallweit
  2018-12-25 15:17 ` Heiner Kallweit
  2018-12-28 21:02 ` David Miller
@ 2018-12-29  2:42 ` Florian Fainelli
  2018-12-29  2:48   ` Florian Fainelli
  2018-12-29  9:33   ` Heiner Kallweit
  2 siblings, 2 replies; 17+ messages in thread
From: Florian Fainelli @ 2018-12-29  2:42 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, David Miller, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: netdev, Norbert Jurkeit, Frank Crawford

Le 12/24/18 à 3:21 AM, Heiner Kallweit a écrit :
> phy_device_create() uses request_module() to load the PHY driver module
> based on the PHY ID of the device. There is some timing issue which
> sometimes prevents the PHY driver to bind to the device. In such cases
> the genphy driver is used what can cause problems if genphy isn't
> compatible with the respective PHY.
> It turned out that the first fix can fix the issue in some but not all
> cases. Moving the call to device_initialize() before the call to
> request_module() was reported to fix the issue.
> I can't explain where the root cause of the issue is and why this fix
> works. AFAICS device_initialize() just initializes the device struct
> w/o doing anything that could interfere with e.g. bus_add_driver().
> This patch removes the first preliminary fix attempt.

Humm but phy_device is comprised of a mdio_device on which the actual
matching is done, so you do have to call device_initialize() first in
order for the phy_device instance to have its companion mdio_device's
kobject to be properly initialized.

Out of curiosity, do any of the people who tested that change have the
ability to run a kernel with list/kobject debugging enabled so we can
learn a bit more about the problematic code path?
> 
> Reference:
> https://bugzilla.redhat.com/show_bug.cgi?id=1650984
> 
> Fixes: c85ddecae6e5 ("net: phy: add workaround for issue where PHY driver doesn't bind to the device")
> Tested-by: Norbert Jurkeit <norbert.jurkeit@web.de>
> Tested-by: Frank Crawford <frank@crawford.emu.id.au>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/net/phy/phy_device.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 26c41ede5..ac0a83c7d 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -579,6 +579,7 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
>  		dev->c45_ids = *c45_ids;
>  	dev->irq = bus->irq[addr];
>  	dev_set_name(&mdiodev->dev, PHY_ID_FMT, bus->id, addr);
> +	device_initialize(&mdiodev->dev);
>  
>  	dev->state = PHY_DOWN;
>  
> @@ -598,8 +599,6 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
>  	 */
>  	request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id));
>  
> -	device_initialize(&mdiodev->dev);
> -
>  	return dev;
>  }
>  EXPORT_SYMBOL(phy_device_create);
> @@ -2191,14 +2190,6 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
>  	new_driver->mdiodrv.driver.remove = phy_remove;
>  	new_driver->mdiodrv.driver.owner = owner;
>  
> -	/* The following works around an issue where the PHY driver doesn't bind
> -	 * to the device, resulting in the genphy driver being used instead of
> -	 * the dedicated driver. The root cause of the issue isn't known yet
> -	 * and seems to be in the base driver core. Once this is fixed we may
> -	 * remove this workaround.
> -	 */
> -	new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
> -
>  	retval = driver_register(&new_driver->mdiodrv.driver);
>  	if (retval) {
>  		pr_err("%s: Error %d in registering driver\n",
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29  2:42 ` Florian Fainelli
@ 2018-12-29  2:48   ` Florian Fainelli
  2018-12-29  9:45     ` Heiner Kallweit
  2018-12-29 11:46     ` Norbert Jurkeit
  2018-12-29  9:33   ` Heiner Kallweit
  1 sibling, 2 replies; 17+ messages in thread
From: Florian Fainelli @ 2018-12-29  2:48 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, David Miller, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: netdev, Norbert Jurkeit, Frank Crawford

Le 12/28/18 à 6:42 PM, Florian Fainelli a écrit :
> Le 12/24/18 à 3:21 AM, Heiner Kallweit a écrit :
>> phy_device_create() uses request_module() to load the PHY driver module
>> based on the PHY ID of the device. There is some timing issue which
>> sometimes prevents the PHY driver to bind to the device. In such cases
>> the genphy driver is used what can cause problems if genphy isn't
>> compatible with the respective PHY.
>> It turned out that the first fix can fix the issue in some but not all
>> cases. Moving the call to device_initialize() before the call to
>> request_module() was reported to fix the issue.
>> I can't explain where the root cause of the issue is and why this fix
>> works. AFAICS device_initialize() just initializes the device struct
>> w/o doing anything that could interfere with e.g. bus_add_driver().
>> This patch removes the first preliminary fix attempt.
> 
> Humm but phy_device is comprised of a mdio_device on which the actual
> matching is done, so you do have to call device_initialize() first in
> order for the phy_device instance to have its companion mdio_device's
> kobject to be properly initialized.
> 
> Out of curiosity, do any of the people who tested that change have the
> ability to run a kernel with list/kobject debugging enabled so we can
> learn a bit more about the problematic code path?

Heiner; are you positive that the PHY is not in a power down mode
(BMCR.PDOWN = 1) at the time the r8169 probe is done? Because if that
was the case, there is no guarantee (per 802.3 clause 22 spec) that the
PHY must correctly respond to MDIO operations other than read/write to
BMCR register and this could explain that problem, as well as the
general lack of connectivity for these users.

Do you have any way to check that by any chance? We should probably take
care of that in the PHY library at some point.

>>
>> Reference:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1650984
>>
>> Fixes: c85ddecae6e5 ("net: phy: add workaround for issue where PHY driver doesn't bind to the device")
>> Tested-by: Norbert Jurkeit <norbert.jurkeit@web.de>
>> Tested-by: Frank Crawford <frank@crawford.emu.id.au>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/net/phy/phy_device.c | 11 +----------
>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 26c41ede5..ac0a83c7d 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -579,6 +579,7 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
>>  		dev->c45_ids = *c45_ids;
>>  	dev->irq = bus->irq[addr];
>>  	dev_set_name(&mdiodev->dev, PHY_ID_FMT, bus->id, addr);
>> +	device_initialize(&mdiodev->dev);
>>  
>>  	dev->state = PHY_DOWN;
>>  
>> @@ -598,8 +599,6 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
>>  	 */
>>  	request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id));
>>  
>> -	device_initialize(&mdiodev->dev);
>> -
>>  	return dev;
>>  }
>>  EXPORT_SYMBOL(phy_device_create);
>> @@ -2191,14 +2190,6 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
>>  	new_driver->mdiodrv.driver.remove = phy_remove;
>>  	new_driver->mdiodrv.driver.owner = owner;
>>  
>> -	/* The following works around an issue where the PHY driver doesn't bind
>> -	 * to the device, resulting in the genphy driver being used instead of
>> -	 * the dedicated driver. The root cause of the issue isn't known yet
>> -	 * and seems to be in the base driver core. Once this is fixed we may
>> -	 * remove this workaround.
>> -	 */
>> -	new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
>> -
>>  	retval = driver_register(&new_driver->mdiodrv.driver);
>>  	if (retval) {
>>  		pr_err("%s: Error %d in registering driver\n",
>>
> 
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29  2:42 ` Florian Fainelli
  2018-12-29  2:48   ` Florian Fainelli
@ 2018-12-29  9:33   ` Heiner Kallweit
  1 sibling, 0 replies; 17+ messages in thread
From: Heiner Kallweit @ 2018-12-29  9:33 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, David Miller, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: netdev, Norbert Jurkeit, Frank Crawford

On 29.12.2018 03:42, Florian Fainelli wrote:
> Le 12/24/18 à 3:21 AM, Heiner Kallweit a écrit :
>> phy_device_create() uses request_module() to load the PHY driver module
>> based on the PHY ID of the device. There is some timing issue which
>> sometimes prevents the PHY driver to bind to the device. In such cases
>> the genphy driver is used what can cause problems if genphy isn't
>> compatible with the respective PHY.
>> It turned out that the first fix can fix the issue in some but not all
>> cases. Moving the call to device_initialize() before the call to
>> request_module() was reported to fix the issue.
>> I can't explain where the root cause of the issue is and why this fix
>> works. AFAICS device_initialize() just initializes the device struct
>> w/o doing anything that could interfere with e.g. bus_add_driver().
>> This patch removes the first preliminary fix attempt.
> 
> Humm but phy_device is comprised of a mdio_device on which the actual
> matching is done, so you do have to call device_initialize() first in
> order for the phy_device instance to have its companion mdio_device's
> kobject to be properly initialized.
> 
> Out of curiosity, do any of the people who tested that change have the
> ability to run a kernel with list/kobject debugging enabled so we can
> learn a bit more about the problematic code path?
>>
One or two can build a kernel and test if they are given a patch and
instructions. The other option is to ask Hans from Redhat to build a
test kernel and distribute it as rpm to users who want to test.
See also history of the bug ticket.

Last status is that I provided a patch which creates some debug output
and Hans built a test kernel. However most likely we will get feedback
only beginning of January when more people are back from holidays.

>> Reference:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1650984
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29  2:48   ` Florian Fainelli
@ 2018-12-29  9:45     ` Heiner Kallweit
  2018-12-29 11:46     ` Norbert Jurkeit
  1 sibling, 0 replies; 17+ messages in thread
From: Heiner Kallweit @ 2018-12-29  9:45 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, David Miller, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: netdev, Norbert Jurkeit, Frank Crawford

On 29.12.2018 03:48, Florian Fainelli wrote:
> Le 12/28/18 à 6:42 PM, Florian Fainelli a écrit :
>> Le 12/24/18 à 3:21 AM, Heiner Kallweit a écrit :
>>> phy_device_create() uses request_module() to load the PHY driver module
>>> based on the PHY ID of the device. There is some timing issue which
>>> sometimes prevents the PHY driver to bind to the device. In such cases
>>> the genphy driver is used what can cause problems if genphy isn't
>>> compatible with the respective PHY.
>>> It turned out that the first fix can fix the issue in some but not all
>>> cases. Moving the call to device_initialize() before the call to
>>> request_module() was reported to fix the issue.
>>> I can't explain where the root cause of the issue is and why this fix
>>> works. AFAICS device_initialize() just initializes the device struct
>>> w/o doing anything that could interfere with e.g. bus_add_driver().
>>> This patch removes the first preliminary fix attempt.
>>
>> Humm but phy_device is comprised of a mdio_device on which the actual
>> matching is done, so you do have to call device_initialize() first in
>> order for the phy_device instance to have its companion mdio_device's
>> kobject to be properly initialized.
>>
>> Out of curiosity, do any of the people who tested that change have the
>> ability to run a kernel with list/kobject debugging enabled so we can
>> learn a bit more about the problematic code path?
> 
> Heiner; are you positive that the PHY is not in a power down mode
> (BMCR.PDOWN = 1) at the time the r8169 probe is done? Because if that
> was the case, there is no guarantee (per 802.3 clause 22 spec) that the
> PHY must correctly respond to MDIO operations other than read/write to
> BMCR register and this could explain that problem, as well as the
> general lack of connectivity for these users.
> 
Some users facing the issue reported that even when the dedicated PHY
driver isn't bound still module "realtek" shows up in lsmod.
This seems to prove that the correct PHYID was read from the chip.

It was also reported that building the PHY driver in instead of
making it a module avoids the issue. All that makes me think that
we have a nasty timing issue, not in phylib but in driver base core.

All that doesn't seem to be specific to the Realtek stuff, so I was
wondering why no other user reported such an issue before.
My assumption:
Different versions of Realtek network chips have been used on a lot of
(most?) consumer mainboards for more than 10 years. So there is a
huge base and wide variety of systems now running phylib, thus the
chance of triggering issues like the one we discuss now has massively
increased.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29  2:48   ` Florian Fainelli
  2018-12-29  9:45     ` Heiner Kallweit
@ 2018-12-29 11:46     ` Norbert Jurkeit
  2018-12-29 11:54       ` Heiner Kallweit
  2019-01-03 20:13       ` Heiner Kallweit
  1 sibling, 2 replies; 17+ messages in thread
From: Norbert Jurkeit @ 2018-12-29 11:46 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Heiner Kallweit, Andrew Lunn, David Miller, Greg Kroah-Hartman,
	Rafael J. Wysocki, netdev, Frank Crawford

Am 29.12.18 um 03:48 schrieb Florian Fainelli:
> Heiner; are you positive that the PHY is not in a power down mode
> (BMCR.PDOWN = 1) at the time the r8169 probe is done? Because if that
> was the case, there is no guarantee (per 802.3 clause 22 spec) that the
> PHY must correctly respond to MDIO operations other than read/write to
> BMCR register and this could explain that problem, as well as the
> general lack of connectivity for these users.

This is a good point, at least in my case. My RTL8111 interface is 
connected to a Wifi router which is normally located out of sight when I 
sit in front of my PC, otherwise I might have noticed this earlier:

When I reboot from kernel 4.18.18 the LAN port status LED turns off at 
the end of the shutdown sequence and remains off with most 4.19.* 
kernels until "rmmod r8169; modprobe r8169" is issued.

When I shutdown&poweroff from kernel 4.18.18 the LED of course also 
turns off but lights up immediately after the PC power button is 
pressed. The network then comes up successfully with all tested 4.19.* 
kernels.

When I reboot from kernel 4.19.* the LED does NOT turn off during 
shutdown (only for about 1s during boot shortly before the login screen 
appears) and the network also comes up successfully with all 4.19.* kernels.


@Heiner: I hope that helps, otherwise please let me know if I should do 
more testing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29 11:46     ` Norbert Jurkeit
@ 2018-12-29 11:54       ` Heiner Kallweit
  2018-12-29 13:27         ` Norbert Jurkeit
  2019-01-03 20:13       ` Heiner Kallweit
  1 sibling, 1 reply; 17+ messages in thread
From: Heiner Kallweit @ 2018-12-29 11:54 UTC (permalink / raw)
  To: Norbert Jurkeit, Florian Fainelli
  Cc: Andrew Lunn, David Miller, Greg Kroah-Hartman, Rafael J. Wysocki,
	netdev, Frank Crawford

On 29.12.2018 12:46, Norbert Jurkeit wrote:
> Am 29.12.18 um 03:48 schrieb Florian Fainelli:
>> Heiner; are you positive that the PHY is not in a power down mode
>> (BMCR.PDOWN = 1) at the time the r8169 probe is done? Because if that
>> was the case, there is no guarantee (per 802.3 clause 22 spec) that the
>> PHY must correctly respond to MDIO operations other than read/write to
>> BMCR register and this could explain that problem, as well as the
>> general lack of connectivity for these users.
> 
> This is a good point, at least in my case. My RTL8111 interface is connected to a Wifi router which is normally located out of sight when I sit in front of my PC, otherwise I might have noticed this earlier:
> 
> When I reboot from kernel 4.18.18 the LAN port status LED turns off at the end of the shutdown sequence and remains off with most 4.19.* kernels until "rmmod r8169; modprobe r8169" is issued.
> 
> When I shutdown&poweroff from kernel 4.18.18 the LED of course also turns off but lights up immediately after the PC power button is pressed. The network then comes up successfully with all tested 4.19.* kernels.
> 
> When I reboot from kernel 4.19.* the LED does NOT turn off during shutdown (only for about 1s during boot shortly before the login screen appears) and the network also comes up successfully with all 4.19.* kernels.
> 
> 
> @Heiner: I hope that helps, otherwise please let me know if I should do more testing.
> 
Good to know, I'll check. In this context, do you have Wake-on-LAN enabled?
IOW, what does "ethtool <if>" state in line "Wake-on:" ?

Heiner

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29 11:54       ` Heiner Kallweit
@ 2018-12-29 13:27         ` Norbert Jurkeit
  2018-12-29 13:55           ` Heiner Kallweit
  0 siblings, 1 reply; 17+ messages in thread
From: Norbert Jurkeit @ 2018-12-29 13:27 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, Andrew Lunn, David Miller, Greg Kroah-Hartman,
	Rafael J. Wysocki, netdev, Frank Crawford

Am 29.12.18 um 12:54 schrieb Heiner Kallweit:
>
> Good to know, I'll check. In this context, do you have Wake-on-LAN enabled?
> IOW, what does "ethtool <if>" state in line "Wake-on:" ?

"ethtool enp1s0" yields
     Supports Wake-on: pumbg
     Wake-on: d

Unfortunately I could not find any BIOS switch to enable WOL, but 
although not persistent, "ethtool -s enp1s0 wol g" works for 1 reboot. I 
issued that command in a session with kernel 4.18.18, rebooted and 
noticed that between shutdown and new boot the LAN port status LED 
turned off for 1s, then turn on again and network came up with a 4.19 
kernel which normally fails in this szenario.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29 13:27         ` Norbert Jurkeit
@ 2018-12-29 13:55           ` Heiner Kallweit
  2018-12-29 15:31             ` Norbert Jurkeit
  0 siblings, 1 reply; 17+ messages in thread
From: Heiner Kallweit @ 2018-12-29 13:55 UTC (permalink / raw)
  To: Norbert Jurkeit; +Cc: Florian Fainelli, Andrew Lunn, netdev, Frank Crawford

On 29.12.2018 14:27, Norbert Jurkeit wrote:
> Am 29.12.18 um 12:54 schrieb Heiner Kallweit:
>>
>> Good to know, I'll check. In this context, do you have Wake-on-LAN enabled?
>> IOW, what does "ethtool <if>" state in line "Wake-on:" ?
> 
> "ethtool enp1s0" yields
>     Supports Wake-on: pumbg
>     Wake-on: d
> 
> Unfortunately I could not find any BIOS switch to enable WOL, but although not persistent, "ethtool -s enp1s0 wol g" works for 1 reboot. I issued that command in a session with kernel 4.18.18, rebooted and noticed that between shutdown and new boot the LAN port status LED turned off for 1s, then turn on again and network came up with a 4.19 kernel which normally fails in this szenario.
> 
> 
Great, then let's go for one more test. Could you apply the following to 4.19 and start in a fail scenario?
I would be interested in the additional dmesg line, just grep for "hk:".


diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 99bc3de90..20457c4e6 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7048,6 +7048,7 @@ static int r8169_mdio_register(struct rtl8169_private *tp)
 	new_bus->name = "r8169";
 	new_bus->priv = tp;
 	new_bus->parent = &pdev->dev;
+	new_bus->phy_mask = GENMASK(31, 1);
 	new_bus->irq[0] = PHY_IGNORE_INTERRUPT;
 	snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x",
 		 PCI_DEVID(pdev->bus->number, pdev->devfn));
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 51990002d..b92699397 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -793,6 +793,8 @@ struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45)
 	if (r)
 		return ERR_PTR(r);
 
+	pr_info("hk: addr = %d, phyid = 0x%08x, bmcr = 0x%x\n", addr, phy_id, mdiobus_read(bus, addr, 0));
+
 	/* If the phy_id is mostly Fs, there is no device there */
 	if ((phy_id & 0x1fffffff) == 0x1fffffff)
 		return ERR_PTR(-ENODEV);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29 13:55           ` Heiner Kallweit
@ 2018-12-29 15:31             ` Norbert Jurkeit
  2018-12-29 15:44               ` Heiner Kallweit
  0 siblings, 1 reply; 17+ messages in thread
From: Norbert Jurkeit @ 2018-12-29 15:31 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, Andrew Lunn, netdev, Frank Crawford

Am 29.12.18 um 14:55 schrieb Heiner Kallweit:
> Great, then let's go for one more test. Could you apply the following to 4.19 and start in a fail scenario?
> I would be interested in the additional dmesg line, just grep for "hk:".
>
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 99bc3de90..20457c4e6 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -7048,6 +7048,7 @@ static int r8169_mdio_register(struct rtl8169_private *tp)
>   	new_bus->name = "r8169";
>   	new_bus->priv = tp;
>   	new_bus->parent = &pdev->dev;
> +	new_bus->phy_mask = GENMASK(31, 1);
>   	new_bus->irq[0] = PHY_IGNORE_INTERRUPT;
>   	snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x",
>   		 PCI_DEVID(pdev->bus->number, pdev->devfn));
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 51990002d..b92699397 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -793,6 +793,8 @@ struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45)
>   	if (r)
>   		return ERR_PTR(r);
>   
> +	pr_info("hk: addr = %d, phyid = 0x%08x, bmcr = 0x%x\n", addr, phy_id, mdiobus_read(bus, addr, 0));
> +
>   	/* If the phy_id is mostly Fs, there is no device there */
>   	if ((phy_id & 0x1fffffff) == 0x1fffffff)
>   		return ERR_PTR(-ENODEV);

Damn. I applied the patch to mainline kernel 4.19.10 because most object 
files already exist and this kernel is known to fail without patches. 
Although not intended the above patch makes it work, i.e. the link comes 
up at the end of the boot process. FWIW, the dmesg lines are:

[    2.164120] libphy: hk: addr = 0, phyid = 0xffffffff, bmcr = 0xffff
[    2.164122] libphy: hk: addr = 1, phyid = 0xffffffff, bmcr = 0xffff
[    2.164123] libphy: hk: addr = 2, phyid = 0xffffffff, bmcr = 0xffff
[    2.164123] libphy: hk: addr = 3, phyid = 0xffffffff, bmcr = 0xffff
[    2.164124] libphy: hk: addr = 4, phyid = 0xffffffff, bmcr = 0xffff
[    2.164125] libphy: hk: addr = 5, phyid = 0xffffffff, bmcr = 0xffff
[    2.164126] libphy: hk: addr = 6, phyid = 0xffffffff, bmcr = 0xffff
[    2.164127] libphy: hk: addr = 7, phyid = 0xffffffff, bmcr = 0xffff
[    2.164128] libphy: hk: addr = 8, phyid = 0xffffffff, bmcr = 0xffff
[    2.164129] libphy: hk: addr = 9, phyid = 0xffffffff, bmcr = 0xffff
[    2.164130] libphy: hk: addr = 10, phyid = 0xffffffff, bmcr = 0xffff
[    2.164130] libphy: hk: addr = 11, phyid = 0xffffffff, bmcr = 0xffff
[    2.164131] libphy: hk: addr = 12, phyid = 0xffffffff, bmcr = 0xffff
[    2.164132] libphy: hk: addr = 13, phyid = 0xffffffff, bmcr = 0xffff
[    2.164133] libphy: hk: addr = 14, phyid = 0xffffffff, bmcr = 0xffff
[    2.164134] libphy: hk: addr = 15, phyid = 0xffffffff, bmcr = 0xffff
[    2.164135] libphy: hk: addr = 16, phyid = 0xffffffff, bmcr = 0xffff
[    2.164136] libphy: hk: addr = 17, phyid = 0xffffffff, bmcr = 0xffff
[    2.164136] libphy: hk: addr = 18, phyid = 0xffffffff, bmcr = 0xffff
[    2.164137] libphy: hk: addr = 19, phyid = 0xffffffff, bmcr = 0xffff
[    2.164138] libphy: hk: addr = 20, phyid = 0xffffffff, bmcr = 0xffff
[    2.164139] libphy: hk: addr = 21, phyid = 0xffffffff, bmcr = 0xffff
[    2.164140] libphy: hk: addr = 22, phyid = 0xffffffff, bmcr = 0xffff
[    2.164141] libphy: hk: addr = 23, phyid = 0xffffffff, bmcr = 0xffff
[    2.164142] libphy: hk: addr = 24, phyid = 0xffffffff, bmcr = 0xffff
[    2.164142] libphy: hk: addr = 25, phyid = 0xffffffff, bmcr = 0xffff
[    2.164143] libphy: hk: addr = 26, phyid = 0xffffffff, bmcr = 0xffff
[    2.164144] libphy: hk: addr = 27, phyid = 0xffffffff, bmcr = 0xffff
[    2.164145] libphy: hk: addr = 28, phyid = 0xffffffff, bmcr = 0xffff
[    2.164146] libphy: hk: addr = 29, phyid = 0xffffffff, bmcr = 0xffff
[    2.164147] libphy: hk: addr = 30, phyid = 0xffffffff, bmcr = 0xffff
[    2.164148] libphy: hk: addr = 31, phyid = 0xffffffff, bmcr = 0xffff
[    4.674485] libphy: hk: addr = 0, phyid = 0x001cc912, bmcr = 0x800
[    4.682743] libphy: hk: addr = 0, phyid = 0x001cc913, bmcr = 0x1000

Please remember that my PC has also a 2. NIC of type RTL8169 which works 
fine.

I will try again with newly released kernel 4.19.13, but this will take 
longer and I can't promise better results.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29 15:31             ` Norbert Jurkeit
@ 2018-12-29 15:44               ` Heiner Kallweit
  2018-12-29 16:59                 ` Norbert Jurkeit
  0 siblings, 1 reply; 17+ messages in thread
From: Heiner Kallweit @ 2018-12-29 15:44 UTC (permalink / raw)
  To: Norbert Jurkeit; +Cc: Florian Fainelli, Andrew Lunn, netdev, Frank Crawford

On 29.12.2018 16:31, Norbert Jurkeit wrote:
> Am 29.12.18 um 14:55 schrieb Heiner Kallweit:
>> Great, then let's go for one more test. Could you apply the following to 4.19 and start in a fail scenario?
>> I would be interested in the additional dmesg line, just grep for "hk:".
>>
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>> index 99bc3de90..20457c4e6 100644
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -7048,6 +7048,7 @@ static int r8169_mdio_register(struct rtl8169_private *tp)
>>       new_bus->name = "r8169";
>>       new_bus->priv = tp;
>>       new_bus->parent = &pdev->dev;
>> +    new_bus->phy_mask = GENMASK(31, 1);
>>       new_bus->irq[0] = PHY_IGNORE_INTERRUPT;
>>       snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x",
>>            PCI_DEVID(pdev->bus->number, pdev->devfn));
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 51990002d..b92699397 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -793,6 +793,8 @@ struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45)
>>       if (r)
>>           return ERR_PTR(r);
>>   +    pr_info("hk: addr = %d, phyid = 0x%08x, bmcr = 0x%x\n", addr, phy_id, mdiobus_read(bus, addr, 0));
>> +
>>       /* If the phy_id is mostly Fs, there is no device there */
>>       if ((phy_id & 0x1fffffff) == 0x1fffffff)
>>           return ERR_PTR(-ENODEV);
> 
> Damn. I applied the patch to mainline kernel 4.19.10 because most object files already exist and this kernel is known to fail without patches. Although not intended the above patch makes it work, i.e. the link comes up at the end of the boot process. FWIW, the dmesg lines are:
> 
I don't think this patch can have any impact on the issue. Maybe WoL is still active from previous test?
Manual WoL settings may survive a reboot, you can disable WoL by "ethtool -s <if> wol d".

> [    4.674485] libphy: hk: addr = 0, phyid = 0x001cc912, bmcr = 0x800

That's interesting. bmcr value states that PHY is powered down, but still the correct phyid is read.
Having said that it should not be the cause of the issue.

> [    4.682743] libphy: hk: addr = 0, phyid = 0x001cc913, bmcr = 0x1000
> 
> Please remember that my PC has also a 2. NIC of type RTL8169 which works fine.
> 
> I will try again with newly released kernel 4.19.13, but this will take longer and I can't promise better results.
> 
> 
It should make no difference whether you go with 4.10 or 4.13.
What could be helpful in addition: I provided a patch with some debug output in comment 106
in the bug ticket (https://bugzilla.redhat.com/show_bug.cgi?id=1650984).
If you could apply this, trigger a fail scenario, and attach the full dmesg to the bug ticket.

Thanks a lot!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29 15:44               ` Heiner Kallweit
@ 2018-12-29 16:59                 ` Norbert Jurkeit
  2019-01-06 15:45                   ` Heiner Kallweit
  0 siblings, 1 reply; 17+ messages in thread
From: Norbert Jurkeit @ 2018-12-29 16:59 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, Andrew Lunn, netdev, Frank Crawford

Am 29.12.18 um 16:44 schrieb Heiner Kallweit:
>
> I don't think this patch can have any impact on the issue. Maybe WoL is still active from previous test?
> Manual WoL settings may survive a reboot, you can disable WoL by "ethtool -s <if> wol d".

In theory I agree, but we have seen before that it can not be predicted 
or logically explained which kernel build suffers from the issue or does 
not. WoL is definitely off. When it was enabled, the LED already turned 
on with the BIOS diagnostics screen and not at the end of the boot 
process as observed with the patched kernel.

>
> What could be helpful in addition: I provided a patch with some debug output in comment 106
> in the bug ticket (https://bugzilla.redhat.com/show_bug.cgi?id=1650984).
> If you could apply this, trigger a fail scenario, and attach the full dmesg to the bug ticket.
I just tried the Fedora kernel provided in comment 107. Unfortunately 
the fault neither shows up with this kernel nor with the stock Fedora 
kernel 4.19.12 it is based on. I will further try to find a kernel which 
fails to bring up the link AND provides some useful debug information 
but can't anticipate if and when.
> Thanks a lot!
You are welcome ;-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29 11:46     ` Norbert Jurkeit
  2018-12-29 11:54       ` Heiner Kallweit
@ 2019-01-03 20:13       ` Heiner Kallweit
  1 sibling, 0 replies; 17+ messages in thread
From: Heiner Kallweit @ 2019-01-03 20:13 UTC (permalink / raw)
  To: Norbert Jurkeit, Florian Fainelli
  Cc: Andrew Lunn, David Miller, netdev, Frank Crawford

On 29.12.2018 12:46, Norbert Jurkeit wrote:
> Am 29.12.18 um 03:48 schrieb Florian Fainelli:
>> Heiner; are you positive that the PHY is not in a power down mode
>> (BMCR.PDOWN = 1) at the time the r8169 probe is done? Because if that
>> was the case, there is no guarantee (per 802.3 clause 22 spec) that the
>> PHY must correctly respond to MDIO operations other than read/write to
>> BMCR register and this could explain that problem, as well as the
>> general lack of connectivity for these users.
> 
> This is a good point, at least in my case. My RTL8111 interface is connected to a Wifi router which is normally located out of sight when I sit in front of my PC, otherwise I might have noticed this earlier:
> 
> When I reboot from kernel 4.18.18 the LAN port status LED turns off at the end of the shutdown sequence and remains off with most 4.19.* kernels until "rmmod r8169; modprobe r8169" is issued.
> 
> When I shutdown&poweroff from kernel 4.18.18 the LED of course also turns off but lights up immediately after the PC power button is pressed. The network then comes up successfully with all tested 4.19.* kernels.
> 
> When I reboot from kernel 4.19.* the LED does NOT turn off during shutdown (only for about 1s during boot shortly before the login screen appears) and the network also comes up successfully with all 4.19.* kernels.
> 
I'd like to come back to these observations. I had a few German beers
to relax my mind and get fresh ideas. And it helped ..

IMO the following explains why the LED stays off when booting from 4.18
to 4.19: phylib support was added to r8169 with 4.19. Up to 4.18 the
driver had its own implementation for powering down and up the PHY.
When powering down the PHY it disabled autonegotiation. From 4.19
genphy_resume() is used to power up the PHY and this function doesn't
touch the "aneg enabled" bit in the PHY (register BMCR).
Therefore aneg is disabled during boot and the LED stays off.
genphy_config_aneg() sets the "aneg enabled" bit later.

A cold boot initializes the PHY and enables aneg.

Rebooting from 4.19 doesn't disable aneg, therefore LED stays on.

I think this effect and the other issue (falsely loading genphy driver)
are overlapping and most likely not related (even if certain symptoms
seem to indicate this). At least I have no explanation for how
disabled aneg could prevent the dedicated PHY driver from binding to
the PHY device.

Heiner

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device
  2018-12-29 16:59                 ` Norbert Jurkeit
@ 2019-01-06 15:45                   ` Heiner Kallweit
  0 siblings, 0 replies; 17+ messages in thread
From: Heiner Kallweit @ 2019-01-06 15:45 UTC (permalink / raw)
  To: Norbert Jurkeit; +Cc: Florian Fainelli, Andrew Lunn, netdev, Frank Crawford

On 29.12.2018 17:59, Norbert Jurkeit wrote:
> Am 29.12.18 um 16:44 schrieb Heiner Kallweit:
>>
>> I don't think this patch can have any impact on the issue. Maybe WoL is still active from previous test?
>> Manual WoL settings may survive a reboot, you can disable WoL by "ethtool -s <if> wol d".
> 
> In theory I agree, but we have seen before that it can not be predicted or logically explained which kernel build suffers from the issue or does not. WoL is definitely off. When it was enabled, the LED already turned on with the BIOS diagnostics screen and not at the end of the boot process as observed with the patched kernel.
> 
>>
>> What could be helpful in addition: I provided a patch with some debug output in comment 106
>> in the bug ticket (https://bugzilla.redhat.com/show_bug.cgi?id=1650984).
>> If you could apply this, trigger a fail scenario, and attach the full dmesg to the bug ticket.
> I just tried the Fedora kernel provided in comment 107. Unfortunately the fault neither shows up with this kernel nor with the stock Fedora kernel 4.19.12 it is based on. I will further try to find a kernel which fails to bring up the link AND provides some useful debug information but can't anticipate if and when.
>> Thanks a lot!
> You are welcome ;-)
> 
> 

Just by chance I came across the concept of MODULE_SOFTDEP. This is
basically in driver code what people did as a workaround manually in
the modprobe config files. It ensures that the PHY driver module is
loaded before the network driver. It's still a workaround but the
most elegant I can think of. I'm pretty sure this reliably avoids
the issue, but: could you please test?

If it works, then what I list below as one patch would be splitted:
1. add MODULE_SOFTDEP to r8169
2. remove preliminary fix

Thanks, Heiner


diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 298930d39..4c1485a42 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -706,6 +706,7 @@ module_param(use_dac, int, 0);
 MODULE_PARM_DESC(use_dac, "Enable PCI DAC. Unsafe on 32 bit PCI slot.");
 module_param_named(debug, debug.msg_enable, int, 0);
 MODULE_PARM_DESC(debug, "Debug verbosity level (0=none, ..., 16=all)");
+MODULE_SOFTDEP("pre: realtek");
 MODULE_LICENSE("GPL");
 MODULE_FIRMWARE(FIRMWARE_8168D_1);
 MODULE_FIRMWARE(FIRMWARE_8168D_2);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 9560a2b84..80be72844 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2259,14 +2259,6 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
 	new_driver->mdiodrv.driver.remove = phy_remove;
 	new_driver->mdiodrv.driver.owner = owner;
 
-	/* The following works around an issue where the PHY driver doesn't bind
-	 * to the device, resulting in the genphy driver being used instead of
-	 * the dedicated driver. The root cause of the issue isn't known yet
-	 * and seems to be in the base driver core. Once this is fixed we may
-	 * remove this workaround.
-	 */
-	new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
-
 	retval = driver_register(&new_driver->mdiodrv.driver);
 	if (retval) {
 		pr_err("%s: Error %d in registering driver\n",
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-01-06 15:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-24 11:21 [PATCH net] net: phy: replace preliminary fix for PHY driver sometimes not binding to the device Heiner Kallweit
2018-12-25 15:17 ` Heiner Kallweit
2018-12-28 21:02 ` David Miller
2018-12-28 21:56   ` Heiner Kallweit
2018-12-29  2:42 ` Florian Fainelli
2018-12-29  2:48   ` Florian Fainelli
2018-12-29  9:45     ` Heiner Kallweit
2018-12-29 11:46     ` Norbert Jurkeit
2018-12-29 11:54       ` Heiner Kallweit
2018-12-29 13:27         ` Norbert Jurkeit
2018-12-29 13:55           ` Heiner Kallweit
2018-12-29 15:31             ` Norbert Jurkeit
2018-12-29 15:44               ` Heiner Kallweit
2018-12-29 16:59                 ` Norbert Jurkeit
2019-01-06 15:45                   ` Heiner Kallweit
2019-01-03 20:13       ` Heiner Kallweit
2018-12-29  9:33   ` Heiner Kallweit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.