linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] net: phy: read link status twice when phy_check_link_status()
@ 2019-07-26  9:53 Yonglong Liu
  2019-07-26 18:14 ` Heiner Kallweit
  0 siblings, 1 reply; 12+ messages in thread
From: Yonglong Liu @ 2019-07-26  9:53 UTC (permalink / raw)
  To: andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose

According to the datasheet of Marvell phy and Realtek phy, the
copper link status should read twice, or it may get a fake link
up status, and cause up->down->up at the first time when link up.
This happens more oftem at Realtek phy.

I add a fake status read, and can solve this problem.

I also see that in genphy_update_link(), had delete the fake
read in polling mode, so I don't know whether my solution is
correct.

Or provide a phydev->drv->read_status functions for the phy I
used is more acceptable?

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
---
 drivers/net/phy/phy.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ef7aa73..0c03edc 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1,4 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0+
+	err = phy_read_status(phydev);
+	if (err)
+		return err;
 /* Framework for configuring and reading PHY devices
  * Based on code in sungem_phy.c and gianfar_phy.c
  *
@@ -525,6 +528,11 @@ static int phy_check_link_status(struct phy_device *phydev)
 
 	WARN_ON(!mutex_is_locked(&phydev->lock));
 
+	/* Do a fake read */
+	err = phy_read(phydev, MII_BMSR);
+	if (err < 0)
+		return err;
+
 	err = phy_read_status(phydev);
 	if (err)
 		return err;
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-26  9:53 [RFC] net: phy: read link status twice when phy_check_link_status() Yonglong Liu
@ 2019-07-26 18:14 ` Heiner Kallweit
  2019-07-29  3:59   ` liuyonglong
  0 siblings, 1 reply; 12+ messages in thread
From: Heiner Kallweit @ 2019-07-26 18:14 UTC (permalink / raw)
  To: Yonglong Liu, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose

On 26.07.2019 11:53, Yonglong Liu wrote:
> According to the datasheet of Marvell phy and Realtek phy, the
> copper link status should read twice, or it may get a fake link
> up status, and cause up->down->up at the first time when link up.
> This happens more oftem at Realtek phy.
> 
This is not correct, there is no fake link up status.
Read the comment in genphy_update_link, only link-down events
are latched. Means if the first read returns link up, then there
is no need for a second read. And in polling mode we don't do a
second read because we want to detect also short link drops.

It would be helpful if you could describe your actual problem
and whether you use polling or interrupt mode.

> I add a fake status read, and can solve this problem.
> 
> I also see that in genphy_update_link(), had delete the fake
> read in polling mode, so I don't know whether my solution is
> correct.
> 
> Or provide a phydev->drv->read_status functions for the phy I
> used is more acceptable?
> 
> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
> ---
>  drivers/net/phy/phy.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index ef7aa73..0c03edc 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -1,4 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0+
> +	err = phy_read_status(phydev);
> +	if (err)
> +		return err;

This seems to be completely wrong at that place.

>  /* Framework for configuring and reading PHY devices
>   * Based on code in sungem_phy.c and gianfar_phy.c
>   *
> @@ -525,6 +528,11 @@ static int phy_check_link_status(struct phy_device *phydev)
>  
>  	WARN_ON(!mutex_is_locked(&phydev->lock));
>  
> +	/* Do a fake read */
> +	err = phy_read(phydev, MII_BMSR);
> +	if (err < 0)
> +		return err;
> +
>  	err = phy_read_status(phydev);
>  	if (err)
>  		return err;
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-26 18:14 ` Heiner Kallweit
@ 2019-07-29  3:59   ` liuyonglong
  2019-07-29 20:57     ` Heiner Kallweit
  0 siblings, 1 reply; 12+ messages in thread
From: liuyonglong @ 2019-07-29  3:59 UTC (permalink / raw)
  To: Heiner Kallweit, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose



On 2019/7/27 2:14, Heiner Kallweit wrote:
> On 26.07.2019 11:53, Yonglong Liu wrote:
>> According to the datasheet of Marvell phy and Realtek phy, the
>> copper link status should read twice, or it may get a fake link
>> up status, and cause up->down->up at the first time when link up.
>> This happens more oftem at Realtek phy.
>>
> This is not correct, there is no fake link up status.
> Read the comment in genphy_update_link, only link-down events
> are latched. Means if the first read returns link up, then there
> is no need for a second read. And in polling mode we don't do a
> second read because we want to detect also short link drops.
> 
> It would be helpful if you could describe your actual problem
> and whether you use polling or interrupt mode.
> 

[   44.498633] hns3 0000:bd:00.1 eth5: net open
[   44.504273] hns3 0000:bd:00.1: reg=0x1, data=0x79ad -> called from phy_start_aneg
[   44.532348] hns3 0000:bd:00.1: reg=0x1, data=0x798d -> called from phy_state_machine,update link.

According to the datasheet:
reg 1.5=0 now, means copper auto-negotiation not complete
reg 1.2=1 now, means link is up

We can see that, when we read the link up, the auto-negotiation
is not complete yet, so the speed is invalid.

I don't know why this happen, maybe this state is keep from bios?
Or we may do something else in the phy initialize to fix it?
And also confuse that why read twice can fix it?

[   44.554063] hns3 0000:bd:00.1: invalid speed (-1)
[   44.560412] hns3 0000:bd:00.1 eth5: failed to adjust link.
[   45.194870] hns3 0000:bd:00.1 eth5: link up
[   45.574095] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
[   46.150051] hns3 0000:bd:00.1 eth5: link down
[   46.598074] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
[   47.622075] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79a9
[   48.646077] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
[   48.934050] hns3 0000:bd:00.1 eth5: link up
[   49.702140] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad

>> I add a fake status read, and can solve this problem.
>>
>> I also see that in genphy_update_link(), had delete the fake
>> read in polling mode, so I don't know whether my solution is
>> correct.
>>
>> Or provide a phydev->drv->read_status functions for the phy I
>> used is more acceptable?
>>
>> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
>> ---
>>  drivers/net/phy/phy.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index ef7aa73..0c03edc 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -1,4 +1,7 @@
>>  // SPDX-License-Identifier: GPL-2.0+
>> +	err = phy_read_status(phydev);
>> +	if (err)
>> +		return err;
> 
> This seems to be completely wrong at that place.
> 

Sorry, this can be ignore.

>>  /* Framework for configuring and reading PHY devices
>>   * Based on code in sungem_phy.c and gianfar_phy.c
>>   *
>> @@ -525,6 +528,11 @@ static int phy_check_link_status(struct phy_device *phydev)
>>  
>>  	WARN_ON(!mutex_is_locked(&phydev->lock));
>>  
>> +	/* Do a fake read */
>> +	err = phy_read(phydev, MII_BMSR);
>> +	if (err < 0)
>> +		return err;
>> +
>>  	err = phy_read_status(phydev);
>>  	if (err)
>>  		return err;
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-29  3:59   ` liuyonglong
@ 2019-07-29 20:57     ` Heiner Kallweit
  2019-07-30  4:03       ` liuyonglong
  0 siblings, 1 reply; 12+ messages in thread
From: Heiner Kallweit @ 2019-07-29 20:57 UTC (permalink / raw)
  To: liuyonglong, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose

On 29.07.2019 05:59, liuyonglong wrote:
> 
> 
> On 2019/7/27 2:14, Heiner Kallweit wrote:
>> On 26.07.2019 11:53, Yonglong Liu wrote:
>>> According to the datasheet of Marvell phy and Realtek phy, the
>>> copper link status should read twice, or it may get a fake link
>>> up status, and cause up->down->up at the first time when link up.
>>> This happens more oftem at Realtek phy.
>>>
>> This is not correct, there is no fake link up status.
>> Read the comment in genphy_update_link, only link-down events
>> are latched. Means if the first read returns link up, then there
>> is no need for a second read. And in polling mode we don't do a
>> second read because we want to detect also short link drops.
>>
>> It would be helpful if you could describe your actual problem
>> and whether you use polling or interrupt mode.
>>
> 
> [   44.498633] hns3 0000:bd:00.1 eth5: net open
> [   44.504273] hns3 0000:bd:00.1: reg=0x1, data=0x79ad -> called from phy_start_aneg
> [   44.532348] hns3 0000:bd:00.1: reg=0x1, data=0x798d -> called from phy_state_machine,update link.

This should not happen. The PHY indicates link up w/o having aneg finished.

> 
> According to the datasheet:
> reg 1.5=0 now, means copper auto-negotiation not complete
> reg 1.2=1 now, means link is up
> 
> We can see that, when we read the link up, the auto-negotiation
> is not complete yet, so the speed is invalid.
> 
> I don't know why this happen, maybe this state is keep from bios?
> Or we may do something else in the phy initialize to fix it?
> And also confuse that why read twice can fix it?
> 
I suppose that basically any delay would do.

> [   44.554063] hns3 0000:bd:00.1: invalid speed (-1)
> [   44.560412] hns3 0000:bd:00.1 eth5: failed to adjust link.
> [   45.194870] hns3 0000:bd:00.1 eth5: link up
> [   45.574095] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
> [   46.150051] hns3 0000:bd:00.1 eth5: link down
> [   46.598074] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
> [   47.622075] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79a9
> [   48.646077] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
> [   48.934050] hns3 0000:bd:00.1 eth5: link up
> [   49.702140] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
> 
>>> I add a fake status read, and can solve this problem.
>>>
>>> I also see that in genphy_update_link(), had delete the fake
>>> read in polling mode, so I don't know whether my solution is
>>> correct.
>>>

Can you test whether the following fixes the issue for you?
Also it would be interesting which exact PHY models you tested
and whether you built the respective PHY drivers or whether you
rely on the genphy driver. Best use the second patch to get the
needed info. It may make sense anyway to add the call to
phy_attached_info() to the hns3 driver.


diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 6b5cb87f3..fbecfe210 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1807,7 +1807,8 @@ int genphy_read_status(struct phy_device *phydev)
 
 	linkmode_zero(phydev->lp_advertising);
 
-	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
+	if (phydev->autoneg == AUTONEG_ENABLE &&
+	    (phydev->autoneg_complete || phydev->link)) {
 		if (phydev->is_gigabit_capable) {
 			lpagb = phy_read(phydev, MII_STAT1000);
 			if (lpagb < 0)
-- 
2.22.0


diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
index abb1b4385..dc4dfd460 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -231,6 +231,8 @@ int hclge_mac_connect_phy(struct hnae3_handle *handle)
 	linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
 			   phydev->advertising);
 
+	phy_attached_info(phydev);
+
 	return 0;
 }
 
-- 
2.22.0




>>> Or provide a phydev->drv->read_status functions for the phy I
>>> used is more acceptable?
>>>
>>> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
>>> ---
>>>  drivers/net/phy/phy.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>>> index ef7aa73..0c03edc 100644
>>> --- a/drivers/net/phy/phy.c
>>> +++ b/drivers/net/phy/phy.c
>>> @@ -1,4 +1,7 @@
>>>  // SPDX-License-Identifier: GPL-2.0+
>>> +	err = phy_read_status(phydev);
>>> +	if (err)
>>> +		return err;
>>
>> This seems to be completely wrong at that place.
>>
> 
> Sorry, this can be ignore.
> 
>>>  /* Framework for configuring and reading PHY devices
>>>   * Based on code in sungem_phy.c and gianfar_phy.c
>>>   *
>>> @@ -525,6 +528,11 @@ static int phy_check_link_status(struct phy_device *phydev)
>>>  
>>>  	WARN_ON(!mutex_is_locked(&phydev->lock));
>>>  
>>> +	/* Do a fake read */
>>> +	err = phy_read(phydev, MII_BMSR);
>>> +	if (err < 0)
>>> +		return err;
>>> +
>>>  	err = phy_read_status(phydev);
>>>  	if (err)
>>>  		return err;
>>>
>>
>>
>> .
>>
> 
> 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-29 20:57     ` Heiner Kallweit
@ 2019-07-30  4:03       ` liuyonglong
  2019-07-30  6:08         ` Heiner Kallweit
  0 siblings, 1 reply; 12+ messages in thread
From: liuyonglong @ 2019-07-30  4:03 UTC (permalink / raw)
  To: Heiner Kallweit, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose



On 2019/7/30 4:57, Heiner Kallweit wrote:
> On 29.07.2019 05:59, liuyonglong wrote:
>>
>>
>> On 2019/7/27 2:14, Heiner Kallweit wrote:
>>> On 26.07.2019 11:53, Yonglong Liu wrote:
>>>> According to the datasheet of Marvell phy and Realtek phy, the
>>>> copper link status should read twice, or it may get a fake link
>>>> up status, and cause up->down->up at the first time when link up.
>>>> This happens more oftem at Realtek phy.
>>>>
>>> This is not correct, there is no fake link up status.
>>> Read the comment in genphy_update_link, only link-down events
>>> are latched. Means if the first read returns link up, then there
>>> is no need for a second read. And in polling mode we don't do a
>>> second read because we want to detect also short link drops.
>>>
>>> It would be helpful if you could describe your actual problem
>>> and whether you use polling or interrupt mode.
>>>
>>
>> [   44.498633] hns3 0000:bd:00.1 eth5: net open
>> [   44.504273] hns3 0000:bd:00.1: reg=0x1, data=0x79ad -> called from phy_start_aneg
>> [   44.532348] hns3 0000:bd:00.1: reg=0x1, data=0x798d -> called from phy_state_machine,update link.
> 
> This should not happen. The PHY indicates link up w/o having aneg finished.
> 
>>
>> According to the datasheet:
>> reg 1.5=0 now, means copper auto-negotiation not complete
>> reg 1.2=1 now, means link is up
>>
>> We can see that, when we read the link up, the auto-negotiation
>> is not complete yet, so the speed is invalid.
>>
>> I don't know why this happen, maybe this state is keep from bios?
>> Or we may do something else in the phy initialize to fix it?
>> And also confuse that why read twice can fix it?
>>
> I suppose that basically any delay would do.
> 
>> [   44.554063] hns3 0000:bd:00.1: invalid speed (-1)
>> [   44.560412] hns3 0000:bd:00.1 eth5: failed to adjust link.
>> [   45.194870] hns3 0000:bd:00.1 eth5: link up
>> [   45.574095] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
>> [   46.150051] hns3 0000:bd:00.1 eth5: link down
>> [   46.598074] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
>> [   47.622075] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79a9
>> [   48.646077] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
>> [   48.934050] hns3 0000:bd:00.1 eth5: link up
>> [   49.702140] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
>>
>>>> I add a fake status read, and can solve this problem.
>>>>
>>>> I also see that in genphy_update_link(), had delete the fake
>>>> read in polling mode, so I don't know whether my solution is
>>>> correct.
>>>>
> 
> Can you test whether the following fixes the issue for you?
> Also it would be interesting which exact PHY models you tested
> and whether you built the respective PHY drivers or whether you
> rely on the genphy driver. Best use the second patch to get the
> needed info. It may make sense anyway to add the call to
> phy_attached_info() to the hns3 driver.
> 

[   40.100716] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: attached PHY driver [RTL8211F Gigabit Ethernet] (mii_bus:phy_addr=mii-0000:bd:00.3:07, irq=POLL)
[   40.932133] hns3 0000:bd:00.3 eth7: net open
[   40.932458] hns3 0000:bd:00.3: invalid speed (-1)
[   40.932541] 8021q: adding VLAN 0 to HW filter on device eth7
[   40.937149] hns3 0000:bd:00.3 eth7: failed to adjust link.

[   40.662050] Generic PHY mii-0000:bd:00.2:05: attached PHY driver [Generic PHY] (mii_bus:phy_addr=mii-0000:bd:00.2:05, irq=POLL)
[   41.563511] hns3 0000:bd:00.2 eth6: net open
[   41.563853] hns3 0000:bd:00.2: invalid speed (-1)
[   41.563943] 8021q: adding VLAN 0 to HW filter on device eth6
[   41.568578] IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready
[   41.568898] hns3 0000:bd:00.2 eth6: failed to adjust link.

I am using RTL8211F, but you can see that, both genphy driver and
RTL8211F driver have the same issue.

> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 6b5cb87f3..fbecfe210 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1807,7 +1807,8 @@ int genphy_read_status(struct phy_device *phydev)
>  
>  	linkmode_zero(phydev->lp_advertising);
>  
> -	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
> +	if (phydev->autoneg == AUTONEG_ENABLE &&
> +	    (phydev->autoneg_complete || phydev->link)) {
>  		if (phydev->is_gigabit_capable) {
>  			lpagb = phy_read(phydev, MII_STAT1000);
>  			if (lpagb < 0)
> 

I have try this patch, have no effect. I suppose that at this time,
the autoneg actually not complete yet.

Maybe the wrong phy state is passed from bios? Invalid speed just
happen at the first time when ethX up, after that, repeat the
ifconfig down/ifconfig up command can not see that again.

So I think the bios should power off the phy(writing reg 1.11 to 1)
before it starts the OS? Or any other way to fix this in the OS?




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-30  4:03       ` liuyonglong
@ 2019-07-30  6:08         ` Heiner Kallweit
  2019-07-30  6:35           ` liuyonglong
  0 siblings, 1 reply; 12+ messages in thread
From: Heiner Kallweit @ 2019-07-30  6:08 UTC (permalink / raw)
  To: liuyonglong, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose

On 30.07.2019 06:03, liuyonglong wrote:
> 
> 
> On 2019/7/30 4:57, Heiner Kallweit wrote:
>> On 29.07.2019 05:59, liuyonglong wrote:
>>>
>>>
>>> On 2019/7/27 2:14, Heiner Kallweit wrote:
>>>> On 26.07.2019 11:53, Yonglong Liu wrote:
>>>>> According to the datasheet of Marvell phy and Realtek phy, the
>>>>> copper link status should read twice, or it may get a fake link
>>>>> up status, and cause up->down->up at the first time when link up.
>>>>> This happens more oftem at Realtek phy.
>>>>>
>>>> This is not correct, there is no fake link up status.
>>>> Read the comment in genphy_update_link, only link-down events
>>>> are latched. Means if the first read returns link up, then there
>>>> is no need for a second read. And in polling mode we don't do a
>>>> second read because we want to detect also short link drops.
>>>>
>>>> It would be helpful if you could describe your actual problem
>>>> and whether you use polling or interrupt mode.
>>>>
>>>
>>> [   44.498633] hns3 0000:bd:00.1 eth5: net open
>>> [   44.504273] hns3 0000:bd:00.1: reg=0x1, data=0x79ad -> called from phy_start_aneg
>>> [   44.532348] hns3 0000:bd:00.1: reg=0x1, data=0x798d -> called from phy_state_machine,update link.
>>
>> This should not happen. The PHY indicates link up w/o having aneg finished.
>>
>>>
>>> According to the datasheet:
>>> reg 1.5=0 now, means copper auto-negotiation not complete
>>> reg 1.2=1 now, means link is up
>>>
>>> We can see that, when we read the link up, the auto-negotiation
>>> is not complete yet, so the speed is invalid.
>>>
>>> I don't know why this happen, maybe this state is keep from bios?
>>> Or we may do something else in the phy initialize to fix it?
>>> And also confuse that why read twice can fix it?
>>>
>> I suppose that basically any delay would do.
>>
>>> [   44.554063] hns3 0000:bd:00.1: invalid speed (-1)
>>> [   44.560412] hns3 0000:bd:00.1 eth5: failed to adjust link.
>>> [   45.194870] hns3 0000:bd:00.1 eth5: link up
>>> [   45.574095] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
>>> [   46.150051] hns3 0000:bd:00.1 eth5: link down
>>> [   46.598074] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
>>> [   47.622075] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79a9
>>> [   48.646077] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
>>> [   48.934050] hns3 0000:bd:00.1 eth5: link up
>>> [   49.702140] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
>>>
>>>>> I add a fake status read, and can solve this problem.
>>>>>
>>>>> I also see that in genphy_update_link(), had delete the fake
>>>>> read in polling mode, so I don't know whether my solution is
>>>>> correct.
>>>>>
>>
>> Can you test whether the following fixes the issue for you?
>> Also it would be interesting which exact PHY models you tested
>> and whether you built the respective PHY drivers or whether you
>> rely on the genphy driver. Best use the second patch to get the
>> needed info. It may make sense anyway to add the call to
>> phy_attached_info() to the hns3 driver.
>>
> 
> [   40.100716] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: attached PHY driver [RTL8211F Gigabit Ethernet] (mii_bus:phy_addr=mii-0000:bd:00.3:07, irq=POLL)
> [   40.932133] hns3 0000:bd:00.3 eth7: net open
> [   40.932458] hns3 0000:bd:00.3: invalid speed (-1)
> [   40.932541] 8021q: adding VLAN 0 to HW filter on device eth7
> [   40.937149] hns3 0000:bd:00.3 eth7: failed to adjust link.
> 
> [   40.662050] Generic PHY mii-0000:bd:00.2:05: attached PHY driver [Generic PHY] (mii_bus:phy_addr=mii-0000:bd:00.2:05, irq=POLL)
> [   41.563511] hns3 0000:bd:00.2 eth6: net open
> [   41.563853] hns3 0000:bd:00.2: invalid speed (-1)
> [   41.563943] 8021q: adding VLAN 0 to HW filter on device eth6
> [   41.568578] IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready
> [   41.568898] hns3 0000:bd:00.2 eth6: failed to adjust link.
> 
> I am using RTL8211F, but you can see that, both genphy driver and
> RTL8211F driver have the same issue.
> 
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 6b5cb87f3..fbecfe210 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -1807,7 +1807,8 @@ int genphy_read_status(struct phy_device *phydev)
>>  
>>  	linkmode_zero(phydev->lp_advertising);
>>  
>> -	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
>> +	if (phydev->autoneg == AUTONEG_ENABLE &&
>> +	    (phydev->autoneg_complete || phydev->link)) {
>>  		if (phydev->is_gigabit_capable) {
>>  			lpagb = phy_read(phydev, MII_STAT1000);
>>  			if (lpagb < 0)
>>
> 
> I have try this patch, have no effect. I suppose that at this time,
> the autoneg actually not complete yet.
> 
> Maybe the wrong phy state is passed from bios? Invalid speed just
> happen at the first time when ethX up, after that, repeat the
> ifconfig down/ifconfig up command can not see that again.
> 
> So I think the bios should power off the phy(writing reg 1.11 to 1)
> before it starts the OS? Or any other way to fix this in the OS?
> 
To get a better idea of whats going on it would be good to see a full
MDIO trace. Can you enable MDIO tracing via the following sysctl file
/sys/kernel/debug/tracing/events/mdio/enable
and provide the generated trace?

Due to polling mode each second entries will be generated, so you
better stop network after the issue occurred.

> 
> 
> 

Heiner

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-30  6:08         ` Heiner Kallweit
@ 2019-07-30  6:35           ` liuyonglong
  2019-07-30  6:39             ` liuyonglong
  2019-07-30 19:04             ` Heiner Kallweit
  0 siblings, 2 replies; 12+ messages in thread
From: liuyonglong @ 2019-07-30  6:35 UTC (permalink / raw)
  To: Heiner Kallweit, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose

:/sys/kernel/debug/tracing$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 45/45   #P:128
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
    kworker/64:2-1028  [064] ....   172.295687: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x02 val:0x001c
    kworker/64:2-1028  [064] ....   172.295726: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x03 val:0xc916
    kworker/64:2-1028  [064] ....   172.296902: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x01 val:0x79ad
    kworker/64:2-1028  [064] ....   172.296938: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x0f val:0x2000
    kworker/64:2-1028  [064] ....   172.321213: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x00 val:0x1040
    kworker/64:2-1028  [064] ....   172.343209: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x02 val:0x001c
    kworker/64:2-1028  [064] ....   172.343245: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x03 val:0xc916
    kworker/64:2-1028  [064] ....   172.343882: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
    kworker/64:2-1028  [064] ....   172.343918: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0f val:0x2000
    kworker/64:2-1028  [064] ....   172.362658: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
    kworker/64:2-1028  [064] ....   172.385961: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x02 val:0x001c
    kworker/64:2-1028  [064] ....   172.385996: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x03 val:0xc916
    kworker/64:2-1028  [064] ....   172.386646: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x01 val:0x79ad
    kworker/64:2-1028  [064] ....   172.386681: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x0f val:0x2000
    kworker/64:2-1028  [064] ....   172.411286: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x00 val:0x1040
    kworker/64:2-1028  [064] ....   172.433225: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x02 val:0x001c
    kworker/64:2-1028  [064] ....   172.433260: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x03 val:0xc916
    kworker/64:2-1028  [064] ....   172.433887: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
    kworker/64:2-1028  [064] ....   172.433922: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x0f val:0x2000
    kworker/64:2-1028  [064] ....   172.452862: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
        ifconfig-1324  [011] ....   177.325585: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
  kworker/u257:0-8     [012] ....   177.325642: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x04 val:0x01e1
  kworker/u257:0-8     [012] ....   177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
  kworker/u257:0-8     [012] ....   177.325708: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
  kworker/u257:0-8     [012] ....   177.325744: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
  kworker/u257:0-8     [012] ....   177.325779: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
  kworker/u257:0-8     [012] ....   177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
  kworker/u257:0-8     [012] ....   177.325843: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x798d
  kworker/u257:0-8     [003] ....   178.360488: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
  kworker/u257:0-8     [000] ....   179.384479: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
  kworker/u257:0-8     [000] ....   180.408477: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
  kworker/u257:0-8     [000] ....   181.432474: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79a9
  kworker/u257:0-8     [000] ....   181.432510: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0a val:0x7800
  kworker/u257:0-8     [000] ....   181.432546: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
  kworker/u257:0-8     [000] ....   181.432582: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x05 val:0xc1e1
  kworker/u257:0-8     [000] ....   182.456510: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
  kworker/u257:0-8     [000] ....   182.456546: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0a val:0x4800
  kworker/u257:0-8     [000] ....   182.456582: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
  kworker/u257:0-8     [000] ....   182.456618: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x05 val:0xc1e1
  kworker/u257:0-8     [001] ....   183.480476: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
  kworker/u257:0-8     [000] ....   184.504478: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
  kworker/u257:0-8     [000] ....   185.528486: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
  kworker/u257:0-8     [000] ....   186.552475: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
        ifconfig-1327  [011] ....   187.196036: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
        ifconfig-1327  [011] ....   187.196046: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1840



On 2019/7/30 14:08, Heiner Kallweit wrote:
> On 30.07.2019 06:03, liuyonglong wrote:
>>
>>
>> On 2019/7/30 4:57, Heiner Kallweit wrote:
>>> On 29.07.2019 05:59, liuyonglong wrote:
>>>>
>>>>
>>>> On 2019/7/27 2:14, Heiner Kallweit wrote:
>>>>> On 26.07.2019 11:53, Yonglong Liu wrote:
>>>>>> According to the datasheet of Marvell phy and Realtek phy, the
>>>>>> copper link status should read twice, or it may get a fake link
>>>>>> up status, and cause up->down->up at the first time when link up.
>>>>>> This happens more oftem at Realtek phy.
>>>>>>
>>>>> This is not correct, there is no fake link up status.
>>>>> Read the comment in genphy_update_link, only link-down events
>>>>> are latched. Means if the first read returns link up, then there
>>>>> is no need for a second read. And in polling mode we don't do a
>>>>> second read because we want to detect also short link drops.
>>>>>
>>>>> It would be helpful if you could describe your actual problem
>>>>> and whether you use polling or interrupt mode.
>>>>>
>>>>
>>>> [   44.498633] hns3 0000:bd:00.1 eth5: net open
>>>> [   44.504273] hns3 0000:bd:00.1: reg=0x1, data=0x79ad -> called from phy_start_aneg
>>>> [   44.532348] hns3 0000:bd:00.1: reg=0x1, data=0x798d -> called from phy_state_machine,update link.
>>>
>>> This should not happen. The PHY indicates link up w/o having aneg finished.
>>>
>>>>
>>>> According to the datasheet:
>>>> reg 1.5=0 now, means copper auto-negotiation not complete
>>>> reg 1.2=1 now, means link is up
>>>>
>>>> We can see that, when we read the link up, the auto-negotiation
>>>> is not complete yet, so the speed is invalid.
>>>>
>>>> I don't know why this happen, maybe this state is keep from bios?
>>>> Or we may do something else in the phy initialize to fix it?
>>>> And also confuse that why read twice can fix it?
>>>>
>>> I suppose that basically any delay would do.
>>>
>>>> [   44.554063] hns3 0000:bd:00.1: invalid speed (-1)
>>>> [   44.560412] hns3 0000:bd:00.1 eth5: failed to adjust link.
>>>> [   45.194870] hns3 0000:bd:00.1 eth5: link up
>>>> [   45.574095] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
>>>> [   46.150051] hns3 0000:bd:00.1 eth5: link down
>>>> [   46.598074] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
>>>> [   47.622075] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79a9
>>>> [   48.646077] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
>>>> [   48.934050] hns3 0000:bd:00.1 eth5: link up
>>>> [   49.702140] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
>>>>
>>>>>> I add a fake status read, and can solve this problem.
>>>>>>
>>>>>> I also see that in genphy_update_link(), had delete the fake
>>>>>> read in polling mode, so I don't know whether my solution is
>>>>>> correct.
>>>>>>
>>>
>>> Can you test whether the following fixes the issue for you?
>>> Also it would be interesting which exact PHY models you tested
>>> and whether you built the respective PHY drivers or whether you
>>> rely on the genphy driver. Best use the second patch to get the
>>> needed info. It may make sense anyway to add the call to
>>> phy_attached_info() to the hns3 driver.
>>>
>>
>> [   40.100716] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: attached PHY driver [RTL8211F Gigabit Ethernet] (mii_bus:phy_addr=mii-0000:bd:00.3:07, irq=POLL)
>> [   40.932133] hns3 0000:bd:00.3 eth7: net open
>> [   40.932458] hns3 0000:bd:00.3: invalid speed (-1)
>> [   40.932541] 8021q: adding VLAN 0 to HW filter on device eth7
>> [   40.937149] hns3 0000:bd:00.3 eth7: failed to adjust link.
>>
>> [   40.662050] Generic PHY mii-0000:bd:00.2:05: attached PHY driver [Generic PHY] (mii_bus:phy_addr=mii-0000:bd:00.2:05, irq=POLL)
>> [   41.563511] hns3 0000:bd:00.2 eth6: net open
>> [   41.563853] hns3 0000:bd:00.2: invalid speed (-1)
>> [   41.563943] 8021q: adding VLAN 0 to HW filter on device eth6
>> [   41.568578] IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready
>> [   41.568898] hns3 0000:bd:00.2 eth6: failed to adjust link.
>>
>> I am using RTL8211F, but you can see that, both genphy driver and
>> RTL8211F driver have the same issue.
>>
>>>
>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>> index 6b5cb87f3..fbecfe210 100644
>>> --- a/drivers/net/phy/phy_device.c
>>> +++ b/drivers/net/phy/phy_device.c
>>> @@ -1807,7 +1807,8 @@ int genphy_read_status(struct phy_device *phydev)
>>>  
>>>  	linkmode_zero(phydev->lp_advertising);
>>>  
>>> -	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
>>> +	if (phydev->autoneg == AUTONEG_ENABLE &&
>>> +	    (phydev->autoneg_complete || phydev->link)) {
>>>  		if (phydev->is_gigabit_capable) {
>>>  			lpagb = phy_read(phydev, MII_STAT1000);
>>>  			if (lpagb < 0)
>>>
>>
>> I have try this patch, have no effect. I suppose that at this time,
>> the autoneg actually not complete yet.
>>
>> Maybe the wrong phy state is passed from bios? Invalid speed just
>> happen at the first time when ethX up, after that, repeat the
>> ifconfig down/ifconfig up command can not see that again.
>>
>> So I think the bios should power off the phy(writing reg 1.11 to 1)
>> before it starts the OS? Or any other way to fix this in the OS?
>>
> To get a better idea of whats going on it would be good to see a full
> MDIO trace. Can you enable MDIO tracing via the following sysctl file
> /sys/kernel/debug/tracing/events/mdio/enable
> and provide the generated trace?
> 
> Due to polling mode each second entries will be generated, so you
> better stop network after the issue occurred.
> 
>>
>>
>>
> 
> Heiner
> 
> .
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-30  6:35           ` liuyonglong
@ 2019-07-30  6:39             ` liuyonglong
  2019-07-30 19:04             ` Heiner Kallweit
  1 sibling, 0 replies; 12+ messages in thread
From: liuyonglong @ 2019-07-30  6:39 UTC (permalink / raw)
  To: Heiner Kallweit, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose



On 2019/7/30 14:35, liuyonglong wrote:
> :/sys/kernel/debug/tracing$ cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 45/45   #P:128
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>     kworker/64:2-1028  [064] ....   172.295687: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.295726: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.296902: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.296938: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.321213: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x00 val:0x1040
>     kworker/64:2-1028  [064] ....   172.343209: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.343245: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.343882: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.343918: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.362658: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>     kworker/64:2-1028  [064] ....   172.385961: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.385996: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.386646: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.386681: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.411286: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x00 val:0x1040
>     kworker/64:2-1028  [064] ....   172.433225: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.433260: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.433887: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.433922: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.452862: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>         ifconfig-1324  [011] ....   177.325585: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>   kworker/u257:0-8     [012] ....   177.325642: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x04 val:0x01e1
>   kworker/u257:0-8     [012] ....   177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
>   kworker/u257:0-8     [012] ....   177.325708: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [012] ....   177.325744: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>   kworker/u257:0-8     [012] ....   177.325779: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>   kworker/u257:0-8     [012] ....   177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
>   kworker/u257:0-8     [012] ....   177.325843: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x798d
>   kworker/u257:0-8     [003] ....   178.360488: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
>   kworker/u257:0-8     [000] ....   179.384479: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
>   kworker/u257:0-8     [000] ....   180.408477: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
>   kworker/u257:0-8     [000] ....   181.432474: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79a9
>   kworker/u257:0-8     [000] ....   181.432510: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0a val:0x7800
>   kworker/u257:0-8     [000] ....   181.432546: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>   kworker/u257:0-8     [000] ....   181.432582: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x05 val:0xc1e1
>   kworker/u257:0-8     [000] ....   182.456510: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   182.456546: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0a val:0x4800
>   kworker/u257:0-8     [000] ....   182.456582: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>   kworker/u257:0-8     [000] ....   182.456618: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x05 val:0xc1e1
>   kworker/u257:0-8     [001] ....   183.480476: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   184.504478: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   185.528486: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   186.552475: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>         ifconfig-1327  [011] ....   187.196036: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>         ifconfig-1327  [011] ....   187.196046: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1840
> 

Add dmesg below:

[  177.325619] hns3 0000:bd:00.1 eth5: net open
[  177.325859] hns3 0000:bd:00.1: invalid speed (-1)
[  177.330180] 8021q: adding VLAN 0 to HW filter on device eth5
[  177.334569] hns3 0000:bd:00.1 eth5: failed to adjust link.
[  177.345674] IPv6: ADDRCONF(NETDEV_CHANGE): eth5: link becomes ready
[  178.021616] hns3 0000:bd:00.1 eth5: link up
[  178.808459] hns3 0000:bd:00.1 eth5: link down
[  182.808452] hns3 0000:bd:00.1 eth5: link up
[  187.191563] hns3 0000:bd:00.1 eth5: net stop
[  187.196049] hns3 0000:bd:00.1 eth5: link down


> 
> 
> On 2019/7/30 14:08, Heiner Kallweit wrote:
>> On 30.07.2019 06:03, liuyonglong wrote:
>>>
>>>
>>> On 2019/7/30 4:57, Heiner Kallweit wrote:
>>>> On 29.07.2019 05:59, liuyonglong wrote:
>>>>>
>>>>>
>>>>> On 2019/7/27 2:14, Heiner Kallweit wrote:
>>>>>> On 26.07.2019 11:53, Yonglong Liu wrote:
>>>>>>> According to the datasheet of Marvell phy and Realtek phy, the
>>>>>>> copper link status should read twice, or it may get a fake link
>>>>>>> up status, and cause up->down->up at the first time when link up.
>>>>>>> This happens more oftem at Realtek phy.
>>>>>>>
>>>>>> This is not correct, there is no fake link up status.
>>>>>> Read the comment in genphy_update_link, only link-down events
>>>>>> are latched. Means if the first read returns link up, then there
>>>>>> is no need for a second read. And in polling mode we don't do a
>>>>>> second read because we want to detect also short link drops.
>>>>>>
>>>>>> It would be helpful if you could describe your actual problem
>>>>>> and whether you use polling or interrupt mode.
>>>>>>
>>>>>
>>>>> [   44.498633] hns3 0000:bd:00.1 eth5: net open
>>>>> [   44.504273] hns3 0000:bd:00.1: reg=0x1, data=0x79ad -> called from phy_start_aneg
>>>>> [   44.532348] hns3 0000:bd:00.1: reg=0x1, data=0x798d -> called from phy_state_machine,update link.
>>>>
>>>> This should not happen. The PHY indicates link up w/o having aneg finished.
>>>>
>>>>>
>>>>> According to the datasheet:
>>>>> reg 1.5=0 now, means copper auto-negotiation not complete
>>>>> reg 1.2=1 now, means link is up
>>>>>
>>>>> We can see that, when we read the link up, the auto-negotiation
>>>>> is not complete yet, so the speed is invalid.
>>>>>
>>>>> I don't know why this happen, maybe this state is keep from bios?
>>>>> Or we may do something else in the phy initialize to fix it?
>>>>> And also confuse that why read twice can fix it?
>>>>>
>>>> I suppose that basically any delay would do.
>>>>
>>>>> [   44.554063] hns3 0000:bd:00.1: invalid speed (-1)
>>>>> [   44.560412] hns3 0000:bd:00.1 eth5: failed to adjust link.
>>>>> [   45.194870] hns3 0000:bd:00.1 eth5: link up
>>>>> [   45.574095] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
>>>>> [   46.150051] hns3 0000:bd:00.1 eth5: link down
>>>>> [   46.598074] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
>>>>> [   47.622075] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79a9
>>>>> [   48.646077] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
>>>>> [   48.934050] hns3 0000:bd:00.1 eth5: link up
>>>>> [   49.702140] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
>>>>>
>>>>>>> I add a fake status read, and can solve this problem.
>>>>>>>
>>>>>>> I also see that in genphy_update_link(), had delete the fake
>>>>>>> read in polling mode, so I don't know whether my solution is
>>>>>>> correct.
>>>>>>>
>>>>
>>>> Can you test whether the following fixes the issue for you?
>>>> Also it would be interesting which exact PHY models you tested
>>>> and whether you built the respective PHY drivers or whether you
>>>> rely on the genphy driver. Best use the second patch to get the
>>>> needed info. It may make sense anyway to add the call to
>>>> phy_attached_info() to the hns3 driver.
>>>>
>>>
>>> [   40.100716] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: attached PHY driver [RTL8211F Gigabit Ethernet] (mii_bus:phy_addr=mii-0000:bd:00.3:07, irq=POLL)
>>> [   40.932133] hns3 0000:bd:00.3 eth7: net open
>>> [   40.932458] hns3 0000:bd:00.3: invalid speed (-1)
>>> [   40.932541] 8021q: adding VLAN 0 to HW filter on device eth7
>>> [   40.937149] hns3 0000:bd:00.3 eth7: failed to adjust link.
>>>
>>> [   40.662050] Generic PHY mii-0000:bd:00.2:05: attached PHY driver [Generic PHY] (mii_bus:phy_addr=mii-0000:bd:00.2:05, irq=POLL)
>>> [   41.563511] hns3 0000:bd:00.2 eth6: net open
>>> [   41.563853] hns3 0000:bd:00.2: invalid speed (-1)
>>> [   41.563943] 8021q: adding VLAN 0 to HW filter on device eth6
>>> [   41.568578] IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready
>>> [   41.568898] hns3 0000:bd:00.2 eth6: failed to adjust link.
>>>
>>> I am using RTL8211F, but you can see that, both genphy driver and
>>> RTL8211F driver have the same issue.
>>>
>>>>
>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>> index 6b5cb87f3..fbecfe210 100644
>>>> --- a/drivers/net/phy/phy_device.c
>>>> +++ b/drivers/net/phy/phy_device.c
>>>> @@ -1807,7 +1807,8 @@ int genphy_read_status(struct phy_device *phydev)
>>>>  
>>>>  	linkmode_zero(phydev->lp_advertising);
>>>>  
>>>> -	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
>>>> +	if (phydev->autoneg == AUTONEG_ENABLE &&
>>>> +	    (phydev->autoneg_complete || phydev->link)) {
>>>>  		if (phydev->is_gigabit_capable) {
>>>>  			lpagb = phy_read(phydev, MII_STAT1000);
>>>>  			if (lpagb < 0)
>>>>
>>>
>>> I have try this patch, have no effect. I suppose that at this time,
>>> the autoneg actually not complete yet.
>>>
>>> Maybe the wrong phy state is passed from bios? Invalid speed just
>>> happen at the first time when ethX up, after that, repeat the
>>> ifconfig down/ifconfig up command can not see that again.
>>>
>>> So I think the bios should power off the phy(writing reg 1.11 to 1)
>>> before it starts the OS? Or any other way to fix this in the OS?
>>>
>> To get a better idea of whats going on it would be good to see a full
>> MDIO trace. Can you enable MDIO tracing via the following sysctl file
>> /sys/kernel/debug/tracing/events/mdio/enable
>> and provide the generated trace?
>>
>> Due to polling mode each second entries will be generated, so you
>> better stop network after the issue occurred.
>>
>>>
>>>
>>>
>>
>> Heiner
>>
>> .
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-30  6:35           ` liuyonglong
  2019-07-30  6:39             ` liuyonglong
@ 2019-07-30 19:04             ` Heiner Kallweit
  2019-07-31  3:33               ` liuyonglong
  1 sibling, 1 reply; 12+ messages in thread
From: Heiner Kallweit @ 2019-07-30 19:04 UTC (permalink / raw)
  To: liuyonglong, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose

On 30.07.2019 08:35, liuyonglong wrote:
> :/sys/kernel/debug/tracing$ cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 45/45   #P:128
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>     kworker/64:2-1028  [064] ....   172.295687: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.295726: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.296902: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.296938: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.321213: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x00 val:0x1040
>     kworker/64:2-1028  [064] ....   172.343209: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.343245: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.343882: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.343918: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.362658: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>     kworker/64:2-1028  [064] ....   172.385961: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.385996: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.386646: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.386681: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.411286: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x00 val:0x1040
>     kworker/64:2-1028  [064] ....   172.433225: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.433260: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.433887: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.433922: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.452862: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>         ifconfig-1324  [011] ....   177.325585: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>   kworker/u257:0-8     [012] ....   177.325642: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x04 val:0x01e1
>   kworker/u257:0-8     [012] ....   177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
>   kworker/u257:0-8     [012] ....   177.325708: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [012] ....   177.325744: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>   kworker/u257:0-8     [012] ....   177.325779: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>   kworker/u257:0-8     [012] ....   177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
>   kworker/u257:0-8     [012] ....   177.325843: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x798d

What I think that happens here:
Writing 0x1240 to BMCR starts aneg. When reading BMSR immediately after that then the PHY seems to have cleared
the "aneg complete" bit already, but not yet the "link up" bit. This results in the false "link up" notification.
The following patch is based on the fact that in case of enabled aneg we can't have a valid link if aneg isn't
finished. Could you please test whether this works for you?

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 6b5cb87f3..7ddd91df9 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev)
 	phydev->link = status & BMSR_LSTATUS ? 1 : 0;
 	phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
 
+	/* Consider the case that autoneg was started and "aneg complete"
+	 * bit has been reset, but "link up" bit not yet.
+	 */
+	if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete)
+		phydev->link = 0;
+
 	return 0;
 }
 EXPORT_SYMBOL(genphy_update_link);
-- 
2.22.0





>   kworker/u257:0-8     [003] ....   178.360488: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
>   kworker/u257:0-8     [000] ....   179.384479: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
>   kworker/u257:0-8     [000] ....   180.408477: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
>   kworker/u257:0-8     [000] ....   181.432474: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79a9
>   kworker/u257:0-8     [000] ....   181.432510: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0a val:0x7800
>   kworker/u257:0-8     [000] ....   181.432546: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>   kworker/u257:0-8     [000] ....   181.432582: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x05 val:0xc1e1
>   kworker/u257:0-8     [000] ....   182.456510: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   182.456546: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0a val:0x4800
>   kworker/u257:0-8     [000] ....   182.456582: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>   kworker/u257:0-8     [000] ....   182.456618: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x05 val:0xc1e1
>   kworker/u257:0-8     [001] ....   183.480476: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   184.504478: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   185.528486: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   186.552475: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>         ifconfig-1327  [011] ....   187.196036: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>         ifconfig-1327  [011] ....   187.196046: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1840
> 
> 
[...]
>>
>>>
>>>
>>>
>>
>> Heiner
>>
>> .
>>
> 
> .
> 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-30 19:04             ` Heiner Kallweit
@ 2019-07-31  3:33               ` liuyonglong
  2019-07-31  5:44                 ` Heiner Kallweit
  0 siblings, 1 reply; 12+ messages in thread
From: liuyonglong @ 2019-07-31  3:33 UTC (permalink / raw)
  To: Heiner Kallweit, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose



On 2019/7/31 3:04, Heiner Kallweit wrote:
> On 30.07.2019 08:35, liuyonglong wrote:
>> :/sys/kernel/debug/tracing$ cat trace
>> # tracer: nop
>> #
>> # entries-in-buffer/entries-written: 45/45   #P:128
>> #
>> #                              _-----=> irqs-off
>> #                             / _----=> need-resched
>> #                            | / _---=> hardirq/softirq
>> #                            || / _--=> preempt-depth
>> #                            ||| /     delay
>> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>> #              | |       |   ||||       |         |
>>     kworker/64:2-1028  [064] ....   172.295687: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x02 val:0x001c
>>     kworker/64:2-1028  [064] ....   172.295726: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x03 val:0xc916
>>     kworker/64:2-1028  [064] ....   172.296902: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x01 val:0x79ad
>>     kworker/64:2-1028  [064] ....   172.296938: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x0f val:0x2000
>>     kworker/64:2-1028  [064] ....   172.321213: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x00 val:0x1040
>>     kworker/64:2-1028  [064] ....   172.343209: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x02 val:0x001c
>>     kworker/64:2-1028  [064] ....   172.343245: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x03 val:0xc916
>>     kworker/64:2-1028  [064] ....   172.343882: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>>     kworker/64:2-1028  [064] ....   172.343918: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0f val:0x2000
>>     kworker/64:2-1028  [064] ....   172.362658: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>>     kworker/64:2-1028  [064] ....   172.385961: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x02 val:0x001c
>>     kworker/64:2-1028  [064] ....   172.385996: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x03 val:0xc916
>>     kworker/64:2-1028  [064] ....   172.386646: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x01 val:0x79ad
>>     kworker/64:2-1028  [064] ....   172.386681: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x0f val:0x2000
>>     kworker/64:2-1028  [064] ....   172.411286: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x00 val:0x1040
>>     kworker/64:2-1028  [064] ....   172.433225: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x02 val:0x001c
>>     kworker/64:2-1028  [064] ....   172.433260: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x03 val:0xc916
>>     kworker/64:2-1028  [064] ....   172.433887: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>>     kworker/64:2-1028  [064] ....   172.433922: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x0f val:0x2000
>>     kworker/64:2-1028  [064] ....   172.452862: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>>         ifconfig-1324  [011] ....   177.325585: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>>   kworker/u257:0-8     [012] ....   177.325642: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x04 val:0x01e1
>>   kworker/u257:0-8     [012] ....   177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
>>   kworker/u257:0-8     [012] ....   177.325708: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>>   kworker/u257:0-8     [012] ....   177.325744: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>>   kworker/u257:0-8     [012] ....   177.325779: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>>   kworker/u257:0-8     [012] ....   177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
>>   kworker/u257:0-8     [012] ....   177.325843: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x798d
> 
> What I think that happens here:
> Writing 0x1240 to BMCR starts aneg. When reading BMSR immediately after that then the PHY seems to have cleared
> the "aneg complete" bit already, but not yet the "link up" bit. This results in the false "link up" notification.
> The following patch is based on the fact that in case of enabled aneg we can't have a valid link if aneg isn't
> finished. Could you please test whether this works for you?
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 6b5cb87f3..7ddd91df9 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev)
>  	phydev->link = status & BMSR_LSTATUS ? 1 : 0;
>  	phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
>  
> +	/* Consider the case that autoneg was started and "aneg complete"
> +	 * bit has been reset, but "link up" bit not yet.
> +	 */
> +	if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete)
> +		phydev->link = 0;
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL(genphy_update_link);
> 

This patch can solve the issue! Will it be upstream?

So it's nothing to do with the bios, and just the PHY's own behavior,
the "link up" bit can not reset immediately,right?

ps: It will take 1 second more to make the link up for RTL8211F when 0x798d happend.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-31  3:33               ` liuyonglong
@ 2019-07-31  5:44                 ` Heiner Kallweit
  2019-07-31  5:58                   ` liuyonglong
  0 siblings, 1 reply; 12+ messages in thread
From: Heiner Kallweit @ 2019-07-31  5:44 UTC (permalink / raw)
  To: liuyonglong, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose

On 31.07.2019 05:33, liuyonglong wrote:
> 
> 
> On 2019/7/31 3:04, Heiner Kallweit wrote:
>> On 30.07.2019 08:35, liuyonglong wrote:
>>> :/sys/kernel/debug/tracing$ cat trace
>>> # tracer: nop
>>> #
>>> # entries-in-buffer/entries-written: 45/45   #P:128
>>> #
>>> #                              _-----=> irqs-off
>>> #                             / _----=> need-resched
>>> #                            | / _---=> hardirq/softirq
>>> #                            || / _--=> preempt-depth
>>> #                            ||| /     delay
>>> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>> #              | |       |   ||||       |         |
>>>     kworker/64:2-1028  [064] ....   172.295687: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x02 val:0x001c
>>>     kworker/64:2-1028  [064] ....   172.295726: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x03 val:0xc916
>>>     kworker/64:2-1028  [064] ....   172.296902: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x01 val:0x79ad
>>>     kworker/64:2-1028  [064] ....   172.296938: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x0f val:0x2000
>>>     kworker/64:2-1028  [064] ....   172.321213: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x00 val:0x1040
>>>     kworker/64:2-1028  [064] ....   172.343209: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x02 val:0x001c
>>>     kworker/64:2-1028  [064] ....   172.343245: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x03 val:0xc916
>>>     kworker/64:2-1028  [064] ....   172.343882: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>>>     kworker/64:2-1028  [064] ....   172.343918: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0f val:0x2000
>>>     kworker/64:2-1028  [064] ....   172.362658: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>>>     kworker/64:2-1028  [064] ....   172.385961: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x02 val:0x001c
>>>     kworker/64:2-1028  [064] ....   172.385996: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x03 val:0xc916
>>>     kworker/64:2-1028  [064] ....   172.386646: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x01 val:0x79ad
>>>     kworker/64:2-1028  [064] ....   172.386681: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x0f val:0x2000
>>>     kworker/64:2-1028  [064] ....   172.411286: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x00 val:0x1040
>>>     kworker/64:2-1028  [064] ....   172.433225: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x02 val:0x001c
>>>     kworker/64:2-1028  [064] ....   172.433260: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x03 val:0xc916
>>>     kworker/64:2-1028  [064] ....   172.433887: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>>>     kworker/64:2-1028  [064] ....   172.433922: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x0f val:0x2000
>>>     kworker/64:2-1028  [064] ....   172.452862: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>>>         ifconfig-1324  [011] ....   177.325585: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>>>   kworker/u257:0-8     [012] ....   177.325642: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x04 val:0x01e1
>>>   kworker/u257:0-8     [012] ....   177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
>>>   kworker/u257:0-8     [012] ....   177.325708: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>>>   kworker/u257:0-8     [012] ....   177.325744: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>>>   kworker/u257:0-8     [012] ....   177.325779: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>>>   kworker/u257:0-8     [012] ....   177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
>>>   kworker/u257:0-8     [012] ....   177.325843: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x798d
>>
>> What I think that happens here:
>> Writing 0x1240 to BMCR starts aneg. When reading BMSR immediately after that then the PHY seems to have cleared
>> the "aneg complete" bit already, but not yet the "link up" bit. This results in the false "link up" notification.
>> The following patch is based on the fact that in case of enabled aneg we can't have a valid link if aneg isn't
>> finished. Could you please test whether this works for you?
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 6b5cb87f3..7ddd91df9 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev)
>>  	phydev->link = status & BMSR_LSTATUS ? 1 : 0;
>>  	phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
>>  
>> +	/* Consider the case that autoneg was started and "aneg complete"
>> +	 * bit has been reset, but "link up" bit not yet.
>> +	 */
>> +	if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete)
>> +		phydev->link = 0;
>> +
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL(genphy_update_link);
>>
> 
> This patch can solve the issue! Will it be upstream?
> 
I'll check for side effects, but in general: yes.

> So it's nothing to do with the bios, and just the PHY's own behavior,
> the "link up" bit can not reset immediately,right?
> 
Yes, it's the PHY's own behavior, and to a certain extent it may depend on speed
of the MDIO bus. At least few network chips require a delay of several microseconds
after each MDIO bus access. This may be sufficient for the PHY to reset the
link-up bit in time.

> ps: It will take 1 second more to make the link up for RTL8211F when 0x798d happend.
> 
In polling mode link-up is detected up to 1s after it happened.
You could switch to interrupt mode to reduce the aneg time.

> 
> 
Heiner

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
  2019-07-31  5:44                 ` Heiner Kallweit
@ 2019-07-31  5:58                   ` liuyonglong
  0 siblings, 0 replies; 12+ messages in thread
From: liuyonglong @ 2019-07-31  5:58 UTC (permalink / raw)
  To: Heiner Kallweit, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang, shiju.jose



On 2019/7/31 13:44, Heiner Kallweit wrote:
> On 31.07.2019 05:33, liuyonglong wrote:
>>
>>
>> On 2019/7/31 3:04, Heiner Kallweit wrote:
>>> On 30.07.2019 08:35, liuyonglong wrote:
>>>> :/sys/kernel/debug/tracing$ cat trace
>>>> # tracer: nop
>>>> #
>>>> # entries-in-buffer/entries-written: 45/45   #P:128
>>>> #
>>>> #                              _-----=> irqs-off
>>>> #                             / _----=> need-resched
>>>> #                            | / _---=> hardirq/softirq
>>>> #                            || / _--=> preempt-depth
>>>> #                            ||| /     delay
>>>> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>>> #              | |       |   ||||       |         |
>>>>     kworker/64:2-1028  [064] ....   172.295687: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x02 val:0x001c
>>>>     kworker/64:2-1028  [064] ....   172.295726: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x03 val:0xc916
>>>>     kworker/64:2-1028  [064] ....   172.296902: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x01 val:0x79ad
>>>>     kworker/64:2-1028  [064] ....   172.296938: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x0f val:0x2000
>>>>     kworker/64:2-1028  [064] ....   172.321213: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x00 val:0x1040
>>>>     kworker/64:2-1028  [064] ....   172.343209: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x02 val:0x001c
>>>>     kworker/64:2-1028  [064] ....   172.343245: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x03 val:0xc916
>>>>     kworker/64:2-1028  [064] ....   172.343882: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>>>>     kworker/64:2-1028  [064] ....   172.343918: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0f val:0x2000
>>>>     kworker/64:2-1028  [064] ....   172.362658: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>>>>     kworker/64:2-1028  [064] ....   172.385961: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x02 val:0x001c
>>>>     kworker/64:2-1028  [064] ....   172.385996: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x03 val:0xc916
>>>>     kworker/64:2-1028  [064] ....   172.386646: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x01 val:0x79ad
>>>>     kworker/64:2-1028  [064] ....   172.386681: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x0f val:0x2000
>>>>     kworker/64:2-1028  [064] ....   172.411286: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x00 val:0x1040
>>>>     kworker/64:2-1028  [064] ....   172.433225: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x02 val:0x001c
>>>>     kworker/64:2-1028  [064] ....   172.433260: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x03 val:0xc916
>>>>     kworker/64:2-1028  [064] ....   172.433887: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>>>>     kworker/64:2-1028  [064] ....   172.433922: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x0f val:0x2000
>>>>     kworker/64:2-1028  [064] ....   172.452862: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>>>>         ifconfig-1324  [011] ....   177.325585: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>>>>   kworker/u257:0-8     [012] ....   177.325642: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x04 val:0x01e1
>>>>   kworker/u257:0-8     [012] ....   177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
>>>>   kworker/u257:0-8     [012] ....   177.325708: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>>>>   kworker/u257:0-8     [012] ....   177.325744: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>>>>   kworker/u257:0-8     [012] ....   177.325779: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>>>>   kworker/u257:0-8     [012] ....   177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
>>>>   kworker/u257:0-8     [012] ....   177.325843: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x798d
>>>
>>> What I think that happens here:
>>> Writing 0x1240 to BMCR starts aneg. When reading BMSR immediately after that then the PHY seems to have cleared
>>> the "aneg complete" bit already, but not yet the "link up" bit. This results in the false "link up" notification.
>>> The following patch is based on the fact that in case of enabled aneg we can't have a valid link if aneg isn't
>>> finished. Could you please test whether this works for you?
>>>
>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>> index 6b5cb87f3..7ddd91df9 100644
>>> --- a/drivers/net/phy/phy_device.c
>>> +++ b/drivers/net/phy/phy_device.c
>>> @@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev)
>>>  	phydev->link = status & BMSR_LSTATUS ? 1 : 0;
>>>  	phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
>>>  
>>> +	/* Consider the case that autoneg was started and "aneg complete"
>>> +	 * bit has been reset, but "link up" bit not yet.
>>> +	 */
>>> +	if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete)
>>> +		phydev->link = 0;
>>> +
>>>  	return 0;
>>>  }
>>>  EXPORT_SYMBOL(genphy_update_link);
>>>
>>
>> This patch can solve the issue! Will it be upstream?
>>
> I'll check for side effects, but in general: yes.
> 
>> So it's nothing to do with the bios, and just the PHY's own behavior,
>> the "link up" bit can not reset immediately,right?
>>
> Yes, it's the PHY's own behavior, and to a certain extent it may depend on speed
> of the MDIO bus. At least few network chips require a delay of several microseconds
> after each MDIO bus access. This may be sufficient for the PHY to reset the
> link-up bit in time.
> 
>> ps: It will take 1 second more to make the link up for RTL8211F when 0x798d happend.
>>
> In polling mode link-up is detected up to 1s after it happened.
> You could switch to interrupt mode to reduce the aneg time.
> 
>>
>>
> Heiner
> 
> .
> 

Thanks very much!


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-07-31  5:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-26  9:53 [RFC] net: phy: read link status twice when phy_check_link_status() Yonglong Liu
2019-07-26 18:14 ` Heiner Kallweit
2019-07-29  3:59   ` liuyonglong
2019-07-29 20:57     ` Heiner Kallweit
2019-07-30  4:03       ` liuyonglong
2019-07-30  6:08         ` Heiner Kallweit
2019-07-30  6:35           ` liuyonglong
2019-07-30  6:39             ` liuyonglong
2019-07-30 19:04             ` Heiner Kallweit
2019-07-31  3:33               ` liuyonglong
2019-07-31  5:44                 ` Heiner Kallweit
2019-07-31  5:58                   ` liuyonglong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).