On Sat, 20 Jul 2019, Heiner Kallweit wrote: > On 19.07.2019 23:12, Thomas Voegtle wrote: >> On Fri, 19 Jul 2019, Heiner Kallweit wrote: >> >>> On 18.07.2019 20:50, Thomas Voegtle wrote: >>>> >>>> Hello, >>>> >>>> I'm having network problems with the commits on r8169 since v5.2. There are ping packet loss, sometimes 100%, sometimes 50%. In the end network is unusable. >>>> >>>> v5.2 is fine, I bisected it down to: >>>> >>>> a2928d28643e3c064ff41397281d20c445525032 is the first bad commit >>>> commit a2928d28643e3c064ff41397281d20c445525032 >>>> Author: Heiner Kallweit >>>> Date:   Sun Jun 2 10:53:49 2019 +0200 >>>> >>>>     r8169: use paged versions of phylib MDIO access functions >>>> >>>>     Use paged versions of phylib MDIO access functions to simplify >>>>     the code. >>>> >>>>     Signed-off-by: Heiner Kallweit >>>>     Signed-off-by: David S. Miller >>>> >>>> >>>> Reverting that commit on top of v5.2-11564-g22051d9c4a57 fixes the problem >>>> for me (had to adjust the renaming to r8169_main.c). >>>> >>>> I have a: >>>> 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. >>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev >>>> 0c) >>>>         Subsystem: Biostar Microtech Int'l Corp Device [1565:2400] >>>>         Kernel driver in use: r8169 >>>> >>>> on a BIOSTAR H81MG motherboard. >>>> >>> Interesting. I have the same chip version (RTL8168g) and can't reproduce >>> the issue. Can you provide a full dmesg output and test the patch below >>> on top of linux-next? I'd be interested in the WARN_ON stack traces >>> (if any) and would like to know whether the experimental change to >>> __phy_modify_changed helps. >>> >>>> >>>> greetings, >>>> >>>>   Thomas >>>> >>>> >>> Heiner >>> >>> >>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c >>> index 8d7dd4c5f..26be73000 100644 >>> --- a/drivers/net/ethernet/realtek/r8169_main.c >>> +++ b/drivers/net/ethernet/realtek/r8169_main.c >>> @@ -1934,6 +1934,8 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp) >>>     struct phy_device *phydev = tp->phydev; >>>     int ret; >>> >>> +    WARN_ON(phy_read(phydev, 0x1f)); >>> + >>>     switch (tp->mac_version) { >>>     case RTL_GIGA_MAC_VER_34: >>>     case RTL_GIGA_MAC_VER_35: >>> @@ -1957,6 +1959,8 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp) >>>     struct phy_device *phydev = tp->phydev; >>>     int ret; >>> >>> +    WARN_ON(phy_read(phydev, 0x1f)); >>> + >>>     switch (tp->mac_version) { >>>     case RTL_GIGA_MAC_VER_34: >>>     case RTL_GIGA_MAC_VER_35: >>> @@ -1980,6 +1984,8 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp) >>>     struct phy_device *phydev = tp->phydev; >>>     int ret; >>> >>> +    WARN_ON(phy_read(phydev, 0x1f)); >>> + >>>     switch (tp->mac_version) { >>>     case RTL_GIGA_MAC_VER_34: >>>     case RTL_GIGA_MAC_VER_35: >>> @@ -2003,6 +2009,8 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, int val) >>>     struct phy_device *phydev = tp->phydev; >>>     int ret = 0; >>> >>> +    WARN_ON(phy_read(phydev, 0x1f)); >>> + >>>     switch (tp->mac_version) { >>>     case RTL_GIGA_MAC_VER_34: >>>     case RTL_GIGA_MAC_VER_35: >>> diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c >>> index 16667fbac..1aa1142b8 100644 >>> --- a/drivers/net/phy/phy-core.c >>> +++ b/drivers/net/phy/phy-core.c >>> @@ -463,12 +463,10 @@ int __phy_modify_changed(struct phy_device *phydev, u32 regnum, u16 mask, >>>         return ret; >>> >>>     new = (ret & ~mask) | set; >>> -    if (new == ret) >>> -        return 0; >>> >>> -    ret = __phy_write(phydev, regnum, new); >>> +    __phy_write(phydev, regnum, new); >>> >>> -    return ret < 0 ? ret : 1; >>> +    return new != ret; >>> } >>> EXPORT_SYMBOL_GPL(__phy_modify_changed); >>> >>> >> >> Took your patch on top of next-20190719. >> See attached dmesg. >> It didn't work. Same thing, lots of ping drops, no usable network. >> >> like that: >> 44 packets transmitted, 2 received, 95% packet loss, time 44005ms >> >> >> Maybe important: >> I build a kernel with no modules. >> >> I have to power off when I booted a kernel which doesn't work, a (soft) reboot into a older kernel (e.g. 4.9.y)  doesn't >> fix the problem. Powering off and on does. >> > > Then, what you could do is reversing the hunks of the patch step by step. > Or make them separate patches and bisect. > Relevant are the hunks from point 1 and 2. > > 1. first 5 hunks (I don't think you have to reverse them individually) > EEE-related > > 2. rtl8168g_disable_aldps, rtl8168g_phy_adjust_10m_aldps, rtl8168g_1_hw_phy_config > all of these hunks are in the path for RTL8168g > > 3. rtl8168h_1_hw_phy_config, rtl8168h_2_hw_phy_config, rtl8168ep_1_hw_phy_config, > rtl8168ep_2_hw_phy_config > not in the path for RTL8168g > this is the minimal revert: diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index efef5453b94f..267995a614b5 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -3249,12 +3249,14 @@ static void rtl8168g_1_hw_phy_config(struct rtl8169_private *tp) else phy_modify_paged(tp->phydev, 0x0bcc, 0x12, 0, BIT(15)); - ret = phy_read_paged(tp->phydev, 0x0a46, 0x13); - if (ret & BIT(8)) - phy_modify_paged(tp->phydev, 0x0c41, 0x12, 0, BIT(1)); - else - phy_modify_paged(tp->phydev, 0x0c41, 0x12, BIT(1), 0); - + rtl_writephy(tp, 0x1f, 0x0a46); + if (rtl_readphy(tp, 0x13) & 0x0100) { + rtl_writephy(tp, 0x1f, 0x0c41); + rtl_w0w1_phy(tp, 0x15, 0x0002, 0x0000); + } else { + rtl_writephy(tp, 0x1f, 0x0c41); + rtl_w0w1_phy(tp, 0x15, 0x0000, 0x0002); + } /* Enable PHY auto speed down */ phy_modify_paged(tp->phydev, 0x0a44, 0x11, 0, BIT(3) | BIT(2)); Could it be, that there is just a typo? if (ret & BIT(8)) - phy_modify_paged(tp->phydev, 0x0c41, 0x12, 0, BIT(1)); + phy_modify_paged(tp->phydev, 0x0c41, 0x15, 0, BIT(1)); else - phy_modify_paged(tp->phydev, 0x0c41, 0x12, BIT(1), 0); + phy_modify_paged(tp->phydev, 0x0c41, 0x15, BIT(1), 0); greetings, Thomas