* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
@ 2018-09-28 15:54 Maciej S. Szmigiero
2018-09-28 22:00 ` Chris Clayton
0 siblings, 1 reply; 22+ messages in thread
From: Maciej S. Szmigiero @ 2018-09-28 15:54 UTC (permalink / raw)
To: Chris Clayton
Cc: David S. Miller, Azat Khuzhin, Greg Kroah-Hartman,
Heiner Kallweit, Realtek linux nic maintainers, linux-kernel
Hi,
> Hi,
>
> I upgraded my kernel to 4.18.10 recently and have since been experiencing network problems after resuming from a
> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>
> The pattern of the problem is that when I first boot, the network is fine. But, after resume from suspend I find that
> the time taken for a ping of one of my ISP's nameservers increases from 14-15ms to more than 1000ms. Moreover, when I
> open a browser (chromium or firefox), it fails to retrieve my home page (https://www.google.co.uk) and pings of the
> nameserver fail with the message "Destination Host Unreachable". Often, I can revive the network by stopping it with
> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 module and load it again.
Please have a look at the following thread:
https://lkml.org/lkml/2018/9/25/1118
Maciej
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-09-28 15:54 R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) Maciej S. Szmigiero @ 2018-09-28 22:00 ` Chris Clayton 2018-09-28 22:13 ` Heiner Kallweit 0 siblings, 1 reply; 22+ messages in thread From: Chris Clayton @ 2018-09-28 22:00 UTC (permalink / raw) To: Maciej S. Szmigiero Cc: David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Heiner Kallweit, Realtek linux nic maintainers, linux-kernel Thanks Maciej. On 28/09/2018 16:54, Maciej S. Szmigiero wrote: > Hi, > >> Hi, >> >> I upgraded my kernel to 4.18.10 recently and have since been experiencing network problems after resuming from a >> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >> >> The pattern of the problem is that when I first boot, the network is fine. But, after resume from suspend I find that >> the time taken for a ping of one of my ISP's nameservers increases from 14-15ms to more than 1000ms. Moreover, when I >> open a browser (chromium or firefox), it fails to retrieve my home page (https://www.google.co.uk) and pings of the >> nameserver fail with the message "Destination Host Unreachable". Often, I can revive the network by stopping it with >> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 module and load it again. > > Please have a look at the following thread: > https://lkml.org/lkml/2018/9/25/1118 > I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is not solved by it. Similarly, I applied Heiner's patch to the 4.19, but again the problem is not solved. > Maciej > Chris ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-09-28 22:00 ` Chris Clayton @ 2018-09-28 22:13 ` Heiner Kallweit 2018-09-29 7:25 ` Chris Clayton 2018-10-04 8:41 ` Chris Clayton 0 siblings, 2 replies; 22+ messages in thread From: Heiner Kallweit @ 2018-09-28 22:13 UTC (permalink / raw) To: Chris Clayton, Maciej S. Szmigiero Cc: David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 29.09.2018 00:00, Chris Clayton wrote: > Thanks Maciej. > > On 28/09/2018 16:54, Maciej S. Szmigiero wrote: >> Hi, >> >>> Hi, >>> >>> I upgraded my kernel to 4.18.10 recently and have since been experiencing network problems after resuming from a >>> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >>> >>> The pattern of the problem is that when I first boot, the network is fine. But, after resume from suspend I find that >>> the time taken for a ping of one of my ISP's nameservers increases from 14-15ms to more than 1000ms. Moreover, when I >>> open a browser (chromium or firefox), it fails to retrieve my home page (https://www.google.co.uk) and pings of the >>> nameserver fail with the message "Destination Host Unreachable". Often, I can revive the network by stopping it with >>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 module and load it again. >> >> Please have a look at the following thread: >> https://lkml.org/lkml/2018/9/25/1118 >> > > I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is not solved by it. Similarly, I applied > Heiner's patch to the 4.19, but again the problem is not solved. > I think we talk about two different issues here. The one the fix is for has no link to suspend/resume. Chris, the lspci output doesn't provide enough detail to determine the exact chip version. Can you provide the dmesg part with the XID? According to your lspci output neither MSI nor MSI-X is active. Do you have to use nomsi for whatever reason? Heiner >> Maciej >> > Chris > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-09-28 22:13 ` Heiner Kallweit @ 2018-09-29 7:25 ` Chris Clayton 2018-09-29 7:38 ` Chris Clayton 2018-10-04 8:41 ` Chris Clayton 1 sibling, 1 reply; 22+ messages in thread From: Chris Clayton @ 2018-09-29 7:25 UTC (permalink / raw) To: Heiner Kallweit, Maciej S. Szmigiero Cc: David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 28/09/2018 23:13, Heiner Kallweit wrote: > On 29.09.2018 00:00, Chris Clayton wrote: >> Thanks Maciej. >> >> On 28/09/2018 16:54, Maciej S. Szmigiero wrote: >>> Hi, >>> >>>> Hi, >>>> >>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing network problems after resuming from a >>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >>>> >>>> The pattern of the problem is that when I first boot, the network is fine. But, after resume from suspend I find that >>>> the time taken for a ping of one of my ISP's nameservers increases from 14-15ms to more than 1000ms. Moreover, when I >>>> open a browser (chromium or firefox), it fails to retrieve my home page (https://www.google.co.uk) and pings of the >>>> nameserver fail with the message "Destination Host Unreachable". Often, I can revive the network by stopping it with >>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 module and load it again. >>> >>> Please have a look at the following thread: >>> https://lkml.org/lkml/2018/9/25/1118 >>> >> >> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is not solved by it. Similarly, I applied >> Heiner's patch to the 4.19, but again the problem is not solved. >> > I think we talk about two different issues here. The one the fix is for has no link to suspend/resume. > > Chris, the lspci output doesn't provide enough detail to determine the exact chip version. > Can you provide the dmesg part with the XID? $ dmesg | grep -i r8169 [ 5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [ 5.321432] r8169 0000:05:00.2: can't disable ASPM; OS doesn't have ASPM control [ 5.322892] r8169 0000:05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 19 [ 5.323786] r8169 0000:05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] [ 10.232077] r8169 0000:05:00.2 eth0: No native access to PCI extended config space, falling back to CSI [ 10.235218] r8169 0000:05:00.2 eth0: link down [ 11.717460] r8169 0000:05:00.2 eth0: link up $ dmesg | grep -i r8169 [ 5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [ 5.208677] r8169 0000:05:00.2: can't disable ASPM; OS doesn't have ASPM control [ 5.210066] r8169 0000:05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 29 [ 5.210676] r8169 0000:05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] [ 10.456081] r8169 0000:05:00.2 eth0: No native access to PCI extended config space, falling back to CSI [ 10.459217] r8169 0000:05:00.2 eth0: link down [ 10.459880] r8169 0000:05:00.2 eth0: link down [ 12.015158] r8169 0000:05:00.2 eth0: link up > According to your lspci output neither MSI nor MSI-X is active. > Do you have to use nomsi for whatever reason? No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% sure that it used to be - I've no idea how it got dropped. If I'm not sure about an option, I start by taking the recommendation in the kconfig help. Help on MSI has a very clear "say Y". > > Heiner > >>> Maciej >>> >> Chris >> > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-09-29 7:25 ` Chris Clayton @ 2018-09-29 7:38 ` Chris Clayton 0 siblings, 0 replies; 22+ messages in thread From: Chris Clayton @ 2018-09-29 7:38 UTC (permalink / raw) To: Heiner Kallweit, Maciej S. Szmigiero Cc: David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel Sorry, sent by accident. Note to self - don't attempt email until after second cup of coffee. On 29/09/2018 08:25, Chris Clayton wrote: > > > On 28/09/2018 23:13, Heiner Kallweit wrote: >> On 29.09.2018 00:00, Chris Clayton wrote: >>> Thanks Maciej. >>> >>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote: >>>> Hi, >>>> >>>>> Hi, >>>>> >>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing network problems after resuming from a >>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >>>>> >>>>> The pattern of the problem is that when I first boot, the network is fine. But, after resume from suspend I find that >>>>> the time taken for a ping of one of my ISP's nameservers increases from 14-15ms to more than 1000ms. Moreover, when I >>>>> open a browser (chromium or firefox), it fails to retrieve my home page (https://www.google.co.uk) and pings of the >>>>> nameserver fail with the message "Destination Host Unreachable". Often, I can revive the network by stopping it with >>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 module and load it again. >>>> >>>> Please have a look at the following thread: >>>> https://lkml.org/lkml/2018/9/25/1118 >>>> >>> >>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is not solved by it. Similarly, I applied >>> Heiner's patch to the 4.19, but again the problem is not solved. >>> >> I think we talk about two different issues here. The one the fix is for has no link to suspend/resume. >> >> Chris, the lspci output doesn't provide enough detail to determine the exact chip version. >> Can you provide the dmesg part with the XID? I meant to say that I have now re-enabled MSI in 4.18.7 - the latest stable series kernel in which eth0 continues to function reliably after a suspend/resume cycle. The second dmesg output below is taken from that kernel. The first one was from an up-to-date 4.19 kernel > > $ dmesg | grep -i r8169 > [ 5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded > [ 5.321432] r8169 0000:05:00.2: can't disable ASPM; OS doesn't have ASPM control > [ 5.322892] r8169 0000:05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 19 > [ 5.323786] r8169 0000:05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] > [ 10.232077] r8169 0000:05:00.2 eth0: No native access to PCI extended config space, falling back to CSI > [ 10.235218] r8169 0000:05:00.2 eth0: link down > [ 11.717460] r8169 0000:05:00.2 eth0: link up > > $ dmesg | grep -i r8169 > [ 5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded > [ 5.208677] r8169 0000:05:00.2: can't disable ASPM; OS doesn't have ASPM control > [ 5.210066] r8169 0000:05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 29 > [ 5.210676] r8169 0000:05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] > [ 10.456081] r8169 0000:05:00.2 eth0: No native access to PCI extended config space, falling back to CSI > [ 10.459217] r8169 0000:05:00.2 eth0: link down > [ 10.459880] r8169 0000:05:00.2 eth0: link down > [ 12.015158] r8169 0000:05:00.2 eth0: link up > > >> According to your lspci output neither MSI nor MSI-X is active. >> Do you have to use nomsi for whatever reason? > > No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% sure that it used to be - I've no idea how > it got dropped. If I'm not sure about an option, I start by taking the recommendation in the kconfig help. Help on MSI > has a very clear "say Y". As I said above I have re-enabled MSI. > >> >> Heiner >> >>>> Maciej >>>> >>> Chris >>> >> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-09-28 22:13 ` Heiner Kallweit 2018-09-29 7:25 ` Chris Clayton @ 2018-10-04 8:41 ` Chris Clayton 2018-10-07 19:36 ` Chris Clayton 1 sibling, 1 reply; 22+ messages in thread From: Chris Clayton @ 2018-10-04 8:41 UTC (permalink / raw) To: Heiner Kallweit, Maciej S. Szmigiero Cc: David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel Hi Heiner, Here's the reply to your questions. Sorry for the delay. On 28/09/2018 23:13, Heiner Kallweit wrote: > On 29.09.2018 00:00, Chris Clayton wrote: >> Thanks Maciej. >> >> On 28/09/2018 16:54, Maciej S. Szmigiero wrote: >>> Hi, >>> >>>> Hi, >>>> >>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing network problems after resuming from a >>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >>>> >>>> The pattern of the problem is that when I first boot, the network is fine. But, after resume from suspend I find that >>>> the time taken for a ping of one of my ISP's nameservers increases from 14-15ms to more than 1000ms. Moreover, when I >>>> open a browser (chromium or firefox), it fails to retrieve my home page (https://www.google.co.uk) and pings of the >>>> nameserver fail with the message "Destination Host Unreachable". Often, I can revive the network by stopping it with >>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 module and load it again. >>> >>> Please have a look at the following thread: >>> https://lkml.org/lkml/2018/9/25/1118 >>> >> >> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is not solved by it. Similarly, I applied >> Heiner's patch to the 4.19, but again the problem is not solved. >> > I think we talk about two different issues here. The one the fix is for has no link to suspend/resume. > > Chris, the lspci output doesn't provide enough detail to determine the exact chip version. > Can you provide the dmesg part with the XID? $ dmesg | grep r8169 [ 5.274938] libphy: r8169: probed [ 5.276563] r8169 0000:05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 29 [ 5.278158] r8169 0000:05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] [ 9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-502:00, irq=IGNORE) [ 9.460876] r8169 0000:05:00.2 eth0: No native access to PCI extended config space, falling back to CSI [ 11.005336] r8169 0000:05:00.2 eth0: Link is Up - 100Mbps/Full - flow control rx/tx > According to your lspci output neither MSI nor MSI-X is active. > Do you have to use nomsi for whatever reason? > No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% sure that it used to be - I've no idea how it got dropped. If I'm not sure about an option, I start by taking the recommendation in the kconfig help. Help on MSI has a very clear "say Y". I've re-enabled it now. Chris > Heiner > >>> Maciej >>> >> Chris >> > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-04 8:41 ` Chris Clayton @ 2018-10-07 19:36 ` Chris Clayton 2018-10-09 12:32 ` Maciej S. Szmigiero 0 siblings, 1 reply; 22+ messages in thread From: Chris Clayton @ 2018-10-07 19:36 UTC (permalink / raw) To: Heiner Kallweit, Maciej S. Szmigiero Cc: David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel Hi again, I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from 14-15ms to more than 1000ms. Chris On 04/10/2018 09:41, Chris Clayton wrote: > Hi Heiner, > > Here's the reply to your questions. Sorry for the delay. > > On 28/09/2018 23:13, Heiner Kallweit wrote: >> On 29.09.2018 00:00, Chris Clayton wrote: >>> Thanks Maciej. >>> >>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote: >>>> Hi, >>>> >>>>> Hi, >>>>> >>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing network problems after resuming from a >>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >>>>> >>>>> The pattern of the problem is that when I first boot, the network is fine. But, after resume from suspend I find that >>>>> the time taken for a ping of one of my ISP's nameservers increases from 14-15ms to more than 1000ms. Moreover, when I >>>>> open a browser (chromium or firefox), it fails to retrieve my home page (https://www.google.co.uk) and pings of the >>>>> nameserver fail with the message "Destination Host Unreachable". Often, I can revive the network by stopping it with >>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 module and load it again. >>>> >>>> Please have a look at the following thread: >>>> https://lkml.org/lkml/2018/9/25/1118 >>>> >>> >>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is not solved by it. Similarly, I applied >>> Heiner's patch to the 4.19, but again the problem is not solved. >>> >> I think we talk about two different issues here. The one the fix is for has no link to suspend/resume. >> >> Chris, the lspci output doesn't provide enough detail to determine the exact chip version. >> Can you provide the dmesg part with the XID? > > $ dmesg | grep r8169 > [ 5.274938] libphy: r8169: probed > [ 5.276563] r8169 0000:05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 29 > [ 5.278158] r8169 0000:05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] > [ 9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver [RTL8211E Gigabit Ethernet] > (mii_bus:phy_addr=r8169-502:00, irq=IGNORE) > [ 9.460876] r8169 0000:05:00.2 eth0: No native access to PCI extended config space, falling back to CSI > [ 11.005336] r8169 0000:05:00.2 eth0: Link is Up - 100Mbps/Full - flow control rx/tx > >> According to your lspci output neither MSI nor MSI-X is active. >> Do you have to use nomsi for whatever reason? >> > > No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% sure that it used to be - I've no idea how > it got dropped. If I'm not sure about an option, I start by taking the recommendation in the kconfig help. Help on MSI > has a very clear "say Y". I've re-enabled it now. > > Chris > >> Heiner >> >>>> Maciej >>>> >>> Chris >>> >> >> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-07 19:36 ` Chris Clayton @ 2018-10-09 12:32 ` Maciej S. Szmigiero 2018-10-09 14:40 ` Chris Clayton 0 siblings, 1 reply; 22+ messages in thread From: Maciej S. Szmigiero @ 2018-10-09 12:32 UTC (permalink / raw) To: Chris Clayton Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 07.10.2018 21:36, Chris Clayton wrote: > Hi again, > > I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the > regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my > browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed > in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from > 14-15ms to more than 1000ms. You can try comparing chip registers (ethtool -d eth0) in the working state (before a suspend) and in the broken state (after a resume). Maybe there will be some obvious in the difference. The same goes for the PCI configuration (lspci -d :8168 -vv). > Chris Maciej ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-09 12:32 ` Maciej S. Szmigiero @ 2018-10-09 14:40 ` Chris Clayton 2018-10-09 20:36 ` Heiner Kallweit 2018-10-09 21:39 ` Heiner Kallweit 0 siblings, 2 replies; 22+ messages in thread From: Chris Clayton @ 2018-10-09 14:40 UTC (permalink / raw) To: Maciej S. Szmigiero Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1635 bytes --] Thanks to Maciej and Heiner for their replies. On 09/10/2018 13:32, Maciej S. Szmigiero wrote: > On 07.10.2018 21:36, Chris Clayton wrote: >> Hi again, >> >> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >> 14-15ms to more than 1000ms. > > You can try comparing chip registers (ethtool -d eth0) in the working > state (before a suspend) and in the broken state (after a resume). > Maybe there will be some obvious in the difference. > > The same goes for the PCI configuration (lspci -d :8168 -vv). > Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. I've attached files I redirected the outputs to. Please don't hesitate to ask for any other information needed to solve this problem. In the meantime, I've now got scripts that stop the network during suspend and restart it during resume. (Those scripts were removed whilst I gathered the diagnostics shown in the attachments.) Chris >> Chris > > Maciej > [-- Attachment #2: r8169-post-suspend --] [-- Type: text/plain, Size: 5653 bytes --] ethtool -d eth0 =============== RealTek RTL8411 registers: -------------------------------------------------------- 0x00: MAC Address 80:fa:5b:08:d0:3d 0x08: Multicast Address Filter 0x00000000 0x00000080 0x10: Dump Tally Counter Command 0x0c2ec000 0x00000004 0x20: Tx Normal Priority Ring Addr 0x07a0a000 0x00000004 0x28: Tx High Priority Ring Addr 0x00000000 0x00000000 0x30: Flash memory read/write 0x00000000 0x34: Early Rx Byte Count 0 0x36: Early Rx Status 0x00 0x37: Command 0x0c Rx on, Tx on 0x3C: Interrupt Mask 0x803f SERR LinkChg RxNoBuf TxErr TxOK RxErr RxOK 0x3E: Interrupt Status 0x0000 0x40: Tx Configuration 0x4b800f80 0x44: Rx Configuration 0x0002870e 0x48: Timer count 0x00000000 0x4C: Missed packet counter 0x000000 0x50: EEPROM Command 0x10 0x51: Config 0 0x00 0x52: Config 1 0xcf 0x53: Config 2 0x3c 0x54: Config 3 0x60 0x55: Config 4 0x10 0x56: Config 5 0x02 0x58: Timer interrupt 0x00000000 0x5C: Multiple Interrupt Select 0x0000 0x60: PHY access 0x80040de1 0x64: TBI control and status 0x27ffff01 0x68: TBI Autonegotiation advertisement (ANAR) 0xf70c 0x6A: TBI Link partner ability (LPAR) 0x0002 0x6C: PHY status 0xeb 0x84: PM wakeup frame 0 0x00000000 0x00000000 0x8C: PM wakeup frame 1 0x00000000 0x00000000 0x94: PM wakeup frame 2 (low) 0x00000000 0x00000000 0x9C: PM wakeup frame 2 (high) 0x00000000 0x00000000 0xA4: PM wakeup frame 3 (low) 0x00000000 0x00000000 0xAC: PM wakeup frame 3 (high) 0x00000000 0x00000000 0xB4: PM wakeup frame 4 (low) 0xffffffff 0xffffffff 0xBC: PM wakeup frame 4 (high) 0x00000000 0x00000000 0xC4: Wakeup frame 0 CRC 0x0000 0xC6: Wakeup frame 1 CRC 0x0000 0xC8: Wakeup frame 2 CRC 0x0000 0xCA: Wakeup frame 3 CRC 0x0000 0xCC: Wakeup frame 4 CRC 0x0000 0xDA: RX packet maximum size 0x4000 0xE0: C+ Command 0x20e1 VLAN de-tagging RX checksumming 0xE2: Interrupt Mitigation 0x5151 TxTimer: 5 TxPackets: 1 RxTimer: 5 RxPackets: 1 0xE4: Rx Ring Addr 0x07935000 0x00000004 0xEC: Early Tx threshold 0x27 0xF0: Func Event 0x0040003f 0xF4: Func Event Mask 0x00000000 0xF8: Func Preset State 0x00031eff 0xFC: Func Force Event 0x00000000 lspci -d :8168 -vv ================== pcilib: sysfs_read_vpd: read failed: Input/output error 05:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0a) Subsystem: CLEVO/KAPOK Computer RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 19 Region 0: I/O ports at e000 [size=256] Region 2: Memory at f0004000 (64-bit, prefetchable) [size=4K] Region 4: Memory at f0000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [d0] Vital Product Data Not readable Kernel driver in use: r8169 Kernel modules: r8169 [-- Attachment #3: r8169-pre-suspend --] [-- Type: text/plain, Size: 5653 bytes --] ethtool -d eth0 =============== RealTek RTL8411 registers: -------------------------------------------------------- 0x00: MAC Address 80:fa:5b:08:d0:3d 0x08: Multicast Address Filter 0x00000000 0x00000080 0x10: Dump Tally Counter Command 0x0c2ec000 0x00000004 0x20: Tx Normal Priority Ring Addr 0x07a0a000 0x00000004 0x28: Tx High Priority Ring Addr 0x00000000 0x00000000 0x30: Flash memory read/write 0x00000000 0x34: Early Rx Byte Count 0 0x36: Early Rx Status 0x00 0x37: Command 0x0c Rx on, Tx on 0x3C: Interrupt Mask 0x803f SERR LinkChg RxNoBuf TxErr TxOK RxErr RxOK 0x3E: Interrupt Status 0x0000 0x40: Tx Configuration 0x4b800f80 0x44: Rx Configuration 0x0002870e 0x48: Timer count 0x00000000 0x4C: Missed packet counter 0x000000 0x50: EEPROM Command 0x10 0x51: Config 0 0x00 0x52: Config 1 0xcf 0x53: Config 2 0x3c 0x54: Config 3 0x60 0x55: Config 4 0x10 0x56: Config 5 0x02 0x58: Timer interrupt 0x00000000 0x5C: Multiple Interrupt Select 0x0000 0x60: PHY access 0x80040de1 0x64: TBI control and status 0x27ffff01 0x68: TBI Autonegotiation advertisement (ANAR) 0xf70c 0x6A: TBI Link partner ability (LPAR) 0x0002 0x6C: PHY status 0xeb 0x84: PM wakeup frame 0 0x00000000 0x00000000 0x8C: PM wakeup frame 1 0x00000000 0x00000000 0x94: PM wakeup frame 2 (low) 0x00000000 0x00000000 0x9C: PM wakeup frame 2 (high) 0x00000000 0x00000000 0xA4: PM wakeup frame 3 (low) 0x00000000 0x00000000 0xAC: PM wakeup frame 3 (high) 0x00000000 0x00000000 0xB4: PM wakeup frame 4 (low) 0xffffffff 0xffffffff 0xBC: PM wakeup frame 4 (high) 0x00000000 0x00000000 0xC4: Wakeup frame 0 CRC 0x0000 0xC6: Wakeup frame 1 CRC 0x0000 0xC8: Wakeup frame 2 CRC 0x0000 0xCA: Wakeup frame 3 CRC 0x0000 0xCC: Wakeup frame 4 CRC 0x0000 0xDA: RX packet maximum size 0x4000 0xE0: C+ Command 0x20e1 VLAN de-tagging RX checksumming 0xE2: Interrupt Mitigation 0x5151 TxTimer: 5 TxPackets: 1 RxTimer: 5 RxPackets: 1 0xE4: Rx Ring Addr 0x07935000 0x00000004 0xEC: Early Tx threshold 0x27 0xF0: Func Event 0x0040003f 0xF4: Func Event Mask 0x00000000 0xF8: Func Preset State 0x00031eff 0xFC: Func Force Event 0x00000000 lspci -d :8168 -vv ================== pcilib: sysfs_read_vpd: read failed: Input/output error 05:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0a) Subsystem: CLEVO/KAPOK Computer RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 19 Region 0: I/O ports at e000 [size=256] Region 2: Memory at f0004000 (64-bit, prefetchable) [size=4K] Region 4: Memory at f0000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [d0] Vital Product Data Not readable Kernel driver in use: r8169 Kernel modules: r8169 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-09 14:40 ` Chris Clayton @ 2018-10-09 20:36 ` Heiner Kallweit 2018-10-10 0:24 ` Maciej S. Szmigiero 2018-10-09 21:39 ` Heiner Kallweit 1 sibling, 1 reply; 22+ messages in thread From: Heiner Kallweit @ 2018-10-09 20:36 UTC (permalink / raw) To: Chris Clayton, Maciej S. Szmigiero Cc: David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 09.10.2018 16:40, Chris Clayton wrote: > Thanks to Maciej and Heiner for their replies. > > On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >> On 07.10.2018 21:36, Chris Clayton wrote: >>> Hi again, >>> >>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >>> 14-15ms to more than 1000ms. >> >> You can try comparing chip registers (ethtool -d eth0) in the working >> state (before a suspend) and in the broken state (after a resume). >> Maybe there will be some obvious in the difference. >> >> The same goes for the PCI configuration (lspci -d :8168 -vv). >> > Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. > > Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. > Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. > Hmm, this is very weird, especially taking into account that in your original report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start() fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and register values seem to be the same before and after resume. So how can the chip behave differently? So far my best guess is that some chip quirk causes it to accept writes to register RxConfig, but to misinterpret or ignore the written value. So far your report is the only one (affecting RTL8411), but we don't know whether other chip versions are affected too. One option could be to call rtl_init_rxcfg() for chip versions <= 06 only because for them we know that they need this call. > I've attached files I redirected the outputs to. > > Please don't hesitate to ask for any other information needed to solve this problem. In the meantime, I've now got > scripts that stop the network during suspend and restart it during resume. (Those scripts were removed whilst I gathered > the diagnostics shown in the attachments.) > > Chris > >>> Chris >> >> Maciej >> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-09 20:36 ` Heiner Kallweit @ 2018-10-10 0:24 ` Maciej S. Szmigiero 2018-10-10 8:09 ` Chris Clayton ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Maciej S. Szmigiero @ 2018-10-10 0:24 UTC (permalink / raw) To: Chris Clayton Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 09.10.2018 22:36, Heiner Kallweit wrote: > On 09.10.2018 16:40, Chris Clayton wrote: >> Thanks to Maciej and Heiner for their replies. >> >> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>> On 07.10.2018 21:36, Chris Clayton wrote: >>>> Hi again, >>>> >>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >>>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >>>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >>>> 14-15ms to more than 1000ms. >>> >>> You can try comparing chip registers (ethtool -d eth0) in the working >>> state (before a suspend) and in the broken state (after a resume). >>> Maybe there will be some obvious in the difference. >>> >>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>> >> Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. >> >> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. >> Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. >> > Hmm, this is very weird, especially taking into account that in your original > report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start() > fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and > register values seem to be the same before and after resume. So how can the > chip behave differently? > So far my best guess is that some chip quirk causes it to accept writes to > register RxConfig, but to misinterpret or ignore the written value. > So far your report is the only one (affecting RTL8411), but we don't know > whether other chip versions are affected too. Also, it is interesting that even if one removes a call to rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get written to moments later by rtl_set_rx_mode(). The only chip accesses in the meantime seems to be a write to TxConfig by rtl_set_tx_config_registers() and then a read of RxConfig plus two writes to MAR0 earlier in rtl_set_rx_mode(). My proposals are: 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" in rtl_hw_start(). Maybe the chip does not like sometimes that RxConfig is written before TxConfig. 2) Check the original value of RxConfig (after a resume) before rtl_init_rxcfg() overwrites it (compile tested only): --- r8169.c.ori +++ r8169.c @@ -5155,6 +5155,9 @@ /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ RTL_R8(tp, IntrMask); RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); + + pr_notice("RxConfig before init was %.8x\n", + (unsigned int)RTL_R32(tp, RxConfig)); rtl_init_rxcfg(tp); rtl_set_tx_config_registers(tp); This should be the value that you got when you removed the call to rtl_init_rxcfg() for testing. Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() writes (under the "default:" label for your NIC model). Hope this helps, Maciej ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-10 0:24 ` Maciej S. Szmigiero @ 2018-10-10 8:09 ` Chris Clayton 2018-10-10 8:51 ` Chris Clayton 2018-10-10 22:30 ` Chris Clayton 2018-10-10 22:49 ` Chris Clayton 2 siblings, 1 reply; 22+ messages in thread From: Chris Clayton @ 2018-10-10 8:09 UTC (permalink / raw) To: Maciej S. Szmigiero Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 10/10/2018 01:24, Maciej S. Szmigiero wrote: > On 09.10.2018 22:36, Heiner Kallweit wrote: >> On 09.10.2018 16:40, Chris Clayton wrote: >>> Thanks to Maciej and Heiner for their replies. >>> >>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>> Hi again, >>>>> >>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >>>>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >>>>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >>>>> 14-15ms to more than 1000ms. >>>> >>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>> state (before a suspend) and in the broken state (after a resume). >>>> Maybe there will be some obvious in the difference. >>>> >>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>> >>> Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. >>> >>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. >>> Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. >>> >> Hmm, this is very weird, especially taking into account that in your original >> report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start() >> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >> register values seem to be the same before and after resume. So how can the >> chip behave differently? >> So far my best guess is that some chip quirk causes it to accept writes to >> register RxConfig, but to misinterpret or ignore the written value. >> So far your report is the only one (affecting RTL8411), but we don't know >> whether other chip versions are affected too. > > Also, it is interesting that even if one removes a call to > rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get > written to moments later by rtl_set_rx_mode(). > > The only chip accesses in the meantime seems to be a write to TxConfig by > rtl_set_tx_config_registers() and then a read of RxConfig plus two writes > to MAR0 earlier in rtl_set_rx_mode(). > > My proposals are: > 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" > in rtl_hw_start(). > Maybe the chip does not like sometimes that RxConfig is written before > TxConfig. > After testing your first proposal, which made no difference, I founf the following in dmesg in the output from dmesg: [ 761.999468] ------------[ cut here ]------------ [ 761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [ 761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 dev_watchdog+0x1e9/0x1f0 [ 761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc videobuf2_memops snd_hda_codec_via videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common usbhid realtek coretemp snd_hda_intel hwmon snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last unloaded: btintel] [ 761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328 [ 761.999504] Hardware name: Notebook W65_67SZ /W65_67SZ , BIOS 1.03.05 02/26/2014 [ 761.999508] Workqueue: events rtl_task [r8169] [ 761.999510] RIP: 0010:dev_watchdog+0x1e9/0x1f0 [ 761.999512] Code: 00 48 63 4d e8 eb 99 4c 89 ef c6 05 b6 13 a6 00 01 e8 1b c7 fd ff 89 d9 4c 89 ee 48 c7 c7 40 53 e1 81 48 89 c2 e8 ae f4 a3 ff <0f> 0b eb c0 0f 1f 00 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 [ 761.999513] RSP: 0018:ffff88040f803e98 EFLAGS: 00010282 [ 761.999514] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 [ 761.999516] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff88040f8153d0 [ 761.999517] RBP: ffff88040ca9a3b8 R08: ffffffff813565f0 R09: 000000000000034e [ 761.999517] R10: 0000000000000007 R11: 0000000000000000 R12: ffff88040ca9a39c [ 761.999518] R13: ffff88040ca9a000 R14: 0000000000000001 R15: ffff8803ea17cc80 [ 761.999520] FS: 0000000000000000(0000) GS:ffff88040f800000(0000) knlGS:0000000000000000 [ 761.999521] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 761.999522] CR2: 00007f67280206b8 CR3: 000000000200a002 CR4: 00000000001606f0 [ 761.999523] Call Trace: [ 761.999525] <IRQ> [ 761.999527] ? qdisc_reset+0xe0/0xe0 [ 761.999529] ? qdisc_reset+0xe0/0xe0 [ 761.999532] call_timer_fn+0x11/0x70 [ 761.999534] expire_timers+0x8e/0xa0 [ 761.999535] run_timer_softirq+0x7e/0x150 [ 761.999538] ? __hrtimer_run_queues+0x12b/0x1a0 [ 761.999541] ? recalibrate_cpu_khz+0x10/0x10 [ 761.999543] ? ktime_get+0x32/0x90 [ 761.999546] ? lapic_next_event+0x20/0x20 [ 761.999549] __do_softirq+0xcc/0x1fc [ 761.999552] irq_exit+0x82/0xb0 [ 761.999554] smp_apic_timer_interrupt+0x61/0x90 [ 761.999556] apic_timer_interrupt+0xf/0x20 [ 761.999557] </IRQ> [ 761.999560] RIP: 0010:rtl_slow_event_work+0x2a/0x1f0 [r8169] [ 761.999562] Code: 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 10 4c 8b 67 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 48 8b 07 66 8b 68 3e <66> 23 af da 0d 00 00 48 8b 07 66 89 68 3e 40 f6 c5 40 0f 85 3b 01 [ 761.999563] RSP: 0018:ffffc900014d7e40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [ 761.999564] RAX: ffffc900000b9000 RBX: ffff88040ca9a7c0 RCX: ffff88040f81f160 [ 761.999565] RDX: ffff8803ea21b300 RSI: 0000000000000000 RDI: ffff88040ca9a7c0 [ 761.999566] RBP: ffff88040ca90050 R08: 0000000000000000 R09: 000073746e657665 [ 761.999567] R10: 8080808080808080 R11: ffff88040f81ea68 R12: ffff88040ca9a000 [ 761.999568] R13: ffff88040ca9a000 R14: ffff88040f81f140 R15: 0000000000000000 [ 761.999571] ? __switch_to_asm+0x34/0x70 [ 761.999573] rtl_task+0x4f/0x70 [r8169] [ 761.999576] process_one_work+0x1bc/0x2f0 [ 761.999577] worker_thread+0x28/0x3c0 [ 761.999579] ? process_one_work+0x2f0/0x2f0 [ 761.999581] kthread+0x109/0x120 [ 761.999583] ? kthread_park+0x80/0x80 [ 761.999585] ret_from_fork+0x35/0x40 [ 761.999586] ---[ end trace fd5800440feffc06 ]--- I haven't seen this before, but maybe it's a consequence of swapping the order of the two functions calls. I'll work on the second proposal later today. Chris > 2) Check the original value of RxConfig (after a resume) before > rtl_init_rxcfg() overwrites it (compile tested only): > --- r8169.c.ori > +++ r8169.c > @@ -5155,6 +5155,9 @@ > /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ > RTL_R8(tp, IntrMask); > RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); > + > + pr_notice("RxConfig before init was %.8x\n", > + (unsigned int)RTL_R32(tp, RxConfig)); > rtl_init_rxcfg(tp); > rtl_set_tx_config_registers(tp); > > > This should be the value that you got when you removed the call to > rtl_init_rxcfg() for testing. > Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() > writes (under the "default:" label for your NIC model). > > Hope this helps, > Maciej > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-10 8:09 ` Chris Clayton @ 2018-10-10 8:51 ` Chris Clayton 0 siblings, 0 replies; 22+ messages in thread From: Chris Clayton @ 2018-10-10 8:51 UTC (permalink / raw) To: Maciej S. Szmigiero Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel Sorry, I forgot that editing r8169.c and rebuilding would result in rc7+, so I tested the wrong kernel/module to get the results I provided below. That, however, may make the results more interesting because they happened with a virgin rc7 kernel/module. I'll test your proposals properly later. Chris On 10/10/2018 09:09, Chris Clayton wrote: > > > On 10/10/2018 01:24, Maciej S. Szmigiero wrote: >> On 09.10.2018 22:36, Heiner Kallweit wrote: >>> On 09.10.2018 16:40, Chris Clayton wrote: >>>> Thanks to Maciej and Heiner for their replies. >>>> >>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>>> Hi again, >>>>>> >>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >>>>>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >>>>>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >>>>>> 14-15ms to more than 1000ms. >>>>> >>>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>>> state (before a suspend) and in the broken state (after a resume). >>>>> Maybe there will be some obvious in the difference. >>>>> >>>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>>> >>>> Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. >>>> >>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. >>>> Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. >>>> >>> Hmm, this is very weird, especially taking into account that in your original >>> report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start() >>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >>> register values seem to be the same before and after resume. So how can the >>> chip behave differently? >>> So far my best guess is that some chip quirk causes it to accept writes to >>> register RxConfig, but to misinterpret or ignore the written value. >>> So far your report is the only one (affecting RTL8411), but we don't know >>> whether other chip versions are affected too. >> >> Also, it is interesting that even if one removes a call to >> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get >> written to moments later by rtl_set_rx_mode(). >> >> The only chip accesses in the meantime seems to be a write to TxConfig by >> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes >> to MAR0 earlier in rtl_set_rx_mode(). >> >> My proposals are: >> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" >> in rtl_hw_start(). >> Maybe the chip does not like sometimes that RxConfig is written before >> TxConfig. >> > After testing your first proposal, which made no difference, I founf the following in dmesg in the output from dmesg: > > [ 761.999468] ------------[ cut here ]------------ > [ 761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out > [ 761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 dev_watchdog+0x1e9/0x1f0 > [ 761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE > nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc videobuf2_memops snd_hda_codec_via > videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common usbhid realtek coretemp snd_hda_intel hwmon > snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last unloaded: btintel] > [ 761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328 > [ 761.999504] Hardware name: Notebook W65_67SZ /W65_67SZ > , BIOS 1.03.05 02/26/2014 > [ 761.999508] Workqueue: events rtl_task [r8169] > [ 761.999510] RIP: 0010:dev_watchdog+0x1e9/0x1f0 > [ 761.999512] Code: 00 48 63 4d e8 eb 99 4c 89 ef c6 05 b6 13 a6 00 01 e8 1b c7 fd ff 89 d9 4c 89 ee 48 c7 c7 40 53 e1 > 81 48 89 c2 e8 ae f4 a3 ff <0f> 0b eb c0 0f 1f 00 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 > [ 761.999513] RSP: 0018:ffff88040f803e98 EFLAGS: 00010282 > [ 761.999514] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 > [ 761.999516] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff88040f8153d0 > [ 761.999517] RBP: ffff88040ca9a3b8 R08: ffffffff813565f0 R09: 000000000000034e > [ 761.999517] R10: 0000000000000007 R11: 0000000000000000 R12: ffff88040ca9a39c > [ 761.999518] R13: ffff88040ca9a000 R14: 0000000000000001 R15: ffff8803ea17cc80 > [ 761.999520] FS: 0000000000000000(0000) GS:ffff88040f800000(0000) knlGS:0000000000000000 > [ 761.999521] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 761.999522] CR2: 00007f67280206b8 CR3: 000000000200a002 CR4: 00000000001606f0 > [ 761.999523] Call Trace: > [ 761.999525] <IRQ> > [ 761.999527] ? qdisc_reset+0xe0/0xe0 > [ 761.999529] ? qdisc_reset+0xe0/0xe0 > [ 761.999532] call_timer_fn+0x11/0x70 > [ 761.999534] expire_timers+0x8e/0xa0 > [ 761.999535] run_timer_softirq+0x7e/0x150 > [ 761.999538] ? __hrtimer_run_queues+0x12b/0x1a0 > [ 761.999541] ? recalibrate_cpu_khz+0x10/0x10 > [ 761.999543] ? ktime_get+0x32/0x90 > [ 761.999546] ? lapic_next_event+0x20/0x20 > [ 761.999549] __do_softirq+0xcc/0x1fc > [ 761.999552] irq_exit+0x82/0xb0 > [ 761.999554] smp_apic_timer_interrupt+0x61/0x90 > [ 761.999556] apic_timer_interrupt+0xf/0x20 > [ 761.999557] </IRQ> > [ 761.999560] RIP: 0010:rtl_slow_event_work+0x2a/0x1f0 [r8169] > [ 761.999562] Code: 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 10 4c 8b 67 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 > 31 c0 48 8b 07 66 8b 68 3e <66> 23 af da 0d 00 00 48 8b 07 66 89 68 3e 40 f6 c5 40 0f 85 3b 01 > [ 761.999563] RSP: 0018:ffffc900014d7e40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 > [ 761.999564] RAX: ffffc900000b9000 RBX: ffff88040ca9a7c0 RCX: ffff88040f81f160 > [ 761.999565] RDX: ffff8803ea21b300 RSI: 0000000000000000 RDI: ffff88040ca9a7c0 > [ 761.999566] RBP: ffff88040ca90050 R08: 0000000000000000 R09: 000073746e657665 > [ 761.999567] R10: 8080808080808080 R11: ffff88040f81ea68 R12: ffff88040ca9a000 > [ 761.999568] R13: ffff88040ca9a000 R14: ffff88040f81f140 R15: 0000000000000000 > [ 761.999571] ? __switch_to_asm+0x34/0x70 > [ 761.999573] rtl_task+0x4f/0x70 [r8169] > [ 761.999576] process_one_work+0x1bc/0x2f0 > [ 761.999577] worker_thread+0x28/0x3c0 > [ 761.999579] ? process_one_work+0x2f0/0x2f0 > [ 761.999581] kthread+0x109/0x120 > [ 761.999583] ? kthread_park+0x80/0x80 > [ 761.999585] ret_from_fork+0x35/0x40 > [ 761.999586] ---[ end trace fd5800440feffc06 ]--- > > I haven't seen this before, but maybe it's a consequence of swapping the order of the two functions calls. > > I'll work on the second proposal later today. > > Chris >> 2) Check the original value of RxConfig (after a resume) before >> rtl_init_rxcfg() overwrites it (compile tested only): >> --- r8169.c.ori >> +++ r8169.c >> @@ -5155,6 +5155,9 @@ >> /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ >> RTL_R8(tp, IntrMask); >> RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); >> + >> + pr_notice("RxConfig before init was %.8x\n", >> + (unsigned int)RTL_R32(tp, RxConfig)); >> rtl_init_rxcfg(tp); >> rtl_set_tx_config_registers(tp); >> >> >> This should be the value that you got when you removed the call to >> rtl_init_rxcfg() for testing. >> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() >> writes (under the "default:" label for your NIC model). >> >> Hope this helps, >> Maciej >> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-10 0:24 ` Maciej S. Szmigiero 2018-10-10 8:09 ` Chris Clayton @ 2018-10-10 22:30 ` Chris Clayton 2018-10-10 22:32 ` Chris Clayton 2018-10-10 22:49 ` Chris Clayton 2 siblings, 1 reply; 22+ messages in thread From: Chris Clayton @ 2018-10-10 22:30 UTC (permalink / raw) To: Maciej S. Szmigiero Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel OK, right kernel/module used this time. Please see findings below. On 10/10/2018 01:24, Maciej S. Szmigiero wrote: > On 09.10.2018 22:36, Heiner Kallweit wrote: >> On 09.10.2018 16:40, Chris Clayton wrote: >>> Thanks to Maciej and Heiner for their replies. >>> >>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>> Hi again, >>>>> >>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >>>>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >>>>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >>>>> 14-15ms to more than 1000ms. >>>> >>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>> state (before a suspend) and in the broken state (after a resume). >>>> Maybe there will be some obvious in the difference. >>>> >>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>> >>> Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. >>> >>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. >>> Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. >>> >> Hmm, this is very weird, especially taking into account that in your original >> report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start() >> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >> register values seem to be the same before and after resume. So how can the >> chip behave differently? >> So far my best guess is that some chip quirk causes it to accept writes to >> register RxConfig, but to misinterpret or ignore the written value. >> So far your report is the only one (affecting RTL8411), but we don't know >> whether other chip versions are affected too. > > Also, it is interesting that even if one removes a call to > rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get > written to moments later by rtl_set_rx_mode(). > > The only chip accesses in the meantime seems to be a write to TxConfig by > rtl_set_tx_config_registers() and then a read of RxConfig plus two writes > to MAR0 earlier in rtl_set_rx_mode(). > > My proposals are: > 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" > in rtl_hw_start(). > Maybe the chip does not like sometimes that RxConfig is written before > TxConfig. > This change made no difference. Networking still dies if I open a browser or leave ping running long enough. > 2) Check the original value of RxConfig (after a resume) before > rtl_init_rxcfg() overwrites it (compile tested only): > --- r8169.c.ori > +++ r8169.c > @@ -5155,6 +5155,9 @@ > /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ > RTL_R8(tp, IntrMask); > RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); > + > + pr_notice("RxConfig before init was %.8x\n", > + (unsigned int)RTL_R32(tp, RxConfig)); > rtl_init_rxcfg(tp); > rtl_set_tx_config_registers(tp); > > > This should be the value that you got when you removed the call to > rtl_init_rxcfg() for testing. > Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() > writes (under the "default:" label for your NIC model). This might be more interesting. Through combination of viewing the output from pr_notice() and the output from "ethtool -d", I can see RxConfig with the following values During boot: 0x00028700 Before suspend: 0x0002870e During resume: 0x00024000 Post resume: 0x0002870e I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, installed and rebooted. Now I see the following values: During boot: 0x00028700 Before suspend: 0x0002870e During resume: 0x00024000 Post resume: 0x0002870e > > Hope this helps, > Maciej > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-10 22:30 ` Chris Clayton @ 2018-10-10 22:32 ` Chris Clayton 0 siblings, 0 replies; 22+ messages in thread From: Chris Clayton @ 2018-10-10 22:32 UTC (permalink / raw) To: Maciej S. Szmigiero Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel Too late at night to be doing this stuff. Clicked send instead of saving a draft. Sorry, please ignore. On 10/10/2018 23:30, Chris Clayton wrote: > OK, right kernel/module used this time. Please see findings below. > > On 10/10/2018 01:24, Maciej S. Szmigiero wrote: >> On 09.10.2018 22:36, Heiner Kallweit wrote: >>> On 09.10.2018 16:40, Chris Clayton wrote: >>>> Thanks to Maciej and Heiner for their replies. >>>> >>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>>> Hi again, >>>>>> >>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >>>>>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >>>>>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >>>>>> 14-15ms to more than 1000ms. >>>>> >>>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>>> state (before a suspend) and in the broken state (after a resume). >>>>> Maybe there will be some obvious in the difference. >>>>> >>>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>>> >>>> Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. >>>> >>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. >>>> Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. >>>> >>> Hmm, this is very weird, especially taking into account that in your original >>> report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start() >>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >>> register values seem to be the same before and after resume. So how can the >>> chip behave differently? >>> So far my best guess is that some chip quirk causes it to accept writes to >>> register RxConfig, but to misinterpret or ignore the written value. >>> So far your report is the only one (affecting RTL8411), but we don't know >>> whether other chip versions are affected too. >> >> Also, it is interesting that even if one removes a call to >> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get >> written to moments later by rtl_set_rx_mode(). >> >> The only chip accesses in the meantime seems to be a write to TxConfig by >> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes >> to MAR0 earlier in rtl_set_rx_mode(). >> >> My proposals are: >> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" >> in rtl_hw_start(). >> Maybe the chip does not like sometimes that RxConfig is written before >> TxConfig. >> > > This change made no difference. Networking still dies if I open a browser or leave ping running long enough. > >> 2) Check the original value of RxConfig (after a resume) before >> rtl_init_rxcfg() overwrites it (compile tested only): >> --- r8169.c.ori >> +++ r8169.c >> @@ -5155,6 +5155,9 @@ >> /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ >> RTL_R8(tp, IntrMask); >> RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); >> + >> + pr_notice("RxConfig before init was %.8x\n", >> + (unsigned int)RTL_R32(tp, RxConfig)); >> rtl_init_rxcfg(tp); >> rtl_set_tx_config_registers(tp); >> >> >> This should be the value that you got when you removed the call to >> rtl_init_rxcfg() for testing. >> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() >> writes (under the "default:" label for your NIC model). > > This might be more interesting. Through combination of viewing the output from pr_notice() and the output from "ethtool > -d", I can see RxConfig with the following values > > During boot: 0x00028700 > Before suspend: 0x0002870e > During resume: 0x00024000 > Post resume: 0x0002870e > > I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, installed and rebooted. Now I see the > following values: > > During boot: 0x00028700 > Before suspend: 0x0002870e > During resume: 0x00024000 > Post resume: 0x0002870e > >> >> Hope this helps, >> Maciej >> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-10 0:24 ` Maciej S. Szmigiero 2018-10-10 8:09 ` Chris Clayton 2018-10-10 22:30 ` Chris Clayton @ 2018-10-10 22:49 ` Chris Clayton 2018-10-11 0:12 ` Maciej S. Szmigiero 2 siblings, 1 reply; 22+ messages in thread From: Chris Clayton @ 2018-10-10 22:49 UTC (permalink / raw) To: Maciej S. Szmigiero Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel OK, right kernel/module used this time. Please see findings below. On 10/10/2018 01:24, Maciej S. Szmigiero wrote: > On 09.10.2018 22:36, Heiner Kallweit wrote: >> On 09.10.2018 16:40, Chris Clayton wrote: >>> Thanks to Maciej and Heiner for their replies. >>> >>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>> Hi again, >>>>> >>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >>>>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >>>>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >>>>> 14-15ms to more than 1000ms. >>>> >>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>> state (before a suspend) and in the broken state (after a resume). >>>> Maybe there will be some obvious in the difference. >>>> >>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>> >>> Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. >>> >>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. >>> Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. >>> >> Hmm, this is very weird, especially taking into account that in your original >> report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start() >> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >> register values seem to be the same before and after resume. So how can the >> chip behave differently? >> So far my best guess is that some chip quirk causes it to accept writes to >> register RxConfig, but to misinterpret or ignore the written value. >> So far your report is the only one (affecting RTL8411), but we don't know >> whether other chip versions are affected too. > > Also, it is interesting that even if one removes a call to > rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get > written to moments later by rtl_set_rx_mode(). > > The only chip accesses in the meantime seems to be a write to TxConfig by > rtl_set_tx_config_registers() and then a read of RxConfig plus two writes > to MAR0 earlier in rtl_set_rx_mode(). > > My proposals are: > 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" > in rtl_hw_start(). > Maybe the chip does not like sometimes that RxConfig is written before > TxConfig. > This change made no difference. Networking still dies if I open a browser or leave ping running long enough. > 2) Check the original value of RxConfig (after a resume) before > rtl_init_rxcfg() overwrites it (compile tested only): > --- r8169.c.ori > +++ r8169.c > @@ -5155,6 +5155,9 @@ > /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ > RTL_R8(tp, IntrMask); > RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); > + > + pr_notice("RxConfig before init was %.8x\n", > + (unsigned int)RTL_R32(tp, RxConfig)); > rtl_init_rxcfg(tp); > rtl_set_tx_config_registers(tp); > > > This should be the value that you got when you removed the call to > rtl_init_rxcfg() for testing. > Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() > writes (under the "default:" label for your NIC model). > This might be more interesting. Through a combination of viewing the output from pr_notice() and the output from "ethtool -d", I can see RxConfig with the following values During boot: 0x00028700 Before suspend: 0x0002870e During resume: 0x00024000 Post resume: 0x0002870e As I did with 4.18.10 early on in the process, I removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, installed and rebooted. Now I see the following values: During boot: 0x00028700 Before suspend: 0x0002870e During resume: 0x00024000 Post resume: 0x0002400e As with 4.18.10, networking now appears to be stable after the resume. Starting a browser results in my homepage being displayed and I've spent a few minutes surfing with no interruptions. Similarly, ping runs without stopping. I simply don't know enough to know what might now be enabled or disabled by this change in value, but hopefully it will provide a clue to someone as to what is going on. Chris > Hope this helps, > Maciej > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-10 22:49 ` Chris Clayton @ 2018-10-11 0:12 ` Maciej S. Szmigiero 2018-10-11 8:24 ` Chris Clayton 0 siblings, 1 reply; 22+ messages in thread From: Maciej S. Szmigiero @ 2018-10-11 0:12 UTC (permalink / raw) To: Chris Clayton Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 11.10.2018 00:49, Chris Clayton wrote: >> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() >> writes (under the "default:" label for your NIC model). >> > > This might be more interesting. Through a combination of viewing the output from pr_notice() and the output from > "ethtool -d", I can see RxConfig with the following values > > During boot: 0x00028700 > Before suspend: 0x0002870e > During resume: 0x00024000 > Post resume: 0x0002870e > > As I did with 4.18.10 early on in the process, I removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, > installed and rebooted. Now I see the following values: > > During boot: 0x00028700 > Before suspend: 0x0002870e > During resume: 0x00024000 > Post resume: 0x0002400e > Now we can finally see some difference... Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this is kind of expected - one can see that the working configuration post-resume has bit 14 (or 0x4000) set, too. This bit is described in the driver as RX_MULTI_EN ("8111c only") and is set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35. RTL_GIGA_MAC_VER_35 is described in the driver as being in the same family as your RTL_GIGA_MAC_VER_38, so can you please try the following change: --- r8169.c +++ r8169.c @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816 case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24: case RTL_GIGA_MAC_VER_34: case RTL_GIGA_MAC_VER_35: + case RTL_GIGA_MAC_VER_38: RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST); break; case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: This will add RX_MULTI_EN also for your chip model (you need to add back the call to rtl_init_rxcfg() to rtl_hw_start(), naturally). If this does not help then I would try another values in the above write: 1) RTL_W32(tp, RxConfig, 0x00024000); 2) RTL_W32(tp, RxConfig, 0x00004000); 3) RTL_W32(tp, RxConfig, RX_DMA_BURST); 4) RTL_W32(tp, RxConfig, RX128_INT_EN); > Chris Maciej ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-11 0:12 ` Maciej S. Szmigiero @ 2018-10-11 8:24 ` Chris Clayton 2018-10-11 12:23 ` Maciej S. Szmigiero 0 siblings, 1 reply; 22+ messages in thread From: Chris Clayton @ 2018-10-11 8:24 UTC (permalink / raw) To: Maciej S. Szmigiero Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 11/10/2018 01:12, Maciej S. Szmigiero wrote: > On 11.10.2018 00:49, Chris Clayton wrote: >>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() >>> writes (under the "default:" label for your NIC model). >>> >> >> This might be more interesting. Through a combination of viewing the output from pr_notice() and the output from >> "ethtool -d", I can see RxConfig with the following values >> >> During boot: 0x00028700 >> Before suspend: 0x0002870e >> During resume: 0x00024000 >> Post resume: 0x0002870e >> >> As I did with 4.18.10 early on in the process, I removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, >> installed and rebooted. Now I see the following values: >> >> During boot: 0x00028700 >> Before suspend: 0x0002870e >> During resume: 0x00024000 >> Post resume: 0x0002400e >> > > Now we can finally see some difference... > Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST > (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this > is kind of expected - one can see that the working configuration > post-resume has bit 14 (or 0x4000) set, too. > > This bit is described in the driver as RX_MULTI_EN ("8111c only") and is > set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35. > > RTL_GIGA_MAC_VER_35 is described in the driver as being in the same > family as your RTL_GIGA_MAC_VER_38, so can you please try the following > change: > --- r8169.c > +++ r8169.c > @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816 > case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24: > case RTL_GIGA_MAC_VER_34: > case RTL_GIGA_MAC_VER_35: > + case RTL_GIGA_MAC_VER_38: > RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST); > break; > case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: > > This will add RX_MULTI_EN also for your chip model (you need to add back > the call to rtl_init_rxcfg() to rtl_hw_start(), naturally). > That's done the trick. With the above change applied, my network runs running fine after a suspend/resume cycle and the ping times are back in the 14-15ms range. Chris > If this does not help then I would try another values in the above write: > 1) RTL_W32(tp, RxConfig, 0x00024000); > 2) RTL_W32(tp, RxConfig, 0x00004000); > 3) RTL_W32(tp, RxConfig, RX_DMA_BURST); > 4) RTL_W32(tp, RxConfig, RX128_INT_EN); > >> Chris > > Maciej > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-11 8:24 ` Chris Clayton @ 2018-10-11 12:23 ` Maciej S. Szmigiero 2018-10-11 13:34 ` Chris Clayton 0 siblings, 1 reply; 22+ messages in thread From: Maciej S. Szmigiero @ 2018-10-11 12:23 UTC (permalink / raw) To: Chris Clayton Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 11.10.2018 10:24, Chris Clayton wrote: > On 11/10/2018 01:12, Maciej S. Szmigiero wrote: >> On 11.10.2018 00:49, Chris Clayton wrote: >>>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() >>>> writes (under the "default:" label for your NIC model). >>>> >>> >>> This might be more interesting. Through a combination of viewing the output from pr_notice() and the output from >>> "ethtool -d", I can see RxConfig with the following values >>> >>> During boot: 0x00028700 >>> Before suspend: 0x0002870e >>> During resume: 0x00024000 >>> Post resume: 0x0002870e >>> >>> As I did with 4.18.10 early on in the process, I removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, >>> installed and rebooted. Now I see the following values: >>> >>> During boot: 0x00028700 >>> Before suspend: 0x0002870e >>> During resume: 0x00024000 >>> Post resume: 0x0002400e >>> >> >> Now we can finally see some difference... >> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST >> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this >> is kind of expected - one can see that the working configuration >> post-resume has bit 14 (or 0x4000) set, too. >> >> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is >> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35. >> >> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same >> family as your RTL_GIGA_MAC_VER_38, so can you please try the following >> change: >> --- r8169.c >> +++ r8169.c >> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816 >> case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24: >> case RTL_GIGA_MAC_VER_34: >> case RTL_GIGA_MAC_VER_35: >> + case RTL_GIGA_MAC_VER_38: >> RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST); >> break; >> case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: >> >> This will add RX_MULTI_EN also for your chip model (you need to add back >> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally). >> > > That's done the trick. With the above change applied, my network runs running fine after a suspend/resume cycle and the > ping times are back in the 14-15ms range. Nice! I will submit a patch, it would be great if you could test it and then add a "Tested-by:" tag. > Chris Maciej ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-11 12:23 ` Maciej S. Szmigiero @ 2018-10-11 13:34 ` Chris Clayton 0 siblings, 0 replies; 22+ messages in thread From: Chris Clayton @ 2018-10-11 13:34 UTC (permalink / raw) To: Maciej S. Szmigiero Cc: Heiner Kallweit, David S. Miller, Azat Khuzhin, Greg Kroah-Hartman, Realtek linux nic maintainers, linux-kernel On 11/10/2018 13:23, Maciej S. Szmigiero wrote: > On 11.10.2018 10:24, Chris Clayton wrote: >> On 11/10/2018 01:12, Maciej S. Szmigiero wrote: >>> On 11.10.2018 00:49, Chris Clayton wrote: >>>>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() >>>>> writes (under the "default:" label for your NIC model). >>>>> >>>> >>>> This might be more interesting. Through a combination of viewing the output from pr_notice() and the output from >>>> "ethtool -d", I can see RxConfig with the following values >>>> >>>> During boot: 0x00028700 >>>> Before suspend: 0x0002870e >>>> During resume: 0x00024000 >>>> Post resume: 0x0002870e >>>> >>>> As I did with 4.18.10 early on in the process, I removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, >>>> installed and rebooted. Now I see the following values: >>>> >>>> During boot: 0x00028700 >>>> Before suspend: 0x0002870e >>>> During resume: 0x00024000 >>>> Post resume: 0x0002400e >>>> >>> >>> Now we can finally see some difference... >>> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST >>> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this >>> is kind of expected - one can see that the working configuration >>> post-resume has bit 14 (or 0x4000) set, too. >>> >>> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is >>> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35. >>> >>> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same >>> family as your RTL_GIGA_MAC_VER_38, so can you please try the following >>> change: >>> --- r8169.c >>> +++ r8169.c >>> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816 >>> case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24: >>> case RTL_GIGA_MAC_VER_34: >>> case RTL_GIGA_MAC_VER_35: >>> + case RTL_GIGA_MAC_VER_38: >>> RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST); >>> break; >>> case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: >>> >>> This will add RX_MULTI_EN also for your chip model (you need to add back >>> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally). >>> >> >> That's done the trick. With the above change applied, my network runs running fine after a suspend/resume cycle and the >> ping times are back in the 14-15ms range. > > Nice! > > I will submit a patch, it would be great if you could test it and then > add a "Tested-by:" tag. > Will do, Maciej. Thanks for solving this. >> Chris > > Maciej > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-09 14:40 ` Chris Clayton 2018-10-09 20:36 ` Heiner Kallweit @ 2018-10-09 21:39 ` Heiner Kallweit 2018-10-09 23:32 ` Chris Clayton 1 sibling, 1 reply; 22+ messages in thread From: Heiner Kallweit @ 2018-10-09 21:39 UTC (permalink / raw) To: Chris Clayton, Maciej S. Szmigiero Cc: Azat Khuzhin, Realtek linux nic maintainers, linux-kernel On 09.10.2018 16:40, Chris Clayton wrote: > Thanks to Maciej and Heiner for their replies. > > On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >> On 07.10.2018 21:36, Chris Clayton wrote: >>> Hi again, >>> >>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >>> 14-15ms to more than 1000ms. >> >> You can try comparing chip registers (ethtool -d eth0) in the working >> state (before a suspend) and in the broken state (after a resume). >> Maybe there will be some obvious in the difference. >> >> The same goes for the PCI configuration (lspci -d :8168 -vv). >> > Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. > > Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. > Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. > > I've attached files I redirected the outputs to. > > Please don't hesitate to ask for any other information needed to solve this problem. In the meantime, I've now got > scripts that stop the network during suspend and restart it during resume. (Those scripts were removed whilst I gathered > the diagnostics shown in the attachments.) > I'd like to check whether it may be a timing issue. The following experimental patch adds a PCI commit after writing register ChipCmd. Could you please check whether it changes anything? diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 7d3f671e1..f3c359492 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -4641,6 +4641,7 @@ static void rtl_hw_start(struct rtl8169_private *tp) /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ RTL_R8(tp, IntrMask); RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); + RTL_R8(tp, ChipCmd); rtl_init_rxcfg(tp); rtl_set_tx_config_registers(tp); -- 2.19.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) 2018-10-09 21:39 ` Heiner Kallweit @ 2018-10-09 23:32 ` Chris Clayton 0 siblings, 0 replies; 22+ messages in thread From: Chris Clayton @ 2018-10-09 23:32 UTC (permalink / raw) To: Heiner Kallweit, Maciej S. Szmigiero Cc: Azat Khuzhin, Realtek linux nic maintainers, linux-kernel On 09/10/2018 22:39, Heiner Kallweit wrote: > On 09.10.2018 16:40, Chris Clayton wrote: >> Thanks to Maciej and Heiner for their replies. >> >> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>> On 07.10.2018 21:36, Chris Clayton wrote: >>>> Hi again, >>>> >>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the >>>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my >>>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed >>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from >>>> 14-15ms to more than 1000ms. >>> >>> You can try comparing chip registers (ethtool -d eth0) in the working >>> state (before a suspend) and in the broken state (after a resume). >>> Maybe there will be some obvious in the difference. >>> >>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>> >> Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. >> >> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. >> Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. >> >> I've attached files I redirected the outputs to. >> >> Please don't hesitate to ask for any other information needed to solve this problem. In the meantime, I've now got >> scripts that stop the network during suspend and restart it during resume. (Those scripts were removed whilst I gathered >> the diagnostics shown in the attachments.) >> > I'd like to check whether it may be a timing issue. The following experimental patch > adds a PCI commit after writing register ChipCmd. Could you please check whether > it changes anything? > > diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c > index 7d3f671e1..f3c359492 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -4641,6 +4641,7 @@ static void rtl_hw_start(struct rtl8169_private *tp) > /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ > RTL_R8(tp, IntrMask); > RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); > + RTL_R8(tp, ChipCmd); > rtl_init_rxcfg(tp); > rtl_set_tx_config_registers(tp); > > Sorry, this patch doesn't make any difference - my network still fails. After a suspend/resume my browsers (chromium and firefox) both fail to open my home page (https://www.google.co.uk). The ping time for one of my ISP's name servers increases from 14-15ms to more than 1000ms, although it after a few pings it does reduce. As the screen grab below shows, the network does eventually fail $ ping NS1 PING ns1 (90.207.238.97): 56 data bytes 64 bytes from 90.207.238.97: icmp_seq=0 ttl=251 time=1017.289 ms 64 bytes from 90.207.238.97: icmp_seq=1 ttl=251 time=1018.051 ms 64 bytes from 90.207.238.97: icmp_seq=2 ttl=251 time=1015.271 ms 64 bytes from 90.207.238.97: icmp_seq=3 ttl=251 time=1015.495 ms 64 bytes from 90.207.238.97: icmp_seq=6 ttl=251 time=1015.646 ms 64 bytes from 90.207.238.97: icmp_seq=7 ttl=251 time=1022.609 ms 64 bytes from 90.207.238.97: icmp_seq=8 ttl=251 time=1015.612 ms 64 bytes from 90.207.238.97: icmp_seq=10 ttl=251 time=1015.551 ms 64 bytes from 90.207.238.97: icmp_seq=12 ttl=251 time=1015.446 ms 64 bytes from 90.207.238.97: icmp_seq=13 ttl=251 time=1015.657 ms 64 bytes from 90.207.238.97: icmp_seq=14 ttl=251 time=1015.614 ms 64 bytes from 90.207.238.97: icmp_seq=15 ttl=251 time=1015.651 ms 64 bytes from 90.207.238.97: icmp_seq=17 ttl=251 time=1015.459 ms 64 bytes from 90.207.238.97: icmp_seq=18 ttl=251 time=1015.443 ms 64 bytes from 90.207.238.97: icmp_seq=19 ttl=251 time=1015.936 ms 64 bytes from 90.207.238.97: icmp_seq=20 ttl=251 time=1015.681 ms 64 bytes from 90.207.238.97: icmp_seq=22 ttl=251 time=1015.410 ms 64 bytes from 90.207.238.97: icmp_seq=23 ttl=251 time=1015.487 ms 64 bytes from 90.207.238.97: icmp_seq=24 ttl=251 time=1016.169 ms 64 bytes from 90.207.238.97: icmp_seq=25 ttl=251 time=1015.659 ms 64 bytes from 90.207.238.97: icmp_seq=26 ttl=251 time=14.606 ms 64 bytes from 90.207.238.97: icmp_seq=30 ttl=251 time=32.765 ms 64 bytes from 90.207.238.97: icmp_seq=31 ttl=251 time=115.052 ms 64 bytes from 90.207.238.97: icmp_seq=33 ttl=251 time=757.115 ms 64 bytes from 90.207.238.97: icmp_seq=34 ttl=251 time=176.696 ms 64 bytes from 90.207.238.97: icmp_seq=35 ttl=251 time=1017.462 ms 64 bytes from 90.207.238.97: icmp_seq=36 ttl=251 time=16.394 ms 64 bytes from 90.207.238.97: icmp_seq=37 ttl=251 time=20.402 ms 64 bytes from 90.207.238.97: icmp_seq=38 ttl=251 time=37.795 ms 64 bytes from 90.207.238.97: icmp_seq=39 ttl=251 time=141.997 ms 92 bytes from laptop.local.lan (192.168.0.20): Destination Host Unreachable 92 bytes from laptop.local.lan (192.168.0.20): Destination Host Unreachable ... Chris ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2018-10-11 13:34 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-09-28 15:54 R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) Maciej S. Szmigiero 2018-09-28 22:00 ` Chris Clayton 2018-09-28 22:13 ` Heiner Kallweit 2018-09-29 7:25 ` Chris Clayton 2018-09-29 7:38 ` Chris Clayton 2018-10-04 8:41 ` Chris Clayton 2018-10-07 19:36 ` Chris Clayton 2018-10-09 12:32 ` Maciej S. Szmigiero 2018-10-09 14:40 ` Chris Clayton 2018-10-09 20:36 ` Heiner Kallweit 2018-10-10 0:24 ` Maciej S. Szmigiero 2018-10-10 8:09 ` Chris Clayton 2018-10-10 8:51 ` Chris Clayton 2018-10-10 22:30 ` Chris Clayton 2018-10-10 22:32 ` Chris Clayton 2018-10-10 22:49 ` Chris Clayton 2018-10-11 0:12 ` Maciej S. Szmigiero 2018-10-11 8:24 ` Chris Clayton 2018-10-11 12:23 ` Maciej S. Szmigiero 2018-10-11 13:34 ` Chris Clayton 2018-10-09 21:39 ` Heiner Kallweit 2018-10-09 23:32 ` Chris Clayton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.