All of lore.kernel.org
 help / color / mirror / Atom feed
* r8169 driver from kernel 5.0 crashing
@ 2019-03-10 19:02 VDR User
  2019-03-10 19:35 ` Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: VDR User @ 2019-03-10 19:02 UTC (permalink / raw)
  To: netdev

Hi, after updating to kernel 5.0, the nic driver (r8169) has been
crashing whenever I start using heavy traffic on it (for example,
xferring large files to the box across my lan). The destination
harddrive may be sleeping and need to spin-up, or not, but the box
itself does not suspend/hibernate. The nic becomes completely
unresponsive and all connections to the box drop. After what I think
is several minutes, the connection comes back to life. The problem
happens consistently but seemingly not consistently at the same point.
For example, I can xfer a few 4gb files and it will crash at around
2-3gb on the first file. The next time it might not crash until 2-3gb
on the second file.Prior to kernel 5.0 I was using 4.19.12 and this
problem didn't occur. I have since downgraded back to 4.19.12 pending
what response this post gets.

Thanks for any help or assistance on how to proceed!
-Derek

NOTE: I'm not subscribed to this mailing list so please CC me on any replies!

The nic is on-board ECS A75F-A:
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
        Subsystem: Elitegroup Computer Systems RTL8111/8168/8411 PCI
Express Gigabit Ethernet Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at c000 [size=256]
        Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: r8169

I see the following in the syslog:

[164572.785517] ------------[ cut here ]------------
[164572.785595] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
[164572.785677] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
dev_watchdog+0x1bb/0x1e0
[164572.785730] Modules linked in: snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi ohci_pci snd_hda_intel
snd_hda_codec snd_hwdep xhci_pci ehci_pci ohci_hcd xhci_hcd ehci_hcd
usbcore usb_common snd_hda_core snd_pcm snd_timer snd soundcore nfsd
auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
[164572.785807] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #2
[164572.785842] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
[164572.785878] EIP: dev_watchdog+0x1bb/0x1e0
[164572.785912] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
00 00
[164572.785963] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
[164572.785998] ESI: f4cbc264 EDI: f4cbc000 EBP: f4c99f74 ESP: f4c99f4c
[164572.786033] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
[164572.786067] CR0: 80050033 CR2: 020dc000 CR3: 2b80b000 CR4: 00000690
[164572.786102] Call Trace:
[164572.786135]  <SOFTIRQ>
[164572.786168]  ? qdisc_put_unlocked+0x40/0x40
[164572.786203]  call_timer_fn+0x19/0xa0
[164572.786237]  run_timer_softirq+0x337/0x380
[164572.786270]  ? qdisc_put_unlocked+0x40/0x40
[164572.786302]  ? rcu_process_callbacks+0xcb/0x380
[164572.786337]  __do_softirq+0xd6/0x21c
[164572.786370]  ? __irqentry_text_end+0x18/0x18
[164572.786404]  call_on_stack+0x10/0x60
[164572.786435]  </SOFTIRQ>
[164572.786467]  ? irq_exit+0x91/0xc0
[164572.786495]  ? smp_apic_timer_interrupt+0x56/0xa0
[164572.786525]  ? apic_timer_interrupt+0xd5/0xdc
[164572.786555]  ? acpi_idle_enter_s2idle+0x60/0x60
[164572.786584]  ? cpuidle_enter_state+0x122/0x360
[164572.786614]  ? tick_nohz_idle_stop_tick+0x27b/0x2a0
[164572.786644]  ? cpuidle_enter+0xf/0x20
[164572.786673]  ? call_cpuidle+0x1c/0x40
[164572.786701]  ? do_idle+0x1e6/0x220
[164572.786730]  ? cpu_startup_entry+0x25/0x40
[164572.786758]  ? start_secondary+0x1a5/0x220
[164572.786787]  ? startup_32_smp+0x15f/0x164
[164572.786816] ---[ end trace 662a5195537dbad8 ]---

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing
  2019-03-10 19:02 r8169 driver from kernel 5.0 crashing VDR User
@ 2019-03-10 19:35 ` Heiner Kallweit
  2019-03-14  3:04   ` VDR User
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-10 19:35 UTC (permalink / raw)
  To: VDR User, netdev

On 10.03.2019 20:02, VDR User wrote:
> Hi, after updating to kernel 5.0, the nic driver (r8169) has been
> crashing whenever I start using heavy traffic on it (for example,
> xferring large files to the box across my lan). The destination
> harddrive may be sleeping and need to spin-up, or not, but the box
> itself does not suspend/hibernate. The nic becomes completely
> unresponsive and all connections to the box drop. After what I think
> is several minutes, the connection comes back to life. The problem
> happens consistently but seemingly not consistently at the same point.
> For example, I can xfer a few 4gb files and it will crash at around
> 2-3gb on the first file. The next time it might not crash until 2-3gb
> on the second file.Prior to kernel 5.0 I was using 4.19.12 and this
> problem didn't occur. I have since downgraded back to 4.19.12 pending
> what response this post gets.
> 
> Thanks for any help or assistance on how to proceed!
> -Derek
> 
Thanks for the report. Helpful would be:
- full dmesg output
- "lspci -vv" output (as root) for the network card
- ethtool -S <if> output

Can you test a recent 4.20 kernel? This would narrow down the number
of potentially problematic patches.

Heiner



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing
  2019-03-10 19:35 ` Heiner Kallweit
@ 2019-03-14  3:04   ` VDR User
  2019-03-14  6:43     ` Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: VDR User @ 2019-03-14  3:04 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 6268 bytes --]

Hi Heiner,

Thanks for your response. Request info follows..

> > Hi, after updating to kernel 5.0, the nic driver (r8169) has been
> > crashing whenever I start using heavy traffic on it (for example,
> > xferring large files to the box across my lan). The destination
> > harddrive may be sleeping and need to spin-up, or not, but the box
> > itself does not suspend/hibernate. The nic becomes completely
> > unresponsive and all connections to the box drop. After what I think
> > is several minutes, the connection comes back to life. The problem
> > happens consistently but seemingly not consistently at the same point.
> > For example, I can xfer a few 4gb files and it will crash at around
> > 2-3gb on the first file. The next time it might not crash until 2-3gb
> > on the second file.Prior to kernel 5.0 I was using 4.19.12 and this
> > problem didn't occur. I have since downgraded back to 4.19.12 pending
> > what response this post gets.
> >
> Thanks for the report. Helpful would be:
> - full dmesg output

Added as attachment.

> - "lspci -vv" output (as root) for the network card

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
        Subsystem: Elitegroup Computer Systems RTL8111/8168/8411 PCI
Express Gigabit Ethernet Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at c000 [size=256]
        Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 01
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+
FLReset- SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s unlimited, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+,
LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00000800
        Capabilities: [d0] Vital Product Data
                Not readable
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 40-01-00-00-68-4c-e0-00
        Kernel driver in use: r8169

> - ethtool -S <if> output

Unfortunately I just realized I did this _after_ ifdown/ifup'ing the
nic to get it back online so this is probably useless but I'll include
it anyway. If I get it to crash again I'll try to remember to get this
before restarting the nic:

NIC statistics:
     tx_packets: 8844844
     rx_packets: 23550316
     tx_errors: 0
     rx_errors: 0
     rx_missed: 13
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 23544796
     broadcast: 4420
     multicast: 1100
     tx_aborted: 0
     tx_underrun: 0

> Can you test a recent 4.20 kernel? This would narrow down the number
> of potentially problematic patches.

I compiled and test 4.20.15 and didn't experience any crashing. I then
switched back to 5.0.0 and this time I had to transfer significantly
more until the crash occured. I'm not sure but it seems like the
crashes happen when there's both outgoing & incoming traffic
simultaneously. Is the dmesg crash info helpful at all?

Thanks,
Derek

[-- Attachment #2: crash.dmesg.log --]
[-- Type: application/octet-stream, Size: 57368 bytes --]

[    0.000000] Linux version 5.0.0-amd (user@debian) (gcc version 8.2.0 (Debian 8.2.0-21)) #2 SMP PREEMPT Mon Mar 4 10:49:13 PST 2019
[    0.000000] x86/fpu: x87 FPU will use FXSAVE
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ebff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009ec00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bf468fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf469000-0x00000000bf4abfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf4ac000-0x00000000bf4b6fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bf4b7000-0x00000000bf4b9fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf4ba000-0x00000000bf4bafff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf4bb000-0x00000000bf4befff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf4bf000-0x00000000bf4c4fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf4c5000-0x00000000bf4c8fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf4c9000-0x00000000bf4f6fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf4f7000-0x00000000bf4f7fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf4f8000-0x00000000bf508fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf509000-0x00000000bf510fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf511000-0x00000000bf53bfff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf53c000-0x00000000bf73efff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf73f000-0x00000000bfb12fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bfb13000-0x00000000bfef3fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bfef4000-0x00000000bfefffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000011effffff] usable
[    0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
[    0.000000] SMBIOS 2.7 present.
[    0.000000] DMI: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2500.112 MHz processor
[    0.002845] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.002847] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.002858] last_pfn = 0xbff00 max_arch_pfn = 0x100000
[    0.002862] MTRR default type: uncachable
[    0.002863] MTRR fixed ranges enabled:
[    0.002864]   00000-9FFFF write-back
[    0.002865]   A0000-BFFFF write-through
[    0.002866]   C0000-CEFFF write-protect
[    0.002866]   CF000-E7FFF uncachable
[    0.002867]   E8000-FFFFF write-protect
[    0.002868] MTRR variable ranges enabled:
[    0.002869]   0 base 0000000000 mask FF80000000 write-back
[    0.002870]   1 base 0080000000 mask FFC0000000 write-back
[    0.002871]   2 base 00BFF00000 mask FFFFF00000 uncachable
[    0.002872]   3 disabled
[    0.002872]   4 disabled
[    0.002873]   5 disabled
[    0.002873]   6 disabled
[    0.002874]   7 disabled
[    0.002875] TOM2: 000000013f000000 aka 5104M
[    0.003023] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[    0.003135] e820: update [mem 0xbff00000-0xffffffff] usable ==> reserved
[    0.003150] initial memory mapped: [mem 0x00000000-0x01bfffff]
[    0.003154] Base memory trampoline at [(ptrval)] 9a000 size 16384
[    0.003178] BRK [0x017b2000, 0x017b2fff] PGTABLE
[    0.003191] RAMDISK: [mem 0x3773d000-0x37b95fff]
[    0.003194] Allocated new RAMDISK: [mem 0x36ba5000-0x36ffdfff]
[    0.005179] Move RAMDISK from [mem 0x3773d000-0x37b95fff] to [mem 0x36ba5000-0x36ffdfff]
[    0.005185] ACPI: Early table checksum verification disabled
[    0.009593] ACPI: RSDP 0x00000000000F0450 000024 (v02 ALASKA)
[    0.009598] ACPI: XSDT 0x00000000BF4AC070 00005C (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.009605] ACPI: FACP 0x00000000BF4B57B8 0000F4 (v04 ALASKA A M I    01072009 AMI  00010013)
[    0.009610] ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20181213/tbfadt-624)
[    0.009615] ACPI: DSDT 0x00000000BF4AC158 00965A (v02 ALASKA A M I    00000000 INTL 20051117)
[    0.009619] ACPI: FACS 0x00000000BF510F80 000040
[    0.009621] ACPI: APIC 0x00000000BF4B58B0 000072 (v03 ALASKA A M I    01072009 AMI  00010013)
[    0.009625] ACPI: MCFG 0x00000000BF4B5928 00003C (v01 A M I  GMCH945. 01072009 MSFT 00000097)
[    0.009628] ACPI: HPET 0x00000000BF4B5968 000038 (v01 ALASKA A M I    01072009 AMI  00000005)
[    0.009631] ACPI: BGRT 0x00000000BF4B59A0 00003C (v00 ALASKA A M I    01072009 AMI  00010013)
[    0.009634] ACPI: SSDT 0x00000000BF4B59E0 0007FE (v01 AMD    POWERNOW 00000001 AMD  00000001)
[    0.009638] ACPI: SSDT 0x00000000BF4B61E0 000695 (v02 AMD    ALIB     00000001 MSFT 04000000)
[    0.009647] ACPI: Local APIC address 0xfee00000
[    0.009650] 2191MB HIGHMEM available.
[    0.009652] 879MB LOWMEM available.
[    0.009653]   mapped low ram: 0 - 36ffe000
[    0.009654]   low ram: 0 - 36ffe000
[    0.009656] BRK [0x017b3000, 0x017b3fff] PGTABLE
[    0.009761] Zone ranges:
[    0.009762]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.009764]   Normal   [mem 0x0000000001000000-0x0000000036ffdfff]
[    0.009765]   HighMem  [mem 0x0000000036ffe000-0x00000000bfefffff]
[    0.009767] Movable zone start for each node
[    0.009767] Early memory node ranges
[    0.009769]   node   0: [mem 0x0000000000001000-0x000000000009dfff]
[    0.009770]   node   0: [mem 0x0000000000100000-0x00000000bf468fff]
[    0.009771]   node   0: [mem 0x00000000bf4ba000-0x00000000bf4bafff]
[    0.009773]   node   0: [mem 0x00000000bf4bf000-0x00000000bf4c4fff]
[    0.009774]   node   0: [mem 0x00000000bf4f7000-0x00000000bf4f7fff]
[    0.009775]   node   0: [mem 0x00000000bf73f000-0x00000000bfb12fff]
[    0.009776]   node   0: [mem 0x00000000bfef4000-0x00000000bfefffff]
[    0.009797] Zeroed struct page in unavailable ranges: 1810 pages
[    0.009798] Initmem setup node 0 [mem 0x0000000000001000-0x00000000bfefffff]
[    0.009801] On node 0 totalpages: 784366
[    0.009802]   DMA zone: 32 pages used for memmap
[    0.009803]   DMA zone: 0 pages reserved
[    0.009804]   DMA zone: 3997 pages, LIFO batch:0
[    0.009872]   Normal zone: 1728 pages used for memmap
[    0.009873]   Normal zone: 221182 pages, LIFO batch:63
[    0.013703]   HighMem zone: 559187 pages, LIFO batch:63
[    0.023434] Using APIC driver default
[    0.023717] ACPI: PM-Timer IO Port: 0x808
[    0.023719] ACPI: Local APIC address 0xfee00000
[    0.023724] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
[    0.023733] IOAPIC[0]: apic_id 3, version 33, address 0xfec00000, GSI 0-23
[    0.023735] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.023737] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[    0.023739] ACPI: IRQ0 used by override.
[    0.023740] ACPI: IRQ9 used by override.
[    0.023742] Using ACPI (MADT) for SMP configuration information
[    0.023744] ACPI: HPET id: 0xffffffff base: 0xfed00000
[    0.023748] smpboot: Allowing 4 CPUs, 2 hotplug CPUs
[    0.023774] [mem 0xbff00000-0xdfffffff] available for PCI devices
[    0.023777] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.023788] random: get_random_bytes called from start_kernel+0x78/0x3d2 with crng_init=0
[    0.023794] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:4 nr_node_ids:1
[    0.023966] percpu: Embedded 28 pages/cpu @(ptrval) s83084 r0 d31604 u114688
[    0.023972] pcpu-alloc: s83084 r0 d31604 u114688 alloc=28*4096
[    0.023973] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.023988] Built 1 zonelists, mobility grouping on.  Total pages: 782606
[    0.023990] Kernel command line: BOOT_IMAGE=/vmlinuz-5.0.0-amd root=UUID=5ecbae99-ac82-4d0b-be32-a70c1d07d9e3 ro
[    0.024146] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.024211] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.024213] BRK [0x017b4000, 0x017b4fff] PGTABLE
[    0.024220] Initializing CPU#0
[    0.024225] Initializing HighMem for node 0 (00036ffe:000bff00)
[    0.124766] Initializing Movable for node 0 (00000000:00000000)
[    0.127491] Memory: 3098864K/3137464K available (5402K kernel code, 251K rwdata, 1332K rodata, 484K init, 288K bss, 38600K reserved, 0K cma-reserved, 2236748K highmem)
[    0.127497] virtual kernel memory layout:
                   fixmap  : 0xfff16000 - 0xfffff000   ( 932 kB)
                 cpu_entry : 0xff800000 - 0xff931000   (1220 kB)
                   pkmap   : 0xff000000 - 0xff400000   (4096 kB)
                   vmalloc : 0xf77fe000 - 0xfeffe000   ( 120 MB)
                   lowmem  : 0xc0000000 - 0xf6ffe000   ( 879 MB)
                     .init : 0xc16e4000 - 0xc175d000   ( 484 kB)
                     .data : 0xc1546898 - 0xc16d6d40   (1601 kB)
                     .text : 0xc1000000 - 0xc1546898   (5402 kB)
[    0.127499] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[    0.127638] rcu: Preemptible hierarchical RCU implementation.
[    0.127640] rcu: 	RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
[    0.127642] 	Tasks RCU enabled.
[    0.127643] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[    0.127644] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.127656] NR_IRQS: 2304, nr_irqs: 456, preallocated irqs: 16
[    0.127933] CPU 0 irqstacks, hard=(ptrval) soft=(ptrval)
[    0.127971] spurious 8259A interrupt: IRQ7.
[    0.129581] Console: colour VGA+ 80x25
[    0.132630] printk: console [tty0] enabled
[    0.132662] ACPI: Core revision 20181213
[    0.132902] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[    0.132946] hpet clockevent registered
[    0.132951] APIC: Switch to symmetric I/O mode setup
[    0.132979] Enabling APIC mode:  Flat.  Using 1 I/O APICs
[    0.133417] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.137951] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2409a385f90, max_idle_ns: 440795263298 ns
[    0.138000] Calibrating delay loop (skipped), value calculated using timer frequency.. 5000.22 BogoMIPS (lpj=2500112)
[    0.138041] pid_max: default: 32768 minimum: 301
[    0.138111] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
[    0.138148] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
[    0.138350] Last level iTLB entries: 4KB 512, 2MB 16, 4MB 8
[    0.138383] Last level dTLB entries: 4KB 1024, 2MB 128, 4MB 64, 1GB 0
[    0.138418] Spectre V2 : Mitigation: Full AMD retpoline
[    0.138450] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.138608] Freeing SMP alternatives memory: 20K
[    0.139991] smpboot: CPU0: AMD A4-3300 APU with Radeon(tm) HD Graphics (family: 0x12, model: 0x1, stepping: 0x0)
[    0.145018] Performance Events: AMD PMU driver.
[    0.145079] ... version:                0
[    0.145121] ... bit width:              48
[    0.145162] ... generic registers:      4
[    0.145205] ... value mask:             0000ffffffffffff
[    0.145249] ... max period:             00007fffffffffff
[    0.145285] ... fixed-purpose events:   0
[    0.145316] ... event mask:             000000000000000f
[    0.147023] rcu: Hierarchical SRCU implementation.
[    0.151002] smp: Bringing up secondary CPUs ...
[    0.159102] CPU 1 irqstacks, hard=(ptrval) soft=(ptrval)
[    0.159107] x86: Booting SMP configuration:
[    0.159158] .... node  #0, CPUs:      #1
[    0.006040] Initializing CPU#1
[    0.160034] smp: Brought up 1 node, 2 CPUs
[    0.160105] smpboot: Max logical packages: 2
[    0.160138] smpboot: Total of 2 processors activated (10000.44 BogoMIPS)
[    0.161139] devtmpfs: initialized
[    0.161139] PM: Registering ACPI NVS region [mem 0xbf469000-0xbf4abfff] (274432 bytes)
[    0.161139] PM: Registering ACPI NVS region [mem 0xbf4b7000-0xbf4b9fff] (12288 bytes)
[    0.161139] PM: Registering ACPI NVS region [mem 0xbf4c5000-0xbf4c8fff] (16384 bytes)
[    0.161139] PM: Registering ACPI NVS region [mem 0xbf509000-0xbf510fff] (32768 bytes)
[    0.161174] PM: Registering ACPI NVS region [mem 0xbf53c000-0xbf73efff] (2109440 bytes)
[    0.162018] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.162052] futex hash table entries: 1024 (order: 4, 65536 bytes)
[    0.162205] NET: Registered protocol family 16
[    0.163015] cpuidle: using governor ladder
[    0.164008] cpuidle: using governor menu
[    0.164094] ACPI: bus type PCI registered
[    0.164185] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
[    0.164997] PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved in E820
[    0.165031] PCI: Using MMCONFIG for extended config space
[    0.165064] PCI: Using configuration type 1 for base access
[    0.165165] mtrr: your CPUs had inconsistent variable MTRR settings
[    0.165995] mtrr: probably your BIOS does not setup all CPUs.
[    0.166028] mtrr: corrected configuration.
[    0.168093] ACPI: Added _OSI(Module Device)
[    0.168093] ACPI: Added _OSI(Processor Device)
[    0.168102] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.168134] ACPI: Added _OSI(Processor Aggregator Device)
[    0.168167] ACPI: Added _OSI(Linux-Dell-Video)
[    0.168199] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    0.168233] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[    0.173166] ACPI: 3 ACPI AML tables successfully acquired and loaded
[    0.174967] ACPI: Interpreter enabled
[    0.175016] ACPI: (supports S0 S3 S5)
[    0.175048] ACPI: Using IOAPIC for interrupt routing
[    0.175226] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.175573] ACPI: Enabled 9 GPEs in block 00 to 1F
[    0.183345] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.183390] acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.183597] acpi PNP0A03:00: _OSC: platform does not support [PCIeCapability LTR]
[    0.183715] acpi PNP0A03:00: _OSC: not requesting control; platform does not support [PCIeCapability]
[    0.183755] acpi PNP0A03:00: _OSC: OS requested [PME AER PCIeCapability LTR]
[    0.183789] acpi PNP0A03:00: _OSC: platform willing to grant [PME AER]
[    0.183825] acpi PNP0A03:00: _OSC failed (AE_SUPPORT); disabling ASPM
[    0.184002] PCI host bridge to bus 0000:00
[    0.184036] pci_bus 0000:00: root bus resource [io  0x0000-0x03af window]
[    0.184070] pci_bus 0000:00: root bus resource [io  0x03e0-0x0cf7 window]
[    0.184106] pci_bus 0000:00: root bus resource [io  0x03b0-0x03df window]
[    0.184140] pci_bus 0000:00: root bus resource [io  0x1778-0xf19f window]
[    0.184175] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.184212] pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000dffff window]
[    0.184250] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfeb51fff window]
[    0.184288] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.184330] pci 0000:00:00.0: [1022:1705] type 00 class 0x060000
[    0.184447] pci 0000:00:01.0: [1002:9645] type 00 class 0x030000
[    0.184462] pci 0000:00:01.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]
[    0.184468] pci 0000:00:01.0: reg 0x14: [io  0xf000-0xf0ff]
[    0.184475] pci 0000:00:01.0: reg 0x18: [mem 0xfeb00000-0xfeb3ffff]
[    0.184497] pci 0000:00:01.0: enabling Extended Tags
[    0.184556] pci 0000:00:01.0: supports D1 D2
[    0.184636] pci 0000:00:01.1: [1002:1714] type 00 class 0x040300
[    0.184649] pci 0000:00:01.1: reg 0x10: [mem 0xfeb44000-0xfeb47fff]
[    0.184677] pci 0000:00:01.1: enabling Extended Tags
[    0.184730] pci 0000:00:01.1: supports D1 D2
[    0.184813] pci 0000:00:04.0: [1022:1709] type 01 class 0x060400
[    0.184839] pci 0000:00:04.0: enabling Extended Tags
[    0.184895] pci 0000:00:04.0: PME# supported from D0 D3hot D3cold
[    0.185013] pci 0000:00:10.0: [1022:7812] type 00 class 0x0c0330
[    0.185041] pci 0000:00:10.0: reg 0x10: [mem 0xfeb4a000-0xfeb4bfff 64bit]
[    0.185140] pci 0000:00:10.0: PME# supported from D0 D3hot D3cold
[    0.185251] pci 0000:00:10.1: [1022:7812] type 00 class 0x0c0330
[    0.185279] pci 0000:00:10.1: reg 0x10: [mem 0xfeb48000-0xfeb49fff 64bit]
[    0.185378] pci 0000:00:10.1: PME# supported from D0 D3hot D3cold
[    0.185487] pci 0000:00:11.0: [1022:7800] type 00 class 0x01018f
[    0.185505] pci 0000:00:11.0: reg 0x10: [io  0xf190-0xf197]
[    0.185513] pci 0000:00:11.0: reg 0x14: [io  0xf180-0xf183]
[    0.185522] pci 0000:00:11.0: reg 0x18: [io  0xf170-0xf177]
[    0.185530] pci 0000:00:11.0: reg 0x1c: [io  0xf160-0xf163]
[    0.185539] pci 0000:00:11.0: reg 0x20: [io  0xf150-0xf15f]
[    0.185547] pci 0000:00:11.0: reg 0x24: [mem 0xfeb51000-0xfeb517ff]
[    0.185567] pci 0000:00:11.0: set SATA to AHCI mode
[    0.185694] pci 0000:00:12.0: [1022:7807] type 00 class 0x0c0310
[    0.185710] pci 0000:00:12.0: reg 0x10: [mem 0xfeb50000-0xfeb50fff]
[    0.185845] pci 0000:00:12.2: [1022:7808] type 00 class 0x0c0320
[    0.185865] pci 0000:00:12.2: reg 0x10: [mem 0xfeb4f000-0xfeb4f0ff]
[    0.185937] pci 0000:00:12.2: supports D1 D2
[    0.185938] pci 0000:00:12.2: PME# supported from D0 D1 D2 D3hot
[    0.186036] pci 0000:00:13.0: [1022:7807] type 00 class 0x0c0310
[    0.186052] pci 0000:00:13.0: reg 0x10: [mem 0xfeb4e000-0xfeb4efff]
[    0.186188] pci 0000:00:13.2: [1022:7808] type 00 class 0x0c0320
[    0.186207] pci 0000:00:13.2: reg 0x10: [mem 0xfeb4d000-0xfeb4d0ff]
[    0.186279] pci 0000:00:13.2: supports D1 D2
[    0.186281] pci 0000:00:13.2: PME# supported from D0 D1 D2 D3hot
[    0.186373] pci 0000:00:14.0: [1022:780b] type 00 class 0x0c0500
[    0.186505] pci 0000:00:14.1: [1022:780c] type 00 class 0x01018a
[    0.186522] pci 0000:00:14.1: reg 0x10: [io  0xf140-0xf147]
[    0.186531] pci 0000:00:14.1: reg 0x14: [io  0xf130-0xf133]
[    0.186539] pci 0000:00:14.1: reg 0x18: [io  0xf120-0xf127]
[    0.186548] pci 0000:00:14.1: reg 0x1c: [io  0xf110-0xf113]
[    0.186556] pci 0000:00:14.1: reg 0x20: [io  0xf100-0xf10f]
[    0.186575] pci 0000:00:14.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.186605] pci 0000:00:14.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.186636] pci 0000:00:14.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.186666] pci 0000:00:14.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.186779] pci 0000:00:14.2: [1022:780d] type 00 class 0x040300
[    0.186802] pci 0000:00:14.2: reg 0x10: [mem 0xfeb40000-0xfeb43fff 64bit]
[    0.186862] pci 0000:00:14.2: PME# supported from D0 D3hot D3cold
[    0.186946] pci 0000:00:14.3: [1022:780e] type 00 class 0x060100
[    0.187091] pci 0000:00:14.4: [1022:780f] type 01 class 0x060401
[    0.187205] pci 0000:00:14.5: [1022:7809] type 00 class 0x0c0310
[    0.187221] pci 0000:00:14.5: reg 0x10: [mem 0xfeb4c000-0xfeb4cfff]
[    0.187357] pci 0000:00:15.0: [1022:43a0] type 01 class 0x060400
[    0.187393] pci 0000:00:15.0: enabling Extended Tags
[    0.187460] pci 0000:00:15.0: supports D1 D2
[    0.187556] pci 0000:00:15.1: [1022:43a1] type 01 class 0x060400
[    0.187593] pci 0000:00:15.1: enabling Extended Tags
[    0.187659] pci 0000:00:15.1: supports D1 D2
[    0.187755] pci 0000:00:15.2: [1022:43a2] type 01 class 0x060400
[    0.187791] pci 0000:00:15.2: enabling Extended Tags
[    0.187857] pci 0000:00:15.2: supports D1 D2
[    0.187954] pci 0000:00:15.3: [1022:43a3] type 01 class 0x060400
[    0.187994] pci 0000:00:15.3: enabling Extended Tags
[    0.188060] pci 0000:00:15.3: supports D1 D2
[    0.188161] pci 0000:00:18.0: [1022:1700] type 00 class 0x060000
[    0.188264] pci 0000:00:18.1: [1022:1701] type 00 class 0x060000
[    0.188361] pci 0000:00:18.2: [1022:1702] type 00 class 0x060000
[    0.188458] pci 0000:00:18.3: [1022:1703] type 00 class 0x060000
[    0.188564] pci 0000:00:18.4: [1022:1704] type 00 class 0x060000
[    0.188661] pci 0000:00:18.5: [1022:1718] type 00 class 0x060000
[    0.188757] pci 0000:00:18.6: [1022:1716] type 00 class 0x060000
[    0.188854] pci 0000:00:18.7: [1022:1719] type 00 class 0x060000
[    0.188999] pci 0000:01:00.0: [1000:0058] type 00 class 0x010000
[    0.189017] pci 0000:01:00.0: reg 0x10: [io  0xe000-0xe0ff]
[    0.189029] pci 0000:01:00.0: reg 0x14: [mem 0xfe710000-0xfe713fff 64bit]
[    0.189041] pci 0000:01:00.0: reg 0x1c: [mem 0xfe700000-0xfe70ffff 64bit]
[    0.189056] pci 0000:01:00.0: reg 0x30: [mem 0xfe600000-0xfe6fffff pref]
[    0.189062] pci 0000:01:00.0: enabling Extended Tags
[    0.189133] pci 0000:01:00.0: supports D1 D2
[    0.189151] pci 0000:01:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:00:04.0 (capable of 16.000 Gb/s with 2.5 GT/s x8 link)
[    0.189220] pci 0000:01:00.0: disabling ASPM on pre-1.1 PCIe device.  You can enable it with 'pcie_aspm=force'
[    0.189261] pci 0000:00:04.0: PCI bridge to [bus 01]
[    0.189293] pci 0000:00:04.0:   bridge window [io  0xe000-0xefff]
[    0.189296] pci 0000:00:04.0:   bridge window [mem 0xfe600000-0xfe7fffff]
[    0.189309] pci_bus 0000:02: extended config space not accessible
[    0.189360] pci 0000:02:00.0: [105a:3d17] type 00 class 0x018000
[    0.189380] pci 0000:02:00.0: reg 0x10: [io  0xd400-0xd47f]
[    0.189401] pci 0000:02:00.0: reg 0x18: [io  0xd200-0xd2ff]
[    0.189412] pci 0000:02:00.0: reg 0x1c: [mem 0xfea7a000-0xfea7afff]
[    0.189423] pci 0000:02:00.0: reg 0x20: [mem 0xfea40000-0xfea5ffff]
[    0.189444] pci 0000:02:00.0: reg 0x30: [mem 0xfea70000-0xfea77fff pref]
[    0.189480] pci 0000:02:00.0: supports D1
[    0.189522] pci 0000:02:01.0: [105a:3d17] type 00 class 0x018000
[    0.189542] pci 0000:02:01.0: reg 0x10: [io  0xd380-0xd3ff]
[    0.189563] pci 0000:02:01.0: reg 0x18: [io  0xd100-0xd1ff]
[    0.189574] pci 0000:02:01.0: reg 0x1c: [mem 0xfea79000-0xfea79fff]
[    0.189586] pci 0000:02:01.0: reg 0x20: [mem 0xfea20000-0xfea3ffff]
[    0.189607] pci 0000:02:01.0: reg 0x30: [mem 0xfea68000-0xfea6ffff pref]
[    0.189642] pci 0000:02:01.0: supports D1
[    0.189686] pci 0000:02:02.0: [105a:3d17] type 00 class 0x018000
[    0.189706] pci 0000:02:02.0: reg 0x10: [io  0xd300-0xd37f]
[    0.189727] pci 0000:02:02.0: reg 0x18: [io  0xd000-0xd0ff]
[    0.189738] pci 0000:02:02.0: reg 0x1c: [mem 0xfea78000-0xfea78fff]
[    0.189750] pci 0000:02:02.0: reg 0x20: [mem 0xfea00000-0xfea1ffff]
[    0.189771] pci 0000:02:02.0: reg 0x30: [mem 0xfea60000-0xfea67fff pref]
[    0.189806] pci 0000:02:02.0: supports D1
[    0.189877] pci 0000:00:14.4: PCI bridge to [bus 02] (subtractive decode)
[    0.189909] pci 0000:00:14.4:   bridge window [io  0xd000-0xdfff]
[    0.189913] pci 0000:00:14.4:   bridge window [mem 0xfea00000-0xfeafffff]
[    0.189916] pci 0000:00:14.4:   bridge window [io  0x0000-0x03af window] (subtractive decode)
[    0.189918] pci 0000:00:14.4:   bridge window [io  0x03e0-0x0cf7 window] (subtractive decode)
[    0.189919] pci 0000:00:14.4:   bridge window [io  0x03b0-0x03df window] (subtractive decode)
[    0.189921] pci 0000:00:14.4:   bridge window [io  0x1778-0xf19f window] (subtractive decode)
[    0.189923] pci 0000:00:14.4:   bridge window [mem 0x000a0000-0x000bffff window] (subtractive decode)
[    0.189924] pci 0000:00:14.4:   bridge window [mem 0x000c0000-0x000dffff window] (subtractive decode)
[    0.189926] pci 0000:00:14.4:   bridge window [mem 0xc0000000-0xfeb51fff window] (subtractive decode)
[    0.189967] pci 0000:00:15.0: PCI bridge to [bus 03]
[    0.190055] pci 0000:04:00.0: [10ec:8168] type 00 class 0x020000
[    0.190084] pci 0000:04:00.0: reg 0x10: [io  0xc000-0xc0ff]
[    0.190113] pci 0000:04:00.0: reg 0x18: [mem 0xd0004000-0xd0004fff 64bit pref]
[    0.190130] pci 0000:04:00.0: reg 0x20: [mem 0xd0000000-0xd0003fff 64bit pref]
[    0.190223] pci 0000:04:00.0: supports D1 D2
[    0.190225] pci 0000:04:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.190329] pci 0000:00:15.1: PCI bridge to [bus 04]
[    0.190361] pci 0000:00:15.1:   bridge window [io  0xc000-0xcfff]
[    0.190368] pci 0000:00:15.1:   bridge window [mem 0xd0000000-0xd00fffff 64bit pref]
[    0.190420] pci 0000:05:00.0: [1106:3483] type 00 class 0x0c0330
[    0.190447] pci 0000:05:00.0: reg 0x10: [mem 0xfe900000-0xfe900fff 64bit]
[    0.190541] pci 0000:05:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.190619] pci 0000:00:15.2: PCI bridge to [bus 05]
[    0.190654] pci 0000:00:15.2:   bridge window [mem 0xfe900000-0xfe9fffff]
[    0.190716] pci 0000:06:00.0: [1095:3132] type 00 class 0x010400
[    0.190755] pci 0000:06:00.0: reg 0x10: [mem 0xfe884000-0xfe88407f 64bit]
[    0.190776] pci 0000:06:00.0: reg 0x18: [mem 0xfe880000-0xfe883fff 64bit]
[    0.190789] pci 0000:06:00.0: reg 0x20: [io  0xb000-0xb07f]
[    0.190815] pci 0000:06:00.0: reg 0x30: [mem 0xfe800000-0xfe87ffff pref]
[    0.190889] pci 0000:06:00.0: supports D1 D2
[    0.190953] pci 0000:06:00.0: disabling ASPM on pre-1.1 PCIe device.  You can enable it with 'pcie_aspm=force'
[    0.191002] pci 0000:00:15.3: PCI bridge to [bus 06]
[    0.191034] pci 0000:00:15.3:   bridge window [io  0xb000-0xbfff]
[    0.191037] pci 0000:00:15.3:   bridge window [mem 0xfe800000-0xfe8fffff]
[    0.191075] pci_bus 0000:00: on NUMA node 0
[    0.191508] ACPI: PCI Interrupt Link [LN24] (IRQs *24)
[    0.191576] ACPI: PCI Interrupt Link [LN25] (IRQs *25)
[    0.191642] ACPI: PCI Interrupt Link [LN26] (IRQs *26)
[    0.191706] ACPI: PCI Interrupt Link [LN27] (IRQs *27)
[    0.191771] ACPI: PCI Interrupt Link [LN28] (IRQs *28)
[    0.191836] ACPI: PCI Interrupt Link [LN29] (IRQs *29)
[    0.191900] ACPI: PCI Interrupt Link [LN30] (IRQs *30)
[    0.191965] ACPI: PCI Interrupt Link [LN31] (IRQs *31)
[    0.192031] ACPI: PCI Interrupt Link [LN32] (IRQs *32)
[    0.192096] ACPI: PCI Interrupt Link [LN33] (IRQs *33)
[    0.192161] ACPI: PCI Interrupt Link [LN34] (IRQs *34)
[    0.192227] ACPI: PCI Interrupt Link [LN35] (IRQs *35)
[    0.192292] ACPI: PCI Interrupt Link [LN36] (IRQs *36)
[    0.192356] ACPI: PCI Interrupt Link [LN37] (IRQs *37)
[    0.192421] ACPI: PCI Interrupt Link [LN38] (IRQs *38)
[    0.192486] ACPI: PCI Interrupt Link [LN39] (IRQs *39)
[    0.192551] ACPI: PCI Interrupt Link [LN40] (IRQs *40)
[    0.192616] ACPI: PCI Interrupt Link [LN41] (IRQs *41)
[    0.192681] ACPI: PCI Interrupt Link [LN42] (IRQs *42)
[    0.192746] ACPI: PCI Interrupt Link [LN43] (IRQs *43)
[    0.192811] ACPI: PCI Interrupt Link [LN44] (IRQs *44)
[    0.192877] ACPI: PCI Interrupt Link [LN45] (IRQs *45)
[    0.192942] ACPI: PCI Interrupt Link [LN46] (IRQs *46)
[    0.193021] ACPI: PCI Interrupt Link [LN47] (IRQs *47)
[    0.193086] ACPI: PCI Interrupt Link [LN48] (IRQs *48)
[    0.193151] ACPI: PCI Interrupt Link [LN49] (IRQs *49)
[    0.193216] ACPI: PCI Interrupt Link [LN50] (IRQs *50)
[    0.193281] ACPI: PCI Interrupt Link [LN51] (IRQs *51)
[    0.193346] ACPI: PCI Interrupt Link [LN52] (IRQs *52)
[    0.193411] ACPI: PCI Interrupt Link [LN53] (IRQs *53)
[    0.193475] ACPI: PCI Interrupt Link [LN54] (IRQs *54)
[    0.193540] ACPI: PCI Interrupt Link [LN55] (IRQs *55)
[    0.193634] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 11 14 15) *0
[    0.193731] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11 14 15) *0
[    0.193830] ACPI: PCI Interrupt Link [LNKC] (IRQs 4 5 7 10 11 14 15) *0
[    0.193927] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 6 7 10 11 14 15) *0
[    0.194018] ACPI: PCI Interrupt Link [LNKE] (IRQs 4 5 7 10 11 14 15) *0
[    0.194093] ACPI: PCI Interrupt Link [LNKF] (IRQs 4 5 7 10 11 14 15) *0
[    0.194167] ACPI: PCI Interrupt Link [LNKG] (IRQs 4 5 7 10 11 14 15) *0
[    0.194241] ACPI: PCI Interrupt Link [LNKH] (IRQs 4 5 7 10 11 14 15) *0
[    0.194458] pci 0000:00:01.0: vgaarb: setting as boot VGA device
[    0.194458] pci 0000:00:01.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.194458] pci 0000:00:01.0: vgaarb: bridge control possible
[    0.194458] vgaarb: loaded
[    0.194458] SCSI subsystem initialized
[    0.194458] libata version 3.00 loaded.
[    0.194458] PCI: Using ACPI for IRQ routing
[    0.202598] PCI: pci_cache_line_size set to 64 bytes
[    0.202686] e820: reserve RAM buffer [mem 0x0009ec00-0x0009ffff]
[    0.202688] e820: reserve RAM buffer [mem 0xbf469000-0xbfffffff]
[    0.202689] e820: reserve RAM buffer [mem 0xbf4bb000-0xbfffffff]
[    0.202691] e820: reserve RAM buffer [mem 0xbf4c5000-0xbfffffff]
[    0.202693] e820: reserve RAM buffer [mem 0xbf4f8000-0xbfffffff]
[    0.202694] e820: reserve RAM buffer [mem 0xbfb13000-0xbfffffff]
[    0.202695] e820: reserve RAM buffer [mem 0xbff00000-0xbfffffff]
[    0.202816] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.202847] hpet0: 3 comparators, 32-bit 14.318180 MHz counter
[    0.204040] clocksource: Switched to clocksource tsc-early
[    0.204075] pnp: PnP ACPI init
[    0.204220] system 00:00: [mem 0xe0000000-0xefffffff] has been reserved
[    0.204256] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active)
[    0.204549] system 00:01: [io  0x04d0-0x04d1] has been reserved
[    0.204580] system 00:01: [io  0x040b] has been reserved
[    0.204609] system 00:01: [io  0x04d6] has been reserved
[    0.204638] system 00:01: [io  0x0c00-0x0c01] has been reserved
[    0.204668] system 00:01: [io  0x0c14] has been reserved
[    0.204697] system 00:01: [io  0x0c50-0x0c51] has been reserved
[    0.204727] system 00:01: [io  0x0c52] has been reserved
[    0.204756] system 00:01: [io  0x0c6c] has been reserved
[    0.204786] system 00:01: [io  0x0c6f] has been reserved
[    0.204815] system 00:01: [io  0x0cd0-0x0cd1] has been reserved
[    0.204844] system 00:01: [io  0x0cd2-0x0cd3] has been reserved
[    0.204874] system 00:01: [io  0x0cd4-0x0cd5] has been reserved
[    0.204904] system 00:01: [io  0x0cd6-0x0cd7] has been reserved
[    0.204934] system 00:01: [io  0x0cd8-0x0cdf] has been reserved
[    0.204963] system 00:01: [io  0x0800-0x089f] has been reserved
[    0.204993] system 00:01: [io  0x0b20-0x0b3f] has been reserved
[    0.205023] system 00:01: [io  0x0900-0x090f] has been reserved
[    0.205058] system 00:01: [io  0x0910-0x091f] has been reserved
[    0.205088] system 00:01: [io  0xfe00-0xfefe] has been reserved
[    0.205119] system 00:01: [mem 0xfec00000-0xfec00fff] could not be reserved
[    0.205150] system 00:01: [mem 0xfee00000-0xfee00fff] has been reserved
[    0.205180] system 00:01: [mem 0xfed80000-0xfed8ffff] has been reserved
[    0.205211] system 00:01: [mem 0xfed61000-0xfed70fff] has been reserved
[    0.205241] system 00:01: [mem 0xfec10000-0xfec10fff] has been reserved
[    0.205272] system 00:01: [mem 0xfed00000-0xfed00fff] could not be reserved
[    0.205303] system 00:01: [mem 0xff000000-0xffffffff] has been reserved
[    0.205335] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.205450] system 00:02: [io  0x0a00-0x0a1f] has been reserved
[    0.205481] system 00:02: [io  0x0e80-0x0e8f] has been reserved
[    0.205512] system 00:02: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.205556] pnp 00:03: Plug and Play ACPI device, IDs PNP0303 PNP030b (active)
[    0.205775] pnp 00:04: [dma 0 disabled]
[    0.205819] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (active)
[    0.206080] pnp 00:05: [dma 3]
[    0.206175] pnp 00:05: Plug and Play ACPI device, IDs PNP0400 (active)
[    0.206199] pnp 00:06: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.206265] system 00:07: [io  0x04d0-0x04d1] has been reserved
[    0.206298] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.206369] system 00:08: [io  0x1770-0x1777] has been reserved
[    0.206402] system 00:08: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.206753] pnp: PnP ACPI: found 9 devices
[    0.243125] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.243221] pci 0000:00:04.0: PCI bridge to [bus 01]
[    0.243256] pci 0000:00:04.0:   bridge window [io  0xe000-0xefff]
[    0.243293] pci 0000:00:04.0:   bridge window [mem 0xfe600000-0xfe7fffff]
[    0.243333] pci 0000:00:14.4: PCI bridge to [bus 02]
[    0.243367] pci 0000:00:14.4:   bridge window [io  0xd000-0xdfff]
[    0.243404] pci 0000:00:14.4:   bridge window [mem 0xfea00000-0xfeafffff]
[    0.243446] pci 0000:00:15.0: PCI bridge to [bus 03]
[    0.243488] pci 0000:00:15.1: PCI bridge to [bus 04]
[    0.243523] pci 0000:00:15.1:   bridge window [io  0xc000-0xcfff]
[    0.243562] pci 0000:00:15.1:   bridge window [mem 0xd0000000-0xd00fffff 64bit pref]
[    0.243603] pci 0000:00:15.2: PCI bridge to [bus 05]
[    0.243640] pci 0000:00:15.2:   bridge window [mem 0xfe900000-0xfe9fffff]
[    0.243680] pci 0000:00:15.3: PCI bridge to [bus 06]
[    0.243714] pci 0000:00:15.3:   bridge window [io  0xb000-0xbfff]
[    0.243747] pci 0000:00:15.3:   bridge window [mem 0xfe800000-0xfe8fffff]
[    0.243785] pci_bus 0000:00: resource 4 [io  0x0000-0x03af window]
[    0.243786] pci_bus 0000:00: resource 5 [io  0x03e0-0x0cf7 window]
[    0.243788] pci_bus 0000:00: resource 6 [io  0x03b0-0x03df window]
[    0.243789] pci_bus 0000:00: resource 7 [io  0x1778-0xf19f window]
[    0.243791] pci_bus 0000:00: resource 8 [mem 0x000a0000-0x000bffff window]
[    0.243792] pci_bus 0000:00: resource 9 [mem 0x000c0000-0x000dffff window]
[    0.243794] pci_bus 0000:00: resource 10 [mem 0xc0000000-0xfeb51fff window]
[    0.243795] pci_bus 0000:01: resource 0 [io  0xe000-0xefff]
[    0.243797] pci_bus 0000:01: resource 1 [mem 0xfe600000-0xfe7fffff]
[    0.243798] pci_bus 0000:02: resource 0 [io  0xd000-0xdfff]
[    0.243800] pci_bus 0000:02: resource 1 [mem 0xfea00000-0xfeafffff]
[    0.243801] pci_bus 0000:02: resource 4 [io  0x0000-0x03af window]
[    0.243803] pci_bus 0000:02: resource 5 [io  0x03e0-0x0cf7 window]
[    0.243804] pci_bus 0000:02: resource 6 [io  0x03b0-0x03df window]
[    0.243805] pci_bus 0000:02: resource 7 [io  0x1778-0xf19f window]
[    0.243807] pci_bus 0000:02: resource 8 [mem 0x000a0000-0x000bffff window]
[    0.243808] pci_bus 0000:02: resource 9 [mem 0x000c0000-0x000dffff window]
[    0.243810] pci_bus 0000:02: resource 10 [mem 0xc0000000-0xfeb51fff window]
[    0.243811] pci_bus 0000:04: resource 0 [io  0xc000-0xcfff]
[    0.243813] pci_bus 0000:04: resource 2 [mem 0xd0000000-0xd00fffff 64bit pref]
[    0.243814] pci_bus 0000:05: resource 1 [mem 0xfe900000-0xfe9fffff]
[    0.243816] pci_bus 0000:06: resource 0 [io  0xb000-0xbfff]
[    0.243817] pci_bus 0000:06: resource 1 [mem 0xfe800000-0xfe8fffff]
[    0.243902] NET: Registered protocol family 2
[    0.244087] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes)
[    0.244125] TCP established hash table entries: 8192 (order: 3, 32768 bytes)
[    0.244171] TCP bind hash table entries: 8192 (order: 4, 65536 bytes)
[    0.244228] TCP: Hash tables configured (established 8192 bind 8192)
[    0.244283] UDP hash table entries: 512 (order: 2, 16384 bytes)
[    0.244319] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[    0.244390] NET: Registered protocol family 1
[    0.244433] pci 0000:00:01.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    0.244471] pci 0000:00:01.1: Linked as a consumer to 0000:00:01.0
[    0.917248] pci 0000:00:12.0: quirk_usb_early_handoff+0x0/0x685 took 656483 usecs
[    0.917479] pci 0000:00:12.2: PME# does not work under D3, disabling it
[    0.943241] pci 0000:00:13.0: quirk_usb_early_handoff+0x0/0x685 took 23466 usecs
[    0.943469] pci 0000:00:13.2: PME# does not work under D3, disabling it
[    0.968238] pci 0000:00:14.5: quirk_usb_early_handoff+0x0/0x685 took 24138 usecs
[    0.968477] PCI: CLS 64 bytes, default 64
[    0.968527] Unpacking initramfs...
[    1.522635] Freeing initrd memory: 4452K
[    1.522702] LVT offset 0 assigned for vector 0x400
[    1.522783] perf: AMD IBS detected (0x000000ff)
[    1.523477] workingset: timestamp_bits=30 max_order=20 bucket_order=0
[    1.523655] fuse init (API version 7.28)
[    1.523738] SGI XFS with security attributes, realtime, no debug enabled
[    1.542913] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    1.542958] io scheduler mq-deadline registered
[    1.542991] io scheduler kyber registered
[    1.543044] io scheduler bfq registered
[    1.543851] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[    1.543926] ACPI: Power Button [PWRB]
[    1.544023] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[    1.544080] ACPI: Power Button [PWRF]
[    1.544465] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    1.565032] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    1.565614] ahci 0000:00:11.0: version 3.0
[    1.565818] ahci 0000:00:11.0: AHCI 0001.0300 32 slots 4 ports 6 Gbps 0xf impl SATA mode
[    1.565861] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part 
[    1.566383] scsi host0: ahci
[    1.566568] scsi host1: ahci
[    1.566756] scsi host2: ahci
[    1.566923] scsi host3: ahci
[    1.567033] ata1: SATA max UDMA/133 abar m2048@0xfeb51000 port 0xfeb51100 irq 19
[    1.567072] ata2: SATA max UDMA/133 abar m2048@0xfeb51000 port 0xfeb51180 irq 19
[    1.567111] ata3: SATA max UDMA/133 abar m2048@0xfeb51000 port 0xfeb51200 irq 19
[    1.567149] ata4: SATA max UDMA/133 abar m2048@0xfeb51000 port 0xfeb51280 irq 19
[    1.567293] sata_sil24 0000:06:00.0: version 1.1
[    1.567692] scsi host4: sata_sil24
[    1.567869] scsi host5: sata_sil24
[    1.568004] ata5: SATA max UDMA/100 host m128@0xfe884000 port 0xfe880000 irq 19
[    1.568042] ata6: SATA max UDMA/100 host m128@0xfe884000 port 0xfe882000 irq 19
[    1.568143] sata_promise 0000:02:00.0: version 2.12
[    1.568628] scsi host6: sata_promise
[    1.568753] scsi host7: sata_promise
[    1.568873] scsi host8: sata_promise
[    1.568998] scsi host9: sata_promise
[    1.569065] ata7: SATA max UDMA/133 mmio m4096@0xfea7a000 ata 0xfea7a380 irq 20
[    1.569105] ata8: SATA max UDMA/133 mmio m4096@0xfea7a000 ata 0xfea7a280 irq 20
[    1.569144] ata9: SATA max UDMA/133 mmio m4096@0xfea7a000 ata 0xfea7a200 irq 20
[    1.569184] ata10: SATA max UDMA/133 mmio m4096@0xfea7a000 ata 0xfea7a300 irq 20
[    1.570938] scsi host10: sata_promise
[    1.571214] scsi host11: sata_promise
[    1.571385] scsi host12: sata_promise
[    1.571497] scsi host13: sata_promise
[    1.571562] ata11: SATA max UDMA/133 mmio m4096@0xfea79000 ata 0xfea79380 irq 21
[    1.571601] ata12: SATA max UDMA/133 mmio m4096@0xfea79000 ata 0xfea79280 irq 21
[    1.571640] ata13: SATA max UDMA/133 mmio m4096@0xfea79000 ata 0xfea79200 irq 21
[    1.571678] ata14: SATA max UDMA/133 mmio m4096@0xfea79000 ata 0xfea79300 irq 21
[    1.572246] scsi host14: sata_promise
[    1.572395] scsi host15: sata_promise
[    1.572515] scsi host16: sata_promise
[    1.572880] scsi host17: sata_promise
[    1.572952] ata15: SATA max UDMA/133 mmio m4096@0xfea78000 ata 0xfea78380 irq 22
[    1.572992] ata16: SATA max UDMA/133 mmio m4096@0xfea78000 ata 0xfea78280 irq 22
[    1.573030] ata17: SATA max UDMA/133 mmio m4096@0xfea78000 ata 0xfea78200 irq 22
[    1.573069] ata18: SATA max UDMA/133 mmio m4096@0xfea78000 ata 0xfea78300 irq 22
[    1.578444] libphy: r8169: probed
[    1.578905] r8169 0000:04:00.0 eth0: RTL8168evl/8111evl, 10:78:d2:2d:5e:52, XID 2c9, IRQ 24
[    1.578945] r8169 0000:04:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
[    1.579003] Fusion MPT base driver 3.04.20
[    1.579035] Copyright (c) 1999-2008 LSI Corporation
[    1.579074] Fusion MPT SAS Host driver 3.04.20
[    1.579291] mptbase: ioc0: Initiating bringup
[    1.789538] ioc0: LSISAS1068E B3: Capabilities={Initiator}
[    1.846952] ata3: SATA link down (SStatus 0 SControl 300)
[    1.847661] ata4: SATA link down (SStatus 0 SControl 300)
[    1.848008] ata2: SATA link down (SStatus 0 SControl 300)
[    1.893109] tsc: Refined TSC clocksource calibration: 2500.127 MHz
[    1.893182] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2409b17973e, max_idle_ns: 440795210105 ns
[    1.893293] clocksource: Switched to clocksource tsc
[    2.005350] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.020095] ata1.00: ATA-9: INTEL SSDSC2BW120A4, DC32, max UDMA/133
[    2.020153] ata1.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    2.046409] ata1.00: configured for UDMA/133
[    2.046867] scsi 0:0:0:0: Direct-Access     ATA      INTEL SSDSC2BW12 DC32 PQ: 0 ANSI: 5
[    2.047309] sd 0:0:0:0: [sda] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[    2.047387] sd 0:0:0:0: [sda] Write Protect is off
[    2.047421] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    2.047540] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.048292]  sda: sda1 sda2
[    2.048931] sd 0:0:0:0: [sda] Attached SCSI disk
[    3.038345] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[    3.040248] ata5.00: ATA-8: HGST HMS5C4040ALE640, MPAOA580, max UDMA/133
[    3.040299] ata5.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    3.042182] ata5.00: configured for UDMA/100
[    3.042625] scsi 4:0:0:0: Direct-Access     ATA      HGST HMS5C4040AL A580 PQ: 0 ANSI: 5
[    3.042921] sd 4:0:0:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    3.042966] sd 4:0:0:0: [sdb] 4096-byte physical blocks
[    3.043031] sd 4:0:0:0: [sdb] Write Protect is off
[    3.043068] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    3.043107] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.086762]  sdb: sdb1
[    3.087495] sd 4:0:0:0: [sdb] Attached SCSI disk
[    5.239344] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[    5.367334] ata6.00: ATA-8: TOSHIBA MD04ACA500, FP2A, max UDMA/100
[    5.367393] ata6.00: 9767541168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[    5.368759] ata6.00: configured for UDMA/100
[    5.369116] scsi 5:0:0:0: Direct-Access     ATA      TOSHIBA MD04ACA5 FP2A PQ: 0 ANSI: 5
[    5.369444] sd 5:0:0:0: [sdc] 9767541168 512-byte logical blocks: (5.00 TB/4.55 TiB)
[    5.369484] sd 5:0:0:0: [sdc] 4096-byte physical blocks
[    5.369577] sd 5:0:0:0: [sdc] Write Protect is off
[    5.369617] sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[    5.369677] sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    5.631506] sd 5:0:0:0: [sdc] Attached SCSI disk
[    6.786442] ata12: SATA link down (SStatus 0 SControl 300)
[    7.090077] ata13: SATA link down (SStatus 0 SControl 300)
[    7.394346] ata14: SATA link down (SStatus 0 SControl 300)
[    7.706129] ata16: SATA link down (SStatus 0 SControl 300)
[    8.482064] ata18: SATA link down (SStatus 0 SControl 300)
[   14.529053] scsi host18: ioc0: LSISAS1068E B3, FwRev=01210000h, Ports=1, MaxQ=483, IRQ=16
[   14.564558] Fusion MPT misc device (ioctl) driver 3.04.20
[   14.564682] mptctl: Registered with Fusion MPT base driver
[   14.564728] mptctl: /dev/mptctl @ (major,minor=10,220)
[   14.564800] i8042: PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
[   14.564834] i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
[   14.565251] serio: i8042 KBD port at 0x60,0x64 irq 1
[   14.565464] mousedev: PS/2 mouse device common for all mice
[   14.565538] rtc_cmos 00:06: RTC can wake from S4
[   14.565734] rtc_cmos 00:06: registered as rtc0
[   14.565768] rtc_cmos 00:06: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
[   14.565816] i2c /dev entries driver
[   14.565881] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0
[   14.565918] piix4_smbus 0000:00:14.0: Using register 0x2e for SMBus port selection
[   14.566355] piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20
[   14.566500] f71882fg: Found f71869 chip at 0xe80, revision 17
[   14.566546] ACPI Warning: SystemIO range 0x0000000000000E80-0x0000000000000E87 conflicts with OpRegion 0x0000000000000E80-0x0000000000000F7F (\HMIO) (20181213/utaddress-213)
[   14.566598] ACPI Warning: SystemIO range 0x0000000000000E80-0x0000000000000E87 conflicts with OpRegion 0x0000000000000E80-0x0000000000000E8F (\_SB.PCI0.SBRG.SIO1.RNTR) (20181213/utaddress-213)
[   14.566649] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   14.566823] NET: Registered protocol family 17
[   14.566868] Key type dns_resolver registered
[   14.566920] Using IPI No-Shortcut mode
[   14.566965] sched_clock: Marking stable (14561388055, 5040746)->(14670327938, -103899137)
[   14.567243] registered taskstats version 1
[   14.568595] rtc_cmos 00:06: setting system clock to 2019-03-13T01:46:44 UTC (1552441604)
[   14.569037] Freeing unused kernel image memory: 484K
[   14.575209] Write protecting kernel text and read-only data: 6752k
[   14.575258] Run /init as init process
[   14.594292] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[   14.687135] random: fast init done
[   14.691171] r8169 0000:04:00.0 enp4s0: renamed from eth0
[   14.698027] random: crng init done
[   15.533966] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
[   15.640122] NET: Registered protocol family 10
[   15.641605] Segment Routing with IPv6
[   15.654639] systemd[1]: systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
[   15.666188] systemd[1]: Detected architecture x86.
[   15.668257] systemd[1]: Set hostname to <debian>.
[   15.830403] systemd[1]: /lib/systemd/system/rpc-statd.service:13: PIDFile= references path below legacy directory /var/run/, updating /var/run/rpc.statd.pid → /run/rpc.statd.pid; please update the unit file accordingly.
[   15.847142] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[   15.847495] systemd[1]: Reached target System Time Synchronized.
[   15.847863] systemd[1]: Listening on fsck to fsckd communication Socket.
[   15.848188] systemd[1]: Listening on udev Kernel Socket.
[   15.848472] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[   15.848781] systemd[1]: Listening on udev Control Socket.
[   15.888228] RPC: Registered named UNIX socket transport module.
[   15.888271] RPC: Registered udp transport module.
[   15.888304] RPC: Registered tcp transport module.
[   15.888337] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   15.893560] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[   15.907167] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
[   16.110586] systemd-journald[229]: Received request to flush runtime journal from PID 1
[   16.132498] ACPI: bus type USB registered
[   16.132557] usbcore: registered new interface driver usbfs
[   16.132596] usbcore: registered new interface driver hub
[   16.133156] usbcore: registered new device driver usb
[   16.133912] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   16.134564] ehci-pci: EHCI PCI platform driver
[   16.134785] QUIRK: Enable AMD PLL fix
[   16.134820] ehci-pci 0000:00:12.2: EHCI Host Controller
[   16.134857] ehci-pci 0000:00:12.2: new USB bus registered, assigned bus number 1
[   16.134909] ehci-pci 0000:00:12.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[   16.134951] ehci-pci 0000:00:12.2: debug port 1
[   16.135046] ehci-pci 0000:00:12.2: irq 17, io mem 0xfeb4f000
[   16.135162] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[   16.142032] ehci-pci 0000:00:12.2: USB 2.0 started, EHCI 1.00
[   16.142317] hub 1-0:1.0: USB hub found
[   16.142369] hub 1-0:1.0: 5 ports detected
[   16.142734] xhci_hcd 0000:00:10.0: xHCI Host Controller
[   16.142775] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 2
[   16.142975] xhci_hcd 0000:00:10.0: hcc params 0x014042c3 hci version 0x96 quirks 0x0000000000000608
[   16.143360] hub 2-0:1.0: USB hub found
[   16.143406] hub 2-0:1.0: 2 ports detected
[   16.143662] xhci_hcd 0000:00:10.0: xHCI Host Controller
[   16.143704] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 3
[   16.143745] xhci_hcd 0000:00:10.0: Host supports USB 3.0  SuperSpeed
[   16.147013] ehci-pci 0000:00:13.2: EHCI Host Controller
[   16.147056] ehci-pci 0000:00:13.2: new USB bus registered, assigned bus number 4
[   16.147094] ehci-pci 0000:00:13.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[   16.147137] ehci-pci 0000:00:13.2: debug port 1
[   16.147199] ehci-pci 0000:00:13.2: irq 17, io mem 0xfeb4d000
[   16.148130] usb usb3: We don't know the algorithms for LPM for this host, disabling LPM.
[   16.148321] hub 3-0:1.0: USB hub found
[   16.148361] hub 3-0:1.0: 2 ports detected
[   16.154021] ehci-pci 0000:00:13.2: USB 2.0 started, EHCI 1.00
[   16.154242] hub 4-0:1.0: USB hub found
[   16.154280] hub 4-0:1.0: 5 ports detected
[   16.154616] xhci_hcd 0000:00:10.1: xHCI Host Controller
[   16.154653] xhci_hcd 0000:00:10.1: new USB bus registered, assigned bus number 5
[   16.154852] xhci_hcd 0000:00:10.1: hcc params 0x014042c3 hci version 0x96 quirks 0x0000000000000608
[   16.155214] hub 5-0:1.0: USB hub found
[   16.155255] hub 5-0:1.0: 2 ports detected
[   16.155395] xhci_hcd 0000:00:10.1: xHCI Host Controller
[   16.155434] xhci_hcd 0000:00:10.1: new USB bus registered, assigned bus number 6
[   16.155474] xhci_hcd 0000:00:10.1: Host supports USB 3.0  SuperSpeed
[   16.156413] ohci-pci: OHCI PCI platform driver
[   16.156639] ohci-pci 0000:00:12.0: OHCI PCI host controller
[   16.156687] ohci-pci 0000:00:12.0: new USB bus registered, assigned bus number 7
[   16.156774] ohci-pci 0000:00:12.0: irq 18, io mem 0xfeb50000
[   16.161317] usb usb6: We don't know the algorithms for LPM for this host, disabling LPM.
[   16.161516] hub 6-0:1.0: USB hub found
[   16.164284] hub 6-0:1.0: 2 ports detected
[   16.165226] xhci_hcd 0000:05:00.0: xHCI Host Controller
[   16.165271] xhci_hcd 0000:05:00.0: new USB bus registered, assigned bus number 8
[   16.165527] xhci_hcd 0000:05:00.0: hcc params 0x002841eb hci version 0x100 quirks 0x0000000000000090
[   16.182933] hub 8-0:1.0: USB hub found
[   16.182980] hub 8-0:1.0: 1 port detected
[   16.183138] xhci_hcd 0000:05:00.0: xHCI Host Controller
[   16.183171] xhci_hcd 0000:05:00.0: new USB bus registered, assigned bus number 9
[   16.183214] xhci_hcd 0000:05:00.0: Host supports USB 3.0  SuperSpeed
[   16.183266] usb usb9: We don't know the algorithms for LPM for this host, disabling LPM.
[   16.183437] hub 9-0:1.0: USB hub found
[   16.183472] hub 9-0:1.0: 4 ports detected
[   16.184380] input: HD-Audio Generic HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/sound/card0/input3
[   16.189598] snd_hda_codec_realtek hdaudioC1D0: autoconfig for ALC892: line_outs=3 (0x14/0x15/0x16/0x0/0x0) type:line
[   16.189646] snd_hda_codec_realtek hdaudioC1D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[   16.189681] snd_hda_codec_realtek hdaudioC1D0:    hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
[   16.189714] snd_hda_codec_realtek hdaudioC1D0:    mono: mono_out=0x0
[   16.189744] snd_hda_codec_realtek hdaudioC1D0:    dig-out=0x11/0x1e
[   16.189774] snd_hda_codec_realtek hdaudioC1D0:    inputs:
[   16.189804] snd_hda_codec_realtek hdaudioC1D0:      Front Mic=0x19
[   16.189835] snd_hda_codec_realtek hdaudioC1D0:      Rear Mic=0x18
[   16.189865] snd_hda_codec_realtek hdaudioC1D0:      Line=0x1a
[   16.212277] hub 7-0:1.0: USB hub found
[   16.212335] hub 7-0:1.0: 5 ports detected
[   16.212730] ohci-pci 0000:00:13.0: OHCI PCI host controller
[   16.212772] ohci-pci 0000:00:13.0: new USB bus registered, assigned bus number 10
[   16.212836] ohci-pci 0000:00:13.0: irq 18, io mem 0xfeb4e000
[   16.268336] hub 10-0:1.0: USB hub found
[   16.268382] hub 10-0:1.0: 5 ports detected
[   16.268719] ohci-pci 0000:00:14.5: OHCI PCI host controller
[   16.268757] ohci-pci 0000:00:14.5: new USB bus registered, assigned bus number 11
[   16.268822] ohci-pci 0000:00:14.5: irq 18, io mem 0xfeb4c000
[   16.309168] input: HD-Audio Generic Front Mic as /devices/pci0000:00/0000:00:14.2/sound/card1/input4
[   16.309267] input: HD-Audio Generic Rear Mic as /devices/pci0000:00/0000:00:14.2/sound/card1/input5
[   16.309324] input: HD-Audio Generic Line as /devices/pci0000:00/0000:00:14.2/sound/card1/input6
[   16.309381] input: HD-Audio Generic Line Out Front as /devices/pci0000:00/0000:00:14.2/sound/card1/input7
[   16.309438] input: HD-Audio Generic Line Out Surround as /devices/pci0000:00/0000:00:14.2/sound/card1/input8
[   16.309495] input: HD-Audio Generic Line Out CLFE as /devices/pci0000:00/0000:00:14.2/sound/card1/input9
[   16.309552] input: HD-Audio Generic Front Headphone as /devices/pci0000:00/0000:00:14.2/sound/card1/input10
[   16.332208] hub 11-0:1.0: USB hub found
[   16.332471] hub 11-0:1.0: 2 ports detected
[   16.503025] usb 8-1: new high-speed USB device number 2 using xhci_hcd
[   16.632154] hub 8-1:1.0: USB hub found
[   16.632416] hub 8-1:1.0: 4 ports detected
[   16.703259] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[   16.784077] RTL8211E Gigabit Ethernet r8169-400:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-400:00, irq=IGNORE)
[   17.059700] r8169 0000:04:00.0 enp4s0: Link is Down
[   17.124857] NFSD: starting 90-second grace period (net f0000098)
[   19.451843] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[   19.734817] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
[   19.738681] r8169 0000:04:00.0 enp4s0: Link is Up - 1Gbps/Full - flow control rx/tx
[   19.738749] IPv6: ADDRCONF(NETDEV_CHANGE): enp4s0: link becomes ready
[   23.272131] nfsd: last server has exited, flushing export cache
[   23.329379] NFSD: starting 90-second grace period (net f0000098)
[88009.651745] ------------[ cut here ]------------
[88009.651822] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
[88009.651901] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x1bb/0x1e0
[88009.651950] Modules linked in: snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic ohci_pci snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer xhci_pci ohci_hcd ehci_pci xhci_hcd ehci_hcd usbcore usb_common snd soundcore nfsd auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
[88009.652015] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #2
[88009.652047] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
[88009.652080] EIP: dev_watchdog+0x1bb/0x1e0
[88009.652112] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89 f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25 ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00 00 00
[88009.652154] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
[88009.652187] ESI: f4e0e264 EDI: f4e0e000 EBP: f4c99f74 ESP: f4c99f4c
[88009.652220] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
[88009.652252] CR0: 80050033 CR2: b7c594f0 CR3: 333da000 CR4: 00000690
[88009.652284] Call Trace:
[88009.652315]  <SOFTIRQ>
[88009.652347]  ? qdisc_put_unlocked+0x40/0x40
[88009.652383]  call_timer_fn+0x19/0xa0
[88009.652416]  run_timer_softirq+0x337/0x380
[88009.652450]  ? qdisc_put_unlocked+0x40/0x40
[88009.652483]  ? rcu_process_callbacks+0x2e3/0x380
[88009.652518]  __do_softirq+0xd6/0x21c
[88009.652550]  ? __irqentry_text_end+0x18/0x18
[88009.652584]  call_on_stack+0x10/0x60
[88009.652615]  </SOFTIRQ>
[88009.652649]  ? irq_exit+0x91/0xc0
[88009.652682]  ? smp_apic_timer_interrupt+0x56/0xa0
[88009.652716]  ? apic_timer_interrupt+0xd5/0xdc
[88009.652751]  ? acpi_idle_enter_s2idle+0x60/0x60
[88009.652787]  ? cpuidle_enter_state+0x122/0x360
[88009.652821]  ? cpuidle_enter+0xf/0x20
[88009.652854]  ? call_cpuidle+0x1c/0x40
[88009.652887]  ? do_idle+0x1e6/0x220
[88009.652916]  ? cpu_startup_entry+0x25/0x40
[88009.652945]  ? start_secondary+0x1a5/0x220
[88009.652973]  ? startup_32_smp+0x15f/0x164
[88009.653002] ---[ end trace b5906820bd38a07f ]---
[88087.942438] r8169 0000:04:00.0 enp4s0: Link is Down
[88093.059959] RTL8211E Gigabit Ethernet r8169-400:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-400:00, irq=IGNORE)
[88093.250189] r8169 0000:04:00.0 enp4s0: Link is Down
[88096.087354] r8169 0000:04:00.0 enp4s0: Link is Up - 1Gbps/Full - flow control rx/tx
[88096.087408] IPv6: ADDRCONF(NETDEV_CHANGE): enp4s0: link becomes ready

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing
  2019-03-14  3:04   ` VDR User
@ 2019-03-14  6:43     ` Heiner Kallweit
  2019-03-14 15:10       ` VDR User
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-14  6:43 UTC (permalink / raw)
  To: VDR User; +Cc: netdev

On 14.03.2019 04:04, VDR User wrote:
> Hi Heiner,
> 
> Thanks for your response. Request info follows..
> 
>>> Hi, after updating to kernel 5.0, the nic driver (r8169) has been
>>> crashing whenever I start using heavy traffic on it (for example,
>>> xferring large files to the box across my lan). The destination
>>> harddrive may be sleeping and need to spin-up, or not, but the box
>>> itself does not suspend/hibernate. The nic becomes completely
>>> unresponsive and all connections to the box drop. After what I think
>>> is several minutes, the connection comes back to life. The problem
>>> happens consistently but seemingly not consistently at the same point.
>>> For example, I can xfer a few 4gb files and it will crash at around
>>> 2-3gb on the first file. The next time it might not crash until 2-3gb
>>> on the second file.Prior to kernel 5.0 I was using 4.19.12 and this
>>> problem didn't occur. I have since downgraded back to 4.19.12 pending
>>> what response this post gets.
>>>
>> Thanks for the report. Helpful would be:
>> - full dmesg output
> 
> Added as attachment.
> 
>> - "lspci -vv" output (as root) for the network card
> 
[...]
>> Can you test a recent 4.20 kernel? This would narrow down the number
>> of potentially problematic patches.
> 
> I compiled and test 4.20.15 and didn't experience any crashing. I then
> switched back to 5.0.0 and this time I had to transfer significantly
> more until the crash occured. I'm not sure but it seems like the
> crashes happen when there's both outgoing & incoming traffic
> simultaneously. Is the dmesg crash info helpful at all?
> 
Thanks for the additional info and for testing 4.20.15.
To rule out that the issue is caused by a regression in network or
some other subsystem: Can you take the r8169.c from 4.20.15 and test
it on top of 5.0?
Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.

> Thanks,
> Derek
> 
Heiner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing
  2019-03-14  6:43     ` Heiner Kallweit
@ 2019-03-14 15:10       ` VDR User
  2019-03-14 18:08         ` Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: VDR User @ 2019-03-14 15:10 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: netdev

> Thanks for the additional info and for testing 4.20.15.
> To rule out that the issue is caused by a regression in network or
> some other subsystem: Can you take the r8169.c from 4.20.15 and test
> it on top of 5.0?
> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.

Sure, no problem! I'll copy the driver & recompile now actually.
Hopefully there aren't a ton of changes to r8169.c to sift through and
the cause isn't good at hiding itself!

Thanks,
Derek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing
  2019-03-14 15:10       ` VDR User
@ 2019-03-14 18:08         ` Heiner Kallweit
  2019-03-14 21:36           ` VDR User
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-14 18:08 UTC (permalink / raw)
  To: VDR User; +Cc: netdev

On 14.03.2019 16:10, VDR User wrote:
>> Thanks for the additional info and for testing 4.20.15.
>> To rule out that the issue is caused by a regression in network or
>> some other subsystem: Can you take the r8169.c from 4.20.15 and test
>> it on top of 5.0?
>> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
> 
> Sure, no problem! I'll copy the driver & recompile now actually.
> Hopefully there aren't a ton of changes to r8169.c to sift through and
> the cause isn't good at hiding itself!
> 
I checked the driver changes new in 5.0 and there are very few
functional changes. You could try to revert the following:

5317d5c6d47e ("r8169: use napi_consume_skb where possible")

> Thanks,
> Derek
> 
Heiner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing
  2019-03-14 18:08         ` Heiner Kallweit
@ 2019-03-14 21:36           ` VDR User
  2019-03-15 20:09             ` VDR User
  0 siblings, 1 reply; 30+ messages in thread
From: VDR User @ 2019-03-14 21:36 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: netdev

> >> Thanks for the additional info and for testing 4.20.15.
> >> To rule out that the issue is caused by a regression in network or
> >> some other subsystem: Can you take the r8169.c from 4.20.15 and test
> >> it on top of 5.0?
> >> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
> >
> > Sure, no problem! I'll copy the driver & recompile now actually.
> > Hopefully there aren't a ton of changes to r8169.c to sift through and
> > the cause isn't good at hiding itself!
> >
> I checked the driver changes new in 5.0 and there are very few
> functional changes. You could try to revert the following:
>
> 5317d5c6d47e ("r8169: use napi_consume_skb where possible")

Will do, and fwiw, while I haven't been able to do tons of testing
today, I haven't been able to trigger the crash after replacing
5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the
file and revert the change you mentioned, and report back my findings.

Thanks,
Derek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing
  2019-03-14 21:36           ` VDR User
@ 2019-03-15 20:09             ` VDR User
  2019-03-15 20:26               ` r8169 driver from kernel 5.0 crashing - napi_consume_skb Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: VDR User @ 2019-03-15 20:09 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: netdev

> > >> Thanks for the additional info and for testing 4.20.15.
> > >> To rule out that the issue is caused by a regression in network or
> > >> some other subsystem: Can you take the r8169.c from 4.20.15 and test
> > >> it on top of 5.0?
> > >> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
> > >
> > > Sure, no problem! I'll copy the driver & recompile now actually.
> > > Hopefully there aren't a ton of changes to r8169.c to sift through and
> > > the cause isn't good at hiding itself!
> > >
> > I checked the driver changes new in 5.0 and there are very few
> > functional changes. You could try to revert the following:
> >
> > 5317d5c6d47e ("r8169: use napi_consume_skb where possible")
>
> Will do, and fwiw, while I haven't been able to do tons of testing
> today, I haven't been able to trigger the crash after replacing
> 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the
> file and revert the change you mentioned, and report back my findings.

Heiner,

After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e
("r8169: use napi_consume_skb where possible"), I so far have not had
any crashes after transferring roughly 30GB back & forth. I'm not
completely confident yet the crash is resolve with that revert and
will continue to do further testing throughout the weekend as well.
What confidence level do you have that 5317d5c6d47e is the culprit at
this point?

Thanks,
Derek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-15 20:09             ` VDR User
@ 2019-03-15 20:26               ` Heiner Kallweit
  2019-03-15 20:40                 ` Alexander Duyck
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-15 20:26 UTC (permalink / raw)
  To: VDR User, Alexander Duyck; +Cc: netdev

On 15.03.2019 21:09, VDR User wrote:
>>>>> Thanks for the additional info and for testing 4.20.15.
>>>>> To rule out that the issue is caused by a regression in network or
>>>>> some other subsystem: Can you take the r8169.c from 4.20.15 and test
>>>>> it on top of 5.0?
>>>>> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
>>>>
>>>> Sure, no problem! I'll copy the driver & recompile now actually.
>>>> Hopefully there aren't a ton of changes to r8169.c to sift through and
>>>> the cause isn't good at hiding itself!
>>>>
>>> I checked the driver changes new in 5.0 and there are very few
>>> functional changes. You could try to revert the following:
>>>
>>> 5317d5c6d47e ("r8169: use napi_consume_skb where possible")
>>
>> Will do, and fwiw, while I haven't been able to do tons of testing
>> today, I haven't been able to trigger the crash after replacing
>> 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the
>> file and revert the change you mentioned, and report back my findings.
> 
> Heiner,
> 
> After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e
> ("r8169: use napi_consume_skb where possible"), I so far have not had
> any crashes after transferring roughly 30GB back & forth. I'm not
> completely confident yet the crash is resolve with that revert and
> will continue to do further testing throughout the weekend as well.
> What confidence level do you have that 5317d5c6d47e is the culprit at
> this point?
> 
Good, thanks for testing. I simply see no other change since 4.20 that
could cause these symptoms.
Using napi_consume_skb() at this place in r8169.c looks safe to me.
Option 1 is that I miss something, option 2 is that there's an issue
in the NAPI subsystem. However in the latter case I assume at least
the Mellanox and/or Intel guys would have observed the same issue
on their respective CI systems.
Let me add Alexander, maybe he can provide a hint before we go and
revert the change.

> Thanks,
> Derek
> 
Heiner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-15 20:26               ` r8169 driver from kernel 5.0 crashing - napi_consume_skb Heiner Kallweit
@ 2019-03-15 20:40                 ` Alexander Duyck
  2019-03-15 20:46                   ` Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Duyck @ 2019-03-15 20:40 UTC (permalink / raw)
  To: Heiner Kallweit, VDR User; +Cc: netdev

On Fri, 2019-03-15 at 21:26 +0100, Heiner Kallweit wrote:
> On 15.03.2019 21:09, VDR User wrote:
> > > > > > Thanks for the additional info and for testing 4.20.15.
> > > > > > To rule out that the issue is caused by a regression in network or
> > > > > > some other subsystem: Can you take the r8169.c from 4.20.15 and test
> > > > > > it on top of 5.0?
> > > > > > Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
> > > > > 
> > > > > Sure, no problem! I'll copy the driver & recompile now actually.
> > > > > Hopefully there aren't a ton of changes to r8169.c to sift through and
> > > > > the cause isn't good at hiding itself!
> > > > > 
> > > > I checked the driver changes new in 5.0 and there are very few
> > > > functional changes. You could try to revert the following:
> > > > 
> > > > 5317d5c6d47e ("r8169: use napi_consume_skb where possible")
> > > 
> > > Will do, and fwiw, while I haven't been able to do tons of testing
> > > today, I haven't been able to trigger the crash after replacing
> > > 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the
> > > file and revert the change you mentioned, and report back my findings.
> > 
> > Heiner,
> > 
> > After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e
> > ("r8169: use napi_consume_skb where possible"), I so far have not had
> > any crashes after transferring roughly 30GB back & forth. I'm not
> > completely confident yet the crash is resolve with that revert and
> > will continue to do further testing throughout the weekend as well.
> > What confidence level do you have that 5317d5c6d47e is the culprit at
> > this point?
> > 
> Good, thanks for testing. I simply see no other change since 4.20 that
> could cause these symptoms.
> Using napi_consume_skb() at this place in r8169.c looks safe to me.
> Option 1 is that I miss something, option 2 is that there's an issue
> in the NAPI subsystem. However in the latter case I assume at least
> the Mellanox and/or Intel guys would have observed the same issue
> on their respective CI systems.
> Let me add Alexander, maybe he can provide a hint before we go and
> revert the change.

Do you have the crash log? I'd be curious what the issue is we are
seeing.

I agree I can't see anything obvious, but it is possible that we may be
running into something we hadn't seen with the Intel and Mellanox
parts.

- Alex


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-15 20:40                 ` Alexander Duyck
@ 2019-03-15 20:46                   ` Heiner Kallweit
  2019-03-15 22:09                     ` Alexander Duyck
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-15 20:46 UTC (permalink / raw)
  To: Alexander Duyck, VDR User; +Cc: netdev

On 15.03.2019 21:40, Alexander Duyck wrote:
> On Fri, 2019-03-15 at 21:26 +0100, Heiner Kallweit wrote:
>> On 15.03.2019 21:09, VDR User wrote:
>>>>>>> Thanks for the additional info and for testing 4.20.15.
>>>>>>> To rule out that the issue is caused by a regression in network or
>>>>>>> some other subsystem: Can you take the r8169.c from 4.20.15 and test
>>>>>>> it on top of 5.0?
>>>>>>> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
>>>>>>
>>>>>> Sure, no problem! I'll copy the driver & recompile now actually.
>>>>>> Hopefully there aren't a ton of changes to r8169.c to sift through and
>>>>>> the cause isn't good at hiding itself!
>>>>>>
>>>>> I checked the driver changes new in 5.0 and there are very few
>>>>> functional changes. You could try to revert the following:
>>>>>
>>>>> 5317d5c6d47e ("r8169: use napi_consume_skb where possible")
>>>>
>>>> Will do, and fwiw, while I haven't been able to do tons of testing
>>>> today, I haven't been able to trigger the crash after replacing
>>>> 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the
>>>> file and revert the change you mentioned, and report back my findings.
>>>
>>> Heiner,
>>>
>>> After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e
>>> ("r8169: use napi_consume_skb where possible"), I so far have not had
>>> any crashes after transferring roughly 30GB back & forth. I'm not
>>> completely confident yet the crash is resolve with that revert and
>>> will continue to do further testing throughout the weekend as well.
>>> What confidence level do you have that 5317d5c6d47e is the culprit at
>>> this point?
>>>
>> Good, thanks for testing. I simply see no other change since 4.20 that
>> could cause these symptoms.
>> Using napi_consume_skb() at this place in r8169.c looks safe to me.
>> Option 1 is that I miss something, option 2 is that there's an issue
>> in the NAPI subsystem. However in the latter case I assume at least
>> the Mellanox and/or Intel guys would have observed the same issue
>> on their respective CI systems.
>> Let me add Alexander, maybe he can provide a hint before we go and
>> revert the change.
> 
> Do you have the crash log? I'd be curious what the issue is we are
> seeing.
> 
> I agree I can't see anything obvious, but it is possible that we may be
> running into something we hadn't seen with the Intel and Mellanox
> parts.
> 
> - Alex
> 
> 
Crash may have been the wrong word, network becomes unresponsive
and the user sees the generic transmit queue timeout.
Let me quote his original mail:

------------------------------------------------------------------------

Hi, after updating to kernel 5.0, the nic driver (r8169) has been
crashing whenever I start using heavy traffic on it (for example,
xferring large files to the box across my lan). The destination
harddrive may be sleeping and need to spin-up, or not, but the box
itself does not suspend/hibernate. The nic becomes completely
unresponsive and all connections to the box drop. After what I think
is several minutes, the connection comes back to life. The problem
happens consistently but seemingly not consistently at the same point.
For example, I can xfer a few 4gb files and it will crash at around
2-3gb on the first file. The next time it might not crash until 2-3gb
on the second file.Prior to kernel 5.0 I was using 4.19.12 and this
problem didn't occur. I have since downgraded back to 4.19.12 pending
what response this post gets.

Thanks for any help or assistance on how to proceed!
-Derek

NOTE: I'm not subscribed to this mailing list so please CC me on any replies!

The nic is on-board ECS A75F-A:
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
        Subsystem: Elitegroup Computer Systems RTL8111/8168/8411 PCI
Express Gigabit Ethernet Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at c000 [size=256]
        Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: r8169

I see the following in the syslog:

[164572.785517] ------------[ cut here ]------------
[164572.785595] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
[164572.785677] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
dev_watchdog+0x1bb/0x1e0
[164572.785730] Modules linked in: snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi ohci_pci snd_hda_intel
snd_hda_codec snd_hwdep xhci_pci ehci_pci ohci_hcd xhci_hcd ehci_hcd
usbcore usb_common snd_hda_core snd_pcm snd_timer snd soundcore nfsd
auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
[164572.785807] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #2
[164572.785842] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
[164572.785878] EIP: dev_watchdog+0x1bb/0x1e0
[164572.785912] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
00 00
[164572.785963] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
[164572.785998] ESI: f4cbc264 EDI: f4cbc000 EBP: f4c99f74 ESP: f4c99f4c
[164572.786033] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
[164572.786067] CR0: 80050033 CR2: 020dc000 CR3: 2b80b000 CR4: 00000690
[164572.786102] Call Trace:
[164572.786135]  <SOFTIRQ>
[164572.786168]  ? qdisc_put_unlocked+0x40/0x40
[164572.786203]  call_timer_fn+0x19/0xa0
[164572.786237]  run_timer_softirq+0x337/0x380
[164572.786270]  ? qdisc_put_unlocked+0x40/0x40
[164572.786302]  ? rcu_process_callbacks+0xcb/0x380
[164572.786337]  __do_softirq+0xd6/0x21c
[164572.786370]  ? __irqentry_text_end+0x18/0x18
[164572.786404]  call_on_stack+0x10/0x60
[164572.786435]  </SOFTIRQ>
[164572.786467]  ? irq_exit+0x91/0xc0
[164572.786495]  ? smp_apic_timer_interrupt+0x56/0xa0
[164572.786525]  ? apic_timer_interrupt+0xd5/0xdc
[164572.786555]  ? acpi_idle_enter_s2idle+0x60/0x60
[164572.786584]  ? cpuidle_enter_state+0x122/0x360
[164572.786614]  ? tick_nohz_idle_stop_tick+0x27b/0x2a0
[164572.786644]  ? cpuidle_enter+0xf/0x20
[164572.786673]  ? call_cpuidle+0x1c/0x40
[164572.786701]  ? do_idle+0x1e6/0x220
[164572.786730]  ? cpu_startup_entry+0x25/0x40
[164572.786758]  ? start_secondary+0x1a5/0x220
[164572.786787]  ? startup_32_smp+0x15f/0x164
[164572.786816] ---[ end trace 662a5195537dbad8 ]---


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-15 20:46                   ` Heiner Kallweit
@ 2019-03-15 22:09                     ` Alexander Duyck
  2019-03-15 22:27                       ` Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Duyck @ 2019-03-15 22:09 UTC (permalink / raw)
  To: Heiner Kallweit, VDR User; +Cc: netdev

On Fri, 2019-03-15 at 21:46 +0100, Heiner Kallweit wrote:
> On 15.03.2019 21:40, Alexander Duyck wrote:
> > On Fri, 2019-03-15 at 21:26 +0100, Heiner Kallweit wrote:
> > > On 15.03.2019 21:09, VDR User wrote:
> > > > > > > > Thanks for the additional info and for testing 4.20.15.
> > > > > > > > To rule out that the issue is caused by a regression in network or
> > > > > > > > some other subsystem: Can you take the r8169.c from 4.20.15 and test
> > > > > > > > it on top of 5.0?
> > > > > > > > Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
> > > > > > > 
> > > > > > > Sure, no problem! I'll copy the driver & recompile now actually.
> > > > > > > Hopefully there aren't a ton of changes to r8169.c to sift through and
> > > > > > > the cause isn't good at hiding itself!
> > > > > > > 
> > > > > > I checked the driver changes new in 5.0 and there are very few
> > > > > > functional changes. You could try to revert the following:
> > > > > > 
> > > > > > 5317d5c6d47e ("r8169: use napi_consume_skb where possible")
> > > > > 
> > > > > Will do, and fwiw, while I haven't been able to do tons of testing
> > > > > today, I haven't been able to trigger the crash after replacing
> > > > > 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the
> > > > > file and revert the change you mentioned, and report back my findings.
> > > > 
> > > > Heiner,
> > > > 
> > > > After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e
> > > > ("r8169: use napi_consume_skb where possible"), I so far have not had
> > > > any crashes after transferring roughly 30GB back & forth. I'm not
> > > > completely confident yet the crash is resolve with that revert and
> > > > will continue to do further testing throughout the weekend as well.
> > > > What confidence level do you have that 5317d5c6d47e is the culprit at
> > > > this point?
> > > > 
> > > Good, thanks for testing. I simply see no other change since 4.20 that
> > > could cause these symptoms.
> > > Using napi_consume_skb() at this place in r8169.c looks safe to me.
> > > Option 1 is that I miss something, option 2 is that there's an issue
> > > in the NAPI subsystem. However in the latter case I assume at least
> > > the Mellanox and/or Intel guys would have observed the same issue
> > > on their respective CI systems.
> > > Let me add Alexander, maybe he can provide a hint before we go and
> > > revert the change.
> > 
> > Do you have the crash log? I'd be curious what the issue is we are
> > seeing.
> > 
> > I agree I can't see anything obvious, but it is possible that we may be
> > running into something we hadn't seen with the Intel and Mellanox
> > parts.
> > 
> > - Alex
> > 
> > 
> Crash may have been the wrong word, network becomes unresponsive
> and the user sees the generic transmit queue timeout.
> Let me quote his original mail:
> 
> ------------------------------------------------------------------------
> 
> Hi, after updating to kernel 5.0, the nic driver (r8169) has been
> crashing whenever I start using heavy traffic on it (for example,
> xferring large files to the box across my lan). The destination
> harddrive may be sleeping and need to spin-up, or not, but the box
> itself does not suspend/hibernate. The nic becomes completely
> unresponsive and all connections to the box drop. After what I think
> is several minutes, the connection comes back to life. The problem
> happens consistently but seemingly not consistently at the same point.
> For example, I can xfer a few 4gb files and it will crash at around
> 2-3gb on the first file. The next time it might not crash until 2-3gb
> on the second file.Prior to kernel 5.0 I was using 4.19.12 and this
> problem didn't occur. I have since downgraded back to 4.19.12 pending
> what response this post gets.
> 
> Thanks for any help or assistance on how to proceed!
> -Derek
> 
> NOTE: I'm not subscribed to this mailing list so please CC me on any replies!
> 
> The nic is on-board ECS A75F-A:
> 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
>         Subsystem: Elitegroup Computer Systems RTL8111/8168/8411 PCI
> Express Gigabit Ethernet Controller
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 17
>         Region 0: I/O ports at c000 [size=256]
>         Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
>         Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
>         Capabilities: <access denied>
>         Kernel driver in use: r8169
> 
> I see the following in the syslog:
> 
> [164572.785517] ------------[ cut here ]------------
> [164572.785595] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
> [164572.785677] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
> dev_watchdog+0x1bb/0x1e0
> [164572.785730] Modules linked in: snd_hda_codec_realtek
> snd_hda_codec_generic snd_hda_codec_hdmi ohci_pci snd_hda_intel
> snd_hda_codec snd_hwdep xhci_pci ehci_pci ohci_hcd xhci_hcd ehci_hcd
> usbcore usb_common snd_hda_core snd_pcm snd_timer snd soundcore nfsd
> auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
> [164572.785807] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #2
> [164572.785842] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
> [164572.785878] EIP: dev_watchdog+0x1bb/0x1e0
> [164572.785912] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
> f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
> ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
> 00 00
> [164572.785963] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
> [164572.785998] ESI: f4cbc264 EDI: f4cbc000 EBP: f4c99f74 ESP: f4c99f4c
> [164572.786033] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
> [164572.786067] CR0: 80050033 CR2: 020dc000 CR3: 2b80b000 CR4: 00000690
> [164572.786102] Call Trace:
> [164572.786135]  <SOFTIRQ>
> [164572.786168]  ? qdisc_put_unlocked+0x40/0x40
> [164572.786203]  call_timer_fn+0x19/0xa0
> [164572.786237]  run_timer_softirq+0x337/0x380
> [164572.786270]  ? qdisc_put_unlocked+0x40/0x40
> [164572.786302]  ? rcu_process_callbacks+0xcb/0x380
> [164572.786337]  __do_softirq+0xd6/0x21c
> [164572.786370]  ? __irqentry_text_end+0x18/0x18
> [164572.786404]  call_on_stack+0x10/0x60
> [164572.786435]  </SOFTIRQ>
> [164572.786467]  ? irq_exit+0x91/0xc0
> [164572.786495]  ? smp_apic_timer_interrupt+0x56/0xa0
> [164572.786525]  ? apic_timer_interrupt+0xd5/0xdc
> [164572.786555]  ? acpi_idle_enter_s2idle+0x60/0x60
> [164572.786584]  ? cpuidle_enter_state+0x122/0x360
> [164572.786614]  ? tick_nohz_idle_stop_tick+0x27b/0x2a0
> [164572.786644]  ? cpuidle_enter+0xf/0x20
> [164572.786673]  ? call_cpuidle+0x1c/0x40
> [164572.786701]  ? do_idle+0x1e6/0x220
> [164572.786730]  ? cpu_startup_entry+0x25/0x40
> [164572.786758]  ? start_secondary+0x1a5/0x220
> [164572.786787]  ? startup_32_smp+0x15f/0x164
> [164572.786816] ---[ end trace 662a5195537dbad8 ]---
> 

Okay, the patch suggested doesn't make much sense for the issue
described. I was expecting to see something that at least pointed to
the NAPI logic, and I don't see anything doing as such.

The issue looks like a problem of the ring being stopped at the QDISC
layer for an extended period of time. Basically somebody called
netif_stop_queue and then never got around to calling netif_wake_queue
to let the ring resume transmitting.

My first thoughts would be to look at an interrupt that are getting
masked and lost, or some ring accounting issue where you are flagging
the ring to stop and then not waking it up due to a math or logic error
somewhere. It looks like there were multiple patches messing with these
sort of items between 4.20 and 5.0 so I would recommend looking there.

- Alex



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-15 22:09                     ` Alexander Duyck
@ 2019-03-15 22:27                       ` Heiner Kallweit
       [not found]                         ` <b470f0ee6071bded8e78e15e18df14ffbc32cd18.camel@linux.intel.com>
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-15 22:27 UTC (permalink / raw)
  To: Alexander Duyck, VDR User; +Cc: netdev

On 15.03.2019 23:09, Alexander Duyck wrote:
> On Fri, 2019-03-15 at 21:46 +0100, Heiner Kallweit wrote:
>> On 15.03.2019 21:40, Alexander Duyck wrote:
>>> On Fri, 2019-03-15 at 21:26 +0100, Heiner Kallweit wrote:
>>>> On 15.03.2019 21:09, VDR User wrote:
>>>>>>>>> Thanks for the additional info and for testing 4.20.15.
>>>>>>>>> To rule out that the issue is caused by a regression in network or
>>>>>>>>> some other subsystem: Can you take the r8169.c from 4.20.15 and test
>>>>>>>>> it on top of 5.0?
>>>>>>>>> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
>>>>>>>>
>>>>>>>> Sure, no problem! I'll copy the driver & recompile now actually.
>>>>>>>> Hopefully there aren't a ton of changes to r8169.c to sift through and
>>>>>>>> the cause isn't good at hiding itself!
>>>>>>>>
>>>>>>> I checked the driver changes new in 5.0 and there are very few
>>>>>>> functional changes. You could try to revert the following:
>>>>>>>
>>>>>>> 5317d5c6d47e ("r8169: use napi_consume_skb where possible")
>>>>>>
>>>>>> Will do, and fwiw, while I haven't been able to do tons of testing
>>>>>> today, I haven't been able to trigger the crash after replacing
>>>>>> 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the
>>>>>> file and revert the change you mentioned, and report back my findings.
>>>>>
>>>>> Heiner,
>>>>>
>>>>> After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e
>>>>> ("r8169: use napi_consume_skb where possible"), I so far have not had
>>>>> any crashes after transferring roughly 30GB back & forth. I'm not
>>>>> completely confident yet the crash is resolve with that revert and
>>>>> will continue to do further testing throughout the weekend as well.
>>>>> What confidence level do you have that 5317d5c6d47e is the culprit at
>>>>> this point?
>>>>>
>>>> Good, thanks for testing. I simply see no other change since 4.20 that
>>>> could cause these symptoms.
>>>> Using napi_consume_skb() at this place in r8169.c looks safe to me.
>>>> Option 1 is that I miss something, option 2 is that there's an issue
>>>> in the NAPI subsystem. However in the latter case I assume at least
>>>> the Mellanox and/or Intel guys would have observed the same issue
>>>> on their respective CI systems.
>>>> Let me add Alexander, maybe he can provide a hint before we go and
>>>> revert the change.
>>>
>>> Do you have the crash log? I'd be curious what the issue is we are
>>> seeing.
>>>
>>> I agree I can't see anything obvious, but it is possible that we may be
>>> running into something we hadn't seen with the Intel and Mellanox
>>> parts.
>>>
>>> - Alex
>>>
>>>
>> Crash may have been the wrong word, network becomes unresponsive
>> and the user sees the generic transmit queue timeout.
>> Let me quote his original mail:
>>
>> ------------------------------------------------------------------------
>>
>> Hi, after updating to kernel 5.0, the nic driver (r8169) has been
>> crashing whenever I start using heavy traffic on it (for example,
>> xferring large files to the box across my lan). The destination
>> harddrive may be sleeping and need to spin-up, or not, but the box
>> itself does not suspend/hibernate. The nic becomes completely
>> unresponsive and all connections to the box drop. After what I think
>> is several minutes, the connection comes back to life. The problem
>> happens consistently but seemingly not consistently at the same point.
>> For example, I can xfer a few 4gb files and it will crash at around
>> 2-3gb on the first file. The next time it might not crash until 2-3gb
>> on the second file.Prior to kernel 5.0 I was using 4.19.12 and this
>> problem didn't occur. I have since downgraded back to 4.19.12 pending
>> what response this post gets.
>>
>> Thanks for any help or assistance on how to proceed!
>> -Derek
>>
>> NOTE: I'm not subscribed to this mailing list so please CC me on any replies!
>>
>> The nic is on-board ECS A75F-A:
>> 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
>>         Subsystem: Elitegroup Computer Systems RTL8111/8168/8411 PCI
>> Express Gigabit Ethernet Controller
>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>         Latency: 0, Cache Line Size: 64 bytes
>>         Interrupt: pin A routed to IRQ 17
>>         Region 0: I/O ports at c000 [size=256]
>>         Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
>>         Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
>>         Capabilities: <access denied>
>>         Kernel driver in use: r8169
>>
>> I see the following in the syslog:
>>
>> [164572.785517] ------------[ cut here ]------------
>> [164572.785595] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
>> [164572.785677] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
>> dev_watchdog+0x1bb/0x1e0
>> [164572.785730] Modules linked in: snd_hda_codec_realtek
>> snd_hda_codec_generic snd_hda_codec_hdmi ohci_pci snd_hda_intel
>> snd_hda_codec snd_hwdep xhci_pci ehci_pci ohci_hcd xhci_hcd ehci_hcd
>> usbcore usb_common snd_hda_core snd_pcm snd_timer snd soundcore nfsd
>> auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
>> [164572.785807] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #2
>> [164572.785842] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
>> [164572.785878] EIP: dev_watchdog+0x1bb/0x1e0
>> [164572.785912] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
>> f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
>> ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
>> 00 00
>> [164572.785963] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
>> [164572.785998] ESI: f4cbc264 EDI: f4cbc000 EBP: f4c99f74 ESP: f4c99f4c
>> [164572.786033] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
>> [164572.786067] CR0: 80050033 CR2: 020dc000 CR3: 2b80b000 CR4: 00000690
>> [164572.786102] Call Trace:
>> [164572.786135]  <SOFTIRQ>
>> [164572.786168]  ? qdisc_put_unlocked+0x40/0x40
>> [164572.786203]  call_timer_fn+0x19/0xa0
>> [164572.786237]  run_timer_softirq+0x337/0x380
>> [164572.786270]  ? qdisc_put_unlocked+0x40/0x40
>> [164572.786302]  ? rcu_process_callbacks+0xcb/0x380
>> [164572.786337]  __do_softirq+0xd6/0x21c
>> [164572.786370]  ? __irqentry_text_end+0x18/0x18
>> [164572.786404]  call_on_stack+0x10/0x60
>> [164572.786435]  </SOFTIRQ>
>> [164572.786467]  ? irq_exit+0x91/0xc0
>> [164572.786495]  ? smp_apic_timer_interrupt+0x56/0xa0
>> [164572.786525]  ? apic_timer_interrupt+0xd5/0xdc
>> [164572.786555]  ? acpi_idle_enter_s2idle+0x60/0x60
>> [164572.786584]  ? cpuidle_enter_state+0x122/0x360
>> [164572.786614]  ? tick_nohz_idle_stop_tick+0x27b/0x2a0
>> [164572.786644]  ? cpuidle_enter+0xf/0x20
>> [164572.786673]  ? call_cpuidle+0x1c/0x40
>> [164572.786701]  ? do_idle+0x1e6/0x220
>> [164572.786730]  ? cpu_startup_entry+0x25/0x40
>> [164572.786758]  ? start_secondary+0x1a5/0x220
>> [164572.786787]  ? startup_32_smp+0x15f/0x164
>> [164572.786816] ---[ end trace 662a5195537dbad8 ]---
>>
> 
> Okay, the patch suggested doesn't make much sense for the issue
> described. I was expecting to see something that at least pointed to
> the NAPI logic, and I don't see anything doing as such.
> 
According to the tests done so far replacing napi_consume_skb()
with dev_consume_skb_any() seems to reliably avoid the issue.
If there really should be an issue with irq/ring management,
then the question would be why it's not trigger with
dev_consume_skb_any().

> The issue looks like a problem of the ring being stopped at the QDISC
> layer for an extended period of time. Basically somebody called
> netif_stop_queue and then never got around to calling netif_wake_queue
> to let the ring resume transmitting.
> 
> My first thoughts would be to look at an interrupt that are getting
> masked and lost, or some ring accounting issue where you are flagging
> the ring to stop and then not waking it up due to a math or logic error
> somewhere. It looks like there were multiple patches messing with these
> sort of items between 4.20 and 5.0 so I would recommend looking there.
> 
> - Alex
> 
Heiner
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
       [not found]                         ` <b470f0ee6071bded8e78e15e18df14ffbc32cd18.camel@linux.intel.com>
@ 2019-03-16  9:07                           ` Heiner Kallweit
  2019-03-16 14:38                           ` VDR User
  1 sibling, 0 replies; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-16  9:07 UTC (permalink / raw)
  To: Alexander Duyck, VDR User; +Cc: netdev

On 16.03.2019 00:54, Alexander Duyck wrote:
> On Fri, 2019-03-15 at 23:27 +0100, Heiner Kallweit wrote:
>> On 15.03.2019 23:09, Alexander Duyck wrote:
>>> On Fri, 2019-03-15 at 21:46 +0100, Heiner Kallweit wrote:
>>>> On 15.03.2019 21:40, Alexander Duyck wrote:
>>>>> On Fri, 2019-03-15 at 21:26 +0100, Heiner Kallweit wrote:
>>>>>> On 15.03.2019 21:09, VDR User wrote:
>>>>>>>>>>> Thanks for the additional info and for testing 4.20.15.
>>>>>>>>>>> To rule out that the issue is caused by a regression in network or
>>>>>>>>>>> some other subsystem: Can you take the r8169.c from 4.20.15 and test
>>>>>>>>>>> it on top of 5.0?
>>>>>>>>>>> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
>>>>>>>>>>
>>>>>>>>>> Sure, no problem! I'll copy the driver & recompile now actually.
>>>>>>>>>> Hopefully there aren't a ton of changes to r8169.c to sift through and
>>>>>>>>>> the cause isn't good at hiding itself!
>>>>>>>>>>
>>>>>>>>> I checked the driver changes new in 5.0 and there are very few
>>>>>>>>> functional changes. You could try to revert the following:
>>>>>>>>>
>>>>>>>>> 5317d5c6d47e ("r8169: use napi_consume_skb where possible")
>>>>>>>>
>>>>>>>> Will do, and fwiw, while I haven't been able to do tons of testing
>>>>>>>> today, I haven't been able to trigger the crash after replacing
>>>>>>>> 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the
>>>>>>>> file and revert the change you mentioned, and report back my findings.
>>>>>>>
>>>>>>> Heiner,
>>>>>>>
>>>>>>> After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e
>>>>>>> ("r8169: use napi_consume_skb where possible"), I so far have not had
>>>>>>> any crashes after transferring roughly 30GB back & forth. I'm not
>>>>>>> completely confident yet the crash is resolve with that revert and
>>>>>>> will continue to do further testing throughout the weekend as well.
>>>>>>> What confidence level do you have that 5317d5c6d47e is the culprit at
>>>>>>> this point?
>>>>>>>
>>>>>> Good, thanks for testing. I simply see no other change since 4.20 that
>>>>>> could cause these symptoms.
>>>>>> Using napi_consume_skb() at this place in r8169.c looks safe to me.
>>>>>> Option 1 is that I miss something, option 2 is that there's an issue
>>>>>> in the NAPI subsystem. However in the latter case I assume at least
>>>>>> the Mellanox and/or Intel guys would have observed the same issue
>>>>>> on their respective CI systems.
>>>>>> Let me add Alexander, maybe he can provide a hint before we go and
>>>>>> revert the change.
>>>>>
>>>>> Do you have the crash log? I'd be curious what the issue is we are
>>>>> seeing.
>>>>>
>>>>> I agree I can't see anything obvious, but it is possible that we may be
>>>>> running into something we hadn't seen with the Intel and Mellanox
>>>>> parts.
>>>>>
>>>>> - Alex
>>>>>
>>>>>
>>>> Crash may have been the wrong word, network becomes unresponsive
>>>> and the user sees the generic transmit queue timeout.
>>>> Let me quote his original mail:
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> Hi, after updating to kernel 5.0, the nic driver (r8169) has been
>>>> crashing whenever I start using heavy traffic on it (for example,
>>>> xferring large files to the box across my lan). The destination
>>>> harddrive may be sleeping and need to spin-up, or not, but the box
>>>> itself does not suspend/hibernate. The nic becomes completely
>>>> unresponsive and all connections to the box drop. After what I think
>>>> is several minutes, the connection comes back to life. The problem
>>>> happens consistently but seemingly not consistently at the same point.
>>>> For example, I can xfer a few 4gb files and it will crash at around
>>>> 2-3gb on the first file. The next time it might not crash until 2-3gb
>>>> on the second file.Prior to kernel 5.0 I was using 4.19.12 and this
>>>> problem didn't occur. I have since downgraded back to 4.19.12 pending
>>>> what response this post gets.
>>>>
>>>> Thanks for any help or assistance on how to proceed!
>>>> -Derek
>>>>
>>>> NOTE: I'm not subscribed to this mailing list so please CC me on any replies!
>>>>
>>>> The nic is on-board ECS A75F-A:
>>>> 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
>>>>         Subsystem: Elitegroup Computer Systems RTL8111/8168/8411 PCI
>>>> Express Gigabit Ethernet Controller
>>>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>         Latency: 0, Cache Line Size: 64 bytes
>>>>         Interrupt: pin A routed to IRQ 17
>>>>         Region 0: I/O ports at c000 [size=256]
>>>>         Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
>>>>         Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
>>>>         Capabilities: <access denied>
>>>>         Kernel driver in use: r8169
>>>>
>>>> I see the following in the syslog:
>>>>
>>>> [164572.785517] ------------[ cut here ]------------
>>>> [164572.785595] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
>>>> [164572.785677] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
>>>> dev_watchdog+0x1bb/0x1e0
>>>> [164572.785730] Modules linked in: snd_hda_codec_realtek
>>>> snd_hda_codec_generic snd_hda_codec_hdmi ohci_pci snd_hda_intel
>>>> snd_hda_codec snd_hwdep xhci_pci ehci_pci ohci_hcd xhci_hcd ehci_hcd
>>>> usbcore usb_common snd_hda_core snd_pcm snd_timer snd soundcore nfsd
>>>> auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
>>>> [164572.785807] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #2
>>>> [164572.785842] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
>>>> [164572.785878] EIP: dev_watchdog+0x1bb/0x1e0
>>>> [164572.785912] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
>>>> f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
>>>> ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
>>>> 00 00
>>>> [164572.785963] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
>>>> [164572.785998] ESI: f4cbc264 EDI: f4cbc000 EBP: f4c99f74 ESP: f4c99f4c
>>>> [164572.786033] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
>>>> [164572.786067] CR0: 80050033 CR2: 020dc000 CR3: 2b80b000 CR4: 00000690
>>>> [164572.786102] Call Trace:
>>>> [164572.786135]  <SOFTIRQ>
>>>> [164572.786168]  ? qdisc_put_unlocked+0x40/0x40
>>>> [164572.786203]  call_timer_fn+0x19/0xa0
>>>> [164572.786237]  run_timer_softirq+0x337/0x380
>>>> [164572.786270]  ? qdisc_put_unlocked+0x40/0x40
>>>> [164572.786302]  ? rcu_process_callbacks+0xcb/0x380
>>>> [164572.786337]  __do_softirq+0xd6/0x21c
>>>> [164572.786370]  ? __irqentry_text_end+0x18/0x18
>>>> [164572.786404]  call_on_stack+0x10/0x60
>>>> [164572.786435]  </SOFTIRQ>
>>>> [164572.786467]  ? irq_exit+0x91/0xc0
>>>> [164572.786495]  ? smp_apic_timer_interrupt+0x56/0xa0
>>>> [164572.786525]  ? apic_timer_interrupt+0xd5/0xdc
>>>> [164572.786555]  ? acpi_idle_enter_s2idle+0x60/0x60
>>>> [164572.786584]  ? cpuidle_enter_state+0x122/0x360
>>>> [164572.786614]  ? tick_nohz_idle_stop_tick+0x27b/0x2a0
>>>> [164572.786644]  ? cpuidle_enter+0xf/0x20
>>>> [164572.786673]  ? call_cpuidle+0x1c/0x40
>>>> [164572.786701]  ? do_idle+0x1e6/0x220
>>>> [164572.786730]  ? cpu_startup_entry+0x25/0x40
>>>> [164572.786758]  ? start_secondary+0x1a5/0x220
>>>> [164572.786787]  ? startup_32_smp+0x15f/0x164
>>>> [164572.786816] ---[ end trace 662a5195537dbad8 ]---
>>>>
>>>
>>> Okay, the patch suggested doesn't make much sense for the issue
>>> described. I was expecting to see something that at least pointed to
>>> the NAPI logic, and I don't see anything doing as such.
>>>
>> According to the tests done so far replacing napi_consume_skb()
>> with dev_consume_skb_any() seems to reliably avoid the issue.
>> If there really should be an issue with irq/ring management,
>> then the question would be why it's not trigger with
>> dev_consume_skb_any().
> 
> Part of the issue though is that we don't know how reliable that test
> was. I believe Derek he hasn't had any crashes, but he wasn't confident
> that it had actually resolved the issue.
> 
> There isn't anything in the napi_consume_skb code that would cause us
> to stop a queue assuming we aren't seeing some sort of bizarre memory
> corruption where the only thing we are corrupting is the Tx queue stop
> bit.
> 
> The dev_watchdog indicates that the Tx queue is stopped. Usually to end
> up in a state there are a few possible causes. The first is some sort
> of mess-up in the BQL or Tx descriptor counting logic. The second
> possible cause is the Tx interrupt just isn't firing due to a lost
> event that somehow leaves it masked.
> 
> A quick pass with "git diff v4.20 v5.0
> drivers/net/ethernet/realtek/r8169.c" shows a number of possible
> candidates that could have caused this in terms of changes. My advise
> would be to look at possible interrupt issues due to a non-spurious
> interrupt getting ignored, or something involving the count of
> descriptor slots available.
> 
> 
Thanks for the feedback!
After looking in more detail at the provided trace, it seems to be
quite weird to me. qdisc_put_unlocked() doesn't have a single user
in 5.0. And start_secondary() I also wouldn't expect to see here.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
       [not found]                         ` <b470f0ee6071bded8e78e15e18df14ffbc32cd18.camel@linux.intel.com>
  2019-03-16  9:07                           ` Heiner Kallweit
@ 2019-03-16 14:38                           ` VDR User
  2019-03-16 15:10                             ` Heiner Kallweit
  2019-03-17 11:35                             ` Heiner Kallweit
  1 sibling, 2 replies; 30+ messages in thread
From: VDR User @ 2019-03-16 14:38 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Heiner Kallweit, netdev

> Part of the issue though is that we don't know how reliable that test
> was. I believe Derek he hasn't had any crashes, but he wasn't confident
> that it had actually resolved the issue.

Previously I thought I could easily & consistently reproduce the crash
but the more testing I did, the more I realized that wasn't the case.
That's why my confidence was low in that reversing commit 5317d5c6d47e
("r8169: use napi_consume_skb where possible") fixed it. I felt like I
needed to do a lot more testing over the weekend to be sure. But, I
can now confirm that reversing that commit did not solve the problem.
I didn't ifdown/ifup after the crash so the nic eventually recovered
on its own I guess. The `ethtool -S` output is:

NIC statistics:
     tx_packets: 5370650
     rx_packets: 57340787
     tx_errors: 0
     rx_errors: 0
     rx_missed: 26
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 57332905
     broadcast: 6409
     multicast: 1473
     tx_aborted: 0
     tx_underrun: 0

The dmesg log looks the same as before:

[95579.984062] ------------[ cut here ]------------
[95579.984142] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
[95579.984224] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
dev_watchdog+0x1bb/0x1e0
[95579.984276] Modules linked in: snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_codec_generic ohci_pci snd_hda_intel
snd_hda_codec snd_hwdep xhci_pci ohci_hcd ehci_pci xhci_hcd ehci_hcd
usbcore snd_hda_core usb_common snd_pcm snd_timer snd soundcore nfsd
auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
[95579.984354] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #1
[95579.984387] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
[95579.984422] EIP: dev_watchdog+0x1bb/0x1e0
[95579.984454] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
00 00
[95579.986189] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
[95579.986224] ESI: f4cc2264 EDI: f4cc2000 EBP: f4c99f74 ESP: f4c99f4c
[95579.986259] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
[95579.986292] CR0: 80050033 CR2: b7c644f0 CR3: 0dfd2000 CR4: 00000690
[95579.986325] Call Trace:
[95579.986356]  <SOFTIRQ>
[95579.986389]  ? qdisc_put_unlocked+0x40/0x40
[95579.986423]  call_timer_fn+0x19/0xa0
[95579.986456]  run_timer_softirq+0x337/0x380
[95579.986488]  ? qdisc_put_unlocked+0x40/0x40
[95579.986521]  ? rcu_process_callbacks+0xcb/0x380
[95579.986555]  __do_softirq+0xd6/0x21c
[95579.986586]  ? __irqentry_text_end+0x18/0x18
[95579.986619]  call_on_stack+0x10/0x60
[95579.986646]  </SOFTIRQ>
[95579.986674]  ? irq_exit+0x91/0xc0
[95579.986701]  ? smp_apic_timer_interrupt+0x56/0xa0
[95579.986731]  ? apic_timer_interrupt+0xd5/0xdc
[95579.986761]  ? acpi_idle_enter_s2idle+0x60/0x60
[95579.986790]  ? cpuidle_enter_state+0x122/0x360
[95579.986821]  ? cpuidle_enter+0xf/0x20
[95579.986850]  ? call_cpuidle+0x1c/0x40
[95579.986878]  ? do_idle+0x1e6/0x220
[95579.986906]  ? cpu_startup_entry+0x25/0x40
[95579.986934]  ? start_secondary+0x1a5/0x220
[95579.986963]  ? startup_32_smp+0x15f/0x164
[95579.986991] ---[ end trace 2e8d77bb3c9d2d45 ]---

Please let me know if there's anything I can do to help.
Derek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-16 14:38                           ` VDR User
@ 2019-03-16 15:10                             ` Heiner Kallweit
  2019-03-16 17:08                               ` Alexander Duyck
  2019-03-17 11:35                             ` Heiner Kallweit
  1 sibling, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-16 15:10 UTC (permalink / raw)
  To: VDR User, Alexander Duyck; +Cc: netdev

On 16.03.2019 15:38, VDR User wrote:
>> Part of the issue though is that we don't know how reliable that test
>> was. I believe Derek he hasn't had any crashes, but he wasn't confident
>> that it had actually resolved the issue.
> 
> Previously I thought I could easily & consistently reproduce the crash
> but the more testing I did, the more I realized that wasn't the case.
> That's why my confidence was low in that reversing commit 5317d5c6d47e
> ("r8169: use napi_consume_skb where possible") fixed it. I felt like I
> needed to do a lot more testing over the weekend to be sure. But, I
> can now confirm that reversing that commit did not solve the problem.
> I didn't ifdown/ifup after the crash so the nic eventually recovered
> on its own I guess. The `ethtool -S` output is:
> 
> NIC statistics:
>      tx_packets: 5370650
>      rx_packets: 57340787
>      tx_errors: 0
>      rx_errors: 0
>      rx_missed: 26
>      align_errors: 0
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      unicast: 57332905
>      broadcast: 6409
>      multicast: 1473
>      tx_aborted: 0
>      tx_underrun: 0
> 
> The dmesg log looks the same as before:
> 
> [95579.984062] ------------[ cut here ]------------
> [95579.984142] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
> [95579.984224] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
> dev_watchdog+0x1bb/0x1e0
> [95579.984276] Modules linked in: snd_hda_codec_hdmi
> snd_hda_codec_realtek snd_hda_codec_generic ohci_pci snd_hda_intel
> snd_hda_codec snd_hwdep xhci_pci ohci_hcd ehci_pci xhci_hcd ehci_hcd
> usbcore snd_hda_core usb_common snd_pcm snd_timer snd soundcore nfsd
> auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
> [95579.984354] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #1
> [95579.984387] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
> [95579.984422] EIP: dev_watchdog+0x1bb/0x1e0
> [95579.984454] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
> f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
> ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
> 00 00
> [95579.986189] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
> [95579.986224] ESI: f4cc2264 EDI: f4cc2000 EBP: f4c99f74 ESP: f4c99f4c
> [95579.986259] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
> [95579.986292] CR0: 80050033 CR2: b7c644f0 CR3: 0dfd2000 CR4: 00000690
> [95579.986325] Call Trace:
> [95579.986356]  <SOFTIRQ>
> [95579.986389]  ? qdisc_put_unlocked+0x40/0x40
> [95579.986423]  call_timer_fn+0x19/0xa0
> [95579.986456]  run_timer_softirq+0x337/0x380
> [95579.986488]  ? qdisc_put_unlocked+0x40/0x40
> [95579.986521]  ? rcu_process_callbacks+0xcb/0x380
> [95579.986555]  __do_softirq+0xd6/0x21c
> [95579.986586]  ? __irqentry_text_end+0x18/0x18
> [95579.986619]  call_on_stack+0x10/0x60
> [95579.986646]  </SOFTIRQ>
> [95579.986674]  ? irq_exit+0x91/0xc0
> [95579.986701]  ? smp_apic_timer_interrupt+0x56/0xa0
> [95579.986731]  ? apic_timer_interrupt+0xd5/0xdc
> [95579.986761]  ? acpi_idle_enter_s2idle+0x60/0x60
> [95579.986790]  ? cpuidle_enter_state+0x122/0x360
> [95579.986821]  ? cpuidle_enter+0xf/0x20
> [95579.986850]  ? call_cpuidle+0x1c/0x40
> [95579.986878]  ? do_idle+0x1e6/0x220
> [95579.986906]  ? cpu_startup_entry+0x25/0x40
> [95579.986934]  ? start_secondary+0x1a5/0x220
> [95579.986963]  ? startup_32_smp+0x15f/0x164
> [95579.986991] ---[ end trace 2e8d77bb3c9d2d45 ]---
> 
> Please let me know if there's anything I can do to help.
> Derek
> 
The other changes between 4.20 and 5.0 don't look like they could cause
the issue. And two critical ones have been reverted already.
So what would need to be done is bisecting the issue.

Heiner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-16 15:10                             ` Heiner Kallweit
@ 2019-03-16 17:08                               ` Alexander Duyck
  2019-03-17  8:25                                 ` Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Duyck @ 2019-03-16 17:08 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: VDR User, Alexander Duyck, Netdev

On Sat, Mar 16, 2019 at 8:12 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 16.03.2019 15:38, VDR User wrote:
> >> Part of the issue though is that we don't know how reliable that test
> >> was. I believe Derek he hasn't had any crashes, but he wasn't confident
> >> that it had actually resolved the issue.
> >
> > Previously I thought I could easily & consistently reproduce the crash
> > but the more testing I did, the more I realized that wasn't the case.
> > That's why my confidence was low in that reversing commit 5317d5c6d47e
> > ("r8169: use napi_consume_skb where possible") fixed it. I felt like I
> > needed to do a lot more testing over the weekend to be sure. But, I
> > can now confirm that reversing that commit did not solve the problem.
> > I didn't ifdown/ifup after the crash so the nic eventually recovered
> > on its own I guess. The `ethtool -S` output is:
> >
> > NIC statistics:
> >      tx_packets: 5370650
> >      rx_packets: 57340787
> >      tx_errors: 0
> >      rx_errors: 0
> >      rx_missed: 26
> >      align_errors: 0
> >      tx_single_collisions: 0
> >      tx_multi_collisions: 0
> >      unicast: 57332905
> >      broadcast: 6409
> >      multicast: 1473
> >      tx_aborted: 0
> >      tx_underrun: 0
> >
> > The dmesg log looks the same as before:
> >
> > [95579.984062] ------------[ cut here ]------------
> > [95579.984142] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
> > [95579.984224] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
> > dev_watchdog+0x1bb/0x1e0
> > [95579.984276] Modules linked in: snd_hda_codec_hdmi
> > snd_hda_codec_realtek snd_hda_codec_generic ohci_pci snd_hda_intel
> > snd_hda_codec snd_hwdep xhci_pci ohci_hcd ehci_pci xhci_hcd ehci_hcd
> > usbcore snd_hda_core usb_common snd_pcm snd_timer snd soundcore nfsd
> > auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
> > [95579.984354] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #1
> > [95579.984387] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
> > [95579.984422] EIP: dev_watchdog+0x1bb/0x1e0
> > [95579.984454] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
> > f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
> > ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
> > 00 00
> > [95579.986189] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
> > [95579.986224] ESI: f4cc2264 EDI: f4cc2000 EBP: f4c99f74 ESP: f4c99f4c
> > [95579.986259] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
> > [95579.986292] CR0: 80050033 CR2: b7c644f0 CR3: 0dfd2000 CR4: 00000690
> > [95579.986325] Call Trace:
> > [95579.986356]  <SOFTIRQ>
> > [95579.986389]  ? qdisc_put_unlocked+0x40/0x40
> > [95579.986423]  call_timer_fn+0x19/0xa0
> > [95579.986456]  run_timer_softirq+0x337/0x380
> > [95579.986488]  ? qdisc_put_unlocked+0x40/0x40
> > [95579.986521]  ? rcu_process_callbacks+0xcb/0x380
> > [95579.986555]  __do_softirq+0xd6/0x21c
> > [95579.986586]  ? __irqentry_text_end+0x18/0x18
> > [95579.986619]  call_on_stack+0x10/0x60
> > [95579.986646]  </SOFTIRQ>
> > [95579.986674]  ? irq_exit+0x91/0xc0
> > [95579.986701]  ? smp_apic_timer_interrupt+0x56/0xa0
> > [95579.986731]  ? apic_timer_interrupt+0xd5/0xdc
> > [95579.986761]  ? acpi_idle_enter_s2idle+0x60/0x60
> > [95579.986790]  ? cpuidle_enter_state+0x122/0x360
> > [95579.986821]  ? cpuidle_enter+0xf/0x20
> > [95579.986850]  ? call_cpuidle+0x1c/0x40
> > [95579.986878]  ? do_idle+0x1e6/0x220
> > [95579.986906]  ? cpu_startup_entry+0x25/0x40
> > [95579.986934]  ? start_secondary+0x1a5/0x220
> > [95579.986963]  ? startup_32_smp+0x15f/0x164
> > [95579.986991] ---[ end trace 2e8d77bb3c9d2d45 ]---
> >
> > Please let me know if there's anything I can do to help.
> > Derek
> >
> The other changes between 4.20 and 5.0 don't look like they could cause
> the issue. And two critical ones have been reverted already.
> So what would need to be done is bisecting the issue.
>
> Heiner

So I took a quick look through the git log. Have you tried taking a
look at the following commit as a possible culprit?
commit e782410ed2378a2ddac58d944c3cf0c6f96b1ff3
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Sat Dec 15 16:25:05 2018 +0100

    r8169: improve spurious interrupt detection

    Improve detection of spurious interrupts by checking against the
    interrupt mask as currently set in the chip.

    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

This patch seems problematic for a few different reasons. The first
being that adding a register read to the interrupt handler will
actually increase the latency for your interrupt handler as a MMIO
read is not a fast operation compared to a memory read. You already
had to read status, and adding a read for the interrupt mask will just
double the overhead as the reads have to be performed sequentially.
Secondly, are we guaranteed that the read of the register will always
be reliable? I know I have seen plenty of hardware issues in the past
where reading a register in the middle of an event can sometimes lead
to the value read being incorrect.

The only other patch in the set between 4.20 and 5.0 that catches my eye is:
commit 76085c9e7eaebf0caa2476f77380efc3d55b050c
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Thu Nov 22 22:03:08 2018 +0100

    r8169: replace macro TX_FRAGS_READY_FOR with a function

    Replace macro TX_FRAGS_READY_FOR with function rtl_tx_slots_avail
    to make code cleaner and type-safe.

    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

However I don't see anything that should be wrong there that is
causing this, but instrumenting the points where we are taking action
based on that with a trace_printk probably wouldn't hurt as it would
help us to verify that we are stopping things in xmit and not getting
around to waking them up for whatever reason. Also a minor
optimization you could do on the patch is to replace the call to
netif_wake_queue in rtl8169_start_xmit with netif_start_queue as you
don't need to actually wake up the queue since you are still in mid
transmit so you just need to reset the bit so it doesn't prevent the
next transmit.

Hope that helps.

- Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-16 17:08                               ` Alexander Duyck
@ 2019-03-17  8:25                                 ` Heiner Kallweit
  0 siblings, 0 replies; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-17  8:25 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: VDR User, Alexander Duyck, Netdev

On 16.03.2019 18:08, Alexander Duyck wrote:
> On Sat, Mar 16, 2019 at 8:12 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 16.03.2019 15:38, VDR User wrote:
>>>> Part of the issue though is that we don't know how reliable that test
>>>> was. I believe Derek he hasn't had any crashes, but he wasn't confident
>>>> that it had actually resolved the issue.
>>>
>>> Previously I thought I could easily & consistently reproduce the crash
>>> but the more testing I did, the more I realized that wasn't the case.
>>> That's why my confidence was low in that reversing commit 5317d5c6d47e
>>> ("r8169: use napi_consume_skb where possible") fixed it. I felt like I
>>> needed to do a lot more testing over the weekend to be sure. But, I
>>> can now confirm that reversing that commit did not solve the problem.
>>> I didn't ifdown/ifup after the crash so the nic eventually recovered
>>> on its own I guess. The `ethtool -S` output is:
>>>
>>> NIC statistics:
>>>      tx_packets: 5370650
>>>      rx_packets: 57340787
>>>      tx_errors: 0
>>>      rx_errors: 0
>>>      rx_missed: 26
>>>      align_errors: 0
>>>      tx_single_collisions: 0
>>>      tx_multi_collisions: 0
>>>      unicast: 57332905
>>>      broadcast: 6409
>>>      multicast: 1473
>>>      tx_aborted: 0
>>>      tx_underrun: 0
>>>
>>> The dmesg log looks the same as before:
>>>
>>> [95579.984062] ------------[ cut here ]------------
>>> [95579.984142] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
>>> [95579.984224] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
>>> dev_watchdog+0x1bb/0x1e0
>>> [95579.984276] Modules linked in: snd_hda_codec_hdmi
>>> snd_hda_codec_realtek snd_hda_codec_generic ohci_pci snd_hda_intel
>>> snd_hda_codec snd_hwdep xhci_pci ohci_hcd ehci_pci xhci_hcd ehci_hcd
>>> usbcore snd_hda_core usb_common snd_pcm snd_timer snd soundcore nfsd
>>> auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
>>> [95579.984354] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #1
>>> [95579.984387] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
>>> [95579.984422] EIP: dev_watchdog+0x1bb/0x1e0
>>> [95579.984454] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
>>> f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
>>> ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
>>> 00 00
>>> [95579.986189] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
>>> [95579.986224] ESI: f4cc2264 EDI: f4cc2000 EBP: f4c99f74 ESP: f4c99f4c
>>> [95579.986259] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
>>> [95579.986292] CR0: 80050033 CR2: b7c644f0 CR3: 0dfd2000 CR4: 00000690
>>> [95579.986325] Call Trace:
>>> [95579.986356]  <SOFTIRQ>
>>> [95579.986389]  ? qdisc_put_unlocked+0x40/0x40
>>> [95579.986423]  call_timer_fn+0x19/0xa0
>>> [95579.986456]  run_timer_softirq+0x337/0x380
>>> [95579.986488]  ? qdisc_put_unlocked+0x40/0x40
>>> [95579.986521]  ? rcu_process_callbacks+0xcb/0x380
>>> [95579.986555]  __do_softirq+0xd6/0x21c
>>> [95579.986586]  ? __irqentry_text_end+0x18/0x18
>>> [95579.986619]  call_on_stack+0x10/0x60
>>> [95579.986646]  </SOFTIRQ>
>>> [95579.986674]  ? irq_exit+0x91/0xc0
>>> [95579.986701]  ? smp_apic_timer_interrupt+0x56/0xa0
>>> [95579.986731]  ? apic_timer_interrupt+0xd5/0xdc
>>> [95579.986761]  ? acpi_idle_enter_s2idle+0x60/0x60
>>> [95579.986790]  ? cpuidle_enter_state+0x122/0x360
>>> [95579.986821]  ? cpuidle_enter+0xf/0x20
>>> [95579.986850]  ? call_cpuidle+0x1c/0x40
>>> [95579.986878]  ? do_idle+0x1e6/0x220
>>> [95579.986906]  ? cpu_startup_entry+0x25/0x40
>>> [95579.986934]  ? start_secondary+0x1a5/0x220
>>> [95579.986963]  ? startup_32_smp+0x15f/0x164
>>> [95579.986991] ---[ end trace 2e8d77bb3c9d2d45 ]---
>>>
>>> Please let me know if there's anything I can do to help.
>>> Derek
>>>
>> The other changes between 4.20 and 5.0 don't look like they could cause
>> the issue. And two critical ones have been reverted already.
>> So what would need to be done is bisecting the issue.
>>
>> Heiner
> 
> So I took a quick look through the git log. Have you tried taking a
> look at the following commit as a possible culprit?
> commit e782410ed2378a2ddac58d944c3cf0c6f96b1ff3
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Sat Dec 15 16:25:05 2018 +0100
> 
>     r8169: improve spurious interrupt detection
> 
>     Improve detection of spurious interrupts by checking against the
>     interrupt mask as currently set in the chip.
> 
>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> This patch seems problematic for a few different reasons. The first
> being that adding a register read to the interrupt handler will
> actually increase the latency for your interrupt handler as a MMIO
> read is not a fast operation compared to a memory read. You already
> had to read status, and adding a read for the interrupt mask will just
> double the overhead as the reads have to be performed sequentially.
> Secondly, are we guaranteed that the read of the register will always
> be reliable? I know I have seen plenty of hardware issues in the past
> where reading a register in the middle of an event can sometimes lead
> to the value read being incorrect.
> 
> The only other patch in the set between 4.20 and 5.0 that catches my eye is:
> commit 76085c9e7eaebf0caa2476f77380efc3d55b050c
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Thu Nov 22 22:03:08 2018 +0100
> 
>     r8169: replace macro TX_FRAGS_READY_FOR with a function
> 
>     Replace macro TX_FRAGS_READY_FOR with function rtl_tx_slots_avail
>     to make code cleaner and type-safe.
> 
>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> However I don't see anything that should be wrong there that is
> causing this, but instrumenting the points where we are taking action
> based on that with a trace_printk probably wouldn't hurt as it would
> help us to verify that we are stopping things in xmit and not getting
> around to waking them up for whatever reason. Also a minor
> optimization you could do on the patch is to replace the call to
> netif_wake_queue in rtl8169_start_xmit with netif_start_queue as you
> don't need to actually wake up the queue since you are still in mid
> transmit so you just need to reset the bit so it doesn't prevent the
> next transmit.
> 
> Hope that helps.
> 
Definitely. Thanks a lot for the valuable feedback.

> - Alex
> 
Heiner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-16 14:38                           ` VDR User
  2019-03-16 15:10                             ` Heiner Kallweit
@ 2019-03-17 11:35                             ` Heiner Kallweit
  2019-03-17 15:23                               ` VDR User
  2019-03-17 15:41                               ` VDR User
  1 sibling, 2 replies; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-17 11:35 UTC (permalink / raw)
  To: VDR User; +Cc: Alexander Duyck, netdev

On 16.03.2019 15:38, VDR User wrote:
>> Part of the issue though is that we don't know how reliable that test
>> was. I believe Derek he hasn't had any crashes, but he wasn't confident
>> that it had actually resolved the issue.
> 
> Previously I thought I could easily & consistently reproduce the crash
> but the more testing I did, the more I realized that wasn't the case.
> That's why my confidence was low in that reversing commit 5317d5c6d47e
> ("r8169: use napi_consume_skb where possible") fixed it. I felt like I
> needed to do a lot more testing over the weekend to be sure. But, I
> can now confirm that reversing that commit did not solve the problem.
> I didn't ifdown/ifup after the crash so the nic eventually recovered
> on its own I guess. The `ethtool -S` output is:
> 
> NIC statistics:
>      tx_packets: 5370650
>      rx_packets: 57340787
>      tx_errors: 0
>      rx_errors: 0
>      rx_missed: 26
>      align_errors: 0
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      unicast: 57332905
>      broadcast: 6409
>      multicast: 1473
>      tx_aborted: 0
>      tx_underrun: 0
> 
[...]
> 
> Please let me know if there's anything I can do to help.
> Derek
> 
Below are two patches. The first removes an extra PCI register read in the
interrupt handler, the second one just adds some tracing for debugging.

Interesting would be whether patch 1 has an impact on the issue, and the
trace output of patch 2 after the issue occurred (w/ or w/o patch 1).
Trace output you find in /sys/kernel/debug/tracing/trace

Heiner

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 761097710..46a4dc888 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -679,6 +679,7 @@ struct rtl8169_private {
 		struct work_struct work;
 	} wk;
 
+	unsigned irq_enabled:1;
 	unsigned supports_gmii:1;
 	dma_addr_t counters_phys_addr;
 	struct rtl8169_counters *counters;
@@ -1294,6 +1295,7 @@ static void rtl_ack_events(struct rtl8169_private *tp, u16 bits)
 static void rtl_irq_disable(struct rtl8169_private *tp)
 {
 	RTL_W16(tp, IntrMask, 0);
+	tp->irq_enabled = 0;
 }
 
 #define RTL_EVENT_NAPI_RX	(RxOK | RxErr)
@@ -1302,6 +1304,7 @@ static void rtl_irq_disable(struct rtl8169_private *tp)
 
 static void rtl_irq_enable(struct rtl8169_private *tp)
 {
+	tp->irq_enabled = 1;
 	RTL_W16(tp, IntrMask, tp->irq_mask);
 }
 
@@ -6521,9 +6524,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 {
 	struct rtl8169_private *tp = dev_instance;
 	u16 status = RTL_R16(tp, IntrStatus);
-	u16 irq_mask = RTL_R16(tp, IntrMask);
 
-	if (status == 0xffff || !(status & irq_mask))
+	if (!tp->irq_enabled || status == 0xffff || !(status & tp->irq_mask))
 		return IRQ_NONE;
 
 	if (unlikely(status & SYSErr)) {
-- 
2.21.0







diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 46a4dc888..5a40fa6f8 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6258,6 +6258,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 		 * not miss a ring update when it notices a stopped queue.
 		 */
 		smp_wmb();
+		trace_printk("stopping tx queue\n");
 		netif_stop_queue(dev);
 		/* Sync with rtl_tx:
 		 * - publish queue status and cur_tx ring index (write barrier)
@@ -6267,8 +6268,10 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 		 * can't.
 		 */
 		smp_mb();
-		if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS))
+		if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
+			trace_printk("waking tx queue\n");
 			netif_wake_queue(dev);
+		}
 	}
 
 	return NETDEV_TX_OK;
@@ -6376,6 +6379,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
 		smp_mb();
 		if (netif_queue_stopped(dev) &&
 		    rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
+			trace_printk("waking tx queue\n");
 			netif_wake_queue(dev);
 		}
 		/*
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-17 11:35                             ` Heiner Kallweit
@ 2019-03-17 15:23                               ` VDR User
  2019-03-17 15:41                               ` VDR User
  1 sibling, 0 replies; 30+ messages in thread
From: VDR User @ 2019-03-17 15:23 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Alexander Duyck, netdev

> Below are two patches. The first removes an extra PCI register read in the
> interrupt handler, the second one just adds some tracing for debugging.
>
> Interesting would be whether patch 1 has an impact on the issue, and the
> trace output of patch 2 after the issue occurred (w/ or w/o patch 1).
> Trace output you find in /sys/kernel/debug/tracing/trace

I'll apply, recompile, and start testing now. Will keep monitoring
this thread and report back once I have something for you.

Best regards,
Derek

> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 761097710..46a4dc888 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -679,6 +679,7 @@ struct rtl8169_private {
>                 struct work_struct work;
>         } wk;
>
> +       unsigned irq_enabled:1;
>         unsigned supports_gmii:1;
>         dma_addr_t counters_phys_addr;
>         struct rtl8169_counters *counters;
> @@ -1294,6 +1295,7 @@ static void rtl_ack_events(struct rtl8169_private *tp, u16 bits)
>  static void rtl_irq_disable(struct rtl8169_private *tp)
>  {
>         RTL_W16(tp, IntrMask, 0);
> +       tp->irq_enabled = 0;
>  }
>
>  #define RTL_EVENT_NAPI_RX      (RxOK | RxErr)
> @@ -1302,6 +1304,7 @@ static void rtl_irq_disable(struct rtl8169_private *tp)
>
>  static void rtl_irq_enable(struct rtl8169_private *tp)
>  {
> +       tp->irq_enabled = 1;
>         RTL_W16(tp, IntrMask, tp->irq_mask);
>  }
>
> @@ -6521,9 +6524,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  {
>         struct rtl8169_private *tp = dev_instance;
>         u16 status = RTL_R16(tp, IntrStatus);
> -       u16 irq_mask = RTL_R16(tp, IntrMask);
>
> -       if (status == 0xffff || !(status & irq_mask))
> +       if (!tp->irq_enabled || status == 0xffff || !(status & tp->irq_mask))
>                 return IRQ_NONE;
>
>         if (unlikely(status & SYSErr)) {
> --
> 2.21.0
>
>
>
>
>
>
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 46a4dc888..5a40fa6f8 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -6258,6 +6258,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>                  * not miss a ring update when it notices a stopped queue.
>                  */
>                 smp_wmb();
> +               trace_printk("stopping tx queue\n");
>                 netif_stop_queue(dev);
>                 /* Sync with rtl_tx:
>                  * - publish queue status and cur_tx ring index (write barrier)
> @@ -6267,8 +6268,10 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>                  * can't.
>                  */
>                 smp_mb();
> -               if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS))
> +               if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
> +                       trace_printk("waking tx queue\n");
>                         netif_wake_queue(dev);
> +               }
>         }
>
>         return NETDEV_TX_OK;
> @@ -6376,6 +6379,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
>                 smp_mb();
>                 if (netif_queue_stopped(dev) &&
>                     rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
> +                       trace_printk("waking tx queue\n");
>                         netif_wake_queue(dev);
>                 }
>                 /*
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-17 11:35                             ` Heiner Kallweit
  2019-03-17 15:23                               ` VDR User
@ 2019-03-17 15:41                               ` VDR User
  2019-03-17 15:49                                 ` Heiner Kallweit
  1 sibling, 1 reply; 30+ messages in thread
From: VDR User @ 2019-03-17 15:41 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Alexander Duyck, netdev

A quick comment on patch 1:

> @@ -1294,6 +1295,7 @@ static void rtl_ack_events(struct rtl8169_private *tp, u16 bits)
>  static void rtl_irq_disable(struct rtl8169_private *tp)
>  {
>         RTL_W16(tp, IntrMask, 0);
> +       tp->irq_enabled = 0;
>  }

This function is slightly different in the vanilla 5.0 driver:

static void rtl_irq_disable(struct rtl8169_private *tp)
{
        RTL_W16(tp, IntrMask, 0);
        mmiowb();
}

Unless that mmiowb() matters, I'll continue.

Derek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-17 15:41                               ` VDR User
@ 2019-03-17 15:49                                 ` Heiner Kallweit
  2019-03-18 16:17                                   ` VDR User
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-17 15:49 UTC (permalink / raw)
  To: VDR User; +Cc: Alexander Duyck, netdev

On 17.03.2019 16:41, VDR User wrote:
> A quick comment on patch 1:
> 
>> @@ -1294,6 +1295,7 @@ static void rtl_ack_events(struct rtl8169_private *tp, u16 bits)
>>  static void rtl_irq_disable(struct rtl8169_private *tp)
>>  {
>>         RTL_W16(tp, IntrMask, 0);
>> +       tp->irq_enabled = 0;
>>  }
> 
> This function is slightly different in the vanilla 5.0 driver:
> 
Right, the patch is based on latest linux-next.

> static void rtl_irq_disable(struct rtl8169_private *tp)
> {
>         RTL_W16(tp, IntrMask, 0);
>         mmiowb();
> }
> 
> Unless that mmiowb() matters, I'll continue.
> 
No, it doesn't matter.

> Derek
> 
Heiner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-17 15:49                                 ` Heiner Kallweit
@ 2019-03-18 16:17                                   ` VDR User
  2019-03-18 18:36                                     ` Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: VDR User @ 2019-03-18 16:17 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Alexander Duyck, netdev

Just a quick check-in...  I was able to do a decent amount of testing
yesterday after applying Heiner's patches that "removes an extra PCI
register read in the
interrupt handler" and adds some debug logging. So far I haven't seen
any crashes, everything has been smooth. It may be worth noting that
the last few crashes I saw occurred with less testing than I did
yesterday. I'm not sure if luck is involved and will continue to test
over the next few days unless otherwise instructed. Just wanted to
keep you guys informed what I've seen so far.

Best regards,
Derek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-18 16:17                                   ` VDR User
@ 2019-03-18 18:36                                     ` Heiner Kallweit
  2019-03-21 19:35                                       ` VDR User
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-18 18:36 UTC (permalink / raw)
  To: VDR User; +Cc: Alexander Duyck, netdev

On 18.03.2019 17:17, VDR User wrote:
> Just a quick check-in...  I was able to do a decent amount of testing
> yesterday after applying Heiner's patches that "removes an extra PCI
> register read in the
> interrupt handler" and adds some debug logging. So far I haven't seen
> any crashes, everything has been smooth. It may be worth noting that
> the last few crashes I saw occurred with less testing than I did
> yesterday. I'm not sure if luck is involved and will continue to test
> over the next few days unless otherwise instructed. Just wanted to
> keep you guys informed what I've seen so far.
> 
Thanks a lot for the feedback and your efforts! If the issue doesn't
occur again in the next days I'll submit the fix.

> Best regards,
> Derek
> 
Heiner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-18 18:36                                     ` Heiner Kallweit
@ 2019-03-21 19:35                                       ` VDR User
  2019-03-22 19:03                                         ` Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: VDR User @ 2019-03-21 19:35 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Alexander Duyck, netdev

> > Just a quick check-in...  I was able to do a decent amount of testing
> > yesterday after applying Heiner's patches that "removes an extra PCI
> > register read in the
> > interrupt handler" and adds some debug logging. So far I haven't seen
> > any crashes, everything has been smooth. It may be worth noting that
> > the last few crashes I saw occurred with less testing than I did
> > yesterday. I'm not sure if luck is involved and will continue to test
> > over the next few days unless otherwise instructed. Just wanted to
> > keep you guys informed what I've seen so far.
> >
> Thanks a lot for the feedback and your efforts! If the issue doesn't
> occur again in the next days I'll submit the fix.

Since my last check-in, I have done a ton more testing, hammering the
nic with both directions of traffic. Not a single crash yet! At this
point I feel very confident that problem has been resolved with your
patch and would be very surprised if I see a crash going forward.
Unless there are reasons unknown to me I'd say the patch/fix is
ACK-worthy.

Thank you to everyone who helped sort this out!
Derek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-21 19:35                                       ` VDR User
@ 2019-03-22 19:03                                         ` Heiner Kallweit
  2019-03-22 20:44                                           ` VDR User
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-03-22 19:03 UTC (permalink / raw)
  To: VDR User; +Cc: Alexander Duyck, netdev

On 21.03.2019 20:35, VDR User wrote:
>>> Just a quick check-in...  I was able to do a decent amount of testing
>>> yesterday after applying Heiner's patches that "removes an extra PCI
>>> register read in the
>>> interrupt handler" and adds some debug logging. So far I haven't seen
>>> any crashes, everything has been smooth. It may be worth noting that
>>> the last few crashes I saw occurred with less testing than I did
>>> yesterday. I'm not sure if luck is involved and will continue to test
>>> over the next few days unless otherwise instructed. Just wanted to
>>> keep you guys informed what I've seen so far.
>>>
>> Thanks a lot for the feedback and your efforts! If the issue doesn't
>> occur again in the next days I'll submit the fix.
> 
> Since my last check-in, I have done a ton more testing, hammering the
> nic with both directions of traffic. Not a single crash yet! At this
> point I feel very confident that problem has been resolved with your
> patch and would be very surprised if I see a crash going forward.
> Unless there are reasons unknown to me I'd say the patch/fix is
> ACK-worthy.
> 
> Thank you to everyone who helped sort this out!
> Derek
> 
Fix has been applied and should show-up in the stable tree very soon.

Heiner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-22 19:03                                         ` Heiner Kallweit
@ 2019-03-22 20:44                                           ` VDR User
  2019-04-04 14:28                                             ` VDR User
  0 siblings, 1 reply; 30+ messages in thread
From: VDR User @ 2019-03-22 20:44 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Alexander Duyck, netdev

> >>> Just a quick check-in...  I was able to do a decent amount of testing
> >>> yesterday after applying Heiner's patches that "removes an extra PCI
> >>> register read in the
> >>> interrupt handler" and adds some debug logging. So far I haven't seen
> >>> any crashes, everything has been smooth. It may be worth noting that
> >>> the last few crashes I saw occurred with less testing than I did
> >>> yesterday. I'm not sure if luck is involved and will continue to test
> >>> over the next few days unless otherwise instructed. Just wanted to
> >>> keep you guys informed what I've seen so far.
> >>>
> >> Thanks a lot for the feedback and your efforts! If the issue doesn't
> >> occur again in the next days I'll submit the fix.
> >
> > Since my last check-in, I have done a ton more testing, hammering the
> > nic with both directions of traffic. Not a single crash yet! At this
> > point I feel very confident that problem has been resolved with your
> > patch and would be very surprised if I see a crash going forward.
> > Unless there are reasons unknown to me I'd say the patch/fix is
> > ACK-worthy.
> >
> > Thank you to everyone who helped sort this out!
> > Derek
> >
> Fix has been applied and should show-up in the stable tree very soon.

Thank you for the update, I'll keep an eye out in stable for it.

Cheers,
Derek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-03-22 20:44                                           ` VDR User
@ 2019-04-04 14:28                                             ` VDR User
  2019-04-04 19:41                                               ` Heiner Kallweit
  0 siblings, 1 reply; 30+ messages in thread
From: VDR User @ 2019-04-04 14:28 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Alexander Duyck, netdev

Hi,

Unfortunately I have to report that the crashing has resurfaced. I'm
currently using kernel 5.0 with Heiner's fix applied. In the last few
days I've had the crash occur 4 times now. I'm not sure how to further
investigate this but I'm guessing a patch that adds more debugging
output will be needed to track down where exactly things go bad and
hopefully what exactly is triggering it? In 2 cases there was
bi-directional traffic, in the other 2 cases there was only incoming
traffic. In all cases the nic became unresponsive, all connections to
the box died/timed out. After roughly 10 minutes the nic came back to
life on its own without my intervention. The odd thing I noticed was
the dmesg log showing the problem only shows up once in dmesg instead
of 4 times to coincide with the 4 crashes/freezes. I don't think the
box reboot itself, and if it did then dmesg should be clean.

Please let me know what you think and how we should proceed. Thanks!
Derek

[1126370.489823] ------------[ cut here ]------------
[1126370.489905] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
[1126370.489989] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
dev_watchdog+0x1bb/0x1e0
[1126370.490040] Modules linked in: snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi ohci_pci snd_hda_intel
snd_hda_codec snd_hwdep snd_hda_core ohci_hcd ehci_pci ehci_hcd
xhci_pci xhci_hcd snd_pcm usbcore snd_timer usb_common snd soundcore
nfsd auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables
ipv6
[1126370.490118] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #3
[1126370.490152] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
[1126370.490187] EIP: dev_watchdog+0x1bb/0x1e0
[1126370.490220] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 e5
ac ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
00 00
[1126370.490271] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
[1126370.490305] ESI: f4cb8264 EDI: f4cb8000 EBP: f4c99f74 ESP: f4c99f4c
[1126370.490339] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
[1126370.490374] CR0: 80050033 CR2: b719b10c CR3: 0d7a6000 CR4: 00000690
[1126370.490407] Call Trace:
[1126370.490439]  <SOFTIRQ>
[1126370.490473]  ? qdisc_put_unlocked+0x40/0x40
[1126370.490508]  call_timer_fn+0x19/0xa0
[1126370.490541]  run_timer_softirq+0x337/0x380
[1126370.490572]  ? qdisc_put_unlocked+0x40/0x40
[1126370.490601]  ? rcu_process_callbacks+0xcb/0x380
[1126370.490630]  ? net_rx_action+0x152/0x220
[1126370.490660]  __do_softirq+0xd6/0x21c
[1126370.490688]  ? __irqentry_text_end+0x18/0x18
[1126370.490718]  call_on_stack+0x10/0x60
[1126370.490746]  </SOFTIRQ>
[1126370.490774]  ? irq_exit+0x91/0xc0
[1126370.490802]  ? smp_apic_timer_interrupt+0x56/0xa0
[1126370.490831]  ? apic_timer_interrupt+0xd5/0xdc
[1126370.490863]  ? acpi_idle_enter_s2idle+0x60/0x60
[1126370.490893]  ? cpuidle_enter_state+0x122/0x360
[1126370.490922]  ? tick_nohz_idle_stop_tick+0x27b/0x2a0
[1126370.490952]  ? cpuidle_enter+0xf/0x20
[1126370.490981]  ? call_cpuidle+0x1c/0x40
[1126370.491011]  ? do_idle+0x1e6/0x220
[1126370.491039]  ? cpu_startup_entry+0x25/0x40
[1126370.491068]  ? start_secondary+0x1a5/0x220
[1126370.491097]  ? startup_32_smp+0x15f/0x164
[1126370.491126] ---[ end trace 8a26c9ba9d20b87b ]---

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-04-04 14:28                                             ` VDR User
@ 2019-04-04 19:41                                               ` Heiner Kallweit
  2019-04-04 22:11                                                 ` VDR User
  0 siblings, 1 reply; 30+ messages in thread
From: Heiner Kallweit @ 2019-04-04 19:41 UTC (permalink / raw)
  To: VDR User; +Cc: netdev

On 04.04.2019 16:28, VDR User wrote:
> Hi,
> 
> Unfortunately I have to report that the crashing has resurfaced. I'm
> currently using kernel 5.0 with Heiner's fix applied. In the last few
> days I've had the crash occur 4 times now. I'm not sure how to further
> investigate this but I'm guessing a patch that adds more debugging
> output will be needed to track down where exactly things go bad and
> hopefully what exactly is triggering it? In 2 cases there was
> bi-directional traffic, in the other 2 cases there was only incoming
> traffic. In all cases the nic became unresponsive, all connections to
> the box died/timed out. After roughly 10 minutes the nic came back to
> life on its own without my intervention. The odd thing I noticed was
> the dmesg log showing the problem only shows up once in dmesg instead
> of 4 times to coincide with the 4 crashes/freezes. I don't think the
> box reboot itself, and if it did then dmesg should be clean.
> 
> Please let me know what you think and how we should proceed. Thanks!
> Derek
> 
Instead of having to wait for the issue: Can you trigger it with iperf3
in either direction?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
  2019-04-04 19:41                                               ` Heiner Kallweit
@ 2019-04-04 22:11                                                 ` VDR User
  0 siblings, 0 replies; 30+ messages in thread
From: VDR User @ 2019-04-04 22:11 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: netdev

> > Unfortunately I have to report that the crashing has resurfaced. I'm
> > currently using kernel 5.0 with Heiner's fix applied. In the last few
> > days I've had the crash occur 4 times now. I'm not sure how to further
> > investigate this but I'm guessing a patch that adds more debugging
> > output will be needed to track down where exactly things go bad and
> > hopefully what exactly is triggering it? In 2 cases there was
> > bi-directional traffic, in the other 2 cases there was only incoming
> > traffic. In all cases the nic became unresponsive, all connections to
> > the box died/timed out. After roughly 10 minutes the nic came back to
> > life on its own without my intervention. The odd thing I noticed was
> > the dmesg log showing the problem only shows up once in dmesg instead
> > of 4 times to coincide with the 4 crashes/freezes. I don't think the
> > box reboot itself, and if it did then dmesg should be clean.
> >
> Instead of having to wait for the issue: Can you trigger it with iperf3
> in either direction?

I'm not familiar with iperf3 but installed it and looked up a quick
example. I started the iperf3 server on the server and then ran the
test on the client a bunch of times. And then also tried the test with
the `-d` bi-directional flag. I couldn't get the crash/freeze to
trigger though. Is there a certain command line I should run the test
with, or anything specific I should do?

-Derek

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-04-04 22:11 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-10 19:02 r8169 driver from kernel 5.0 crashing VDR User
2019-03-10 19:35 ` Heiner Kallweit
2019-03-14  3:04   ` VDR User
2019-03-14  6:43     ` Heiner Kallweit
2019-03-14 15:10       ` VDR User
2019-03-14 18:08         ` Heiner Kallweit
2019-03-14 21:36           ` VDR User
2019-03-15 20:09             ` VDR User
2019-03-15 20:26               ` r8169 driver from kernel 5.0 crashing - napi_consume_skb Heiner Kallweit
2019-03-15 20:40                 ` Alexander Duyck
2019-03-15 20:46                   ` Heiner Kallweit
2019-03-15 22:09                     ` Alexander Duyck
2019-03-15 22:27                       ` Heiner Kallweit
     [not found]                         ` <b470f0ee6071bded8e78e15e18df14ffbc32cd18.camel@linux.intel.com>
2019-03-16  9:07                           ` Heiner Kallweit
2019-03-16 14:38                           ` VDR User
2019-03-16 15:10                             ` Heiner Kallweit
2019-03-16 17:08                               ` Alexander Duyck
2019-03-17  8:25                                 ` Heiner Kallweit
2019-03-17 11:35                             ` Heiner Kallweit
2019-03-17 15:23                               ` VDR User
2019-03-17 15:41                               ` VDR User
2019-03-17 15:49                                 ` Heiner Kallweit
2019-03-18 16:17                                   ` VDR User
2019-03-18 18:36                                     ` Heiner Kallweit
2019-03-21 19:35                                       ` VDR User
2019-03-22 19:03                                         ` Heiner Kallweit
2019-03-22 20:44                                           ` VDR User
2019-04-04 14:28                                             ` VDR User
2019-04-04 19:41                                               ` Heiner Kallweit
2019-04-04 22:11                                                 ` VDR User

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.