* [BUG] igb: reconnecting of cable not always detected @ 2018-04-24 15:14 ` Holger Schurig 0 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-24 15:14 UTC (permalink / raw) To: jeffrey.t.kirsher, intel-wired-lan, linux-kernel Hi all, I'm on kernel 4.16.4 and have an issue with eth0, driver is igb. When I remove the ethernet cable, this is always detected: [ 2.772360] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k [ 2.772363] igb: Copyright (c) 2007-2014 Intel Corporation. [ 3.023707] igb 0000:02:00.0: added PHC on eth0 [ 3.023710] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection [ 3.023713] igb 0000:02:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:13:95:1a:54:33 [ 3.023758] igb 0000:02:00.0: eth0: PBA No: 000300-000 [ 3.023762] igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) [ 7.984921] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 11.184593] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Down Sometimes, plugging the cable back in is detected ... [ 43.736922] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX ... but sometimes this is *NOT* detected. I can put the cable in and even after two minutes nothing has been detected. But when I run "rmmod igb" followed by "modpobe igb", the link is detected again: [ 100.528609] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Down [ 2336.583244] igb 0000:02:00.0: removed PHC on eth0 [ 2339.693521] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k [ 2339.693524] igb: Copyright (c) 2007-2014 Intel Corporation. [ 2339.990553] pps pps0: new PPS source ptp0 [ 2339.990561] igb 0000:02:00.0: added PHC on eth0 [ 2339.990565] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection [ 2339.990569] igb 0000:02:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:13:95:1a:54:33 [ 2339.990611] igb 0000:02:00.0: eth0: PBA No: 000300-000 [ 2339.990615] igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) [ 2343.001114] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX (In above dmesg snippet the ethernet cable was the whole time inserted). Any tips on how I can debug this further? PS: I already tried a different switch and also a direct connection from device-to-device, without a switch. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-04-24 15:14 ` Holger Schurig 0 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-24 15:14 UTC (permalink / raw) To: intel-wired-lan Hi all, I'm on kernel 4.16.4 and have an issue with eth0, driver is igb. When I remove the ethernet cable, this is always detected: [ 2.772360] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k [ 2.772363] igb: Copyright (c) 2007-2014 Intel Corporation. [ 3.023707] igb 0000:02:00.0: added PHC on eth0 [ 3.023710] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection [ 3.023713] igb 0000:02:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:13:95:1a:54:33 [ 3.023758] igb 0000:02:00.0: eth0: PBA No: 000300-000 [ 3.023762] igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) [ 7.984921] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 11.184593] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Down Sometimes, plugging the cable back in is detected ... [ 43.736922] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX ... but sometimes this is *NOT* detected. I can put the cable in and even after two minutes nothing has been detected. But when I run "rmmod igb" followed by "modpobe igb", the link is detected again: [ 100.528609] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Down [ 2336.583244] igb 0000:02:00.0: removed PHC on eth0 [ 2339.693521] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k [ 2339.693524] igb: Copyright (c) 2007-2014 Intel Corporation. [ 2339.990553] pps pps0: new PPS source ptp0 [ 2339.990561] igb 0000:02:00.0: added PHC on eth0 [ 2339.990565] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection [ 2339.990569] igb 0000:02:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:13:95:1a:54:33 [ 2339.990611] igb 0000:02:00.0: eth0: PBA No: 000300-000 [ 2339.990615] igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) [ 2343.001114] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX (In above dmesg snippet the ethernet cable was the whole time inserted). Any tips on how I can debug this further? PS: I already tried a different switch and also a direct connection from device-to-device, without a switch. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] igb: reconnecting of cable not always detected 2018-04-24 15:14 ` [Intel-wired-lan] " Holger Schurig @ 2018-04-24 18:09 ` Alexander Duyck -1 siblings, 0 replies; 23+ messages in thread From: Alexander Duyck @ 2018-04-24 18:09 UTC (permalink / raw) To: Holger Schurig; +Cc: Jeff Kirsher, intel-wired-lan, LKML On Tue, Apr 24, 2018 at 8:14 AM, Holger Schurig <holgerschurig@gmail.com> wrote: > Hi all, > > I'm on kernel 4.16.4 and have an issue with eth0, driver is igb. When I > remove the ethernet cable, this is always detected: > > [ 2.772360] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k > [ 2.772363] igb: Copyright (c) 2007-2014 Intel Corporation. > [ 3.023707] igb 0000:02:00.0: added PHC on eth0 > [ 3.023710] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection > [ 3.023713] igb 0000:02:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:13:95:1a:54:33 > [ 3.023758] igb 0000:02:00.0: eth0: PBA No: 000300-000 > [ 3.023762] igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) > [ 7.984921] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX > [ 11.184593] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Down > > Sometimes, plugging the cable back in is detected ... > > [ 43.736922] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX > > ... but sometimes this is *NOT* detected. I can put the cable in and > even after two minutes nothing has been detected. > > But when I run "rmmod igb" followed by "modpobe igb", the link is > detected again: > > [ 100.528609] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Down > [ 2336.583244] igb 0000:02:00.0: removed PHC on eth0 > [ 2339.693521] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k > [ 2339.693524] igb: Copyright (c) 2007-2014 Intel Corporation. > [ 2339.990553] pps pps0: new PPS source ptp0 > [ 2339.990561] igb 0000:02:00.0: added PHC on eth0 > [ 2339.990565] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection > [ 2339.990569] igb 0000:02:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:13:95:1a:54:33 > [ 2339.990611] igb 0000:02:00.0: eth0: PBA No: 000300-000 > [ 2339.990615] igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) > [ 2343.001114] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX > > (In above dmesg snippet the ethernet cable was the whole time inserted). > > > Any tips on how I can debug this further? > > PS: I already tried a different switch and also a direct connection from > device-to-device, without a switch. Sounds like the link is failing to re-establish. You might double check a few things. One is to verify if the link partner is recognizing the link as coming up or not. That would help to tell us if this is a problem of the driver detecting the link, or if the link itself is not being re-established. Another thing you could look at doing is running "ethtool -r eth0" after plugging the cable in to see if that re-establishes the link or not. It should be easier anyway than having to unload and reload the driver. If you could also provide an "lspci -vvv" and "ethtool -i" for the device it would help us in the debugging process as it would provide us with information on what NIC it is you are using and what firmware is in use on it. Thanks. - Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-04-24 18:09 ` Alexander Duyck 0 siblings, 0 replies; 23+ messages in thread From: Alexander Duyck @ 2018-04-24 18:09 UTC (permalink / raw) To: intel-wired-lan On Tue, Apr 24, 2018 at 8:14 AM, Holger Schurig <holgerschurig@gmail.com> wrote: > Hi all, > > I'm on kernel 4.16.4 and have an issue with eth0, driver is igb. When I > remove the ethernet cable, this is always detected: > > [ 2.772360] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k > [ 2.772363] igb: Copyright (c) 2007-2014 Intel Corporation. > [ 3.023707] igb 0000:02:00.0: added PHC on eth0 > [ 3.023710] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection > [ 3.023713] igb 0000:02:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:13:95:1a:54:33 > [ 3.023758] igb 0000:02:00.0: eth0: PBA No: 000300-000 > [ 3.023762] igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) > [ 7.984921] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX > [ 11.184593] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Down > > Sometimes, plugging the cable back in is detected ... > > [ 43.736922] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX > > ... but sometimes this is *NOT* detected. I can put the cable in and > even after two minutes nothing has been detected. > > But when I run "rmmod igb" followed by "modpobe igb", the link is > detected again: > > [ 100.528609] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Down > [ 2336.583244] igb 0000:02:00.0: removed PHC on eth0 > [ 2339.693521] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k > [ 2339.693524] igb: Copyright (c) 2007-2014 Intel Corporation. > [ 2339.990553] pps pps0: new PPS source ptp0 > [ 2339.990561] igb 0000:02:00.0: added PHC on eth0 > [ 2339.990565] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection > [ 2339.990569] igb 0000:02:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:13:95:1a:54:33 > [ 2339.990611] igb 0000:02:00.0: eth0: PBA No: 000300-000 > [ 2339.990615] igb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) > [ 2343.001114] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX > > (In above dmesg snippet the ethernet cable was the whole time inserted). > > > Any tips on how I can debug this further? > > PS: I already tried a different switch and also a direct connection from > device-to-device, without a switch. Sounds like the link is failing to re-establish. You might double check a few things. One is to verify if the link partner is recognizing the link as coming up or not. That would help to tell us if this is a problem of the driver detecting the link, or if the link itself is not being re-established. Another thing you could look at doing is running "ethtool -r eth0" after plugging the cable in to see if that re-establishes the link or not. It should be easier anyway than having to unload and reload the driver. If you could also provide an "lspci -vvv" and "ethtool -i" for the device it would help us in the debugging process as it would provide us with information on what NIC it is you are using and what firmware is in use on it. Thanks. - Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] igb: reconnecting of cable not always detected 2018-04-24 18:09 ` [Intel-wired-lan] " Alexander Duyck @ 2018-04-25 3:30 ` Richard Cochran -1 siblings, 0 replies; 23+ messages in thread From: Richard Cochran @ 2018-04-25 3:30 UTC (permalink / raw) To: Alexander Duyck; +Cc: Holger Schurig, Jeff Kirsher, intel-wired-lan, LKML On Tue, Apr 24, 2018 at 11:09:02AM -0700, Alexander Duyck wrote: > On Tue, Apr 24, 2018 at 8:14 AM, Holger Schurig <holgerschurig@gmail.com> wrote: > > Sometimes, plugging the cable back in is detected ... > > > > [ 43.736922] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX > > > > ... but sometimes this is *NOT* detected. I can put the cable in and > > even after two minutes nothing has been detected. > > > > But when I run "rmmod igb" followed by "modpobe igb", the link is > > detected again: FWIW, I have noticed over the past months (or even years?) that my i210 cards (or the igb driver) also fail to detect link changes after a few physical link interruptions. I never bothered to try and debug this, but it is super annoying. Thanks, Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-04-25 3:30 ` Richard Cochran 0 siblings, 0 replies; 23+ messages in thread From: Richard Cochran @ 2018-04-25 3:30 UTC (permalink / raw) To: intel-wired-lan On Tue, Apr 24, 2018 at 11:09:02AM -0700, Alexander Duyck wrote: > On Tue, Apr 24, 2018 at 8:14 AM, Holger Schurig <holgerschurig@gmail.com> wrote: > > Sometimes, plugging the cable back in is detected ... > > > > [ 43.736922] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX > > > > ... but sometimes this is *NOT* detected. I can put the cable in and > > even after two minutes nothing has been detected. > > > > But when I run "rmmod igb" followed by "modpobe igb", the link is > > detected again: FWIW, I have noticed over the past months (or even years?) that my i210 cards (or the igb driver) also fail to detect link changes after a few physical link interruptions. I never bothered to try and debug this, but it is super annoying. Thanks, Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] igb: reconnecting of cable not always detected 2018-04-24 18:09 ` [Intel-wired-lan] " Alexander Duyck @ 2018-04-25 9:47 ` Holger Schurig -1 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-25 9:47 UTC (permalink / raw) To: Alexander Duyck; +Cc: Jeff Kirsher, intel-wired-lan, LKML Hi Alex, (Sent a 2nd time, this time with "Reply to all" and without HTML, so that it hits the kernel archives as well. Sorry for the noise. > Sounds like the link is failing to re-establish. You might double > check a few things. One is to verify if the link partner is > recognizing the link as coming up or not. It turns on differently. Before I remove the cable, the LED on the TP LINK "TL SG-108" was green. After removing the cable, the LED went off. After reinserting the cable, it became orange after some while. Green LED means 1000 MB/s, orange LED means 10/100 MB/s. I have a different, even older switch: "Allnet ALL8039". Here the same: the switch detects a link, but igb not. > If you could also provide an "lspci -vvv" 02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at 90600000 (32-bit, non-prefetchable) [size=512K] Region 2: I/O ports at d000 [size=32] Region 3: Memory at 90680000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <16us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Device Serial Number 00-13-95-ff-ff-1a-54-33 Capabilities: [1a0 v1] Transaction Processing Hints Device specific mode supported Steering table in TPH capability structure Kernel driver in use: igb Kernel modules: igb > and "ethtool -i" for the driver: igb version: 5.4.0-k firmware-version: 3.20, 0x80000553 expansion-rom-version: bus-info: 0000:02:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes One thing that is interesting is how igb reacts to ethtool inquiries once it goes into the failed state. You inquired for "ethtool -i eth0", but in the failed state I only get this: Cannot restart autonegotiation: No such device But eth0 is of course still there, "ip -d link show eth0" shows: 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether 00:13:95:1a:54:33 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 Other ethtool commands also don't report any information once the link went bogus. Here one output from "ethtool eth0": Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: off (auto) Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000007 (7) drv probe link Link detected: yes ... and here another: Settings for eth0: Cannot get device settings: No such device Cannot get wake-on-lan settings: No such device Cannot get message level: No such device Cannot get link status: No such device Settings for eth0: No data available I'm willing to pepper the source with printk, if this helps :-) Greetings, Holger ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-04-25 9:47 ` Holger Schurig 0 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-25 9:47 UTC (permalink / raw) To: intel-wired-lan Hi Alex, (Sent a 2nd time, this time with "Reply to all" and without HTML, so that it hits the kernel archives as well. Sorry for the noise. > Sounds like the link is failing to re-establish. You might double > check a few things. One is to verify if the link partner is > recognizing the link as coming up or not. It turns on differently. Before I remove the cable, the LED on the TP LINK "TL SG-108" was green. After removing the cable, the LED went off. After reinserting the cable, it became orange after some while. Green LED means 1000 MB/s, orange LED means 10/100 MB/s. I have a different, even older switch: "Allnet ALL8039". Here the same: the switch detects a link, but igb not. > If you could also provide an "lspci -vvv" 02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at 90600000 (32-bit, non-prefetchable) [size=512K] Region 2: I/O ports at d000 [size=32] Region 3: Memory@90680000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <16us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Device Serial Number 00-13-95-ff-ff-1a-54-33 Capabilities: [1a0 v1] Transaction Processing Hints Device specific mode supported Steering table in TPH capability structure Kernel driver in use: igb Kernel modules: igb > and "ethtool -i" for the driver: igb version: 5.4.0-k firmware-version: 3.20, 0x80000553 expansion-rom-version: bus-info: 0000:02:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes One thing that is interesting is how igb reacts to ethtool inquiries once it goes into the failed state. You inquired for "ethtool -i eth0", but in the failed state I only get this: Cannot restart autonegotiation: No such device But eth0 is of course still there, "ip -d link show eth0" shows: 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether 00:13:95:1a:54:33 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 Other ethtool commands also don't report any information once the link went bogus. Here one output from "ethtool eth0": Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: off (auto) Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000007 (7) drv probe link Link detected: yes ... and here another: Settings for eth0: Cannot get device settings: No such device Cannot get wake-on-lan settings: No such device Cannot get message level: No such device Cannot get link status: No such device Settings for eth0: No data available I'm willing to pepper the source with printk, if this helps :-) Greetings, Holger ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] igb: reconnecting of cable not always detected 2018-04-25 9:47 ` [Intel-wired-lan] " Holger Schurig @ 2018-04-25 16:01 ` Alexander Duyck -1 siblings, 0 replies; 23+ messages in thread From: Alexander Duyck @ 2018-04-25 16:01 UTC (permalink / raw) To: Holger Schurig; +Cc: Jeff Kirsher, intel-wired-lan, LKML On Wed, Apr 25, 2018 at 2:47 AM, Holger Schurig <holgerschurig@gmail.com> wrote: > Hi Alex, > > (Sent a 2nd time, this time with "Reply to all" and without HTML, so > that it hits the kernel archives as well. Sorry for the noise. > > > > >> Sounds like the link is failing to re-establish. You might double >> check a few things. One is to verify if the link partner is >> recognizing the link as coming up or not. > > It turns on differently. Before I remove the cable, the LED on the TP > LINK "TL SG-108" was green. After removing the cable, the LED went off. > After reinserting the cable, it became orange after some while. > > Green LED means 1000 MB/s, orange LED means 10/100 MB/s. Was the orange LED on the igb NIC or on the TL SG-108? Based on the comment below I am assuming it is the switch. Based on that I am thinking we probably need to work on the PHY configuration. > I have a different, even older switch: "Allnet ALL8039". Here the same: > the switch detects a link, but igb not. > > > >> If you could also provide an "lspci -vvv" > > 02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network > Connection (rev 03) Okay so we are working with an i210. > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 19 > Region 0: Memory at 90600000 (32-bit, non-prefetchable) [size=512K] > Region 2: I/O ports at d000 [size=32] > Region 3: Memory at 90680000 (32-bit, non-prefetchable) [size=16K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Address: 0000000000000000 Data: 0000 > Masking: 00000000 Pending: 00000000 > Capabilities: [70] MSI-X: Enable+ Count=5 Masked- > Vector table: BAR=3 offset=00000000 > PBA: BAR=3 offset=00002000 > Capabilities: [a0] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s > <512ns, L1 <64us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ > SlotPowerLimit 0.000W > DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ > Unsupported+ > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ > FLReset- > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ > TransPend- > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit > Latency L0s <2us, L1 <16us > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, > OBFF Not Supported > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, > LTR-, OBFF Disabled > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- > SpeedDis- > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, > EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, > LinkEqualizationRequest- > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > NonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ > ChkEn- > Capabilities: [140 v1] Device Serial Number 00-13-95-ff-ff-1a-54-33 > Capabilities: [1a0 v1] Transaction Processing Hints > Device specific mode supported > Steering table in TPH capability structure > Kernel driver in use: igb > Kernel modules: igb > >> and "ethtool -i" for the > > driver: igb > version: 5.4.0-k > firmware-version: 3.20, 0x80000553 > expansion-rom-version: > bus-info: 0000:02:00.0 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: yes > supports-register-dump: yes > supports-priv-flags: yes > > > > One thing that is interesting is how igb reacts to ethtool inquiries > once it goes into the failed state. You inquired for "ethtool -i eth0", > but in the failed state I only get this: > > Cannot restart autonegotiation: No such device I assume you mean "ethtool -r" since that is what is supposed to be restarting negotiation. The "ethtool -i" is what you provided above. The fact that the device disappears is a bit concerning. I'm wondering if we are somehow triggering the surprise removal code. > But eth0 is of course still there, "ip -d link show eth0" shows: > > > 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN > mode DEFAULT group default qlen 1000 > link/ether 00:13:95:1a:54:33 brd ff:ff:ff:ff:ff:ff promiscuity 0 > numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 > > > > > > Other ethtool commands also don't report any information once the link > went bogus. Here one output from "ethtool eth0": > > Settings for eth0: > Supported ports: [ TP ] > Supported link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Supported pause frame use: Symmetric > Supports auto-negotiation: Yes > Advertised link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Advertised pause frame use: Symmetric > Advertised auto-negotiation: Yes > Speed: 1000Mb/s > Duplex: Full > Port: Twisted Pair > PHYAD: 1 > Transceiver: internal > Auto-negotiation: on > MDI-X: off (auto) > Supports Wake-on: pumbg > Wake-on: g > Current message level: 0x00000007 (7) > drv probe link > Link detected: yes > > ... and here another: > > Settings for eth0: > Cannot get device settings: No such device > Cannot get wake-on-lan settings: No such device > Cannot get message level: No such device > Cannot get link status: No such device > Settings for eth0: > No data available > > > > I'm willing to pepper the source with printk, if this helps :-) > > > Greetings, > Holger Thanks. I'm suspecting we may need to instrument igb_rd32 at this point. In order to trigger what you are seeing I am assuming the device has been detached due to a read failure of some sort. Another thing you could look at doing is narrowing down the possible factors involved. You could go through and limit phy settings and look at possibly dropping features such as EEE if it is enabled on the device. Thanks. - Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-04-25 16:01 ` Alexander Duyck 0 siblings, 0 replies; 23+ messages in thread From: Alexander Duyck @ 2018-04-25 16:01 UTC (permalink / raw) To: intel-wired-lan On Wed, Apr 25, 2018 at 2:47 AM, Holger Schurig <holgerschurig@gmail.com> wrote: > Hi Alex, > > (Sent a 2nd time, this time with "Reply to all" and without HTML, so > that it hits the kernel archives as well. Sorry for the noise. > > > > >> Sounds like the link is failing to re-establish. You might double >> check a few things. One is to verify if the link partner is >> recognizing the link as coming up or not. > > It turns on differently. Before I remove the cable, the LED on the TP > LINK "TL SG-108" was green. After removing the cable, the LED went off. > After reinserting the cable, it became orange after some while. > > Green LED means 1000 MB/s, orange LED means 10/100 MB/s. Was the orange LED on the igb NIC or on the TL SG-108? Based on the comment below I am assuming it is the switch. Based on that I am thinking we probably need to work on the PHY configuration. > I have a different, even older switch: "Allnet ALL8039". Here the same: > the switch detects a link, but igb not. > > > >> If you could also provide an "lspci -vvv" > > 02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network > Connection (rev 03) Okay so we are working with an i210. > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 19 > Region 0: Memory at 90600000 (32-bit, non-prefetchable) [size=512K] > Region 2: I/O ports at d000 [size=32] > Region 3: Memory at 90680000 (32-bit, non-prefetchable) [size=16K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Address: 0000000000000000 Data: 0000 > Masking: 00000000 Pending: 00000000 > Capabilities: [70] MSI-X: Enable+ Count=5 Masked- > Vector table: BAR=3 offset=00000000 > PBA: BAR=3 offset=00002000 > Capabilities: [a0] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s > <512ns, L1 <64us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ > SlotPowerLimit 0.000W > DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ > Unsupported+ > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ > FLReset- > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ > TransPend- > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit > Latency L0s <2us, L1 <16us > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, > OBFF Not Supported > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, > LTR-, OBFF Disabled > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- > SpeedDis- > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, > EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, > LinkEqualizationRequest- > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > NonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ > ChkEn- > Capabilities: [140 v1] Device Serial Number 00-13-95-ff-ff-1a-54-33 > Capabilities: [1a0 v1] Transaction Processing Hints > Device specific mode supported > Steering table in TPH capability structure > Kernel driver in use: igb > Kernel modules: igb > >> and "ethtool -i" for the > > driver: igb > version: 5.4.0-k > firmware-version: 3.20, 0x80000553 > expansion-rom-version: > bus-info: 0000:02:00.0 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: yes > supports-register-dump: yes > supports-priv-flags: yes > > > > One thing that is interesting is how igb reacts to ethtool inquiries > once it goes into the failed state. You inquired for "ethtool -i eth0", > but in the failed state I only get this: > > Cannot restart autonegotiation: No such device I assume you mean "ethtool -r" since that is what is supposed to be restarting negotiation. The "ethtool -i" is what you provided above. The fact that the device disappears is a bit concerning. I'm wondering if we are somehow triggering the surprise removal code. > But eth0 is of course still there, "ip -d link show eth0" shows: > > > 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN > mode DEFAULT group default qlen 1000 > link/ether 00:13:95:1a:54:33 brd ff:ff:ff:ff:ff:ff promiscuity 0 > numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 > > > > > > Other ethtool commands also don't report any information once the link > went bogus. Here one output from "ethtool eth0": > > Settings for eth0: > Supported ports: [ TP ] > Supported link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Supported pause frame use: Symmetric > Supports auto-negotiation: Yes > Advertised link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Advertised pause frame use: Symmetric > Advertised auto-negotiation: Yes > Speed: 1000Mb/s > Duplex: Full > Port: Twisted Pair > PHYAD: 1 > Transceiver: internal > Auto-negotiation: on > MDI-X: off (auto) > Supports Wake-on: pumbg > Wake-on: g > Current message level: 0x00000007 (7) > drv probe link > Link detected: yes > > ... and here another: > > Settings for eth0: > Cannot get device settings: No such device > Cannot get wake-on-lan settings: No such device > Cannot get message level: No such device > Cannot get link status: No such device > Settings for eth0: > No data available > > > > I'm willing to pepper the source with printk, if this helps :-) > > > Greetings, > Holger Thanks. I'm suspecting we may need to instrument igb_rd32 at this point. In order to trigger what you are seeing I am assuming the device has been detached due to a read failure of some sort. Another thing you could look at doing is narrowing down the possible factors involved. You could go through and limit phy settings and look at possibly dropping features such as EEE if it is enabled on the device. Thanks. - Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] igb: reconnecting of cable not always detected 2018-04-25 16:01 ` [Intel-wired-lan] " Alexander Duyck @ 2018-04-26 7:54 ` Holger Schurig -1 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-26 7:54 UTC (permalink / raw) To: Alexander Duyck; +Cc: Jeff Kirsher, intel-wired-lan, LKML > Was the orange LED on the igb NIC or on the TL SG-108? Based on the > comment below I am assuming it is the switch. The LEDs were on the switch. When everything works, the switch says green == 1000 MB/s. When cable is disconnected, switch doesn't light any LED. When cable is inserted and things fail, the switch says orange LED == 100 MB/s. Sometimes the insertion process works, then the switch will go, of course, to the green LED == 1000 MB/s. I must admit that I didn't look at the LEDs of the device. Now I looked there, and the device the left+green LED is on. In the failed case (so, in the dmesg output the last thing I see is "Link is Down", but the device still has left+green LED on. The right+orange LED on the device seems to indicate traffic, and it is constantly off in the failed case. > I assume you mean "ethtool -r" since that is what is supposed to be > restarting negotiation. The "ethtool -i" is what you provided above. Maybe I've edited my text too much and moved output along. Anyway, in the failed case neither "ethtool- r eth0" nor "ethtool -i eth0" nor "mii-tool eth0" work at all, they all emit error warning. > Thanks. I'm suspecting we may need to instrument igb_rd32 at this > point. In order to trigger what you are seeing I am assuming the > device has been detached due to a read failure of some sort. I'll do that and reply later. I first need to understand this source part :-) > Another thing you could look at doing is narrowing down the possible > factors involved. You could go through and limit phy settings and look > at possibly dropping features such as EEE if it is enabled on the > device. I actually tried a driver patch to remove 1000 GB/s from the driver, in the assumption that maybe this specific hardware has a bad layout and thus trouble (I don't really think that, because I never observed any data transfer problem). So, is the following patch (that didn't help) what in the line of what you suggested? Index: linux-4.16/drivers/net/ethernet/intel/igb/igb_main.c =================================================================== --- linux-4.16.orig/drivers/net/ethernet/intel/igb/igb_main.c 2018-04-01 23:20:27.000000000 +0200 +++ linux-4.16/drivers/net/ethernet/intel/igb/igb_main.c 2018-04-24 11:35:17.420760650 +0200 @@ -2080,7 +2080,7 @@ if ((adapter->flags & IGB_FLAG_EEE) && (!hw->dev_spec._82575.eee_disable)) - adapter->eee_advert = MDIO_EEE_100TX | MDIO_EEE_1000T; + adapter->eee_advert = MDIO_EEE_100TX /* | MDIO_EEE_1000T */; return 0; } @@ -2908,7 +2908,7 @@ /* Initialize link properties that are user-changeable */ adapter->fc_autoneg = true; hw->mac.autoneg = true; - hw->phy.autoneg_advertised = 0x2f; + hw->phy.autoneg_advertised = 0x0f; hw->fc.requested_mode = e1000_fc_default; hw->fc.current_mode = e1000_fc_default; @@ -3099,7 +3099,7 @@ if ((!err) && (!hw->dev_spec._82575.eee_disable)) { adapter->eee_advert = - MDIO_EEE_100TX | MDIO_EEE_1000T; + MDIO_EEE_100TX /* | MDIO_EEE_1000T */; adapter->flags |= IGB_FLAG_EEE; } break; @@ -3110,7 +3110,7 @@ if ((!err) && (!hw->dev_spec._82575.eee_disable)) { adapter->eee_advert = - MDIO_EEE_100TX | MDIO_EEE_1000T; + MDIO_EEE_100TX /* | MDIO_EEE_1000T */; adapter->flags |= IGB_FLAG_EEE; } } Index: linux-4.16/drivers/net/ethernet/intel/igb/igb_ethtool.c =================================================================== --- linux-4.16.orig/drivers/net/ethernet/intel/igb/igb_ethtool.c 2018-04-01 23:20:27.000000000 +0200 +++ linux-4.16/drivers/net/ethernet/intel/igb/igb_ethtool.c 2018-04-24 11:42:36.737959749 +0200 @@ -170,7 +170,7 @@ SUPPORTED_10baseT_Full | SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full | - SUPPORTED_1000baseT_Full| + /* SUPPORTED_1000baseT_Full| */ SUPPORTED_Autoneg | SUPPORTED_TP | SUPPORTED_Pause); @@ -3003,7 +3003,7 @@ (hw->phy.media_type != e1000_media_type_copper)) return -EOPNOTSUPP; - edata->supported = (SUPPORTED_1000baseT_Full | + edata->supported = (/* SUPPORTED_1000baseT_Full | */ SUPPORTED_100baseT_Full); if (!hw->dev_spec._82575.eee_disable) edata->advertised = ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-04-26 7:54 ` Holger Schurig 0 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-26 7:54 UTC (permalink / raw) To: intel-wired-lan > Was the orange LED on the igb NIC or on the TL SG-108? Based on the > comment below I am assuming it is the switch. The LEDs were on the switch. When everything works, the switch says green == 1000 MB/s. When cable is disconnected, switch doesn't light any LED. When cable is inserted and things fail, the switch says orange LED == 100 MB/s. Sometimes the insertion process works, then the switch will go, of course, to the green LED == 1000 MB/s. I must admit that I didn't look at the LEDs of the device. Now I looked there, and the device the left+green LED is on. In the failed case (so, in the dmesg output the last thing I see is "Link is Down", but the device still has left+green LED on. The right+orange LED on the device seems to indicate traffic, and it is constantly off in the failed case. > I assume you mean "ethtool -r" since that is what is supposed to be > restarting negotiation. The "ethtool -i" is what you provided above. Maybe I've edited my text too much and moved output along. Anyway, in the failed case neither "ethtool- r eth0" nor "ethtool -i eth0" nor "mii-tool eth0" work at all, they all emit error warning. > Thanks. I'm suspecting we may need to instrument igb_rd32 at this > point. In order to trigger what you are seeing I am assuming the > device has been detached due to a read failure of some sort. I'll do that and reply later. I first need to understand this source part :-) > Another thing you could look at doing is narrowing down the possible > factors involved. You could go through and limit phy settings and look > at possibly dropping features such as EEE if it is enabled on the > device. I actually tried a driver patch to remove 1000 GB/s from the driver, in the assumption that maybe this specific hardware has a bad layout and thus trouble (I don't really think that, because I never observed any data transfer problem). So, is the following patch (that didn't help) what in the line of what you suggested? Index: linux-4.16/drivers/net/ethernet/intel/igb/igb_main.c =================================================================== --- linux-4.16.orig/drivers/net/ethernet/intel/igb/igb_main.c 2018-04-01 23:20:27.000000000 +0200 +++ linux-4.16/drivers/net/ethernet/intel/igb/igb_main.c 2018-04-24 11:35:17.420760650 +0200 @@ -2080,7 +2080,7 @@ if ((adapter->flags & IGB_FLAG_EEE) && (!hw->dev_spec._82575.eee_disable)) - adapter->eee_advert = MDIO_EEE_100TX | MDIO_EEE_1000T; + adapter->eee_advert = MDIO_EEE_100TX /* | MDIO_EEE_1000T */; return 0; } @@ -2908,7 +2908,7 @@ /* Initialize link properties that are user-changeable */ adapter->fc_autoneg = true; hw->mac.autoneg = true; - hw->phy.autoneg_advertised = 0x2f; + hw->phy.autoneg_advertised = 0x0f; hw->fc.requested_mode = e1000_fc_default; hw->fc.current_mode = e1000_fc_default; @@ -3099,7 +3099,7 @@ if ((!err) && (!hw->dev_spec._82575.eee_disable)) { adapter->eee_advert = - MDIO_EEE_100TX | MDIO_EEE_1000T; + MDIO_EEE_100TX /* | MDIO_EEE_1000T */; adapter->flags |= IGB_FLAG_EEE; } break; @@ -3110,7 +3110,7 @@ if ((!err) && (!hw->dev_spec._82575.eee_disable)) { adapter->eee_advert = - MDIO_EEE_100TX | MDIO_EEE_1000T; + MDIO_EEE_100TX /* | MDIO_EEE_1000T */; adapter->flags |= IGB_FLAG_EEE; } } Index: linux-4.16/drivers/net/ethernet/intel/igb/igb_ethtool.c =================================================================== --- linux-4.16.orig/drivers/net/ethernet/intel/igb/igb_ethtool.c 2018-04-01 23:20:27.000000000 +0200 +++ linux-4.16/drivers/net/ethernet/intel/igb/igb_ethtool.c 2018-04-24 11:42:36.737959749 +0200 @@ -170,7 +170,7 @@ SUPPORTED_10baseT_Full | SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full | - SUPPORTED_1000baseT_Full| + /* SUPPORTED_1000baseT_Full| */ SUPPORTED_Autoneg | SUPPORTED_TP | SUPPORTED_Pause); @@ -3003,7 +3003,7 @@ (hw->phy.media_type != e1000_media_type_copper)) return -EOPNOTSUPP; - edata->supported = (SUPPORTED_1000baseT_Full | + edata->supported = (/* SUPPORTED_1000baseT_Full | */ SUPPORTED_100baseT_Full); if (!hw->dev_spec._82575.eee_disable) edata->advertised = ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] igb: reconnecting of cable not always detected 2018-04-25 16:01 ` [Intel-wired-lan] " Alexander Duyck @ 2018-04-26 9:08 ` Holger Schurig -1 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-26 9:08 UTC (permalink / raw) To: Alexander Duyck; +Cc: Jeff Kirsher, intel-wired-lan, LKML Hi, > Thanks. I'm suspecting we may need to instrument igb_rd32 at this > point. In order to trigger what you are seeing I am assuming the > device has been detached due to a read failure of some sort. Okay, I added a printk to igb_rd32. And because no one calls this function directly (all access goes via the rd32/rd32_array macro) I also added the output of the calling function. This should help greatly in identifying the read from the hardware to the consumer. Finally, I noticed that igb_update_stats() produced a lot of churn that most likely are unrelated. So I helper variable to make output from this function go away. I installed this modified driver, rebooted, and removed / inserted the LAN cable until the error was present. As before, "ethtool" and "mii-tool" now said that the device is not there, while "ip link" showed the device as present. The full output of "journalctl -fk | grep igb" is 600 kB. So put the whole file at Google Drive: https://drive.google.com/open?id=1p9cCT2d_EHnSHh29oS3AepUgFTKGFSeA I looked at the output to see patterns, e.g with grep -n igb_get_cfg_done_i210 igb.error.txt grep -n __igb_shutdown igb.error.txt ... (and almost all other function names). I hoped to see patterns. But for my untrained eye, things looked not out of the order. (For reference, here is the debug patch) Index: linux-4.16/drivers/net/ethernet/intel/igb/igb_main.c =================================================================== --- linux-4.16.orig/drivers/net/ethernet/intel/igb/igb_main.c 2018-04-01 23:20:27.000000000 +0200 +++ linux-4.16/drivers/net/ethernet/intel/igb/igb_main.c 2018-04-26 10:36:09.625135952 +0200 @@ -759,7 +759,8 @@ } } -u32 igb_rd32(struct e1000_hw *hw, u32 reg) +int igb_rd32_silent = 0; +u32 igb_rd32(const char *func, struct e1000_hw *hw, u32 reg) { struct igb_adapter *igb = container_of(hw, struct igb_adapter, hw); u8 __iomem *hw_addr = READ_ONCE(hw->hw_addr); @@ -769,6 +770,8 @@ return ~value; value = readl(&hw_addr[reg]); + if (!igb_rd32_silent) + printk("rd32 %s %08x %08x\n", func, reg, value); /* reads should not return all F's */ if (!(~value) && (!reg || !(~readl(hw_addr)))) { @@ -5935,6 +5938,7 @@ if (pci_channel_offline(pdev)) return; + igb_rd32_silent = 1; bytes = 0; packets = 0; @@ -6100,6 +6104,7 @@ adapter->stats.b2ospc += rd32(E1000_B2OSPC); adapter->stats.b2ogprc += rd32(E1000_B2OGPRC); } + igb_rd32_silent = 0; } static void igb_tsync_interrupt(struct igb_adapter *adapter) Index: linux-4.16/drivers/net/ethernet/intel/igb/e1000_regs.h =================================================================== --- linux-4.16.orig/drivers/net/ethernet/intel/igb/e1000_regs.h 2018-04-01 23:20:27.000000000 +0200 +++ linux-4.16/drivers/net/ethernet/intel/igb/e1000_regs.h 2018-04-26 10:34:24.332157000 +0200 @@ -370,7 +370,8 @@ struct e1000_hw; -u32 igb_rd32(struct e1000_hw *hw, u32 reg); +extern int igb_rd32_silent; +u32 igb_rd32(const char *fname, struct e1000_hw *hw, u32 reg); /* write operations, indexed using DWORDS */ #define wr32(reg, val) \ @@ -380,14 +381,14 @@ writel((val), &hw_addr[(reg)]); \ } while (0) -#define rd32(reg) (igb_rd32(hw, reg)) +#define rd32(reg) (igb_rd32(__func__, hw, reg)) #define wrfl() ((void)rd32(E1000_STATUS)) #define array_wr32(reg, offset, value) \ wr32((reg) + ((offset) << 2), (value)) -#define array_rd32(reg, offset) (igb_rd32(hw, reg + ((offset) << 2))) +#define array_rd32(reg, offset) (igb_rd32(__func__, hw, reg + ((offset) << 2))) /* DMA Coalescing registers */ #define E1000_PCIEMISC 0x05BB8 /* PCIE misc config register */ ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-04-26 9:08 ` Holger Schurig 0 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-26 9:08 UTC (permalink / raw) To: intel-wired-lan Hi, > Thanks. I'm suspecting we may need to instrument igb_rd32 at this > point. In order to trigger what you are seeing I am assuming the > device has been detached due to a read failure of some sort. Okay, I added a printk to igb_rd32. And because no one calls this function directly (all access goes via the rd32/rd32_array macro) I also added the output of the calling function. This should help greatly in identifying the read from the hardware to the consumer. Finally, I noticed that igb_update_stats() produced a lot of churn that most likely are unrelated. So I helper variable to make output from this function go away. I installed this modified driver, rebooted, and removed / inserted the LAN cable until the error was present. As before, "ethtool" and "mii-tool" now said that the device is not there, while "ip link" showed the device as present. The full output of "journalctl -fk | grep igb" is 600 kB. So put the whole file at Google Drive: https://drive.google.com/open?id=1p9cCT2d_EHnSHh29oS3AepUgFTKGFSeA I looked at the output to see patterns, e.g with grep -n igb_get_cfg_done_i210 igb.error.txt grep -n __igb_shutdown igb.error.txt ... (and almost all other function names). I hoped to see patterns. But for my untrained eye, things looked not out of the order. (For reference, here is the debug patch) Index: linux-4.16/drivers/net/ethernet/intel/igb/igb_main.c =================================================================== --- linux-4.16.orig/drivers/net/ethernet/intel/igb/igb_main.c 2018-04-01 23:20:27.000000000 +0200 +++ linux-4.16/drivers/net/ethernet/intel/igb/igb_main.c 2018-04-26 10:36:09.625135952 +0200 @@ -759,7 +759,8 @@ } } -u32 igb_rd32(struct e1000_hw *hw, u32 reg) +int igb_rd32_silent = 0; +u32 igb_rd32(const char *func, struct e1000_hw *hw, u32 reg) { struct igb_adapter *igb = container_of(hw, struct igb_adapter, hw); u8 __iomem *hw_addr = READ_ONCE(hw->hw_addr); @@ -769,6 +770,8 @@ return ~value; value = readl(&hw_addr[reg]); + if (!igb_rd32_silent) + printk("rd32 %s %08x %08x\n", func, reg, value); /* reads should not return all F's */ if (!(~value) && (!reg || !(~readl(hw_addr)))) { @@ -5935,6 +5938,7 @@ if (pci_channel_offline(pdev)) return; + igb_rd32_silent = 1; bytes = 0; packets = 0; @@ -6100,6 +6104,7 @@ adapter->stats.b2ospc += rd32(E1000_B2OSPC); adapter->stats.b2ogprc += rd32(E1000_B2OGPRC); } + igb_rd32_silent = 0; } static void igb_tsync_interrupt(struct igb_adapter *adapter) Index: linux-4.16/drivers/net/ethernet/intel/igb/e1000_regs.h =================================================================== --- linux-4.16.orig/drivers/net/ethernet/intel/igb/e1000_regs.h 2018-04-01 23:20:27.000000000 +0200 +++ linux-4.16/drivers/net/ethernet/intel/igb/e1000_regs.h 2018-04-26 10:34:24.332157000 +0200 @@ -370,7 +370,8 @@ struct e1000_hw; -u32 igb_rd32(struct e1000_hw *hw, u32 reg); +extern int igb_rd32_silent; +u32 igb_rd32(const char *fname, struct e1000_hw *hw, u32 reg); /* write operations, indexed using DWORDS */ #define wr32(reg, val) \ @@ -380,14 +381,14 @@ writel((val), &hw_addr[(reg)]); \ } while (0) -#define rd32(reg) (igb_rd32(hw, reg)) +#define rd32(reg) (igb_rd32(__func__, hw, reg)) #define wrfl() ((void)rd32(E1000_STATUS)) #define array_wr32(reg, offset, value) \ wr32((reg) + ((offset) << 2), (value)) -#define array_rd32(reg, offset) (igb_rd32(hw, reg + ((offset) << 2))) +#define array_rd32(reg, offset) (igb_rd32(__func__, hw, reg + ((offset) << 2))) /* DMA Coalescing registers */ #define E1000_PCIEMISC 0x05BB8 /* PCIE misc config register */ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] igb: reconnecting of cable not always detected 2018-04-26 9:08 ` [Intel-wired-lan] " Holger Schurig @ 2018-04-26 16:02 ` Alexander Duyck -1 siblings, 0 replies; 23+ messages in thread From: Alexander Duyck @ 2018-04-26 16:02 UTC (permalink / raw) To: Holger Schurig; +Cc: Jeff Kirsher, intel-wired-lan, LKML On Thu, Apr 26, 2018 at 2:08 AM, Holger Schurig <holgerschurig@gmail.com> wrote: > Hi, > >> Thanks. I'm suspecting we may need to instrument igb_rd32 at this >> point. In order to trigger what you are seeing I am assuming the >> device has been detached due to a read failure of some sort. > > Okay, I added a printk to igb_rd32. And because no one calls this > function directly (all access goes via the rd32/rd32_array macro) I also > added the output of the calling function. This should help greatly in > identifying the read from the hardware to the consumer. > > Finally, I noticed that igb_update_stats() produced a lot of churn that > most likely are unrelated. So I helper variable to make output from this > function go away. > > I installed this modified driver, rebooted, and removed / inserted the > LAN cable until the error was present. > > As before, "ethtool" and "mii-tool" now said that the device is not > there, while "ip link" showed the device as present. > > > The full output of "journalctl -fk | grep igb" is 600 kB. So put the > whole file at Google Drive: > > https://drive.google.com/open?id=1p9cCT2d_EHnSHh29oS3AepUgFTKGFSeA > > > > I looked at the output to see patterns, e.g with > > grep -n igb_get_cfg_done_i210 igb.error.txt > grep -n __igb_shutdown igb.error.txt > ... > > (and almost all other function names). I hoped to see patterns. But for > my untrained eye, things looked not out of the order. Thanks for the data. It is actually useful. There are a few things that I see that seem to point to an obvious issue. The first are the following 2 lines from your dump: Apr 26 10:42:49 kernel: igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Half Duplex, Flow Control: RX Apr 26 10:42:49 kernel: igb 0000:02:00.0: EEE Disabled: unsupported at half duplex. Re-enable using ethtool when at full duplex. In case you aren't aware 1000Mbps Half Duplex is not a valid combination. The other bit that catches my attention is: Apr 26 10:42:51 kernel: igb 0000:02:00.0: exceed max 2 second Which appears to be a timeout error that is triggered in response to the above error which I believe is the fact that it didn't actually link at 1000Mbps. As I get time I will try to look into this further. I will have to go through the MDIC reads to figure out if there is something in there that is providing us with bad information from the PHY or if we are misinterpreting something. Thanks. - Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-04-26 16:02 ` Alexander Duyck 0 siblings, 0 replies; 23+ messages in thread From: Alexander Duyck @ 2018-04-26 16:02 UTC (permalink / raw) To: intel-wired-lan On Thu, Apr 26, 2018 at 2:08 AM, Holger Schurig <holgerschurig@gmail.com> wrote: > Hi, > >> Thanks. I'm suspecting we may need to instrument igb_rd32 at this >> point. In order to trigger what you are seeing I am assuming the >> device has been detached due to a read failure of some sort. > > Okay, I added a printk to igb_rd32. And because no one calls this > function directly (all access goes via the rd32/rd32_array macro) I also > added the output of the calling function. This should help greatly in > identifying the read from the hardware to the consumer. > > Finally, I noticed that igb_update_stats() produced a lot of churn that > most likely are unrelated. So I helper variable to make output from this > function go away. > > I installed this modified driver, rebooted, and removed / inserted the > LAN cable until the error was present. > > As before, "ethtool" and "mii-tool" now said that the device is not > there, while "ip link" showed the device as present. > > > The full output of "journalctl -fk | grep igb" is 600 kB. So put the > whole file at Google Drive: > > https://drive.google.com/open?id=1p9cCT2d_EHnSHh29oS3AepUgFTKGFSeA > > > > I looked at the output to see patterns, e.g with > > grep -n igb_get_cfg_done_i210 igb.error.txt > grep -n __igb_shutdown igb.error.txt > ... > > (and almost all other function names). I hoped to see patterns. But for > my untrained eye, things looked not out of the order. Thanks for the data. It is actually useful. There are a few things that I see that seem to point to an obvious issue. The first are the following 2 lines from your dump: Apr 26 10:42:49 kernel: igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Half Duplex, Flow Control: RX Apr 26 10:42:49 kernel: igb 0000:02:00.0: EEE Disabled: unsupported at half duplex. Re-enable using ethtool when at full duplex. In case you aren't aware 1000Mbps Half Duplex is not a valid combination. The other bit that catches my attention is: Apr 26 10:42:51 kernel: igb 0000:02:00.0: exceed max 2 second Which appears to be a timeout error that is triggered in response to the above error which I believe is the fact that it didn't actually link at 1000Mbps. As I get time I will try to look into this further. I will have to go through the MDIC reads to figure out if there is something in there that is providing us with bad information from the PHY or if we are misinterpreting something. Thanks. - Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] igb: reconnecting of cable not always detected 2018-04-26 16:02 ` [Intel-wired-lan] " Alexander Duyck @ 2018-04-27 10:39 ` Holger Schurig -1 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-27 10:39 UTC (permalink / raw) To: Alexander Duyck; +Cc: Jeff Kirsher, intel-wired-lan, LKML Hi Alex, > The first are the following 2 lines from your dump: > Apr 26 10:42:49 kernel: igb 0000:02:00.0 eth0: igb: eth0 NIC Link is > Up 1000 Mbps Half Duplex, Flow Control: RX > Apr 26 10:42:49 kernel: igb 0000:02:00.0: EEE Disabled: unsupported at > half duplex. Re-enable using ethtool when at full duplex. Can it be the case that this is just a follow-up error? In one of the mails from yesterday I showed you my patch to disable 1000 MB/s ... and still I had the link-always-down. Similarly when I used a 10/100 MB/s switch only. Both scenarios disabled 1000 MB/s, one more strictly than the other :-) ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-04-27 10:39 ` Holger Schurig 0 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-04-27 10:39 UTC (permalink / raw) To: intel-wired-lan Hi Alex, > The first are the following 2 lines from your dump: > Apr 26 10:42:49 kernel: igb 0000:02:00.0 eth0: igb: eth0 NIC Link is > Up 1000 Mbps Half Duplex, Flow Control: RX > Apr 26 10:42:49 kernel: igb 0000:02:00.0: EEE Disabled: unsupported at > half duplex. Re-enable using ethtool when at full duplex. Can it be the case that this is just a follow-up error? In one of the mails from yesterday I showed you my patch to disable 1000 MB/s ... and still I had the link-always-down. Similarly when I used a 10/100 MB/s switch only. Both scenarios disabled 1000 MB/s, one more strictly than the other :-) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] igb: reconnecting of cable not always detected 2018-04-26 16:02 ` [Intel-wired-lan] " Alexander Duyck @ 2018-05-18 7:35 ` Holger Schurig -1 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-05-18 7:35 UTC (permalink / raw) To: Alexander Duyck; +Cc: Jeff Kirsher, intel-wired-lan, LKML Alexander Duyck <alexander.duyck@gmail.com> writes: > Thanks for the data. It is actually useful. There are a few things > that I see that seem to point to an obvious issue. Any news on this? A collegue of mine states (I have not checked this) that a kernel 4.9.0-6-686 from a Debian Live ISO (debian-live-9.4.0-i386-kde.iso) didn't show this behavior, so we have some kind of regression perhaps? Greetings, Holger ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2018-05-18 7:35 ` Holger Schurig 0 siblings, 0 replies; 23+ messages in thread From: Holger Schurig @ 2018-05-18 7:35 UTC (permalink / raw) To: intel-wired-lan Alexander Duyck <alexander.duyck@gmail.com> writes: > Thanks for the data. It is actually useful. There are a few things > that I see that seem to point to an obvious issue. Any news on this? A collegue of mine states (I have not checked this) that a kernel 4.9.0-6-686 from a Debian Live ISO (debian-live-9.4.0-i386-kde.iso) didn't show this behavior, so we have some kind of regression perhaps? Greetings, Holger ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected 2018-05-18 7:35 ` [Intel-wired-lan] " Holger Schurig @ 2019-01-17 21:55 ` Jeff Kirsher -1 siblings, 0 replies; 23+ messages in thread From: Jeff Kirsher @ 2019-01-17 21:55 UTC (permalink / raw) To: Holger Schurig; +Cc: Alexander Duyck, intel-wired-lan, LKML On Fri, May 18, 2018 at 12:36 AM Holger Schurig <holgerschurig@gmail.com> wrote: > > Alexander Duyck <alexander.duyck@gmail.com> writes: > > Thanks for the data. It is actually useful. There are a few things > > that I see that seem to point to an obvious issue. > > Any news on this? > > A collegue of mine states (I have not checked this) that a kernel > 4.9.0-6-686 from a Debian Live ISO (debian-live-9.4.0-i386-kde.iso) > didn't show this behavior, so we have some kind of regression perhaps? Our validation team was only able to reproduce this once, but is not able to reproduce the issue again or even consistently to be able to adequate debug the issue. Are you still seeing the issue with the latest upstream kernel from either David Miller's net-next tree or Linus's tree? -- Cheers, Jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Intel-wired-lan] [BUG] igb: reconnecting of cable not always detected @ 2019-01-17 21:55 ` Jeff Kirsher 0 siblings, 0 replies; 23+ messages in thread From: Jeff Kirsher @ 2019-01-17 21:55 UTC (permalink / raw) To: intel-wired-lan On Fri, May 18, 2018 at 12:36 AM Holger Schurig <holgerschurig@gmail.com> wrote: > > Alexander Duyck <alexander.duyck@gmail.com> writes: > > Thanks for the data. It is actually useful. There are a few things > > that I see that seem to point to an obvious issue. > > Any news on this? > > A collegue of mine states (I have not checked this) that a kernel > 4.9.0-6-686 from a Debian Live ISO (debian-live-9.4.0-i386-kde.iso) > didn't show this behavior, so we have some kind of regression perhaps? Our validation team was only able to reproduce this once, but is not able to reproduce the issue again or even consistently to be able to adequate debug the issue. Are you still seeing the issue with the latest upstream kernel from either David Miller's net-next tree or Linus's tree? -- Cheers, Jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* [BUG] igb: reconnecting of cable not always detected @ 2018-06-09 17:15 Thomas Netousek 0 siblings, 0 replies; 23+ messages in thread From: Thomas Netousek @ 2018-06-09 17:15 UTC (permalink / raw) To: linux-kernel I have a similar problem. If I disconnect and reconnect the ethernet cable on a Intel Ethernet card then the device does not come up again. For me this problem happens on the first pull of the LAN cable all the time. It is reproducible on Supermicro X8, X9 and X10 dual CPU mainboards with onboard networking providing two PHY interfaces using Intel 82576 and I350 chips. It is not reproducible on a Supermicro X10SLL single mainboard with onboard I210 chip providing one PHY for eth0 (tested) and one I217-LM powered by the e1000e driver (not connected, not tested). It is reproducible using kernel 4.9.107 and 4.17.0. It is not reproducible using kernels 4.1.48, 4.4.136. So it might be related to the changes in the igb versions from 5.3.0-k (good) to 5.4.0-k (bad). After pulling and re-plugging the cable, with the bad driver I get: # ip -d link show eth0 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:69:9d:3e brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 # ethtool -i eth0 Cannot get driver information: No such device The last lines in the dmesg output are: [ 13.127730] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 13.747735] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 147.760943] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Down [ 608.211864] igb 0000:01:00.0 eth0: PCIe link lost, device now detached Please note that the "PCIe link lost" message arrives 8 minutes after re-plugging the LAN cable. I hope that information helps pinning down this bug and fixing it. Kind regards Thomas ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2019-01-17 21:55 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-24 15:14 [BUG] igb: reconnecting of cable not always detected Holger Schurig 2018-04-24 15:14 ` [Intel-wired-lan] " Holger Schurig 2018-04-24 18:09 ` Alexander Duyck 2018-04-24 18:09 ` [Intel-wired-lan] " Alexander Duyck 2018-04-25 3:30 ` Richard Cochran 2018-04-25 3:30 ` [Intel-wired-lan] " Richard Cochran 2018-04-25 9:47 ` Holger Schurig 2018-04-25 9:47 ` [Intel-wired-lan] " Holger Schurig 2018-04-25 16:01 ` Alexander Duyck 2018-04-25 16:01 ` [Intel-wired-lan] " Alexander Duyck 2018-04-26 7:54 ` Holger Schurig 2018-04-26 7:54 ` [Intel-wired-lan] " Holger Schurig 2018-04-26 9:08 ` Holger Schurig 2018-04-26 9:08 ` [Intel-wired-lan] " Holger Schurig 2018-04-26 16:02 ` Alexander Duyck 2018-04-26 16:02 ` [Intel-wired-lan] " Alexander Duyck 2018-04-27 10:39 ` Holger Schurig 2018-04-27 10:39 ` [Intel-wired-lan] " Holger Schurig 2018-05-18 7:35 ` Holger Schurig 2018-05-18 7:35 ` [Intel-wired-lan] " Holger Schurig 2019-01-17 21:55 ` Jeff Kirsher 2019-01-17 21:55 ` Jeff Kirsher 2018-06-09 17:15 Thomas Netousek
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.