* 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
@ 2012-12-15 15:49 Justin Piszcz
2012-12-17 17:56 ` Tantilov, Emil S
2012-12-17 18:38 ` devendra.aaru
0 siblings, 2 replies; 4+ messages in thread
From: Justin Piszcz @ 2012-12-15 15:49 UTC (permalink / raw)
To: linux-kernel
Hello,
Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE)
anyway, is this a known issue with 3.6.10?
When the link went down is when I rebooted/etc the remote host attached on
the other end.
I've not changed anything physically with the hardware and have been on
3.6.0-3.6.9 and noticed this when I moved to 3.6.10.
[10270.229200] ixgbe 0000:01:00.0 eth4: NIC Link is Down
[10276.124937] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow
Control: RX/TX
[24529.430997] ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
[24529.430997] Tx Queue <10>
[24529.430997] TDH, TDT <4e>, <51>
[24529.430997] next_to_use <51>
[24529.430997] next_to_clean <4e>
[24529.430997] tx_buffer_info[next_to_clean]
[24529.430997] time_stamp <10172668f>
[24529.430997] jiffies <101726ea4>
[24529.431011] ixgbe 0000:01:00.0 eth4: tx hang 1 detected on queue 10,
resetting adapter
[24529.431028] ixgbe 0000:01:00.0 eth4: Reset adapter
Thoughts?
lspci -vvxx
01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT2 Server
Adapter (rev 01)
Subsystem: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 26
Region 0: Memory at fbe40000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at fbe00000 (32-bit, non-prefetchable) [size=256K]
Region 2: I/O ports at e000 [size=32]
Region 3: Memory at fbe60000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [60] MSI-X: Enable+ Count=18 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Latency L0 <4us,
L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number XX-XX-XX-XX-XX-XX-XX-XX
(masked)
Kernel driver in use: ixgbe
00: 86 80 0b 15 07 04 10 00 01 00 00 02 10 00 00 00
10: 00 00 e4 fb 00 00 e0 fb 01 e0 00 00 00 00 e6 fb
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 2c a1
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
Justin.
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
2012-12-15 15:49 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang Justin Piszcz
@ 2012-12-17 17:56 ` Tantilov, Emil S
2012-12-17 18:38 ` devendra.aaru
1 sibling, 0 replies; 4+ messages in thread
From: Tantilov, Emil S @ 2012-12-17 17:56 UTC (permalink / raw)
To: Justin Piszcz, linux-kernel
>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>owner@vger.kernel.org] On Behalf Of Justin Piszcz
>Sent: Saturday, December 15, 2012 7:49 AM
>To: linux-kernel@vger.kernel.org
>Subject: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
>
>Hello,
>
>Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE)
>anyway, is this a known issue with 3.6.10?
>
>When the link went down is when I rebooted/etc the remote host attached on
>the other end.
>I've not changed anything physically with the hardware and have been on
>3.6.0-3.6.9 and noticed this when I moved to 3.6.10.
>
>[10270.229200] ixgbe 0000:01:00.0 eth4: NIC Link is Down
>[10276.124937] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow
>Control: RX/TX
>[24529.430997] ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
>[24529.430997] Tx Queue <10>
>[24529.430997] TDH, TDT <4e>, <51>
>[24529.430997] next_to_use <51>
>[24529.430997] next_to_clean <4e>
>[24529.430997] tx_buffer_info[next_to_clean]
>[24529.430997] time_stamp <10172668f>
>[24529.430997] jiffies <101726ea4>
>[24529.431011] ixgbe 0000:01:00.0 eth4: tx hang 1 detected on queue 10,
>resetting adapter
>[24529.431028] ixgbe 0000:01:00.0 eth4: Reset adapter
>
>Thoughts?
I don't believe we have seen Tx hangs in validation. If you could narrow down the conditions that lead to the Tx hang that would help a lot. Also the output of ethtool -S eth4 after the Tx hang occurs can be useful to get an idea of the load on the interface.
Thanks,
Emil
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
2012-12-15 15:49 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang Justin Piszcz
2012-12-17 17:56 ` Tantilov, Emil S
@ 2012-12-17 18:38 ` devendra.aaru
2012-12-17 21:06 ` Justin Piszcz
1 sibling, 1 reply; 4+ messages in thread
From: devendra.aaru @ 2012-12-17 18:38 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-kernel, netdev
Ccing netdev
On Sat, Dec 15, 2012 at 10:49 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Hello,
>
> Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE)
> anyway, is this a known issue with 3.6.10?
>
> When the link went down is when I rebooted/etc the remote host attached on
> the other end.
> I've not changed anything physically with the hardware and have been on
> 3.6.0-3.6.9 and noticed this when I moved to 3.6.10.
>
> [10270.229200] ixgbe 0000:01:00.0 eth4: NIC Link is Down
> [10276.124937] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow
> Control: RX/TX
> [24529.430997] ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
> [24529.430997] Tx Queue <10>
> [24529.430997] TDH, TDT <4e>, <51>
> [24529.430997] next_to_use <51>
> [24529.430997] next_to_clean <4e>
> [24529.430997] tx_buffer_info[next_to_clean]
> [24529.430997] time_stamp <10172668f>
> [24529.430997] jiffies <101726ea4>
> [24529.431011] ixgbe 0000:01:00.0 eth4: tx hang 1 detected on queue 10,
> resetting adapter
> [24529.431028] ixgbe 0000:01:00.0 eth4: Reset adapter
>
> Thoughts?
>
> lspci -vvxx
>
> 01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT2 Server
> Adapter (rev 01)
> Subsystem: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 26
> Region 0: Memory at fbe40000 (32-bit, non-prefetchable) [size=128K]
> Region 1: Memory at fbe00000 (32-bit, non-prefetchable) [size=256K]
> Region 2: I/O ports at e000 [size=32]
> Region 3: Memory at fbe60000 (32-bit, non-prefetchable) [size=16K]
> Capabilities: [40] Power Management version 3
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold-)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> Address: 0000000000000000 Data: 0000
> Capabilities: [60] MSI-X: Enable+ Count=18 Masked-
> Vector table: BAR=3 offset=00000000
> PBA: BAR=3 offset=00002000
> Capabilities: [a0] Express (v2) Endpoint, MSI 00
> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 256 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Latency L0 <4us,
> L1 <64us
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive-
> BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
> DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-
> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-,
> Selectable De-emphasis: -6dB
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
> ComplianceSOS-
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
> EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> Capabilities: [100 v1] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq+ ACSViol-
> UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
> MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
> Capabilities: [140 v1] Device Serial Number XX-XX-XX-XX-XX-XX-XX-XX
> (masked)
> Kernel driver in use: ixgbe
> 00: 86 80 0b 15 07 04 10 00 01 00 00 02 10 00 00 00
> 10: 00 00 e4 fb 00 00 e0 fb 01 e0 00 00 00 00 e6 fb
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 2c a1
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
>
> Justin.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
2012-12-17 18:38 ` devendra.aaru
@ 2012-12-17 21:06 ` Justin Piszcz
0 siblings, 0 replies; 4+ messages in thread
From: Justin Piszcz @ 2012-12-17 21:06 UTC (permalink / raw)
To: 'devendra.aaru', 'Tantilov, Emil S'; +Cc: linux-kernel, netdev
-----Original Message-----
From: devendra.aaru [mailto:devendra.aaru@gmail.com]
Sent: Monday, December 17, 2012 1:39 PM
To: Justin Piszcz
Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org
Subject: Re: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
Ccing netdev
On Sat, Dec 15, 2012 at 10:49 AM, Justin Piszcz <jpiszcz@lucidpixels.com>
wrote:
> Hello,
>
> Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE)
> anyway, is this a known issue with 3.6.10?
>
> When the link went down is when I rebooted/etc the remote host attached on
> the other end.
> I've not changed anything physically with the hardware and have been on
> 3.6.0-3.6.9 and noticed this when I moved to 3.6.10.
--
> I don't believe we have seen Tx hangs in validation. If you could narrow
down the conditions that lead to the Tx hang that would help a lot. Also
> the output of ethtool -S eth4 after the Tx hang occurs can be useful to
get an idea of the load on the interface.
> Thanks,
> Emil
--
In this case I only have two servers that mount each other's NFS volumes and
that were idle at the time, I rebooted one of the systems and that is when I
saw this, if I can get something to repeat/pattern and/or the ethtool output
I will update this thread, thank you.
Justin.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-12-17 21:06 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-15 15:49 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang Justin Piszcz
2012-12-17 17:56 ` Tantilov, Emil S
2012-12-17 18:38 ` devendra.aaru
2012-12-17 21:06 ` Justin Piszcz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).