linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
@ 2012-12-15 15:49 Justin Piszcz
  2012-12-17 17:56 ` Tantilov, Emil S
  2012-12-17 18:38 ` devendra.aaru
  0 siblings, 2 replies; 4+ messages in thread
From: Justin Piszcz @ 2012-12-15 15:49 UTC (permalink / raw)
  To: linux-kernel

Hello,

Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE)
anyway, is this a known issue with 3.6.10?

When the link went down is when I rebooted/etc the remote host attached on
the other end.
I've not changed anything physically with the hardware and have been on
3.6.0-3.6.9 and noticed this when I moved to 3.6.10.

[10270.229200] ixgbe 0000:01:00.0 eth4: NIC Link is Down
[10276.124937] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow
Control: RX/TX
[24529.430997] ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
[24529.430997]   Tx Queue             <10>
[24529.430997]   TDH, TDT             <4e>, <51>
[24529.430997]   next_to_use          <51>
[24529.430997]   next_to_clean        <4e>
[24529.430997] tx_buffer_info[next_to_clean]
[24529.430997]   time_stamp           <10172668f>
[24529.430997]   jiffies              <101726ea4>
[24529.431011] ixgbe 0000:01:00.0 eth4: tx hang 1 detected on queue 10,
resetting adapter
[24529.431028] ixgbe 0000:01:00.0 eth4: Reset adapter

Thoughts?

lspci -vvxx

01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT2 Server
Adapter (rev 01)
  Subsystem: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter
  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
  Latency: 0, Cache Line Size: 64 bytes
  Interrupt: pin A routed to IRQ 26
  Region 0: Memory at fbe40000 (32-bit, non-prefetchable) [size=128K]
  Region 1: Memory at fbe00000 (32-bit, non-prefetchable) [size=256K]
  Region 2: I/O ports at e000 [size=32]
  Region 3: Memory at fbe60000 (32-bit, non-prefetchable) [size=16K]
  Capabilities: [40] Power Management version 3
    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold-)
    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
  Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
    Address: 0000000000000000  Data: 0000
  Capabilities: [60] MSI-X: Enable+ Count=18 Masked-
    Vector table: BAR=3 offset=00000000
    PBA: BAR=3 offset=00002000
  Capabilities: [a0] Express (v2) Endpoint, MSI 00
    DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
      ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
    DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
      RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
      MaxPayload 256 bytes, MaxReadReq 512 bytes
    DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
    LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Latency L0 <4us,
L1 <64us
      ClockPM- Surprise- LLActRep- BwNot-
    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
      ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
    LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
    DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
    DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-
    LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
       Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
       Compliance De-emphasis: -6dB
    LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
EqualizationPhase1-
       EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
  Capabilities: [100 v1] Advanced Error Reporting
    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq+ ACSViol-
    UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
    AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
  Capabilities: [140 v1] Device Serial Number XX-XX-XX-XX-XX-XX-XX-XX
(masked)
  Kernel driver in use: ixgbe
00: 86 80 0b 15 07 04 10 00 01 00 00 02 10 00 00 00
10: 00 00 e4 fb 00 00 e0 fb 01 e0 00 00 00 00 e6 fb
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 2c a1
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00

Justin.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
  2012-12-15 15:49 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang Justin Piszcz
@ 2012-12-17 17:56 ` Tantilov, Emil S
  2012-12-17 18:38 ` devendra.aaru
  1 sibling, 0 replies; 4+ messages in thread
From: Tantilov, Emil S @ 2012-12-17 17:56 UTC (permalink / raw)
  To: Justin Piszcz, linux-kernel

>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>owner@vger.kernel.org] On Behalf Of Justin Piszcz
>Sent: Saturday, December 15, 2012 7:49 AM
>To: linux-kernel@vger.kernel.org
>Subject: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
>
>Hello,
>
>Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE)
>anyway, is this a known issue with 3.6.10?
>
>When the link went down is when I rebooted/etc the remote host attached on
>the other end.
>I've not changed anything physically with the hardware and have been on
>3.6.0-3.6.9 and noticed this when I moved to 3.6.10.
>
>[10270.229200] ixgbe 0000:01:00.0 eth4: NIC Link is Down
>[10276.124937] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow
>Control: RX/TX
>[24529.430997] ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
>[24529.430997]   Tx Queue             <10>
>[24529.430997]   TDH, TDT             <4e>, <51>
>[24529.430997]   next_to_use          <51>
>[24529.430997]   next_to_clean        <4e>
>[24529.430997] tx_buffer_info[next_to_clean]
>[24529.430997]   time_stamp           <10172668f>
>[24529.430997]   jiffies              <101726ea4>
>[24529.431011] ixgbe 0000:01:00.0 eth4: tx hang 1 detected on queue 10,
>resetting adapter
>[24529.431028] ixgbe 0000:01:00.0 eth4: Reset adapter
>
>Thoughts?

I don't believe we have seen Tx hangs in validation. If you could narrow down the conditions that lead to the Tx hang that would help a lot. Also the output of ethtool -S eth4 after the Tx hang occurs can be useful to get an idea of the load on the interface.

Thanks,
Emil



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
  2012-12-15 15:49 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang Justin Piszcz
  2012-12-17 17:56 ` Tantilov, Emil S
@ 2012-12-17 18:38 ` devendra.aaru
  2012-12-17 21:06   ` Justin Piszcz
  1 sibling, 1 reply; 4+ messages in thread
From: devendra.aaru @ 2012-12-17 18:38 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, netdev

Ccing netdev
On Sat, Dec 15, 2012 at 10:49 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Hello,
>
> Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE)
> anyway, is this a known issue with 3.6.10?
>
> When the link went down is when I rebooted/etc the remote host attached on
> the other end.
> I've not changed anything physically with the hardware and have been on
> 3.6.0-3.6.9 and noticed this when I moved to 3.6.10.
>
> [10270.229200] ixgbe 0000:01:00.0 eth4: NIC Link is Down
> [10276.124937] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow
> Control: RX/TX
> [24529.430997] ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
> [24529.430997]   Tx Queue             <10>
> [24529.430997]   TDH, TDT             <4e>, <51>
> [24529.430997]   next_to_use          <51>
> [24529.430997]   next_to_clean        <4e>
> [24529.430997] tx_buffer_info[next_to_clean]
> [24529.430997]   time_stamp           <10172668f>
> [24529.430997]   jiffies              <101726ea4>
> [24529.431011] ixgbe 0000:01:00.0 eth4: tx hang 1 detected on queue 10,
> resetting adapter
> [24529.431028] ixgbe 0000:01:00.0 eth4: Reset adapter
>
> Thoughts?
>
> lspci -vvxx
>
> 01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT2 Server
> Adapter (rev 01)
>   Subsystem: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter
>   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
>   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
>   Latency: 0, Cache Line Size: 64 bytes
>   Interrupt: pin A routed to IRQ 26
>   Region 0: Memory at fbe40000 (32-bit, non-prefetchable) [size=128K]
>   Region 1: Memory at fbe00000 (32-bit, non-prefetchable) [size=256K]
>   Region 2: I/O ports at e000 [size=32]
>   Region 3: Memory at fbe60000 (32-bit, non-prefetchable) [size=16K]
>   Capabilities: [40] Power Management version 3
>     Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold-)
>     Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>   Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>     Address: 0000000000000000  Data: 0000
>   Capabilities: [60] MSI-X: Enable+ Count=18 Masked-
>     Vector table: BAR=3 offset=00000000
>     PBA: BAR=3 offset=00002000
>   Capabilities: [a0] Express (v2) Endpoint, MSI 00
>     DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>       ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>     DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
>       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
>       MaxPayload 256 bytes, MaxReadReq 512 bytes
>     DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>     LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Latency L0 <4us,
> L1 <64us
>       ClockPM- Surprise- LLActRep- BwNot-
>     LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>     LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive-
> BWMgmt- ABWMgmt-
>     DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
>     DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-
>     LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-,
> Selectable De-emphasis: -6dB
>        Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
> ComplianceSOS-
>        Compliance De-emphasis: -6dB
>     LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
> EqualizationPhase1-
>        EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>   Capabilities: [100 v1] Advanced Error Reporting
>     UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq- ACSViol-
>     UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq+ ACSViol-
>     UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
> MalfTLP+ ECRC- UnsupReq- ACSViol-
>     CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>     CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>     AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
>   Capabilities: [140 v1] Device Serial Number XX-XX-XX-XX-XX-XX-XX-XX
> (masked)
>   Kernel driver in use: ixgbe
> 00: 86 80 0b 15 07 04 10 00 01 00 00 02 10 00 00 00
> 10: 00 00 e4 fb 00 00 e0 fb 01 e0 00 00 00 00 e6 fb
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 2c a1
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
>
> Justin.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
  2012-12-17 18:38 ` devendra.aaru
@ 2012-12-17 21:06   ` Justin Piszcz
  0 siblings, 0 replies; 4+ messages in thread
From: Justin Piszcz @ 2012-12-17 21:06 UTC (permalink / raw)
  To: 'devendra.aaru', 'Tantilov, Emil S'; +Cc: linux-kernel, netdev



-----Original Message-----
From: devendra.aaru [mailto:devendra.aaru@gmail.com] 
Sent: Monday, December 17, 2012 1:39 PM
To: Justin Piszcz
Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org
Subject: Re: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang

Ccing netdev
On Sat, Dec 15, 2012 at 10:49 AM, Justin Piszcz <jpiszcz@lucidpixels.com>
wrote:
> Hello,
>
> Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE)
> anyway, is this a known issue with 3.6.10?
>
> When the link went down is when I rebooted/etc the remote host attached on
> the other end.
> I've not changed anything physically with the hardware and have been on
> 3.6.0-3.6.9 and noticed this when I moved to 3.6.10.

--

> I don't believe we have seen Tx hangs in validation. If you could narrow
down the conditions that lead to the Tx hang that would help a lot. Also 
>  the output of ethtool -S eth4 after the Tx hang occurs can be useful to
get an idea of the load on the interface.

> Thanks,
> Emil

--

In this case I only have two servers that mount each other's NFS volumes and
that were idle at the time, I rebooted one of the systems and that is when I
saw this, if I can get something to repeat/pattern and/or the ethtool output
I will update this thread, thank you.

Justin.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-12-17 21:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-15 15:49 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang Justin Piszcz
2012-12-17 17:56 ` Tantilov, Emil S
2012-12-17 18:38 ` devendra.aaru
2012-12-17 21:06   ` Justin Piszcz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).