All of lore.kernel.org
 help / color / mirror / Atom feed
* e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
@ 2012-04-19 23:27 Ben Greear
  2012-04-20  2:39 ` Tom Herbert
  2012-04-20  6:46 ` Dave, Tushar N
  0 siblings, 2 replies; 17+ messages in thread
From: Ben Greear @ 2012-04-19 23:27 UTC (permalink / raw)
  To: netdev, e1000-devel list, therbert

Test case:

Run full duplex traffic (900Mbps rx, 400Mbps tx) UDP traffic
(moderate speeds of traffic has issues as well, maybe not as easy to reproduce)
reset peer interface
----> tx queue timeout


Apr 19 16:12:48 localhost kernel: e1000e: eth2 NIC Link is Down
Apr 19 16:12:48 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter
Apr 19 16:12:48 localhost kernel: e1000e: eth3 NIC Link is Down
Apr 19 16:12:50 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
Apr 19 16:12:50 localhost kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth3: link becomes ready
Apr 19 16:12:54 localhost /usr/sbin/irqbalance: Load average increasing, re-enabling all cpus for irq balancing
Apr 19 16:12:55 localhost kernel: ------------[ cut here ]------------
Apr 19 16:12:55 localhost kernel: WARNING: at /home/greearb/git/linux-3.3.dev.y/net/sched/sch_generic.c:256 dev_watchdog+0xf4/0x154()
Apr 19 16:12:55 localhost kernel: Hardware name: X7DBU
Apr 19 16:12:55 localhost kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out
Apr 19 16:12:55 localhost kernel: Modules linked in: xt_CT iptable_raw 8021q garp stp llc veth ppdev parport_pc lp parport fuse macvlan pktgen iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi lockd w83793 w83627hf hwmon_vid coretemp iTCO_wdt microcode iTCO_vendor_support pcspkr i5k_amb ioatdma i2c_i801
i5000_edac dca edac_core e1000e shpchp uinput sunrpc ipv6 autofs4 floppy radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core [last unloaded: nf_nat]
Apr 19 16:12:55 localhost kernel: Pid: 0, comm: kworker/0:1 Not tainted 3.2.0-rc2+ #36
Apr 19 16:12:55 localhost kernel: Call Trace:
Apr 19 16:12:55 localhost kernel: <IRQ>  [<ffffffff81042902>] warn_slowpath_common+0x80/0x98
Apr 19 16:12:55 localhost kernel: [<ffffffff810429ae>] warn_slowpath_fmt+0x41/0x43
Apr 19 16:12:55 localhost kernel: [<ffffffff8139f8a3>] dev_watchdog+0xf4/0x154
Apr 19 16:12:55 localhost kernel: [<ffffffff8104d371>] run_timer_softirq+0x16f/0x201
Apr 19 16:12:55 localhost kernel: [<ffffffff8139f7af>] ? netif_tx_unlock+0x57/0x57
Apr 19 16:12:55 localhost kernel: [<ffffffff81047e47>] __do_softirq+0x86/0x12f
Apr 19 16:12:55 localhost kernel: [<ffffffff8105d54e>] ? hrtimer_interrupt+0x12b/0x1bd
Apr 19 16:12:55 localhost kernel: [<ffffffff8144296c>] call_softirq+0x1c/0x30
Apr 19 16:12:55 localhost kernel: [<ffffffff8100bb75>] do_softirq+0x41/0x7e
Apr 19 16:12:55 localhost kernel: [<ffffffff81047c26>] irq_exit+0x3f/0xbb
Apr 19 16:12:55 localhost kernel: [<ffffffff81021df5>] smp_apic_timer_interrupt+0x85/0x93
Apr 19 16:12:55 localhost kernel: [<ffffffff814411de>] apic_timer_interrupt+0x6e/0x80
Apr 19 16:12:55 localhost kernel: <EOI>  [<ffffffff81010b8c>] ? mwait_idle+0x6e/0x8c
Apr 19 16:12:55 localhost kernel: [<ffffffff81010b7f>] ? mwait_idle+0x61/0x8c
Apr 19 16:12:55 localhost kernel: [<ffffffff81009e72>] cpu_idle+0x67/0xbe
Apr 19 16:12:55 localhost kernel: [<ffffffff81435477>] start_secondary+0x194/0x199
Apr 19 16:12:55 localhost kernel: ---[ end trace e3ca12fc1a8b85da ]---
Apr 19 16:12:55 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter
Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Found oopses: 1
Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Creating dump directories
Apr 19 16:12:57 localhost abrtd: Directory 'oops-2012-04-19-16:12:57-898-0' creation detected
Apr 19 16:12:57 localhost abrt-dump-oops: Reported 1 kernel oopses to Abrt
Apr 19 16:12:57 localhost abrtd: Can't open file '/var/spool/abrt/oops-2012-04-19-16:12:57-898-0/uid': No such file or directory
Apr 19 16:12:57 localhost abrtd: DUP_OF_DIR: /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
Apr 19 16:12:57 localhost abrtd: Dump directory is a duplicate of /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
Apr 19 16:12:57 localhost abrtd: Deleting dump directory oops-2012-04-19-16:12:57-898-0 (dup of oops-2012-04-19-15:02:13-862-0), sending dbus signal
Apr 19 16:12:58 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Apr 19 16:12:58 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
Apr 19 16:13:03 localhost /usr/sbin/irqbalance: Load average increasing, re-enabling all cpus for irq balancing
Apr 19 16:13:04 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter
Apr 19 16:13:05 localhost chronyd[1003]: Selected source 108.59.2.194
Apr 19 16:13:07 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Apr 19 16:13:07 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
....

lspci:

08:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
	Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 74
	Region 0: Memory at d8300000 (32-bit, non-prefetchable) [size=128K]
	Region 2: I/O ports at 3000 [size=32]
	[virtual] Expansion ROM at d8d00000 [disabled] [size=128K]
	Capabilities: [c8] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000feeff00c  Data: 41a3
	Capabilities: [e0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 2.5GT/s, Width x2, ASPM L0s L1, Latency L0 <4us, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		AERCap:	First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Device Serial Number 00-e0-ed-ff-ff-0c-11-6e
	Kernel driver in use: e1000e
	Kernel modules: e1000e


3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c is the first bad commit
commit 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c
Author: Tom Herbert <therbert@google.com>
Date:   Mon Nov 28 16:33:16 2011 +0000

     e1000e: Support for byte queue limits

     Changes to e1000e to use byte queue limits.

     Signed-off-by: Tom Herbert <therbert@google.com>
     Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
     Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 bf3e2ec64fd74253563e1ab39797b27a5f2df3fe 51914e221547b95a989b5c7e9b037c9370fd734e M	drivers


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-19 23:27 e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e) Ben Greear
@ 2012-04-20  2:39 ` Tom Herbert
  2012-04-20  6:44   ` Ying Cai
  2012-04-20 19:00   ` Ben Greear
  2012-04-20  6:46 ` Dave, Tushar N
  1 sibling, 2 replies; 17+ messages in thread
From: Tom Herbert @ 2012-04-20  2:39 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, e1000-devel list

Thanks, will try to reproduce.

Tom

On Thu, Apr 19, 2012 at 4:27 PM, Ben Greear <greearb@candelatech.com> wrote:
> Test case:
>
> Run full duplex traffic (900Mbps rx, 400Mbps tx) UDP traffic
> (moderate speeds of traffic has issues as well, maybe not as easy to
> reproduce)
> reset peer interface
> ----> tx queue timeout
>
>
> Apr 19 16:12:48 localhost kernel: e1000e: eth2 NIC Link is Down
> Apr 19 16:12:48 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter
> Apr 19 16:12:48 localhost kernel: e1000e: eth3 NIC Link is Down
> Apr 19 16:12:50 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: Rx/Tx
> Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
> becomes ready
> Apr 19 16:12:50 localhost kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: Rx/Tx
> Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth3: link
> becomes ready
> Apr 19 16:12:54 localhost /usr/sbin/irqbalance: Load average increasing,
> re-enabling all cpus for irq balancing
> Apr 19 16:12:55 localhost kernel: ------------[ cut here ]------------
> Apr 19 16:12:55 localhost kernel: WARNING: at
> /home/greearb/git/linux-3.3.dev.y/net/sched/sch_generic.c:256
> dev_watchdog+0xf4/0x154()
> Apr 19 16:12:55 localhost kernel: Hardware name: X7DBU
> Apr 19 16:12:55 localhost kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit
> queue 0 timed out
> Apr 19 16:12:55 localhost kernel: Modules linked in: xt_CT iptable_raw 8021q
> garp stp llc veth ppdev parport_pc lp parport fuse macvlan pktgen iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi lockd w83793 w83627hf hwmon_vid
> coretemp iTCO_wdt microcode iTCO_vendor_support pcspkr i5k_amb ioatdma
> i2c_i801
> i5000_edac dca edac_core e1000e shpchp uinput sunrpc ipv6 autofs4 floppy
> radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core [last unloaded:
> nf_nat]
> Apr 19 16:12:55 localhost kernel: Pid: 0, comm: kworker/0:1 Not tainted
> 3.2.0-rc2+ #36
> Apr 19 16:12:55 localhost kernel: Call Trace:
> Apr 19 16:12:55 localhost kernel: <IRQ>  [<ffffffff81042902>]
> warn_slowpath_common+0x80/0x98
> Apr 19 16:12:55 localhost kernel: [<ffffffff810429ae>]
> warn_slowpath_fmt+0x41/0x43
> Apr 19 16:12:55 localhost kernel: [<ffffffff8139f8a3>]
> dev_watchdog+0xf4/0x154
> Apr 19 16:12:55 localhost kernel: [<ffffffff8104d371>]
> run_timer_softirq+0x16f/0x201
> Apr 19 16:12:55 localhost kernel: [<ffffffff8139f7af>] ?
> netif_tx_unlock+0x57/0x57
> Apr 19 16:12:55 localhost kernel: [<ffffffff81047e47>]
> __do_softirq+0x86/0x12f
> Apr 19 16:12:55 localhost kernel: [<ffffffff8105d54e>] ?
> hrtimer_interrupt+0x12b/0x1bd
> Apr 19 16:12:55 localhost kernel: [<ffffffff8144296c>]
> call_softirq+0x1c/0x30
> Apr 19 16:12:55 localhost kernel: [<ffffffff8100bb75>] do_softirq+0x41/0x7e
> Apr 19 16:12:55 localhost kernel: [<ffffffff81047c26>] irq_exit+0x3f/0xbb
> Apr 19 16:12:55 localhost kernel: [<ffffffff81021df5>]
> smp_apic_timer_interrupt+0x85/0x93
> Apr 19 16:12:55 localhost kernel: [<ffffffff814411de>]
> apic_timer_interrupt+0x6e/0x80
> Apr 19 16:12:55 localhost kernel: <EOI>  [<ffffffff81010b8c>] ?
> mwait_idle+0x6e/0x8c
> Apr 19 16:12:55 localhost kernel: [<ffffffff81010b7f>] ?
> mwait_idle+0x61/0x8c
> Apr 19 16:12:55 localhost kernel: [<ffffffff81009e72>] cpu_idle+0x67/0xbe
> Apr 19 16:12:55 localhost kernel: [<ffffffff81435477>]
> start_secondary+0x194/0x199
> Apr 19 16:12:55 localhost kernel: ---[ end trace e3ca12fc1a8b85da ]---
> Apr 19 16:12:55 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter
> Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Found oopses:
> 1
> Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Creating dump
> directories
> Apr 19 16:12:57 localhost abrtd: Directory 'oops-2012-04-19-16:12:57-898-0'
> creation detected
> Apr 19 16:12:57 localhost abrt-dump-oops: Reported 1 kernel oopses to Abrt
> Apr 19 16:12:57 localhost abrtd: Can't open file
> '/var/spool/abrt/oops-2012-04-19-16:12:57-898-0/uid': No such file or
> directory
> Apr 19 16:12:57 localhost abrtd: DUP_OF_DIR:
> /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
> Apr 19 16:12:57 localhost abrtd: Dump directory is a duplicate of
> /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
> Apr 19 16:12:57 localhost abrtd: Deleting dump directory
> oops-2012-04-19-16:12:57-898-0 (dup of oops-2012-04-19-15:02:13-862-0),
> sending dbus signal
> Apr 19 16:12:58 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: Rx/Tx
> Apr 19 16:12:58 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
> becomes ready
> Apr 19 16:13:03 localhost /usr/sbin/irqbalance: Load average increasing,
> re-enabling all cpus for irq balancing
> Apr 19 16:13:04 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter
> Apr 19 16:13:05 localhost chronyd[1003]: Selected source 108.59.2.194
> Apr 19 16:13:07 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: Rx/Tx
> Apr 19 16:13:07 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
> becomes ready
> ....
>
> lspci:
>
> 08:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
> Controller (rev 06)
>        Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B- DisINTx+
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0, Cache Line Size: 32 bytes
>        Interrupt: pin A routed to IRQ 74
>        Region 0: Memory at d8300000 (32-bit, non-prefetchable) [size=128K]
>        Region 2: I/O ports at 3000 [size=32]
>        [virtual] Expansion ROM at d8d00000 [disabled] [size=128K]
>        Capabilities: [c8] Power Management version 2
>                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold-)
>                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>                Address: 00000000feeff00c  Data: 41a3
>        Capabilities: [e0] Express (v1) Endpoint, MSI 00
>                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
> <512ns, L1 <64us
>                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
> Unsupported+
>                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
>                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr-
> TransPend-
>                LnkCap: Port #1, Speed 2.5GT/s, Width x2, ASPM L0s L1,
> Latency L0 <4us, L1 <64us
>                        ClockPM- Surprise- LLActRep- BwNot-
>                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
> CommClk-
>                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                LnkSta: Speed 2.5GT/s, Width x2, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
>        Capabilities: [100 v1] Advanced Error Reporting
>                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> NonFatalErr-
>                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> NonFatalErr-
>                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap-
> ChkEn-
>        Capabilities: [140 v1] Device Serial Number 00-e0-ed-ff-ff-0c-11-6e
>        Kernel driver in use: e1000e
>        Kernel modules: e1000e
>
>
> 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c is the first bad commit
> commit 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c
> Author: Tom Herbert <therbert@google.com>
> Date:   Mon Nov 28 16:33:16 2011 +0000
>
>    e1000e: Support for byte queue limits
>
>    Changes to e1000e to use byte queue limits.
>
>    Signed-off-by: Tom Herbert <therbert@google.com>
>    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>    Signed-off-by: David S. Miller <davem@davemloft.net>
>
> :040000 040000 bf3e2ec64fd74253563e1ab39797b27a5f2df3fe
> 51914e221547b95a989b5c7e9b037c9370fd734e M      drivers
>
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-20  2:39 ` Tom Herbert
@ 2012-04-20  6:44   ` Ying Cai
  2012-04-20 19:00   ` Ben Greear
  1 sibling, 0 replies; 17+ messages in thread
From: Ying Cai @ 2012-04-20  6:44 UTC (permalink / raw)
  To: Tom Herbert; +Cc: e1000-devel list, netdev


[-- Attachment #1.1: Type: text/plain, Size: 12102 bytes --]

No, we have seen it in our test. I just checked all our Ilium machines'
/var/log/messages back to earlier to month. We got numerous "tg3 transmit
timed out" on Ilium GLAG machines only, no e1000e transmit queue timed out.

The "tg3 transmit queue timed out" does not seem to be caused by BQL, since
I can see it happen even in 420 kernel!

Apr 13 18:36:08 lpb77 kernel: [   51.684212] ------------[ cut here
]------------
Apr 13 18:36:08 lpb77 kernel: [   51.684223] WARNING: at
net/sched/sch_generic.c:263 dev_watchdog+0x1c8/0x1f0()
Apr 13 18:36:08 lpb77 kernel: [   51.684226] Hardware name: Greencreek,ESB2
Apr 13 18:36:08 lpb77 kernel: [   51.684228] NETDEV WATCHDOG: eth4 (tg3):
transmit queue 0 timed out
Apr 13 18:36:08 lpb77 kernel: [   51.684230] Modules linked in: bonding
sata_mv i2c_imch pca954x i2c_virtual uhci_hcd msr cpuid i2c_dev i2c_i801
i2c_core i2c_debug tg3 e1000e ipv6 genrtc
Apr 13 18:36:08 lpb77 kernel: [   51.684246] Pid: 0, comm: swapper Not
tainted 2.6.34-smp-420.16 #1
Apr 13 18:36:08 lpb77 kernel: [   51.684248] Call Trace:
Apr 13 18:36:08 lpb77 kernel: [   51.684251]  <IRQ>  [<ffffffff811dc939>]
warn_slowpath_common+0x7c/0xce
Apr 13 18:36:08 lpb77 kernel: [   51.684261]  [<ffffffff811dc9e2>]
warn_slowpath_fmt+0x41/0x43
Apr 13 18:36:08 lpb77 kernel: [   51.684265]  [<ffffffff811494d8>]
dev_watchdog+0x1c8/0x1f0
Apr 13 18:36:08 lpb77 kernel: [   51.684270]  [<ffffffff811f14a5>] ?
sched_clock_local+0x1c/0x82
Apr 13 18:36:08 lpb77 kernel: [   51.684275]  [<ffffffff8101826c>]
run_timer_softirq+0x34c/0x360
Apr 13 18:36:08 lpb77 kernel: [   51.684278]  [<ffffffff81149310>] ?
dev_watchdog+0x0/0x1f0
Apr 13 18:36:08 lpb77 kernel: [   51.684282]  [<ffffffff8101780e>]
__do_softirq+0x39e/0x490
Apr 13 18:36:08 lpb77 kernel: [   51.684316]  [<ffffffff81020360>] ?
tick_program_event+0x60/0x120
Apr 13 18:36:08 lpb77 kernel: [   51.684319]  [<ffffffff811f14a5>] ?
sched_clock_local+0x1c/0x82
Apr 13 18:36:08 lpb77 kernel: [   51.684323]  [<ffffffff8107770c>]
call_softirq+0x1c/0x30
Apr 13 18:36:08 lpb77 kernel: [   51.684327]  [<ffffffff810009a2>]
do_softirq+0x42/0x80
Apr 13 18:36:08 lpb77 kernel: [   51.684329]  [<ffffffff81017159>]
irq_exit+0x49/0xa0
Apr 13 18:36:08 lpb77 kernel: [   51.684334]  [<ffffffff814ac184>]
smp_apic_timer_interrupt+0x84/0xf4
Apr 13 18:36:08 lpb77 kernel: [   51.684338]  [<ffffffff810771d3>]
apic_timer_interrupt+0x13/0x20
Apr 13 18:36:08 lpb77 kernel: [   51.684339]  <EOI>  [<ffffffff811b5709>] ?
mwait_idle+0x7a/0x81
Apr 13 18:36:08 lpb77 kernel: [   51.684355]  [<ffffffff811b56bb>] ?
mwait_idle+0x2c/0x81
Apr 13 18:36:08 lpb77 kernel: [   51.684358]  [<ffffffff810005c2>]
cpu_idle+0x92/0x140
Apr 13 18:36:08 lpb77 kernel: [   51.684363]  [<ffffffff81b80a53>]
start_secondary+0x1c3/0x1c7
Apr 13 18:36:08 lpb77 kernel: [   51.684373] ---[ end trace
fc8464e86676a1b7 ]---


On Thu, Apr 19, 2012 at 7:39 PM, Tom Herbert <therbert@google.com> wrote:

> Thanks, will try to reproduce.
>
> Tom
>
> On Thu, Apr 19, 2012 at 4:27 PM, Ben Greear <greearb@candelatech.com>
> wrote:
> > Test case:
> >
> > Run full duplex traffic (900Mbps rx, 400Mbps tx) UDP traffic
> > (moderate speeds of traffic has issues as well, maybe not as easy to
> > reproduce)
> > reset peer interface
> > ----> tx queue timeout
> >
> >
> > Apr 19 16:12:48 localhost kernel: e1000e: eth2 NIC Link is Down
> > Apr 19 16:12:48 localhost kernel: e1000e 0000:08:00.0: eth2: Reset
> adapter
> > Apr 19 16:12:48 localhost kernel: e1000e: eth3 NIC Link is Down
> > Apr 19 16:12:50 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps
> Full
> > Duplex, Flow Control: Rx/Tx
> > Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
> > becomes ready
> > Apr 19 16:12:50 localhost kernel: e1000e: eth3 NIC Link is Up 1000 Mbps
> Full
> > Duplex, Flow Control: Rx/Tx
> > Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth3: link
> > becomes ready
> > Apr 19 16:12:54 localhost /usr/sbin/irqbalance: Load average increasing,
> > re-enabling all cpus for irq balancing
> > Apr 19 16:12:55 localhost kernel: ------------[ cut here ]------------
> > Apr 19 16:12:55 localhost kernel: WARNING: at
> > /home/greearb/git/linux-3.3.dev.y/net/sched/sch_generic.c:256
> > dev_watchdog+0xf4/0x154()
> > Apr 19 16:12:55 localhost kernel: Hardware name: X7DBU
> > Apr 19 16:12:55 localhost kernel: NETDEV WATCHDOG: eth2 (e1000e):
> transmit
> > queue 0 timed out
> > Apr 19 16:12:55 localhost kernel: Modules linked in: xt_CT iptable_raw
> 8021q
> > garp stp llc veth ppdev parport_pc lp parport fuse macvlan pktgen
> iscsi_tcp
> > libiscsi_tcp libiscsi scsi_transport_iscsi lockd w83793 w83627hf
> hwmon_vid
> > coretemp iTCO_wdt microcode iTCO_vendor_support pcspkr i5k_amb ioatdma
> > i2c_i801
> > i5000_edac dca edac_core e1000e shpchp uinput sunrpc ipv6 autofs4 floppy
> > radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core [last unloaded:
> > nf_nat]
> > Apr 19 16:12:55 localhost kernel: Pid: 0, comm: kworker/0:1 Not tainted
> > 3.2.0-rc2+ #36
> > Apr 19 16:12:55 localhost kernel: Call Trace:
> > Apr 19 16:12:55 localhost kernel: <IRQ>  [<ffffffff81042902>]
> > warn_slowpath_common+0x80/0x98
> > Apr 19 16:12:55 localhost kernel: [<ffffffff810429ae>]
> > warn_slowpath_fmt+0x41/0x43
> > Apr 19 16:12:55 localhost kernel: [<ffffffff8139f8a3>]
> > dev_watchdog+0xf4/0x154
> > Apr 19 16:12:55 localhost kernel: [<ffffffff8104d371>]
> > run_timer_softirq+0x16f/0x201
> > Apr 19 16:12:55 localhost kernel: [<ffffffff8139f7af>] ?
> > netif_tx_unlock+0x57/0x57
> > Apr 19 16:12:55 localhost kernel: [<ffffffff81047e47>]
> > __do_softirq+0x86/0x12f
> > Apr 19 16:12:55 localhost kernel: [<ffffffff8105d54e>] ?
> > hrtimer_interrupt+0x12b/0x1bd
> > Apr 19 16:12:55 localhost kernel: [<ffffffff8144296c>]
> > call_softirq+0x1c/0x30
> > Apr 19 16:12:55 localhost kernel: [<ffffffff8100bb75>]
> do_softirq+0x41/0x7e
> > Apr 19 16:12:55 localhost kernel: [<ffffffff81047c26>] irq_exit+0x3f/0xbb
> > Apr 19 16:12:55 localhost kernel: [<ffffffff81021df5>]
> > smp_apic_timer_interrupt+0x85/0x93
> > Apr 19 16:12:55 localhost kernel: [<ffffffff814411de>]
> > apic_timer_interrupt+0x6e/0x80
> > Apr 19 16:12:55 localhost kernel: <EOI>  [<ffffffff81010b8c>] ?
> > mwait_idle+0x6e/0x8c
> > Apr 19 16:12:55 localhost kernel: [<ffffffff81010b7f>] ?
> > mwait_idle+0x61/0x8c
> > Apr 19 16:12:55 localhost kernel: [<ffffffff81009e72>] cpu_idle+0x67/0xbe
> > Apr 19 16:12:55 localhost kernel: [<ffffffff81435477>]
> > start_secondary+0x194/0x199
> > Apr 19 16:12:55 localhost kernel: ---[ end trace e3ca12fc1a8b85da ]---
> > Apr 19 16:12:55 localhost kernel: e1000e 0000:08:00.0: eth2: Reset
> adapter
> > Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Found
> oopses:
> > 1
> > Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Creating
> dump
> > directories
> > Apr 19 16:12:57 localhost abrtd: Directory
> 'oops-2012-04-19-16:12:57-898-0'
> > creation detected
> > Apr 19 16:12:57 localhost abrt-dump-oops: Reported 1 kernel oopses to
> Abrt
> > Apr 19 16:12:57 localhost abrtd: Can't open file
> > '/var/spool/abrt/oops-2012-04-19-16:12:57-898-0/uid': No such file or
> > directory
> > Apr 19 16:12:57 localhost abrtd: DUP_OF_DIR:
> > /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
> > Apr 19 16:12:57 localhost abrtd: Dump directory is a duplicate of
> > /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
> > Apr 19 16:12:57 localhost abrtd: Deleting dump directory
> > oops-2012-04-19-16:12:57-898-0 (dup of oops-2012-04-19-15:02:13-862-0),
> > sending dbus signal
> > Apr 19 16:12:58 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps
> Full
> > Duplex, Flow Control: Rx/Tx
> > Apr 19 16:12:58 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
> > becomes ready
> > Apr 19 16:13:03 localhost /usr/sbin/irqbalance: Load average increasing,
> > re-enabling all cpus for irq balancing
> > Apr 19 16:13:04 localhost kernel: e1000e 0000:08:00.0: eth2: Reset
> adapter
> > Apr 19 16:13:05 localhost chronyd[1003]: Selected source 108.59.2.194
> > Apr 19 16:13:07 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps
> Full
> > Duplex, Flow Control: Rx/Tx
> > Apr 19 16:13:07 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
> > becomes ready
> > ....
> >
> > lspci:
> >
> > 08:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
> > Controller (rev 06)
> >        Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
> >        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr+
> > Stepping- SERR+ FastB2B- DisINTx+
> >        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >        Latency: 0, Cache Line Size: 32 bytes
> >        Interrupt: pin A routed to IRQ 74
> >        Region 0: Memory at d8300000 (32-bit, non-prefetchable)
> [size=128K]
> >        Region 2: I/O ports at 3000 [size=32]
> >        [virtual] Expansion ROM at d8d00000 [disabled] [size=128K]
> >        Capabilities: [c8] Power Management version 2
> >                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold-)
> >                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
> >        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> >                Address: 00000000feeff00c  Data: 41a3
> >        Capabilities: [e0] Express (v1) Endpoint, MSI 00
> >                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
> > <512ns, L1 <64us
> >                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
> >                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
> > Unsupported+
> >                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> >                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr-
> > TransPend-
> >                LnkCap: Port #1, Speed 2.5GT/s, Width x2, ASPM L0s L1,
> > Latency L0 <4us, L1 <64us
> >                        ClockPM- Surprise- LLActRep- BwNot-
> >                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
> > CommClk-
> >                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                LnkSta: Speed 2.5GT/s, Width x2, TrErr- Train- SlotClk+
> > DLActive- BWMgmt- ABWMgmt-
> >        Capabilities: [100 v1] Advanced Error Reporting
> >                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
> >                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> > NonFatalErr-
> >                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> > NonFatalErr-
> >                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap-
> > ChkEn-
> >        Capabilities: [140 v1] Device Serial Number
> 00-e0-ed-ff-ff-0c-11-6e
> >        Kernel driver in use: e1000e
> >        Kernel modules: e1000e
> >
> >
> > 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c is the first bad commit
> > commit 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c
> > Author: Tom Herbert <therbert@google.com>
> > Date:   Mon Nov 28 16:33:16 2011 +0000
> >
> >    e1000e: Support for byte queue limits
> >
> >    Changes to e1000e to use byte queue limits.
> >
> >    Signed-off-by: Tom Herbert <therbert@google.com>
> >    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> >    Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > :040000 040000 bf3e2ec64fd74253563e1ab39797b27a5f2df3fe
> > 51914e221547b95a989b5c7e9b037c9370fd734e M      drivers
> >
> >
> > Thanks,
> > Ben
> >
> > --
> > Ben Greear <greearb@candelatech.com>
> > Candela Technologies Inc  http://www.candelatech.com
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[-- Attachment #2: Type: text/plain, Size: 271 bytes --]

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-19 23:27 e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e) Ben Greear
  2012-04-20  2:39 ` Tom Herbert
@ 2012-04-20  6:46 ` Dave, Tushar N
  1 sibling, 0 replies; 17+ messages in thread
From: Dave, Tushar N @ 2012-04-20  6:46 UTC (permalink / raw)
  To: Ben Greear, netdev, e1000-devel list, therbert

I had done some work on this and to me it looks like this can only happen if driver does not report bytes_compl and pkts_compl stats correctly.

I will experiment more tomorrow.

-Tushar

>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>On Behalf Of Ben Greear
>Sent: Thursday, April 19, 2012 4:27 PM
>To: netdev; e1000-devel list; therbert@google.com
>Subject: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for
>e1000e)
>
>Test case:
>
>Run full duplex traffic (900Mbps rx, 400Mbps tx) UDP traffic (moderate
>speeds of traffic has issues as well, maybe not as easy to reproduce)
>reset peer interface
>----> tx queue timeout
>
>
>Apr 19 16:12:48 localhost kernel: e1000e: eth2 NIC Link is Down Apr 19
>16:12:48 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter Apr 19
>16:12:48 localhost kernel: e1000e: eth3 NIC Link is Down Apr 19 16:12:50
>localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow
>Control: Rx/Tx Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE):
>eth2: link becomes ready Apr 19 16:12:50 localhost kernel: e1000e: eth3
>NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Apr 19 16:12:50
>localhost kernel: ADDRCONF(NETDEV_CHANGE): eth3: link becomes ready Apr 19
>16:12:54 localhost /usr/sbin/irqbalance: Load average increasing, re-
>enabling all cpus for irq balancing Apr 19 16:12:55 localhost kernel: ----
>--------[ cut here ]------------ Apr 19 16:12:55 localhost kernel:
>WARNING: at /home/greearb/git/linux-3.3.dev.y/net/sched/sch_generic.c:256
>dev_watchdog+0xf4/0x154() Apr 19 16:12:55 localhost kernel: Hardware name:
>X7DBU Apr 19 16:12:55 localhost kernel: NETDEV WATCHDOG: eth2 (e1000e):
>transmit queue 0 timed out Apr 19 16:12:55 localhost kernel: Modules
>linked in: xt_CT iptable_raw 8021q garp stp llc veth ppdev parport_pc lp
>parport fuse macvlan pktgen iscsi_tcp libiscsi_tcp libiscsi
>scsi_transport_iscsi lockd w83793 w83627hf hwmon_vid coretemp iTCO_wdt
>microcode iTCO_vendor_support pcspkr i5k_amb ioatdma i2c_i801 i5000_edac
>dca edac_core e1000e shpchp uinput sunrpc ipv6 autofs4 floppy radeon ttm
>drm_kms_helper drm hwmon i2c_algo_bit i2c_core [last unloaded: nf_nat] Apr
>19 16:12:55 localhost kernel: Pid: 0, comm: kworker/0:1 Not tainted 3.2.0-
>rc2+ #36 Apr 19 16:12:55 localhost kernel: Call Trace:
>Apr 19 16:12:55 localhost kernel: <IRQ>  [<ffffffff81042902>]
>warn_slowpath_common+0x80/0x98 Apr 19 16:12:55 localhost kernel:
>[<ffffffff810429ae>] warn_slowpath_fmt+0x41/0x43 Apr 19 16:12:55 localhost
>kernel: [<ffffffff8139f8a3>] dev_watchdog+0xf4/0x154 Apr 19 16:12:55
>localhost kernel: [<ffffffff8104d371>] run_timer_softirq+0x16f/0x201 Apr
>19 16:12:55 localhost kernel: [<ffffffff8139f7af>] ?
>netif_tx_unlock+0x57/0x57 Apr 19 16:12:55 localhost kernel:
>[<ffffffff81047e47>] __do_softirq+0x86/0x12f Apr 19 16:12:55 localhost
>kernel: [<ffffffff8105d54e>] ? hrtimer_interrupt+0x12b/0x1bd Apr 19
>16:12:55 localhost kernel: [<ffffffff8144296c>] call_softirq+0x1c/0x30 Apr
>19 16:12:55 localhost kernel: [<ffffffff8100bb75>] do_softirq+0x41/0x7e
>Apr 19 16:12:55 localhost kernel: [<ffffffff81047c26>] irq_exit+0x3f/0xbb
>Apr 19 16:12:55 localhost kernel: [<ffffffff81021df5>]
>smp_apic_timer_interrupt+0x85/0x93
>Apr 19 16:12:55 localhost kernel: [<ffffffff814411de>]
>apic_timer_interrupt+0x6e/0x80 Apr 19 16:12:55 localhost kernel: <EOI>
>[<ffffffff81010b8c>] ? mwait_idle+0x6e/0x8c Apr 19 16:12:55 localhost
>kernel: [<ffffffff81010b7f>] ? mwait_idle+0x61/0x8c Apr 19 16:12:55
>localhost kernel: [<ffffffff81009e72>] cpu_idle+0x67/0xbe Apr 19 16:12:55
>localhost kernel: [<ffffffff81435477>] start_secondary+0x194/0x199 Apr 19
>16:12:55 localhost kernel: ---[ end trace e3ca12fc1a8b85da ]--- Apr 19
>16:12:55 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter Apr 19
>16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Found oopses: 1
>Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Creating
>dump directories Apr 19 16:12:57 localhost abrtd: Directory 'oops-2012-04-
>19-16:12:57-898-0' creation detected Apr 19 16:12:57 localhost abrt-dump-
>oops: Reported 1 kernel oopses to Abrt Apr 19 16:12:57 localhost abrtd:
>Can't open file '/var/spool/abrt/oops-2012-04-19-16:12:57-898-0/uid': No
>such file or directory Apr 19 16:12:57 localhost abrtd: DUP_OF_DIR:
>/var/spool/abrt/oops-2012-04-19-15:02:13-862-0
>Apr 19 16:12:57 localhost abrtd: Dump directory is a duplicate of
>/var/spool/abrt/oops-2012-04-19-15:02:13-862-0
>Apr 19 16:12:57 localhost abrtd: Deleting dump directory oops-2012-04-19-
>16:12:57-898-0 (dup of oops-2012-04-19-15:02:13-862-0), sending dbus
>signal Apr 19 16:12:58 localhost kernel: e1000e: eth2 NIC Link is Up 1000
>Mbps Full Duplex, Flow Control: Rx/Tx Apr 19 16:12:58 localhost kernel:
>ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready Apr 19 16:13:03
>localhost /usr/sbin/irqbalance: Load average increasing, re-enabling all
>cpus for irq balancing Apr 19 16:13:04 localhost kernel: e1000e
>0000:08:00.0: eth2: Reset adapter Apr 19 16:13:05 localhost chronyd[1003]:
>Selected source 108.59.2.194 Apr 19 16:13:07 localhost kernel: e1000e:
>eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Apr 19
>16:13:07 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link becomes
>ready ....
>
>lspci:
>
>08:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
>Controller (rev 06)
>	Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
>	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
>Stepping- SERR+ FastB2B- DisINTx+
>	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
><TAbort- <MAbort- >SERR- <PERR- INTx-
>	Latency: 0, Cache Line Size: 32 bytes
>	Interrupt: pin A routed to IRQ 74
>	Region 0: Memory at d8300000 (32-bit, non-prefetchable) [size=128K]
>	Region 2: I/O ports at 3000 [size=32]
>	[virtual] Expansion ROM at d8d00000 [disabled] [size=128K]
>	Capabilities: [c8] Power Management version 2
>		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-
>,D3hot+,D3cold-)
>		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>		Address: 00000000feeff00c  Data: 41a3
>	Capabilities: [e0] Express (v1) Endpoint, MSI 00
>		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s
><512ns, L1 <64us
>			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+
>Unsupported+
>			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>			MaxPayload 128 bytes, MaxReadReq 4096 bytes
>		DevSta:	CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr-
>TransPend-
>		LnkCap:	Port #1, Speed 2.5GT/s, Width x2, ASPM L0s L1,
>Latency L0 <4us, L1 <64us
>			ClockPM- Surprise- LLActRep- BwNot-
>		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain-
>CommClk-
>			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>		LnkSta:	Speed 2.5GT/s, Width x2, TrErr- Train- SlotClk+
>DLActive- BWMgmt- ABWMgmt-
>	Capabilities: [100 v1] Advanced Error Reporting
>		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
>UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>		AERCap:	First Error Pointer: 14, GenCap- CGenEn- ChkCap-
>ChkEn-
>	Capabilities: [140 v1] Device Serial Number 00-e0-ed-ff-ff-0c-11-6e
>	Kernel driver in use: e1000e
>	Kernel modules: e1000e
>
>
>3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c is the first bad commit commit
>3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c
>Author: Tom Herbert <therbert@google.com>
>Date:   Mon Nov 28 16:33:16 2011 +0000
>
>     e1000e: Support for byte queue limits
>
>     Changes to e1000e to use byte queue limits.
>
>     Signed-off-by: Tom Herbert <therbert@google.com>
>     Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
>:040000 040000 bf3e2ec64fd74253563e1ab39797b27a5f2df3fe
>51914e221547b95a989b5c7e9b037c9370fd734e M	drivers
>
>
>Thanks,
>Ben
>
>--
>Ben Greear <greearb@candelatech.com>
>Candela Technologies Inc  http://www.candelatech.com
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in the
>body of a message to majordomo@vger.kernel.org More majordomo info at
>http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-20  2:39 ` Tom Herbert
  2012-04-20  6:44   ` Ying Cai
@ 2012-04-20 19:00   ` Ben Greear
  2012-04-20 19:05     ` Tom Herbert
  1 sibling, 1 reply; 17+ messages in thread
From: Ben Greear @ 2012-04-20 19:00 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev, e1000-devel list, Eric Dumazet

On 04/19/2012 07:39 PM, Tom Herbert wrote:
> Thanks, will try to reproduce.

I am seeing something similar with the 'igb' driver, though this
NIC also involves a side-driver that does bypass.  When I enable/disable
bypass, the links bounce (as expected), and igb reports the same
timeout that I was seeing with e1000e.

If I revert the igb BQL patch, then it works fine (kernel 3.3.2+).

I did not do an exhaustive bisect, nor have tested non-bypass
igb hardware, but since reverting the patch fixes the problem....

Maybe there is some fundamental issue with BQL when a NIC
resets itself (in this case, due to remote port doing
a reset)  Maybe we are not properly accounting pkts cleared
from the xmit queue on reset or something of that nature?

I'm happy to test patches of someone has suggestions...

Thanks,
Ben

>
> Tom
>
> On Thu, Apr 19, 2012 at 4:27 PM, Ben Greear<greearb@candelatech.com>  wrote:
>> Test case:
>>
>> Run full duplex traffic (900Mbps rx, 400Mbps tx) UDP traffic
>> (moderate speeds of traffic has issues as well, maybe not as easy to
>> reproduce)
>> reset peer interface
>> ---->  tx queue timeout
>>
>>
>> Apr 19 16:12:48 localhost kernel: e1000e: eth2 NIC Link is Down
>> Apr 19 16:12:48 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter
>> Apr 19 16:12:48 localhost kernel: e1000e: eth3 NIC Link is Down
>> Apr 19 16:12:50 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full
>> Duplex, Flow Control: Rx/Tx
>> Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
>> becomes ready
>> Apr 19 16:12:50 localhost kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full
>> Duplex, Flow Control: Rx/Tx
>> Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth3: link
>> becomes ready
>> Apr 19 16:12:54 localhost /usr/sbin/irqbalance: Load average increasing,
>> re-enabling all cpus for irq balancing
>> Apr 19 16:12:55 localhost kernel: ------------[ cut here ]------------
>> Apr 19 16:12:55 localhost kernel: WARNING: at
>> /home/greearb/git/linux-3.3.dev.y/net/sched/sch_generic.c:256
>> dev_watchdog+0xf4/0x154()
>> Apr 19 16:12:55 localhost kernel: Hardware name: X7DBU
>> Apr 19 16:12:55 localhost kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit
>> queue 0 timed out
>> Apr 19 16:12:55 localhost kernel: Modules linked in: xt_CT iptable_raw 8021q
>> garp stp llc veth ppdev parport_pc lp parport fuse macvlan pktgen iscsi_tcp
>> libiscsi_tcp libiscsi scsi_transport_iscsi lockd w83793 w83627hf hwmon_vid
>> coretemp iTCO_wdt microcode iTCO_vendor_support pcspkr i5k_amb ioatdma
>> i2c_i801
>> i5000_edac dca edac_core e1000e shpchp uinput sunrpc ipv6 autofs4 floppy
>> radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core [last unloaded:
>> nf_nat]
>> Apr 19 16:12:55 localhost kernel: Pid: 0, comm: kworker/0:1 Not tainted
>> 3.2.0-rc2+ #36
>> Apr 19 16:12:55 localhost kernel: Call Trace:
>> Apr 19 16:12:55 localhost kernel:<IRQ>    [<ffffffff81042902>]
>> warn_slowpath_common+0x80/0x98
>> Apr 19 16:12:55 localhost kernel: [<ffffffff810429ae>]
>> warn_slowpath_fmt+0x41/0x43
>> Apr 19 16:12:55 localhost kernel: [<ffffffff8139f8a3>]
>> dev_watchdog+0xf4/0x154
>> Apr 19 16:12:55 localhost kernel: [<ffffffff8104d371>]
>> run_timer_softirq+0x16f/0x201
>> Apr 19 16:12:55 localhost kernel: [<ffffffff8139f7af>] ?
>> netif_tx_unlock+0x57/0x57
>> Apr 19 16:12:55 localhost kernel: [<ffffffff81047e47>]
>> __do_softirq+0x86/0x12f
>> Apr 19 16:12:55 localhost kernel: [<ffffffff8105d54e>] ?
>> hrtimer_interrupt+0x12b/0x1bd
>> Apr 19 16:12:55 localhost kernel: [<ffffffff8144296c>]
>> call_softirq+0x1c/0x30
>> Apr 19 16:12:55 localhost kernel: [<ffffffff8100bb75>] do_softirq+0x41/0x7e
>> Apr 19 16:12:55 localhost kernel: [<ffffffff81047c26>] irq_exit+0x3f/0xbb
>> Apr 19 16:12:55 localhost kernel: [<ffffffff81021df5>]
>> smp_apic_timer_interrupt+0x85/0x93
>> Apr 19 16:12:55 localhost kernel: [<ffffffff814411de>]
>> apic_timer_interrupt+0x6e/0x80
>> Apr 19 16:12:55 localhost kernel:<EOI>    [<ffffffff81010b8c>] ?
>> mwait_idle+0x6e/0x8c
>> Apr 19 16:12:55 localhost kernel: [<ffffffff81010b7f>] ?
>> mwait_idle+0x61/0x8c
>> Apr 19 16:12:55 localhost kernel: [<ffffffff81009e72>] cpu_idle+0x67/0xbe
>> Apr 19 16:12:55 localhost kernel: [<ffffffff81435477>]
>> start_secondary+0x194/0x199
>> Apr 19 16:12:55 localhost kernel: ---[ end trace e3ca12fc1a8b85da ]---
>> Apr 19 16:12:55 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter
>> Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Found oopses:
>> 1
>> Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Creating dump
>> directories
>> Apr 19 16:12:57 localhost abrtd: Directory 'oops-2012-04-19-16:12:57-898-0'
>> creation detected
>> Apr 19 16:12:57 localhost abrt-dump-oops: Reported 1 kernel oopses to Abrt
>> Apr 19 16:12:57 localhost abrtd: Can't open file
>> '/var/spool/abrt/oops-2012-04-19-16:12:57-898-0/uid': No such file or
>> directory
>> Apr 19 16:12:57 localhost abrtd: DUP_OF_DIR:
>> /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
>> Apr 19 16:12:57 localhost abrtd: Dump directory is a duplicate of
>> /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
>> Apr 19 16:12:57 localhost abrtd: Deleting dump directory
>> oops-2012-04-19-16:12:57-898-0 (dup of oops-2012-04-19-15:02:13-862-0),
>> sending dbus signal
>> Apr 19 16:12:58 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full
>> Duplex, Flow Control: Rx/Tx
>> Apr 19 16:12:58 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
>> becomes ready
>> Apr 19 16:13:03 localhost /usr/sbin/irqbalance: Load average increasing,
>> re-enabling all cpus for irq balancing
>> Apr 19 16:13:04 localhost kernel: e1000e 0000:08:00.0: eth2: Reset adapter
>> Apr 19 16:13:05 localhost chronyd[1003]: Selected source 108.59.2.194
>> Apr 19 16:13:07 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full
>> Duplex, Flow Control: Rx/Tx
>> Apr 19 16:13:07 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
>> becomes ready
>> ....
>>
>> lspci:
>>
>> 08:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
>> Controller (rev 06)
>>         Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
>> Stepping- SERR+ FastB2B- DisINTx+
>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-
>> <TAbort-<MAbort->SERR-<PERR- INTx-
>>         Latency: 0, Cache Line Size: 32 bytes
>>         Interrupt: pin A routed to IRQ 74
>>         Region 0: Memory at d8300000 (32-bit, non-prefetchable) [size=128K]
>>         Region 2: I/O ports at 3000 [size=32]
>>         [virtual] Expansion ROM at d8d00000 [disabled] [size=128K]
>>         Capabilities: [c8] Power Management version 2
>>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>> PME(D0+,D1-,D2-,D3hot+,D3cold-)
>>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>>         Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>                 Address: 00000000feeff00c  Data: 41a3
>>         Capabilities: [e0] Express (v1) Endpoint, MSI 00
>>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
>> <512ns, L1<64us
>>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
>> Unsupported+
>>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>                         MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>                 DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr-
>> TransPend-
>>                 LnkCap: Port #1, Speed 2.5GT/s, Width x2, ASPM L0s L1,
>> Latency L0<4us, L1<64us
>>                         ClockPM- Surprise- LLActRep- BwNot-
>>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
>> CommClk-
>>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>                 LnkSta: Speed 2.5GT/s, Width x2, TrErr- Train- SlotClk+
>> DLActive- BWMgmt- ABWMgmt-
>>         Capabilities: [100 v1] Advanced Error Reporting
>>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>> RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>                 UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>> NonFatalErr-
>>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>> NonFatalErr-
>>                 AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap-
>> ChkEn-
>>         Capabilities: [140 v1] Device Serial Number 00-e0-ed-ff-ff-0c-11-6e
>>         Kernel driver in use: e1000e
>>         Kernel modules: e1000e
>>
>>
>> 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c is the first bad commit
>> commit 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c
>> Author: Tom Herbert<therbert@google.com>
>> Date:   Mon Nov 28 16:33:16 2011 +0000
>>
>>     e1000e: Support for byte queue limits
>>
>>     Changes to e1000e to use byte queue limits.
>>
>>     Signed-off-by: Tom Herbert<therbert@google.com>
>>     Acked-by: Eric Dumazet<eric.dumazet@gmail.com>
>>     Signed-off-by: David S. Miller<davem@davemloft.net>
>>
>> :040000 040000 bf3e2ec64fd74253563e1ab39797b27a5f2df3fe
>> 51914e221547b95a989b5c7e9b037c9370fd734e M      drivers
>>
>>
>> Thanks,
>> Ben
>>
>> --
>> Ben Greear<greearb@candelatech.com>
>> Candela Technologies Inc  http://www.candelatech.com
>>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-20 19:00   ` Ben Greear
@ 2012-04-20 19:05     ` Tom Herbert
  2012-04-20 19:13       ` Ben Greear
  0 siblings, 1 reply; 17+ messages in thread
From: Tom Herbert @ 2012-04-20 19:05 UTC (permalink / raw)
  To: Ben Greear; +Cc: e1000-devel list, netdev

> I am seeing something similar with the 'igb' driver, though this
> NIC also involves a side-driver that does bypass.  When I enable/disable
> bypass, the links bounce (as expected), and igb reports the same
> timeout that I was seeing with e1000e.
>

Hi Ben,

Are the circumstances the same between igb and e1000e for the issue?
That is in both cases are links being bounced or reset?

Thanks,
Tom

> If I revert the igb BQL patch, then it works fine (kernel 3.3.2+).
>
> I did not do an exhaustive bisect, nor have tested non-bypass
> igb hardware, but since reverting the patch fixes the problem....
>
> Maybe there is some fundamental issue with BQL when a NIC
> resets itself (in this case, due to remote port doing
> a reset)  Maybe we are not properly accounting pkts cleared
> from the xmit queue on reset or something of that nature?
>
> I'm happy to test patches of someone has suggestions...
>
> Thanks,
> Ben
>
>
>>
>> Tom
>>
>> On Thu, Apr 19, 2012 at 4:27 PM, Ben Greear<greearb@candelatech.com>
>>  wrote:
>>>
>>> Test case:
>>>
>>> Run full duplex traffic (900Mbps rx, 400Mbps tx) UDP traffic
>>> (moderate speeds of traffic has issues as well, maybe not as easy to
>>> reproduce)
>>> reset peer interface
>>> ---->  tx queue timeout
>>>
>>>
>>> Apr 19 16:12:48 localhost kernel: e1000e: eth2 NIC Link is Down
>>> Apr 19 16:12:48 localhost kernel: e1000e 0000:08:00.0: eth2: Reset
>>> adapter
>>> Apr 19 16:12:48 localhost kernel: e1000e: eth3 NIC Link is Down
>>> Apr 19 16:12:50 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps
>>> Full
>>> Duplex, Flow Control: Rx/Tx
>>> Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
>>> becomes ready
>>> Apr 19 16:12:50 localhost kernel: e1000e: eth3 NIC Link is Up 1000 Mbps
>>> Full
>>> Duplex, Flow Control: Rx/Tx
>>> Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth3: link
>>> becomes ready
>>> Apr 19 16:12:54 localhost /usr/sbin/irqbalance: Load average increasing,
>>> re-enabling all cpus for irq balancing
>>> Apr 19 16:12:55 localhost kernel: ------------[ cut here ]------------
>>> Apr 19 16:12:55 localhost kernel: WARNING: at
>>> /home/greearb/git/linux-3.3.dev.y/net/sched/sch_generic.c:256
>>> dev_watchdog+0xf4/0x154()
>>> Apr 19 16:12:55 localhost kernel: Hardware name: X7DBU
>>> Apr 19 16:12:55 localhost kernel: NETDEV WATCHDOG: eth2 (e1000e):
>>> transmit
>>> queue 0 timed out
>>> Apr 19 16:12:55 localhost kernel: Modules linked in: xt_CT iptable_raw
>>> 8021q
>>> garp stp llc veth ppdev parport_pc lp parport fuse macvlan pktgen
>>> iscsi_tcp
>>> libiscsi_tcp libiscsi scsi_transport_iscsi lockd w83793 w83627hf
>>> hwmon_vid
>>> coretemp iTCO_wdt microcode iTCO_vendor_support pcspkr i5k_amb ioatdma
>>> i2c_i801
>>> i5000_edac dca edac_core e1000e shpchp uinput sunrpc ipv6 autofs4 floppy
>>> radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core [last unloaded:
>>> nf_nat]
>>> Apr 19 16:12:55 localhost kernel: Pid: 0, comm: kworker/0:1 Not tainted
>>> 3.2.0-rc2+ #36
>>> Apr 19 16:12:55 localhost kernel: Call Trace:
>>> Apr 19 16:12:55 localhost kernel:<IRQ>    [<ffffffff81042902>]
>>> warn_slowpath_common+0x80/0x98
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff810429ae>]
>>> warn_slowpath_fmt+0x41/0x43
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8139f8a3>]
>>> dev_watchdog+0xf4/0x154
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8104d371>]
>>> run_timer_softirq+0x16f/0x201
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8139f7af>] ?
>>> netif_tx_unlock+0x57/0x57
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81047e47>]
>>> __do_softirq+0x86/0x12f
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8105d54e>] ?
>>> hrtimer_interrupt+0x12b/0x1bd
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8144296c>]
>>> call_softirq+0x1c/0x30
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8100bb75>]
>>> do_softirq+0x41/0x7e
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81047c26>] irq_exit+0x3f/0xbb
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81021df5>]
>>> smp_apic_timer_interrupt+0x85/0x93
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff814411de>]
>>> apic_timer_interrupt+0x6e/0x80
>>> Apr 19 16:12:55 localhost kernel:<EOI>    [<ffffffff81010b8c>] ?
>>> mwait_idle+0x6e/0x8c
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81010b7f>] ?
>>> mwait_idle+0x61/0x8c
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81009e72>] cpu_idle+0x67/0xbe
>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81435477>]
>>> start_secondary+0x194/0x199
>>> Apr 19 16:12:55 localhost kernel: ---[ end trace e3ca12fc1a8b85da ]---
>>> Apr 19 16:12:55 localhost kernel: e1000e 0000:08:00.0: eth2: Reset
>>> adapter
>>> Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Found
>>> oopses:
>>> 1
>>> Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Creating
>>> dump
>>> directories
>>> Apr 19 16:12:57 localhost abrtd: Directory
>>> 'oops-2012-04-19-16:12:57-898-0'
>>> creation detected
>>> Apr 19 16:12:57 localhost abrt-dump-oops: Reported 1 kernel oopses to
>>> Abrt
>>> Apr 19 16:12:57 localhost abrtd: Can't open file
>>> '/var/spool/abrt/oops-2012-04-19-16:12:57-898-0/uid': No such file or
>>> directory
>>> Apr 19 16:12:57 localhost abrtd: DUP_OF_DIR:
>>> /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
>>> Apr 19 16:12:57 localhost abrtd: Dump directory is a duplicate of
>>> /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
>>> Apr 19 16:12:57 localhost abrtd: Deleting dump directory
>>> oops-2012-04-19-16:12:57-898-0 (dup of oops-2012-04-19-15:02:13-862-0),
>>> sending dbus signal
>>> Apr 19 16:12:58 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps
>>> Full
>>> Duplex, Flow Control: Rx/Tx
>>> Apr 19 16:12:58 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
>>> becomes ready
>>> Apr 19 16:13:03 localhost /usr/sbin/irqbalance: Load average increasing,
>>> re-enabling all cpus for irq balancing
>>> Apr 19 16:13:04 localhost kernel: e1000e 0000:08:00.0: eth2: Reset
>>> adapter
>>> Apr 19 16:13:05 localhost chronyd[1003]: Selected source 108.59.2.194
>>> Apr 19 16:13:07 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps
>>> Full
>>> Duplex, Flow Control: Rx/Tx
>>> Apr 19 16:13:07 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
>>> becomes ready
>>> ....
>>>
>>> lspci:
>>>
>>> 08:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
>>> Controller (rev 06)
>>>        Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
>>>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>> ParErr+
>>> Stepping- SERR+ FastB2B- DisINTx+
>>>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-
>>> <TAbort-<MAbort->SERR-<PERR- INTx-
>>>        Latency: 0, Cache Line Size: 32 bytes
>>>        Interrupt: pin A routed to IRQ 74
>>>        Region 0: Memory at d8300000 (32-bit, non-prefetchable)
>>> [size=128K]
>>>        Region 2: I/O ports at 3000 [size=32]
>>>        [virtual] Expansion ROM at d8d00000 [disabled] [size=128K]
>>>        Capabilities: [c8] Power Management version 2
>>>                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>>> PME(D0+,D1-,D2-,D3hot+,D3cold-)
>>>                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>>>        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>                Address: 00000000feeff00c  Data: 41a3
>>>        Capabilities: [e0] Express (v1) Endpoint, MSI 00
>>>                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
>>> <512ns, L1<64us
>>>                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>>>                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
>>> Unsupported+
>>>                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>>                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr-
>>> TransPend-
>>>                LnkCap: Port #1, Speed 2.5GT/s, Width x2, ASPM L0s L1,
>>> Latency L0<4us, L1<64us
>>>                        ClockPM- Surprise- LLActRep- BwNot-
>>>                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
>>> CommClk-
>>>                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>                LnkSta: Speed 2.5GT/s, Width x2, TrErr- Train- SlotClk+
>>> DLActive- BWMgmt- ABWMgmt-
>>>        Capabilities: [100 v1] Advanced Error Reporting
>>>                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>> RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>>>                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>>> NonFatalErr-
>>>                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>>> NonFatalErr-
>>>                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap-
>>> ChkEn-
>>>        Capabilities: [140 v1] Device Serial Number
>>> 00-e0-ed-ff-ff-0c-11-6e
>>>        Kernel driver in use: e1000e
>>>        Kernel modules: e1000e
>>>
>>>
>>> 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c is the first bad commit
>>> commit 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c
>>> Author: Tom Herbert<therbert@google.com>
>>> Date:   Mon Nov 28 16:33:16 2011 +0000
>>>
>>>    e1000e: Support for byte queue limits
>>>
>>>    Changes to e1000e to use byte queue limits.
>>>
>>>    Signed-off-by: Tom Herbert<therbert@google.com>
>>>    Acked-by: Eric Dumazet<eric.dumazet@gmail.com>
>>>    Signed-off-by: David S. Miller<davem@davemloft.net>
>>>
>>> :040000 040000 bf3e2ec64fd74253563e1ab39797b27a5f2df3fe
>>> 51914e221547b95a989b5c7e9b037c9370fd734e M      drivers
>>>
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>> Ben Greear<greearb@candelatech.com>
>>> Candela Technologies Inc  http://www.candelatech.com
>>>
>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-20 19:05     ` Tom Herbert
@ 2012-04-20 19:13       ` Ben Greear
  2012-04-20 19:44         ` John Fastabend
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Greear @ 2012-04-20 19:13 UTC (permalink / raw)
  To: Tom Herbert; +Cc: e1000-devel list, netdev

On 04/20/2012 12:05 PM, Tom Herbert wrote:
>> I am seeing something similar with the 'igb' driver, though this
>> NIC also involves a side-driver that does bypass.  When I enable/disable
>> bypass, the links bounce (as expected), and igb reports the same
>> timeout that I was seeing with e1000e.
>>
>
> Hi Ben,
>
> Are the circumstances the same between igb and e1000e for the issue?
> That is in both cases are links being bounced or reset?

Yes, peer interface is being bounced in both cases.  The igb NIC is
in a different machine, running Fedora 14 instead of F16 for the
e1000e.

I have the bypass NIC cabled to itself..2 ports sending/receiving
traffic, and 2 doing bypass (in hardware, or bridging in software).

When I twiddle to/from hardware/software bridging on the two
bypass ports, link is re-negotiated on the traffic-generating ports,
and the traffic generating ports have the tx queue timeout
issue.

If you have an igb or e1000e system that can generate traffic
while resetting the peer interface, I think you should easily
see the problem.  If you DON'T see the problem, please let me
know and I will try to provide a more concise test case using
open-source traffic generators (we're using our own
traffic generator, but at least on the e1000e bisect, I was
using standard upstream kernels and pure user-space API to
generate traffic).

Thanks,
Ben

>
> Thanks,
> Tom
>
>> If I revert the igb BQL patch, then it works fine (kernel 3.3.2+).
>>
>> I did not do an exhaustive bisect, nor have tested non-bypass
>> igb hardware, but since reverting the patch fixes the problem....
>>
>> Maybe there is some fundamental issue with BQL when a NIC
>> resets itself (in this case, due to remote port doing
>> a reset)  Maybe we are not properly accounting pkts cleared
>> from the xmit queue on reset or something of that nature?
>>
>> I'm happy to test patches of someone has suggestions...
>>
>> Thanks,
>> Ben
>>
>>
>>>
>>> Tom
>>>
>>> On Thu, Apr 19, 2012 at 4:27 PM, Ben Greear<greearb@candelatech.com>
>>>   wrote:
>>>>
>>>> Test case:
>>>>
>>>> Run full duplex traffic (900Mbps rx, 400Mbps tx) UDP traffic
>>>> (moderate speeds of traffic has issues as well, maybe not as easy to
>>>> reproduce)
>>>> reset peer interface
>>>> ---->    tx queue timeout
>>>>
>>>>
>>>> Apr 19 16:12:48 localhost kernel: e1000e: eth2 NIC Link is Down
>>>> Apr 19 16:12:48 localhost kernel: e1000e 0000:08:00.0: eth2: Reset
>>>> adapter
>>>> Apr 19 16:12:48 localhost kernel: e1000e: eth3 NIC Link is Down
>>>> Apr 19 16:12:50 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps
>>>> Full
>>>> Duplex, Flow Control: Rx/Tx
>>>> Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
>>>> becomes ready
>>>> Apr 19 16:12:50 localhost kernel: e1000e: eth3 NIC Link is Up 1000 Mbps
>>>> Full
>>>> Duplex, Flow Control: Rx/Tx
>>>> Apr 19 16:12:50 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth3: link
>>>> becomes ready
>>>> Apr 19 16:12:54 localhost /usr/sbin/irqbalance: Load average increasing,
>>>> re-enabling all cpus for irq balancing
>>>> Apr 19 16:12:55 localhost kernel: ------------[ cut here ]------------
>>>> Apr 19 16:12:55 localhost kernel: WARNING: at
>>>> /home/greearb/git/linux-3.3.dev.y/net/sched/sch_generic.c:256
>>>> dev_watchdog+0xf4/0x154()
>>>> Apr 19 16:12:55 localhost kernel: Hardware name: X7DBU
>>>> Apr 19 16:12:55 localhost kernel: NETDEV WATCHDOG: eth2 (e1000e):
>>>> transmit
>>>> queue 0 timed out
>>>> Apr 19 16:12:55 localhost kernel: Modules linked in: xt_CT iptable_raw
>>>> 8021q
>>>> garp stp llc veth ppdev parport_pc lp parport fuse macvlan pktgen
>>>> iscsi_tcp
>>>> libiscsi_tcp libiscsi scsi_transport_iscsi lockd w83793 w83627hf
>>>> hwmon_vid
>>>> coretemp iTCO_wdt microcode iTCO_vendor_support pcspkr i5k_amb ioatdma
>>>> i2c_i801
>>>> i5000_edac dca edac_core e1000e shpchp uinput sunrpc ipv6 autofs4 floppy
>>>> radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core [last unloaded:
>>>> nf_nat]
>>>> Apr 19 16:12:55 localhost kernel: Pid: 0, comm: kworker/0:1 Not tainted
>>>> 3.2.0-rc2+ #36
>>>> Apr 19 16:12:55 localhost kernel: Call Trace:
>>>> Apr 19 16:12:55 localhost kernel:<IRQ>      [<ffffffff81042902>]
>>>> warn_slowpath_common+0x80/0x98
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff810429ae>]
>>>> warn_slowpath_fmt+0x41/0x43
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8139f8a3>]
>>>> dev_watchdog+0xf4/0x154
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8104d371>]
>>>> run_timer_softirq+0x16f/0x201
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8139f7af>] ?
>>>> netif_tx_unlock+0x57/0x57
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81047e47>]
>>>> __do_softirq+0x86/0x12f
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8105d54e>] ?
>>>> hrtimer_interrupt+0x12b/0x1bd
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8144296c>]
>>>> call_softirq+0x1c/0x30
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff8100bb75>]
>>>> do_softirq+0x41/0x7e
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81047c26>] irq_exit+0x3f/0xbb
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81021df5>]
>>>> smp_apic_timer_interrupt+0x85/0x93
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff814411de>]
>>>> apic_timer_interrupt+0x6e/0x80
>>>> Apr 19 16:12:55 localhost kernel:<EOI>      [<ffffffff81010b8c>] ?
>>>> mwait_idle+0x6e/0x8c
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81010b7f>] ?
>>>> mwait_idle+0x61/0x8c
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81009e72>] cpu_idle+0x67/0xbe
>>>> Apr 19 16:12:55 localhost kernel: [<ffffffff81435477>]
>>>> start_secondary+0x194/0x199
>>>> Apr 19 16:12:55 localhost kernel: ---[ end trace e3ca12fc1a8b85da ]---
>>>> Apr 19 16:12:55 localhost kernel: e1000e 0000:08:00.0: eth2: Reset
>>>> adapter
>>>> Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Found
>>>> oopses:
>>>> 1
>>>> Apr 19 16:12:57 localhost abrt-dump-oops[898]: abrt-dump-oops: Creating
>>>> dump
>>>> directories
>>>> Apr 19 16:12:57 localhost abrtd: Directory
>>>> 'oops-2012-04-19-16:12:57-898-0'
>>>> creation detected
>>>> Apr 19 16:12:57 localhost abrt-dump-oops: Reported 1 kernel oopses to
>>>> Abrt
>>>> Apr 19 16:12:57 localhost abrtd: Can't open file
>>>> '/var/spool/abrt/oops-2012-04-19-16:12:57-898-0/uid': No such file or
>>>> directory
>>>> Apr 19 16:12:57 localhost abrtd: DUP_OF_DIR:
>>>> /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
>>>> Apr 19 16:12:57 localhost abrtd: Dump directory is a duplicate of
>>>> /var/spool/abrt/oops-2012-04-19-15:02:13-862-0
>>>> Apr 19 16:12:57 localhost abrtd: Deleting dump directory
>>>> oops-2012-04-19-16:12:57-898-0 (dup of oops-2012-04-19-15:02:13-862-0),
>>>> sending dbus signal
>>>> Apr 19 16:12:58 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps
>>>> Full
>>>> Duplex, Flow Control: Rx/Tx
>>>> Apr 19 16:12:58 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
>>>> becomes ready
>>>> Apr 19 16:13:03 localhost /usr/sbin/irqbalance: Load average increasing,
>>>> re-enabling all cpus for irq balancing
>>>> Apr 19 16:13:04 localhost kernel: e1000e 0000:08:00.0: eth2: Reset
>>>> adapter
>>>> Apr 19 16:13:05 localhost chronyd[1003]: Selected source 108.59.2.194
>>>> Apr 19 16:13:07 localhost kernel: e1000e: eth2 NIC Link is Up 1000 Mbps
>>>> Full
>>>> Duplex, Flow Control: Rx/Tx
>>>> Apr 19 16:13:07 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link
>>>> becomes ready
>>>> ....
>>>>
>>>> lspci:
>>>>
>>>> 08:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
>>>> Controller (rev 06)
>>>>         Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
>>>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>> ParErr+
>>>> Stepping- SERR+ FastB2B- DisINTx+
>>>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-
>>>> <TAbort-<MAbort->SERR-<PERR- INTx-
>>>>         Latency: 0, Cache Line Size: 32 bytes
>>>>         Interrupt: pin A routed to IRQ 74
>>>>         Region 0: Memory at d8300000 (32-bit, non-prefetchable)
>>>> [size=128K]
>>>>         Region 2: I/O ports at 3000 [size=32]
>>>>         [virtual] Expansion ROM at d8d00000 [disabled] [size=128K]
>>>>         Capabilities: [c8] Power Management version 2
>>>>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>>>> PME(D0+,D1-,D2-,D3hot+,D3cold-)
>>>>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>>>>         Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>>                 Address: 00000000feeff00c  Data: 41a3
>>>>         Capabilities: [e0] Express (v1) Endpoint, MSI 00
>>>>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
>>>> <512ns, L1<64us
>>>>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>>>>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
>>>> Unsupported+
>>>>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>>>                         MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>                 DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr-
>>>> TransPend-
>>>>                 LnkCap: Port #1, Speed 2.5GT/s, Width x2, ASPM L0s L1,
>>>> Latency L0<4us, L1<64us
>>>>                         ClockPM- Surprise- LLActRep- BwNot-
>>>>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
>>>> CommClk-
>>>>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>                 LnkSta: Speed 2.5GT/s, Width x2, TrErr- Train- SlotClk+
>>>> DLActive- BWMgmt- ABWMgmt-
>>>>         Capabilities: [100 v1] Advanced Error Reporting
>>>>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>> RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>>>>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>                 UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>>>> NonFatalErr-
>>>>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>>>> NonFatalErr-
>>>>                 AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap-
>>>> ChkEn-
>>>>         Capabilities: [140 v1] Device Serial Number
>>>> 00-e0-ed-ff-ff-0c-11-6e
>>>>         Kernel driver in use: e1000e
>>>>         Kernel modules: e1000e
>>>>
>>>>
>>>> 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c is the first bad commit
>>>> commit 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c
>>>> Author: Tom Herbert<therbert@google.com>
>>>> Date:   Mon Nov 28 16:33:16 2011 +0000
>>>>
>>>>     e1000e: Support for byte queue limits
>>>>
>>>>     Changes to e1000e to use byte queue limits.
>>>>
>>>>     Signed-off-by: Tom Herbert<therbert@google.com>
>>>>     Acked-by: Eric Dumazet<eric.dumazet@gmail.com>
>>>>     Signed-off-by: David S. Miller<davem@davemloft.net>
>>>>
>>>> :040000 040000 bf3e2ec64fd74253563e1ab39797b27a5f2df3fe
>>>> 51914e221547b95a989b5c7e9b037c9370fd734e M      drivers
>>>>
>>>>
>>>> Thanks,
>>>> Ben
>>>>
>>>> --
>>>> Ben Greear<greearb@candelatech.com>
>>>> Candela Technologies Inc  http://www.candelatech.com
>>>>
>>
>>
>> --
>> Ben Greear<greearb@candelatech.com>
>> Candela Technologies Inc  http://www.candelatech.com
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-20 19:13       ` Ben Greear
@ 2012-04-20 19:44         ` John Fastabend
  2012-04-20 21:21           ` Tom Herbert
  0 siblings, 1 reply; 17+ messages in thread
From: John Fastabend @ 2012-04-20 19:44 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Ben Greear, netdev, e1000-devel list, Eric Dumazet

On 4/20/2012 12:13 PM, Ben Greear wrote:
> On 04/20/2012 12:05 PM, Tom Herbert wrote:
>>> I am seeing something similar with the 'igb' driver, though this
>>> NIC also involves a side-driver that does bypass.  When I enable/disable
>>> bypass, the links bounce (as expected), and igb reports the same
>>> timeout that I was seeing with e1000e.
>>>
>>


[...]

>>>>>
>>>>> 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c is the first bad commit
>>>>> commit 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c
>>>>> Author: Tom Herbert<therbert@google.com>
>>>>> Date:   Mon Nov 28 16:33:16 2011 +0000
>>>>>
>>>>>     e1000e: Support for byte queue limits
>>>>>
>>>>>     Changes to e1000e to use byte queue limits.
>>>>>
>>>>>     Signed-off-by: Tom Herbert<therbert@google.com>
>>>>>     Acked-by: Eric Dumazet<eric.dumazet@gmail.com>
>>>>>     Signed-off-by: David S. Miller<davem@davemloft.net>
>>>>>
>>>>> :040000 040000 bf3e2ec64fd74253563e1ab39797b27a5f2df3fe
>>>>> 51914e221547b95a989b5c7e9b037c9370fd734e M      drivers
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Ben
>>>>>


Tom, did you see these two patches? Maybe this is resolved by
the second patch.

We needed these to fixup ixgbe and igb (i didn't test e1000e)
looks like we might want to push these at stable. I don't
believe they are in 3.3.

commit b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
Author: Alexander Duyck <alexander.h.duyck@intel.com>
Date:   Tue Feb 7 02:29:06 2012 +0000

    net: Add memory barriers to prevent possible race in byte queue limits

    This change adds a memory barrier to the byte queue limit code to address a
    possible race as has been seen in the past with the
    netif_stop_queue/netif_wake_queue logic.

    Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
    Tested-by: Stephen Ko <stephen.s.ko@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b37c0fbe3f6dfba1f8ad2aed47fb40578a254635


commit 5c4903549c05bbb373479e0ce2992573c120654a
Author: Alexander Duyck <alexander.h.duyck@intel.com>
Date:   Tue Feb 7 02:29:01 2012 +0000

    net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state

    We are seeing dev_watchdog hangs on several drivers.  I suspect this is due
    to the __QUEUE_STATE_STACK_XOFF bit being set prior to a reset for link
    change, and then not being cleared by netdev_tx_reset_queue.  This change
    corrects that.

    In addition we were seeing dev_watchdog hangs on igb after running the
    ethtool tests.  We found this to be due to the fact that the ethtool test
    runs the same logic as ndo_start_xmit, but we were never clearing the XOFF
    flag since the loopback test in ethtool does not do byte queue accounting.

    Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
    Tested-by: Stephen Ko <stephen.s.ko@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5c4903549c05bbb373479e0ce2992573c120654a

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-20 19:44         ` John Fastabend
@ 2012-04-20 21:21           ` Tom Herbert
  2012-04-20 21:24             ` Ben Greear
  2012-04-20 21:56             ` Ben Greear
  0 siblings, 2 replies; 17+ messages in thread
From: Tom Herbert @ 2012-04-20 21:21 UTC (permalink / raw)
  To: John Fastabend; +Cc: Ben Greear, netdev, e1000-devel list, Eric Dumazet

Thanks John for pointers to those.  Ben, are you running a kernel with
these patches?

Tom

>
> Tom, did you see these two patches? Maybe this is resolved by
> the second patch.
>
> We needed these to fixup ixgbe and igb (i didn't test e1000e)
> looks like we might want to push these at stable. I don't
> believe they are in 3.3.
>
> commit b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
> Author: Alexander Duyck <alexander.h.duyck@intel.com>
> Date:   Tue Feb 7 02:29:06 2012 +0000
>
>    net: Add memory barriers to prevent possible race in byte queue limits
>
>    This change adds a memory barrier to the byte queue limit code to address a
>    possible race as has been seen in the past with the
>    netif_stop_queue/netif_wake_queue logic.
>
>    Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>    Tested-by: Stephen Ko <stephen.s.ko@intel.com>
>    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>
>
> commit 5c4903549c05bbb373479e0ce2992573c120654a
> Author: Alexander Duyck <alexander.h.duyck@intel.com>
> Date:   Tue Feb 7 02:29:01 2012 +0000
>
>    net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state
>
>    We are seeing dev_watchdog hangs on several drivers.  I suspect this is due
>    to the __QUEUE_STATE_STACK_XOFF bit being set prior to a reset for link
>    change, and then not being cleared by netdev_tx_reset_queue.  This change
>    corrects that.
>
>    In addition we were seeing dev_watchdog hangs on igb after running the
>    ethtool tests.  We found this to be due to the fact that the ethtool test
>    runs the same logic as ndo_start_xmit, but we were never clearing the XOFF
>    flag since the loopback test in ethtool does not do byte queue accounting.
>
>    Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>    Tested-by: Stephen Ko <stephen.s.ko@intel.com>
>    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5c4903549c05bbb373479e0ce2992573c120654a
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-20 21:21           ` Tom Herbert
@ 2012-04-20 21:24             ` Ben Greear
  2012-04-20 21:56             ` Ben Greear
  1 sibling, 0 replies; 17+ messages in thread
From: Ben Greear @ 2012-04-20 21:24 UTC (permalink / raw)
  To: Tom Herbert; +Cc: John Fastabend, netdev, e1000-devel list

On 04/20/2012 02:21 PM, Tom Herbert wrote:
> Thanks John for pointers to those.  Ben, are you running a kernel with
> these patches?

I don't think so.  I'll add them and re-test.

Thanks,
Ben

>
> Tom
>
>>
>> Tom, did you see these two patches? Maybe this is resolved by
>> the second patch.
>>
>> We needed these to fixup ixgbe and igb (i didn't test e1000e)
>> looks like we might want to push these at stable. I don't
>> believe they are in 3.3.
>>
>> commit b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>> Author: Alexander Duyck<alexander.h.duyck@intel.com>
>> Date:   Tue Feb 7 02:29:06 2012 +0000
>>
>>     net: Add memory barriers to prevent possible race in byte queue limits
>>
>>     This change adds a memory barrier to the byte queue limit code to address a
>>     possible race as has been seen in the past with the
>>     netif_stop_queue/netif_wake_queue logic.
>>
>>     Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com>
>>     Tested-by: Stephen Ko<stephen.s.ko@intel.com>
>>     Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>>
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>>
>>
>> commit 5c4903549c05bbb373479e0ce2992573c120654a
>> Author: Alexander Duyck<alexander.h.duyck@intel.com>
>> Date:   Tue Feb 7 02:29:01 2012 +0000
>>
>>     net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state
>>
>>     We are seeing dev_watchdog hangs on several drivers.  I suspect this is due
>>     to the __QUEUE_STATE_STACK_XOFF bit being set prior to a reset for link
>>     change, and then not being cleared by netdev_tx_reset_queue.  This change
>>     corrects that.
>>
>>     In addition we were seeing dev_watchdog hangs on igb after running the
>>     ethtool tests.  We found this to be due to the fact that the ethtool test
>>     runs the same logic as ndo_start_xmit, but we were never clearing the XOFF
>>     flag since the loopback test in ethtool does not do byte queue accounting.
>>
>>     Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com>
>>     Tested-by: Stephen Ko<stephen.s.ko@intel.com>
>>     Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5c4903549c05bbb373479e0ce2992573c120654a
>>
>>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-20 21:21           ` Tom Herbert
  2012-04-20 21:24             ` Ben Greear
@ 2012-04-20 21:56             ` Ben Greear
  2012-05-01 21:10               ` [E1000-devel] " Ben Greear
  1 sibling, 1 reply; 17+ messages in thread
From: Ben Greear @ 2012-04-20 21:56 UTC (permalink / raw)
  To: Tom Herbert; +Cc: John Fastabend, netdev, e1000-devel list

On 04/20/2012 02:21 PM, Tom Herbert wrote:
> Thanks John for pointers to those.  Ben, are you running a kernel with
> these patches?

I just tested this on my e1000e and igb machine.  With these patches,
I can no longer reproduce the problem.

So, please make sure those are queued up for 3.3 stable!

Thanks,
Ben

>
> Tom
>
>>
>> Tom, did you see these two patches? Maybe this is resolved by
>> the second patch.
>>
>> We needed these to fixup ixgbe and igb (i didn't test e1000e)
>> looks like we might want to push these at stable. I don't
>> believe they are in 3.3.
>>
>> commit b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>> Author: Alexander Duyck<alexander.h.duyck@intel.com>
>> Date:   Tue Feb 7 02:29:06 2012 +0000
>>
>>     net: Add memory barriers to prevent possible race in byte queue limits
>>
>>     This change adds a memory barrier to the byte queue limit code to address a
>>     possible race as has been seen in the past with the
>>     netif_stop_queue/netif_wake_queue logic.
>>
>>     Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com>
>>     Tested-by: Stephen Ko<stephen.s.ko@intel.com>
>>     Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>>
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>>
>>
>> commit 5c4903549c05bbb373479e0ce2992573c120654a
>> Author: Alexander Duyck<alexander.h.duyck@intel.com>
>> Date:   Tue Feb 7 02:29:01 2012 +0000
>>
>>     net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state
>>
>>     We are seeing dev_watchdog hangs on several drivers.  I suspect this is due
>>     to the __QUEUE_STATE_STACK_XOFF bit being set prior to a reset for link
>>     change, and then not being cleared by netdev_tx_reset_queue.  This change
>>     corrects that.
>>
>>     In addition we were seeing dev_watchdog hangs on igb after running the
>>     ethtool tests.  We found this to be due to the fact that the ethtool test
>>     runs the same logic as ndo_start_xmit, but we were never clearing the XOFF
>>     flag since the loopback test in ethtool does not do byte queue accounting.
>>
>>     Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com>
>>     Tested-by: Stephen Ko<stephen.s.ko@intel.com>
>>     Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5c4903549c05bbb373479e0ce2992573c120654a
>>
>>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [E1000-devel] e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-04-20 21:56             ` Ben Greear
@ 2012-05-01 21:10               ` Ben Greear
  2012-05-01 21:49                 ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Greear @ 2012-05-01 21:10 UTC (permalink / raw)
  To: Tom Herbert, David Miller; +Cc: John Fastabend, netdev, e1000-devel list

On 04/20/2012 02:56 PM, Ben Greear wrote:
> On 04/20/2012 02:21 PM, Tom Herbert wrote:
>> Thanks John for pointers to those.  Ben, are you running a kernel with
>> these patches?
>
> I just tested this on my e1000e and igb machine.  With these patches,
> I can no longer reproduce the problem.
>
> So, please make sure those are queued up for 3.3 stable!

Dave:  I think these patches below should go to 3.3 stable.

They are not queued for stable yet as far as I can tell.

Thanks,
Ben

>
> Thanks,
> Ben
>
>>
>> Tom
>>
>>>
>>> Tom, did you see these two patches? Maybe this is resolved by
>>> the second patch.
>>>
>>> We needed these to fixup ixgbe and igb (i didn't test e1000e)
>>> looks like we might want to push these at stable. I don't
>>> believe they are in 3.3.
>>>
>>> commit b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>>> Author: Alexander Duyck<alexander.h.duyck@intel.com>
>>> Date:   Tue Feb 7 02:29:06 2012 +0000
>>>
>>>      net: Add memory barriers to prevent possible race in byte queue limits
>>>
>>>      This change adds a memory barrier to the byte queue limit code to address a
>>>      possible race as has been seen in the past with the
>>>      netif_stop_queue/netif_wake_queue logic.
>>>
>>>      Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com>
>>>      Tested-by: Stephen Ko<stephen.s.ko@intel.com>
>>>      Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>>>
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>>>
>>>
>>> commit 5c4903549c05bbb373479e0ce2992573c120654a
>>> Author: Alexander Duyck<alexander.h.duyck@intel.com>
>>> Date:   Tue Feb 7 02:29:01 2012 +0000
>>>
>>>      net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state
>>>
>>>      We are seeing dev_watchdog hangs on several drivers.  I suspect this is due
>>>      to the __QUEUE_STATE_STACK_XOFF bit being set prior to a reset for link
>>>      change, and then not being cleared by netdev_tx_reset_queue.  This change
>>>      corrects that.
>>>
>>>      In addition we were seeing dev_watchdog hangs on igb after running the
>>>      ethtool tests.  We found this to be due to the fact that the ethtool test
>>>      runs the same logic as ndo_start_xmit, but we were never clearing the XOFF
>>>      flag since the loopback test in ethtool does not do byte queue accounting.
>>>
>>>      Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com>
>>>      Tested-by: Stephen Ko<stephen.s.ko@intel.com>
>>>      Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5c4903549c05bbb373479e0ce2992573c120654a
>>>
>>>
>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-05-01 21:10               ` [E1000-devel] " Ben Greear
@ 2012-05-01 21:49                 ` David Miller
  2012-05-01 22:08                   ` Ben Greear
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2012-05-01 21:49 UTC (permalink / raw)
  To: greearb; +Cc: john.r.fastabend, netdev, e1000-devel, therbert

From: Ben Greear <greearb@candelatech.com>
Date: Tue, 01 May 2012 14:10:43 -0700

> On 04/20/2012 02:56 PM, Ben Greear wrote:
>> On 04/20/2012 02:21 PM, Tom Herbert wrote:
>>> Thanks John for pointers to those.  Ben, are you running a kernel with
>>> these patches?
>>
>> I just tested this on my e1000e and igb machine.  With these patches,
>> I can no longer reproduce the problem.
>>
>> So, please make sure those are queued up for 3.3 stable!
> 
> Dave:  I think these patches below should go to 3.3 stable.
> 
> They are not queued for stable yet as far as I can tell.

I let the Intel developers handle -stable submissions for their
drivers.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-05-01 21:49                 ` David Miller
@ 2012-05-01 22:08                   ` Ben Greear
  2012-05-01 22:42                     ` [E1000-devel] " Jeff Kirsher
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Greear @ 2012-05-01 22:08 UTC (permalink / raw)
  To: David Miller; +Cc: john.r.fastabend, netdev, e1000-devel, therbert

On 05/01/2012 02:49 PM, David Miller wrote:
> From: Ben Greear<greearb@candelatech.com>
> Date: Tue, 01 May 2012 14:10:43 -0700
>
>> On 04/20/2012 02:56 PM, Ben Greear wrote:
>>> On 04/20/2012 02:21 PM, Tom Herbert wrote:
>>>> Thanks John for pointers to those.  Ben, are you running a kernel with
>>>> these patches?
>>>
>>> I just tested this on my e1000e and igb machine.  With these patches,
>>> I can no longer reproduce the problem.
>>>
>>> So, please make sure those are queued up for 3.3 stable!
>>
>> Dave:  I think these patches below should go to 3.3 stable.
>>
>> They are not queued for stable yet as far as I can tell.
>
> I let the Intel developers handle -stable submissions for their
> drivers.

I don't think this is specific to their drivers.

But, as long as _someone_ is pushing it to stable,
it is fine by me.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [E1000-devel] e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-05-01 22:08                   ` Ben Greear
@ 2012-05-01 22:42                     ` Jeff Kirsher
  2012-05-01 22:46                       ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Kirsher @ 2012-05-01 22:42 UTC (permalink / raw)
  To: Ben Greear; +Cc: David Miller, john.r.fastabend, netdev, e1000-devel, therbert

[-- Attachment #1: Type: text/plain, Size: 1059 bytes --]

On Tue, 2012-05-01 at 15:08 -0700, Ben Greear wrote:
> On 05/01/2012 02:49 PM, David Miller wrote:
> > From: Ben Greear<greearb@candelatech.com>
> > Date: Tue, 01 May 2012 14:10:43 -0700
> >
> >> On 04/20/2012 02:56 PM, Ben Greear wrote:
> >>> On 04/20/2012 02:21 PM, Tom Herbert wrote:
> >>>> Thanks John for pointers to those.  Ben, are you running a kernel
> with
> >>>> these patches?
> >>>
> >>> I just tested this on my e1000e and igb machine.  With these
> patches,
> >>> I can no longer reproduce the problem.
> >>>
> >>> So, please make sure those are queued up for 3.3 stable!
> >>
> >> Dave:  I think these patches below should go to 3.3 stable.
> >>
> >> They are not queued for stable yet as far as I can tell.
> >
> > I let the Intel developers handle -stable submissions for their
> > drivers.
> 
> I don't think this is specific to their drivers.
> 
> But, as long as _someone_ is pushing it to stable,
> it is fine by me.
> 
> Thanks,
> Ben 

I will take care of the stable submission's Ben.

Cheers,
Jeff

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-05-01 22:42                     ` [E1000-devel] " Jeff Kirsher
@ 2012-05-01 22:46                       ` David Miller
  2012-05-01 22:52                         ` [E1000-devel] " Jeff Kirsher
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2012-05-01 22:46 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: john.r.fastabend, netdev, therbert, e1000-devel

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 01 May 2012 15:42:28 -0700

> I will take care of the stable submission's Ben.

Actually, Jeff, hold off on that.

I misread Ben's email and didn't see that these were not
driver-specific changes.

Therefore, I'll take care of the -stable submissions and you don't
need to worry about it.

Thanks everyone.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [E1000-devel] e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e)
  2012-05-01 22:46                       ` David Miller
@ 2012-05-01 22:52                         ` Jeff Kirsher
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Kirsher @ 2012-05-01 22:52 UTC (permalink / raw)
  To: David Miller; +Cc: greearb, john.r.fastabend, netdev, e1000-devel, therbert

[-- Attachment #1: Type: text/plain, Size: 490 bytes --]

On Tue, 2012-05-01 at 18:46 -0400, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Tue, 01 May 2012 15:42:28 -0700
> 
> > I will take care of the stable submission's Ben.
> 
> Actually, Jeff, hold off on that.
> 
> I misread Ben's email and didn't see that these were not
> driver-specific changes.
> 
> Therefore, I'll take care of the -stable submissions and you don't
> need to worry about it.
> 
> Thanks everyone.

Ok, sounds good Dave.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-05-01 22:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-19 23:27 e1000e tx queue timeout in 3.3.0 (bisected to BQL support for e1000e) Ben Greear
2012-04-20  2:39 ` Tom Herbert
2012-04-20  6:44   ` Ying Cai
2012-04-20 19:00   ` Ben Greear
2012-04-20 19:05     ` Tom Herbert
2012-04-20 19:13       ` Ben Greear
2012-04-20 19:44         ` John Fastabend
2012-04-20 21:21           ` Tom Herbert
2012-04-20 21:24             ` Ben Greear
2012-04-20 21:56             ` Ben Greear
2012-05-01 21:10               ` [E1000-devel] " Ben Greear
2012-05-01 21:49                 ` David Miller
2012-05-01 22:08                   ` Ben Greear
2012-05-01 22:42                     ` [E1000-devel] " Jeff Kirsher
2012-05-01 22:46                       ` David Miller
2012-05-01 22:52                         ` [E1000-devel] " Jeff Kirsher
2012-04-20  6:46 ` Dave, Tushar N

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.