linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
@ 2008-01-15  5:25 Frans Pop
  2008-01-15  5:53 ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Frans Pop @ 2008-01-15  5:25 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

After compiling v2.6.24-rc7-163-g1a1b285 (x86_64) yesterday I suddenly see this error
repeatedly:
kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
kernel:   Tx Queue             <0>
kernel:   TDH                  <a>
kernel:   TDT                  <a>
kernel:   next_to_use          <a>
kernel:   next_to_clean        <ff>
kernel: buffer_info[next_to_clean]
kernel:   time_stamp           <10002738a>
kernel:   next_to_watch        <ff>
kernel:   jiffies              <1000275b4>
kernel:   next_to_watch.status <1>

My previous kernel was v2.6.24-rc7 and with that this error did not occur. I
have also never seen it with earlier kernels.

The values for "TX Queue" and "next_to_watch.status" are constant, the
others vary.

My NIC is:
01:00.0 Ethernet controller [0200]: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)

01:00.0 0200: 8086:108c (rev 03)
        Subsystem: 8086:3096
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 1273
        Region 0: Memory at 90200000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at 90100000 (32-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at 1000 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
                Address: 00000000fee0300c  Data: 41a9
        Capabilities: [e0] Express Endpoint IRQ 0
                Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag-
                Device: Latency L0s <512ns, L1 <64us
                Device: AtnBtn- AtnInd- PwrInd-
                Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
                Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
                Link: Supported Speed 2.5Gb/s, Width x1, ASPM unknown, Port 0
                Link: Latency L0s <128ns, L1 <64us
                Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
                Link: Speed 2.5Gb/s, Width x1

The system is an Intel D945GCZ main board with
Intel(R) Pentium(R) D CPU 3.20GHz (dual core) processor.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-15  5:25 [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang Frans Pop
@ 2008-01-15  5:53 ` David Miller
  2008-01-15  6:17   ` Frans Pop
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: David Miller @ 2008-01-15  5:53 UTC (permalink / raw)
  To: elendil; +Cc: netdev, linux-kernel

From: Frans Pop <elendil@planet.nl>
Date: Tue, 15 Jan 2008 06:25:10 +0100

> kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

Does this make the problem go away?

(Note this isn't the final correct patch we should apply.  There
 is no reason why this revert back to the older ->poll() logic
 here should have any effect on the TX hang triggering...)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 13d57b0..cada32c 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
 {
 	struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
 	struct net_device *poll_dev = adapter->netdev;
-	int work_done = 0;
+	int tx_work = 0, work_done = 0;
 
 	/* Must NOT use netdev_priv macro here. */
 	adapter = poll_dev->priv;
@@ -3929,8 +3929,8 @@ e1000_clean(struct napi_struct *napi, int budget)
 	 * simultaneously.  A failure obtaining the lock means
 	 * tx_ring[0] is currently being cleaned anyway. */
 	if (spin_trylock(&adapter->tx_queue_lock)) {
-		e1000_clean_tx_irq(adapter,
-				   &adapter->tx_ring[0]);
+		tx_work = e1000_clean_tx_irq(adapter,
+					     &adapter->tx_ring[0]);
 		spin_unlock(&adapter->tx_queue_lock);
 	}
 
@@ -3938,7 +3938,7 @@ e1000_clean(struct napi_struct *napi, int budget)
 	                  &work_done, budget);
 
 	/* If budget not fully consumed, exit the polling mode */
-	if (work_done < budget) {
+	if (!tx_work && (work_done < budget)) {
 		if (likely(adapter->itr_setting & 3))
 			e1000_set_itr(adapter);
 		netif_rx_complete(poll_dev, napi);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-15  5:53 ` David Miller
@ 2008-01-15  6:17   ` Frans Pop
  2008-01-15 14:04   ` Frans Pop
  2008-01-16  9:02   ` Badalian Vyacheslav
  2 siblings, 0 replies; 27+ messages in thread
From: Frans Pop @ 2008-01-15  6:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

Wow. That's fast! :-)

On Tuesday 15 January 2008, David Miller wrote:
> From: Frans Pop <elendil@planet.nl>
>
> > kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>
> Does this make the problem go away?

I'm compiling a kernel with the patch now. Will let you know the result.
May take a while as I don't know how to trigger the bug, so I'll just have 
to let it run for some time.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-15  5:53 ` David Miller
  2008-01-15  6:17   ` Frans Pop
@ 2008-01-15 14:04   ` Frans Pop
  2008-01-15 16:04     ` slavon
  2008-01-16  9:02   ` Badalian Vyacheslav
  2 siblings, 1 reply; 27+ messages in thread
From: Frans Pop @ 2008-01-15 14:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

On Tuesday 15 January 2008, David Miller wrote:
> From: Frans Pop <elendil@planet.nl>
> > kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>
> Does this make the problem go away?

Yes, it very much looks like that solves it.
I ran with the patch for 6 hours or so without any errors. I then switched 
back to an unpatched kernel and they reappeared immediately.

> (Note this isn't the final correct patch we should apply.  There
>  is no reason why this revert back to the older ->poll() logic
>  here should have any effect on the TX hang triggering...)

s/no reason/no obvious reason/ ? ;-)

Cheers,
FJP

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-15 14:04   ` Frans Pop
@ 2008-01-15 16:04     ` slavon
  2008-01-15 21:53       ` Brandeburg, Jesse
  0 siblings, 1 reply; 27+ messages in thread
From: slavon @ 2008-01-15 16:04 UTC (permalink / raw)
  To: Frans Pop; +Cc: David Miller, netdev, linux-kernel

Quoting Frans Pop <elendil@planet.nl>:

> On Tuesday 15 January 2008, David Miller wrote:
>> From: Frans Pop <elendil@planet.nl>
>> > kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>>
>> Does this make the problem go away?
>
> Yes, it very much looks like that solves it.
> I ran with the patch for 6 hours or so without any errors. I then switched
> back to an unpatched kernel and they reappeared immediately.
>
>> (Note this isn't the final correct patch we should apply.  There
>>  is no reason why this revert back to the older ->poll() logic
>>  here should have any effect on the TX hang triggering...)
>
> s/no reason/no obvious reason/ ? ;-)
>
> Cheers,
> FJP
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

Hello.

I also try your patch (apply to 2.6.24-rc7-git2)

I catch this message in dmesg
[ 1771.796954] e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
[ 1771.796957]   Tx Queue             <0>
[ 1771.796958]   TDH                  <54>
[ 1771.796959]   TDT                  <54>
[ 1771.796960]   next_to_use          <54>
[ 1771.796961]   next_to_clean        <a9>
[ 1771.796962] buffer_info[next_to_clean]
[ 1771.796963]   time_stamp           <14d72e>
[ 1771.796964]   next_to_watch        <a9>
[ 1771.796965]   jiffies              <14ddd3>
[ 1771.796966]   next_to_watch.status <1>

Thanks.


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-15 16:04     ` slavon
@ 2008-01-15 21:53       ` Brandeburg, Jesse
  2008-01-16  5:02         ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Brandeburg, Jesse @ 2008-01-15 21:53 UTC (permalink / raw)
  To: slavon, Frans Pop; +Cc: David Miller, netdev, linux-kernel

slavon@bigtelecom.ru wrote:
> Quoting Frans Pop <elendil@planet.nl>:
>>> (Note this isn't the final correct patch we should apply.  There  is
>>> no reason why this revert back to the older ->poll() logic  here
>>> should have any effect on the TX hang triggering...)
>> 
>> s/no reason/no obvious reason/ ? ;-)

The tx code has an "early exit" that tries to limit the amount of tx
packets handled in a single poll loop and requires napi or interrupt
rescheduling based on the return value from e1000_clean_tx_irq.

see this code in e1000_clean_tx_irq

4005 #ifdef CONFIG_E1000_NAPI
4006 #define E1000_TX_WEIGHT 64
4007 >       >       /* weight of a sort for tx, to avoid endless
transmit cleanup */
4008 >       >       if (count++ == E1000_TX_WEIGHT) break;
4009 #endif

I think that is probably related.  For a test you could apply the
original patch, and remove this "break" just by commenting out line
4008.  This would guarantee all tx work is cleaned at every e1000_clean

Jesse

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-15 21:53       ` Brandeburg, Jesse
@ 2008-01-16  5:02         ` David Miller
  2008-01-16  8:56           ` Frans Pop
  2008-01-17  7:09           ` Brandeburg, Jesse
  0 siblings, 2 replies; 27+ messages in thread
From: David Miller @ 2008-01-16  5:02 UTC (permalink / raw)
  To: jesse.brandeburg; +Cc: slavon, elendil, netdev, linux-kernel

From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Date: Tue, 15 Jan 2008 13:53:43 -0800

> The tx code has an "early exit" that tries to limit the amount of tx
> packets handled in a single poll loop and requires napi or interrupt
> rescheduling based on the return value from e1000_clean_tx_irq.

That explains everything, thanks Jesse.

Ok, here is the patch I'll propose to fix this.  The goal is to make
it as simple as possible without regressing the thing we were trying
to fix.

Something more sophisticated can be done later.

Three of the 5 Intel drivers had the TX breakout logic.  e1000,
e1000e, and ixgbe.  e100 and ixgb did not, so they don't have any
problems we need to fix here.

What the fix does is behave as if the budget was fully consumed if
*_clean_tx_irq() returns true.

The only valid way to return from ->poll() without copleting the NAPI
poll is by returning work_done == budget.  That signals to the caller
that the NAPI instance has not been descheduled and therefore the
caller fully owns the NAPI context.

This does mean that for these drivers any time TX work is done, we'll
loop at least one extra time in the ->poll() loop of net_rx_work() but
that is historically what these drivers have caused to happen for
years.

For 2.6.25 or similar I would suggest investigating courses of action
to bring closure and consistency to this:

1) Determine whether the loop breakout is actually necessary.
   Jesse explained to me that they had seen a case where a
   thread on one cpu feeding the TX ring could keep a thread
   on another cpu constantly running the *_clean_tx_irq() code
   in a loop.

   I find this hard to believe since even the slowest CPU should be
   able to free up TX entries faster than they can be transmitted on
   gigabit links :-)

2) If the investigation in #1 deems the breakout logic is necessary,
   then consistently amongst all the 5 drivers a policy should be
   implemented which is integrated with the NAPI budgetting logic.
   For example, the simplest thing to do is to pass the budget and the
   "work_done" thing down into *_clean_tx_irq() and break out if it is
   exceeded.

   As a further refinement we can say that TX work is about 1/4 the
   expense of RX work and adjust the budget checking logic to match
   that.

[NET]: Fix TX timeout regression in Intel drivers.

This fixes a regression added by changeset
53e52c729cc169db82a6105fac7a166e10c2ec36 ("[NET]: Make ->poll()
breakout consistent in Intel ethernet drivers.")

As pointed out by Jesse Brandeburg, for three of the drivers edited
above there is breakout logic in the *_clean_tx_irq() code to prevent
running TX reclaim forever.  If this occurs, we have to elide NAPI
poll completion or else those TX events will never be serviced.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 13d57b0..0c9a6f7 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
 {
 	struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
 	struct net_device *poll_dev = adapter->netdev;
-	int work_done = 0;
+	int tx_cleaned = 0, work_done = 0;
 
 	/* Must NOT use netdev_priv macro here. */
 	adapter = poll_dev->priv;
@@ -3929,14 +3929,17 @@ e1000_clean(struct napi_struct *napi, int budget)
 	 * simultaneously.  A failure obtaining the lock means
 	 * tx_ring[0] is currently being cleaned anyway. */
 	if (spin_trylock(&adapter->tx_queue_lock)) {
-		e1000_clean_tx_irq(adapter,
-				   &adapter->tx_ring[0]);
+		tx_cleaned = e1000_clean_tx_irq(adapter,
+						&adapter->tx_ring[0]);
 		spin_unlock(&adapter->tx_queue_lock);
 	}
 
 	adapter->clean_rx(adapter, &adapter->rx_ring[0],
 	                  &work_done, budget);
 
+	if (tx_cleaned)
+		work_done = budget;
+
 	/* If budget not fully consumed, exit the polling mode */
 	if (work_done < budget) {
 		if (likely(adapter->itr_setting & 3))
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 4a6fc74..2ab3bfb 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1384,7 +1384,7 @@ static int e1000_clean(struct napi_struct *napi, int budget)
 {
 	struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
 	struct net_device *poll_dev = adapter->netdev;
-	int work_done = 0;
+	int tx_cleaned = 0, work_done = 0;
 
 	/* Must NOT use netdev_priv macro here. */
 	adapter = poll_dev->priv;
@@ -1394,12 +1394,15 @@ static int e1000_clean(struct napi_struct *napi, int budget)
 	 * simultaneously.  A failure obtaining the lock means
 	 * tx_ring is currently being cleaned anyway. */
 	if (spin_trylock(&adapter->tx_queue_lock)) {
-		e1000_clean_tx_irq(adapter);
+		tx_cleaned = e1000_clean_tx_irq(adapter);
 		spin_unlock(&adapter->tx_queue_lock);
 	}
 
 	adapter->clean_rx(adapter, &work_done, budget);
 
+	if (tx_cleaned)
+		work_done = budget;
+
 	/* If budget not fully consumed, exit the polling mode */
 	if (work_done < budget) {
 		if (adapter->itr_setting & 3)
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index a564916..de3f45e 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1468,13 +1468,16 @@ static int ixgbe_clean(struct napi_struct *napi, int budget)
 	struct ixgbe_adapter *adapter = container_of(napi,
 					struct ixgbe_adapter, napi);
 	struct net_device *netdev = adapter->netdev;
-	int work_done = 0;
+	int tx_cleaned = 0, work_done = 0;
 
 	/* In non-MSIX case, there is no multi-Tx/Rx queue */
-	ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
+	tx_cleaned = ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
 	ixgbe_clean_rx_irq(adapter, &adapter->rx_ring[0], &work_done,
 			   budget);
 
+	if (tx_cleaned)
+		work_done = budget;
+
 	/* If budget not fully consumed, exit the polling mode */
 	if (work_done < budget) {
 		netif_rx_complete(netdev, napi);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-16  5:02         ` David Miller
@ 2008-01-16  8:56           ` Frans Pop
  2008-01-16 10:29             ` David Miller
  2008-01-17  7:09           ` Brandeburg, Jesse
  1 sibling, 1 reply; 27+ messages in thread
From: Frans Pop @ 2008-01-16  8:56 UTC (permalink / raw)
  To: David Miller; +Cc: jesse.brandeburg, slavon, netdev, linux-kernel

On Wednesday 16 January 2008, David Miller wrote:
> Ok, here is the patch I'll propose to fix this.  The goal is to make
> it as simple as possible without regressing the thing we were trying
> to fix.

Looks good to me. Tested with -rc8.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-15  5:53 ` David Miller
  2008-01-15  6:17   ` Frans Pop
  2008-01-15 14:04   ` Frans Pop
@ 2008-01-16  9:02   ` Badalian Vyacheslav
  2008-01-16 12:25     ` David Miller
  2008-01-16 12:28     ` David Miller
  2 siblings, 2 replies; 27+ messages in thread
From: Badalian Vyacheslav @ 2008-01-16  9:02 UTC (permalink / raw)
  To: David Miller; +Cc: elendil, netdev, linux-kernel

applied to 2.6.24-rc7-git2
Have messages
Also have regression after apply patch.
System may do above 800mbs traffic before patch. After its "exit polling 
mode?" (4 CPU, 1 cpu get 100% si (process ksoftirqd/0), 3 CPU is IDLE)
After patch system was go to "exit polling mode" at above 600mbs.

Thanks.

> From: Frans Pop <elendil@planet.nl>
> Date: Tue, 15 Jan 2008 06:25:10 +0100
>
>   
>> kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>>     
>
> Does this make the problem go away?
>
> (Note this isn't the final correct patch we should apply.  There
>  is no reason why this revert back to the older ->poll() logic
>  here should have any effect on the TX hang triggering...)
>
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index 13d57b0..cada32c 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
>  {
>  	struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
>  	struct net_device *poll_dev = adapter->netdev;
> -	int work_done = 0;
> +	int tx_work = 0, work_done = 0;
>  
>  	/* Must NOT use netdev_priv macro here. */
>  	adapter = poll_dev->priv;
> @@ -3929,8 +3929,8 @@ e1000_clean(struct napi_struct *napi, int budget)
>  	 * simultaneously.  A failure obtaining the lock means
>  	 * tx_ring[0] is currently being cleaned anyway. */
>  	if (spin_trylock(&adapter->tx_queue_lock)) {
> -		e1000_clean_tx_irq(adapter,
> -				   &adapter->tx_ring[0]);
> +		tx_work = e1000_clean_tx_irq(adapter,
> +					     &adapter->tx_ring[0]);
>  		spin_unlock(&adapter->tx_queue_lock);
>  	}
>  
> @@ -3938,7 +3938,7 @@ e1000_clean(struct napi_struct *napi, int budget)
>  	                  &work_done, budget);
>  
>  	/* If budget not fully consumed, exit the polling mode */
> -	if (work_done < budget) {
> +	if (!tx_work && (work_done < budget)) {
>  		if (likely(adapter->itr_setting & 3))
>  			e1000_set_itr(adapter);
>  		netif_rx_complete(poll_dev, napi);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-16  8:56           ` Frans Pop
@ 2008-01-16 10:29             ` David Miller
  2008-01-16 17:07               ` Robert Olsson
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2008-01-16 10:29 UTC (permalink / raw)
  To: elendil; +Cc: jesse.brandeburg, slavon, netdev, linux-kernel

From: Frans Pop <elendil@planet.nl>
Date: Wed, 16 Jan 2008 09:56:08 +0100

> On Wednesday 16 January 2008, David Miller wrote:
> > Ok, here is the patch I'll propose to fix this.  The goal is to make
> > it as simple as possible without regressing the thing we were trying
> > to fix.
> 
> Looks good to me. Tested with -rc8.

Thanks for testing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-16  9:02   ` Badalian Vyacheslav
@ 2008-01-16 12:25     ` David Miller
  2008-01-16 12:28     ` David Miller
  1 sibling, 0 replies; 27+ messages in thread
From: David Miller @ 2008-01-16 12:25 UTC (permalink / raw)
  To: slavon; +Cc: elendil, netdev, linux-kernel

From: Badalian Vyacheslav <slavon@bigtelecom.ru>
Date: Wed, 16 Jan 2008 12:02:28 +0300

> applied to 2.6.24-rc7-git2
> Have messages
> Also have regression after apply patch.
> System may do above 800mbs traffic before patch. After its "exit polling 
> mode?" (4 CPU, 1 cpu get 100% si (process ksoftirqd/0), 3 CPU is IDLE)
> After patch system was go to "exit polling mode" at above 600mbs.

What do you mean by 'system was go to "exit polling mode"'?

Please be more clear about your situation, in particular
provide every detail about what happens so that we can
properly debug this.

THanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-16  9:02   ` Badalian Vyacheslav
  2008-01-16 12:25     ` David Miller
@ 2008-01-16 12:28     ` David Miller
  2008-01-21  6:54       ` Badalian Vyacheslav
  1 sibling, 1 reply; 27+ messages in thread
From: David Miller @ 2008-01-16 12:28 UTC (permalink / raw)
  To: slavon; +Cc: elendil, netdev, linux-kernel

From: Badalian Vyacheslav <slavon@bigtelecom.ru>
Date: Wed, 16 Jan 2008 12:02:28 +0300

> Also have regression after apply patch.

BTW, if you are using the e1000e driver then this initial
patch will not work.

My more recent patch posting for this problem, will.

I include it again below for you:

[NET]: Fix TX timeout regression in Intel drivers.

This fixes a regression added by changeset
53e52c729cc169db82a6105fac7a166e10c2ec36 ("[NET]: Make ->poll()
breakout consistent in Intel ethernet drivers.")

As pointed out by Jesse Brandeburg, for three of the drivers edited
above there is breakout logic in the *_clean_tx_irq() code to prevent
running TX reclaim forever.  If this occurs, we have to elide NAPI
poll completion or else those TX events will never be serviced.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 13d57b0..0c9a6f7 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
 {
 	struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
 	struct net_device *poll_dev = adapter->netdev;
-	int work_done = 0;
+	int tx_cleaned = 0, work_done = 0;
 
 	/* Must NOT use netdev_priv macro here. */
 	adapter = poll_dev->priv;
@@ -3929,14 +3929,17 @@ e1000_clean(struct napi_struct *napi, int budget)
 	 * simultaneously.  A failure obtaining the lock means
 	 * tx_ring[0] is currently being cleaned anyway. */
 	if (spin_trylock(&adapter->tx_queue_lock)) {
-		e1000_clean_tx_irq(adapter,
-				   &adapter->tx_ring[0]);
+		tx_cleaned = e1000_clean_tx_irq(adapter,
+						&adapter->tx_ring[0]);
 		spin_unlock(&adapter->tx_queue_lock);
 	}
 
 	adapter->clean_rx(adapter, &adapter->rx_ring[0],
 	                  &work_done, budget);
 
+	if (tx_cleaned)
+		work_done = budget;
+
 	/* If budget not fully consumed, exit the polling mode */
 	if (work_done < budget) {
 		if (likely(adapter->itr_setting & 3))
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 4a6fc74..2ab3bfb 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1384,7 +1384,7 @@ static int e1000_clean(struct napi_struct *napi, int budget)
 {
 	struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
 	struct net_device *poll_dev = adapter->netdev;
-	int work_done = 0;
+	int tx_cleaned = 0, work_done = 0;
 
 	/* Must NOT use netdev_priv macro here. */
 	adapter = poll_dev->priv;
@@ -1394,12 +1394,15 @@ static int e1000_clean(struct napi_struct *napi, int budget)
 	 * simultaneously.  A failure obtaining the lock means
 	 * tx_ring is currently being cleaned anyway. */
 	if (spin_trylock(&adapter->tx_queue_lock)) {
-		e1000_clean_tx_irq(adapter);
+		tx_cleaned = e1000_clean_tx_irq(adapter);
 		spin_unlock(&adapter->tx_queue_lock);
 	}
 
 	adapter->clean_rx(adapter, &work_done, budget);
 
+	if (tx_cleaned)
+		work_done = budget;
+
 	/* If budget not fully consumed, exit the polling mode */
 	if (work_done < budget) {
 		if (adapter->itr_setting & 3)
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index a564916..de3f45e 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1468,13 +1468,16 @@ static int ixgbe_clean(struct napi_struct *napi, int budget)
 	struct ixgbe_adapter *adapter = container_of(napi,
 					struct ixgbe_adapter, napi);
 	struct net_device *netdev = adapter->netdev;
-	int work_done = 0;
+	int tx_cleaned = 0, work_done = 0;
 
 	/* In non-MSIX case, there is no multi-Tx/Rx queue */
-	ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
+	tx_cleaned = ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
 	ixgbe_clean_rx_irq(adapter, &adapter->rx_ring[0], &work_done,
 			   budget);
 
+	if (tx_cleaned)
+		work_done = budget;
+
 	/* If budget not fully consumed, exit the polling mode */
 	if (work_done < budget) {
 		netif_rx_complete(netdev, napi);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-16 10:29             ` David Miller
@ 2008-01-16 17:07               ` Robert Olsson
  2008-01-18 12:11                 ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Robert Olsson @ 2008-01-16 17:07 UTC (permalink / raw)
  To: David Miller; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel


David Miller writes:
 > > On Wednesday 16 January 2008, David Miller wrote:
 > > > Ok, here is the patch I'll propose to fix this.  The goal is to make
 > > > it as simple as possible without regressing the thing we were trying
 > > > to fix.
 > > 
 > > Looks good to me. Tested with -rc8.
 > 
 > Thanks for testing.

 Yes that code looks nice. I'm using the patch but I've noticed another 
 phenomena with the current e1000 driver. There is a race when taking a 
 device down at high traffic loads. I've tracked and instrumented and it 
 seems like occasionly irq_sem can get bump up so interrupts can't be 
 enabled again.


eth0 e1000_irq_enable sem = 1    <- High netload
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1    <- ifconfig eth0 down
eth0 e1000_irq_disable sem = 2

**e1000_open                     <- ifconfig eth0 up
eth0 e1000_irq_disable sem = 3      Dead. irq's can't be enabled
e1000_irq_enable miss
eth0 e1000_irq_enable sem = 2
e1000_irq_enable miss
eth0 e1000_irq_enable sem = 1
ADDRCONF(NETDEV_UP): eth0: link is not ready


Cheers
					--ro

static void
e1000_irq_disable(struct e1000_adapter *adapter)
{
        atomic_inc(&adapter->irq_sem);
        E1000_WRITE_REG(&adapter->hw, IMC, ~0);
        E1000_WRITE_FLUSH(&adapter->hw);
        synchronize_irq(adapter->pdev->irq);

        if(adapter->netdev->ifindex == 3)
        printk("%s e1000_irq_disable sem = %d\n",  adapter->netdev->name,
               atomic_read(&adapter->irq_sem));
}

static void
e1000_irq_enable(struct e1000_adapter *adapter)
{
        if (likely(atomic_dec_and_test(&adapter->irq_sem))) {
                E1000_WRITE_REG(&adapter->hw, IMS, IMS_ENABLE_MASK);
                E1000_WRITE_FLUSH(&adapter->hw);
                }
        else
                printk("e1000_irq_enable miss\n");

        if(adapter->netdev->ifindex == 3)
          printk("%s e1000_irq_enable sem = %d\n",  adapter->netdev->name,
                 atomic_read(&adapter->irq_sem));
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-16  5:02         ` David Miller
  2008-01-16  8:56           ` Frans Pop
@ 2008-01-17  7:09           ` Brandeburg, Jesse
  2008-01-17  7:20             ` David Miller
  1 sibling, 1 reply; 27+ messages in thread
From: Brandeburg, Jesse @ 2008-01-17  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: slavon, elendil, netdev, linux-kernel

David Miller wrote:
> From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
> Date: Tue, 15 Jan 2008 13:53:43 -0800
> 
>> The tx code has an "early exit" that tries to limit the amount of tx
>> packets handled in a single poll loop and requires napi or interrupt
>> rescheduling based on the return value from e1000_clean_tx_irq.
> 
> That explains everything, thanks Jesse.
> 
> Ok, here is the patch I'll propose to fix this.  The goal is to make
> it as simple as possible without regressing the thing we were trying
> to fix.

We spent Wednesday trying to reproduce (without the patch) these issues
without much luck, and have applied the patch cleanly and will continue
testing it.  Given the simplicity of the changes, and the community
testing, I'll give my ack and we will continue testing.

I think we should fix Robert's (unrelated, but in this thread) reported
issue before 2.6.24 final if we can, and I'll look at that tonight and
tomorrow.

Thanks for your work on this Dave,
 Jesse

Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-17  7:09           ` Brandeburg, Jesse
@ 2008-01-17  7:20             ` David Miller
  2008-01-17  7:51               ` Frans Pop
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2008-01-17  7:20 UTC (permalink / raw)
  To: jesse.brandeburg; +Cc: slavon, elendil, netdev, linux-kernel

From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Date: Wed, 16 Jan 2008 23:09:47 -0800

> We spent Wednesday trying to reproduce (without the patch) these issues
> without much luck, and have applied the patch cleanly and will continue
> testing it.  Given the simplicity of the changes, and the community
> testing, I'll give my ack and we will continue testing.

You need a slow CPU, and you need to make sure you do actually
trigger the TX limiting code there.

I bet your cpus are fast enough that it simply never triggers.
:-)

> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

Thanks for reviewing Jesse.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-17  7:20             ` David Miller
@ 2008-01-17  7:51               ` Frans Pop
  2008-01-17  8:00                 ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Frans Pop @ 2008-01-17  7:51 UTC (permalink / raw)
  To: David Miller; +Cc: jesse.brandeburg, slavon, netdev, linux-kernel

On Thursday 17 January 2008, David Miller wrote:
> From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
>
> > We spent Wednesday trying to reproduce (without the patch) these issues
> > without much luck, and have applied the patch cleanly and will continue
> > testing it.  Given the simplicity of the changes, and the community
> > testing, I'll give my ack and we will continue testing.
>
> You need a slow CPU, and you need to make sure you do actually
> trigger the TX limiting code there.

Hmmm. Is a dual core Pentium D 3.20GHz considered slow these days?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-17  7:51               ` Frans Pop
@ 2008-01-17  8:00                 ` David Miller
  2008-01-17  9:40                   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2008-01-17  8:00 UTC (permalink / raw)
  To: elendil; +Cc: jesse.brandeburg, slavon, netdev, linux-kernel

From: Frans Pop <elendil@planet.nl>
Date: Thu, 17 Jan 2008 08:51:55 +0100

> On Thursday 17 January 2008, David Miller wrote:
> > From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
> >
> > > We spent Wednesday trying to reproduce (without the patch) these issues
> > > without much luck, and have applied the patch cleanly and will continue
> > > testing it.  Given the simplicity of the changes, and the community
> > > testing, I'll give my ack and we will continue testing.
> >
> > You need a slow CPU, and you need to make sure you do actually
> > trigger the TX limiting code there.
> 
> Hmmm. Is a dual core Pentium D 3.20GHz considered slow these days?

No of course :-)  I guess it therefore depends upon the load
as well.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-17  8:00                 ` David Miller
@ 2008-01-17  9:40                   ` Arnaldo Carvalho de Melo
  2008-01-17  9:45                     ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2008-01-17  9:40 UTC (permalink / raw)
  To: David Miller; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel

Em Thu, Jan 17, 2008 at 12:00:02AM -0800, David Miller escreveu:
> From: Frans Pop <elendil@planet.nl>
> Date: Thu, 17 Jan 2008 08:51:55 +0100
> 
> > On Thursday 17 January 2008, David Miller wrote:
> > > From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
> > >
> > > > We spent Wednesday trying to reproduce (without the patch) these issues
> > > > without much luck, and have applied the patch cleanly and will continue
> > > > testing it.  Given the simplicity of the changes, and the community
> > > > testing, I'll give my ack and we will continue testing.
> > >
> > > You need a slow CPU, and you need to make sure you do actually
> > > trigger the TX limiting code there.
> > 
> > Hmmm. Is a dual core Pentium D 3.20GHz considered slow these days?
> 
> No of course :-)  I guess it therefore depends upon the load
> as well.

I saw it just once, yesterday:

[root@doppio ~]# uname -r
2.6.24-rc5
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <58>
  TDT                  <8f>
  next_to_use          <8f>
  next_to_clean        <55>
buffer_info[next_to_clean]
  time_stamp           <105e973a9>
  next_to_watch        <56>
  jiffies              <105e97992>
  next_to_watch.status <1>
[root@doppio ~]#

on a lenovo T60W, core2duo machine (2GHz), when using it to stress test
another machine, I was using netperf TCP_STREAM ranging from 1 to 8
streams + a ping -f using various packet sizes.

I'll update this machine today to 2.6.24-rc8-git + net-2.6 and try again
to reproduce.

I also applied David's patch while trying some RT experiments on
another, 8 way machine used as a server, but on this machine I didn't
experience the Tx Unit Hang message with or without the patch.

- Arnaldo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-17  9:40                   ` Arnaldo Carvalho de Melo
@ 2008-01-17  9:45                     ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2008-01-17  9:45 UTC (permalink / raw)
  To: acme; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel

From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Thu, 17 Jan 2008 07:40:07 -0200

> I'll update this machine today to 2.6.24-rc8-git + net-2.6 and try again
> to reproduce.

Thanks for the datapoints and testing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-16 17:07               ` Robert Olsson
@ 2008-01-18 12:11                 ` David Miller
  2008-01-18 13:00                   ` Robert Olsson
  2008-01-21 13:27                   ` Robert Olsson
  0 siblings, 2 replies; 27+ messages in thread
From: David Miller @ 2008-01-18 12:11 UTC (permalink / raw)
  To: Robert.Olsson; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel

From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Wed, 16 Jan 2008 18:07:38 +0100

> 
> eth0 e1000_irq_enable sem = 1    <- High netload
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1    <- ifconfig eth0 down
> eth0 e1000_irq_disable sem = 2
> 
> **e1000_open                     <- ifconfig eth0 up
> eth0 e1000_irq_disable sem = 3      Dead. irq's can't be enabled
> e1000_irq_enable miss
> eth0 e1000_irq_enable sem = 2
> e1000_irq_enable miss
> eth0 e1000_irq_enable sem = 1
> ADDRCONF(NETDEV_UP): eth0: link is not ready

Yes, this semaphore thing is highly problematic.  In the most crucial
areas where network driver consistency matters the most for ease of
understanding and debugging, the Intel drivers choose to be different
:-(

The way the napi_disable() logic breaks out from high packet load in
net_rx_action() is it simply returns even leaving interrupts disabled
when a pending napi_disable() is pending.

This is what trips up the semaphore logic.

Robert, give this patch a try.

In the long term this semaphore should be completely eliminated,
there is no justification for it.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 0c9a6f7..76c0fa6 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)
 
 #ifdef CONFIG_E1000_NAPI
 	napi_disable(&adapter->napi);
+	atomic_set(&adapter->irq_sem, 0);
 #endif
 	e1000_irq_disable(adapter);
 
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 2ab3bfb..9cc5a6b 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
 	msleep(10);
 
 	napi_disable(&adapter->napi);
+	atomic_set(&adapter->irq_sem, 0);
 	e1000_irq_disable(adapter);
 
 	del_timer_sync(&adapter->watchdog_timer);
diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
index d2fb88d..4f63839 100644
--- a/drivers/net/ixgb/ixgb_main.c
+++ b/drivers/net/ixgb/ixgb_main.c
@@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 {
 	struct net_device *netdev = adapter->netdev;
 
+#ifdef CONFIG_IXGB_NAPI
+	napi_disable(&adapter->napi);
+	atomic_set(&adapter->irq_sem, 0);
+#endif
+
 	ixgb_irq_disable(adapter);
 	free_irq(adapter->pdev->irq, netdev);
 
@@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 
 	if(kill_watchdog)
 		del_timer_sync(&adapter->watchdog_timer);
-#ifdef CONFIG_IXGB_NAPI
-	napi_disable(&adapter->napi);
-#endif
+
 	adapter->link_speed = 0;
 	adapter->link_duplex = 0;
 	netif_carrier_off(netdev);
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index de3f45e..a4265bc 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	IXGBE_WRITE_FLUSH(&adapter->hw);
 	msleep(10);
 
+	napi_disable(&adapter->napi);
+	atomic_set(&adapter->irq_sem, 0);
+
 	ixgbe_irq_disable(adapter);
 
-	napi_disable(&adapter->napi);
 	del_timer_sync(&adapter->watchdog_timer);
 
 	netif_carrier_off(netdev);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-18 12:11                 ` David Miller
@ 2008-01-18 13:00                   ` Robert Olsson
  2008-01-18 13:37                     ` David Miller
  2008-01-21 13:27                   ` Robert Olsson
  1 sibling, 1 reply; 27+ messages in thread
From: Robert Olsson @ 2008-01-18 13:00 UTC (permalink / raw)
  To: David Miller
  Cc: Robert.Olsson, elendil, jesse.brandeburg, slavon, netdev, linux-kernel


David Miller writes:

 > > eth0 e1000_irq_enable sem = 1    <- ifconfig eth0 down
 > > eth0 e1000_irq_disable sem = 2
 > > 
 > > **e1000_open                     <- ifconfig eth0 up
 > > eth0 e1000_irq_disable sem = 3      Dead. irq's can't be enabled
 > > e1000_irq_enable miss
 > > eth0 e1000_irq_enable sem = 2
 > > e1000_irq_enable miss
 > > eth0 e1000_irq_enable sem = 1
 > > ADDRCONF(NETDEV_UP): eth0: link is not ready
 > 
 > Yes, this semaphore thing is highly problematic.  In the most crucial
 > areas where network driver consistency matters the most for ease of
 > understanding and debugging, the Intel drivers choose to be different

 I don't understand the idea with semaphore for enabling/disabling 
 irq's either the overall logic must safer/better without it.  
 
 > The way the napi_disable() logic breaks out from high packet load in
 > net_rx_action() is it simply returns even leaving interrupts disabled
 > when a pending napi_disable() is pending.
 > 
 > This is what trips up the semaphore logic.
 > 
 > Robert, give this patch a try.
 > 
 > In the long term this semaphore should be completely eliminated,
 > there is no justification for it.

 It's on the testing list...

 Cheers
					--ro


 > 
 > Signed-off-by: David S. Miller <davem@davemloft.net>
 > 
 > diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
 > index 0c9a6f7..76c0fa6 100644
 > --- a/drivers/net/e1000/e1000_main.c
 > +++ b/drivers/net/e1000/e1000_main.c
 > @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)
 >  
 >  #ifdef CONFIG_E1000_NAPI
 >  	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 >  #endif
 >  	e1000_irq_disable(adapter);
 >  
 > diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
 > index 2ab3bfb..9cc5a6b 100644
 > --- a/drivers/net/e1000e/netdev.c
 > +++ b/drivers/net/e1000e/netdev.c
 > @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
 >  	msleep(10);
 >  
 >  	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 >  	e1000_irq_disable(adapter);
 >  
 >  	del_timer_sync(&adapter->watchdog_timer);
 > diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
 > index d2fb88d..4f63839 100644
 > --- a/drivers/net/ixgb/ixgb_main.c
 > +++ b/drivers/net/ixgb/ixgb_main.c
 > @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 >  {
 >  	struct net_device *netdev = adapter->netdev;
 >  
 > +#ifdef CONFIG_IXGB_NAPI
 > +	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 > +#endif
 > +
 >  	ixgb_irq_disable(adapter);
 >  	free_irq(adapter->pdev->irq, netdev);
 >  
 > @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 >  
 >  	if(kill_watchdog)
 >  		del_timer_sync(&adapter->watchdog_timer);
 > -#ifdef CONFIG_IXGB_NAPI
 > -	napi_disable(&adapter->napi);
 > -#endif
 > +
 >  	adapter->link_speed = 0;
 >  	adapter->link_duplex = 0;
 >  	netif_carrier_off(netdev);
 > diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
 > index de3f45e..a4265bc 100644
 > --- a/drivers/net/ixgbe/ixgbe_main.c
 > +++ b/drivers/net/ixgbe/ixgbe_main.c
 > @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 >  	IXGBE_WRITE_FLUSH(&adapter->hw);
 >  	msleep(10);
 >  
 > +	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 > +
 >  	ixgbe_irq_disable(adapter);
 >  
 > -	napi_disable(&adapter->napi);
 >  	del_timer_sync(&adapter->watchdog_timer);
 >  
 >  	netif_carrier_off(netdev);
 > --
 > To unsubscribe from this list: send the line "unsubscribe netdev" in
 > the body of a message to majordomo@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-18 13:00                   ` Robert Olsson
@ 2008-01-18 13:37                     ` David Miller
  2008-01-20  9:20                       ` Brandeburg, Jesse
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2008-01-18 13:37 UTC (permalink / raw)
  To: Robert.Olsson; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel

From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Fri, 18 Jan 2008 14:00:57 +0100

>  I don't understand the idea with semaphore for enabling/disabling 
>  irq's either the overall logic must safer/better without it.  

They must have had code paths where they didn't know if IRQs were
enabled or not already, so they tried to create something which
approximates the:

	local_irq_save(flags);
	local_irq_restore(flags);

constructs we have for CPU interrupts, so they could go:

	e1000_irq_disable();
	/* ... */
	e1000_irq_enable();

and this would work even if the caller was running
with e1000 interrupts disabled already.

Or, something like that... it is indeed confusing.

Anyways, yes it's totally bogus and should be removed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-18 13:37                     ` David Miller
@ 2008-01-20  9:20                       ` Brandeburg, Jesse
  2008-01-20  9:28                         ` Andrey Rahmatullin
  0 siblings, 1 reply; 27+ messages in thread
From: Brandeburg, Jesse @ 2008-01-20  9:20 UTC (permalink / raw)
  To: David Miller, Robert.Olsson; +Cc: elendil, slavon, netdev, linux-kernel

David Miller wrote:
> From: Robert Olsson <Robert.Olsson@data.slu.se>
> Date: Fri, 18 Jan 2008 14:00:57 +0100
> 
>>  I don't understand the idea with semaphore for enabling/disabling
>>  irq's either the overall logic must safer/better without it.
> 
> They must have had code paths where they didn't know if IRQs were
> enabled or not already, so they tried to create something which
> approximates the:
> 
> 	local_irq_save(flags);
> 	local_irq_restore(flags);
> 
> constructs we have for CPU interrupts, so they could go:
> 
> 	e1000_irq_disable();
> 	/* ... */
> 	e1000_irq_enable();
> 
> and this would work even if the caller was running
> with e1000 interrupts disabled already.
> 
> Or, something like that... it is indeed confusing.
> 
> Anyways, yes it's totally bogus and should be removed.

I agree, bogus, and in fact I've already removed it from our development
version of ixgbe.  Right now I wanted to report I can't remove e1000 at
all on 2.6.24-rc8+git

I continually get the
 kernel: unregister_netdevice: waiting for eth2 to become free. Usage
count = 1

Where 2.6.24-rc5 e1000 is okay still.  Seems like maybe we are still
missing a netif_rx_complete or a napi_disable somewhere.

I don't think this problem has anything to do with the irq_sem right
now.  Something is still badly broken.  I am just using the interface
regularly (no heavy load) and I can't unload the module.

Jesse

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-20  9:20                       ` Brandeburg, Jesse
@ 2008-01-20  9:28                         ` Andrey Rahmatullin
  0 siblings, 0 replies; 27+ messages in thread
From: Andrey Rahmatullin @ 2008-01-20  9:28 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: David Miller, Robert.Olsson, elendil, slavon, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 263 bytes --]

On Sun, Jan 20, 2008 at 01:20:11AM -0800, Brandeburg, Jesse wrote:
> I continually get the
>  kernel: unregister_netdevice: waiting for eth2 to become free. Usage
> count = 1
http://bugzilla.kernel.org/show_bug.cgi?id=9778

-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-16 12:28     ` David Miller
@ 2008-01-21  6:54       ` Badalian Vyacheslav
  0 siblings, 0 replies; 27+ messages in thread
From: Badalian Vyacheslav @ 2008-01-21  6:54 UTC (permalink / raw)
  To: David Miller; +Cc: elendil, netdev, linux-kernel

Hello. Its work, thanks for resend it!
Sorry, i understand that patch 53e52c729cc169db82a6105fac7a166e10c2ec36 
("[NET]: Make ->poll() breakout consistent in Intel ethernet drivers.") 
have regression and rollback it, i not see your patch.
Sorry again.

Thanks!
> From: Badalian Vyacheslav <slavon@bigtelecom.ru>
> Date: Wed, 16 Jan 2008 12:02:28 +0300
>
>   
>> Also have regression after apply patch.
>>     
>
> BTW, if you are using the e1000e driver then this initial
> patch will not work.
>
> My more recent patch posting for this problem, will.
>
> I include it again below for you:
>
> [NET]: Fix TX timeout regression in Intel drivers.
>
> This fixes a regression added by changeset
> 53e52c729cc169db82a6105fac7a166e10c2ec36 ("[NET]: Make ->poll()
> breakout consistent in Intel ethernet drivers.")
>
> As pointed out by Jesse Brandeburg, for three of the drivers edited
> above there is breakout logic in the *_clean_tx_irq() code to prevent
> running TX reclaim forever.  If this occurs, we have to elide NAPI
> poll completion or else those TX events will never be serviced.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index 13d57b0..0c9a6f7 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
>  {
>  	struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
>  	struct net_device *poll_dev = adapter->netdev;
> -	int work_done = 0;
> +	int tx_cleaned = 0, work_done = 0;
>  
>  	/* Must NOT use netdev_priv macro here. */
>  	adapter = poll_dev->priv;
> @@ -3929,14 +3929,17 @@ e1000_clean(struct napi_struct *napi, int budget)
>  	 * simultaneously.  A failure obtaining the lock means
>  	 * tx_ring[0] is currently being cleaned anyway. */
>  	if (spin_trylock(&adapter->tx_queue_lock)) {
> -		e1000_clean_tx_irq(adapter,
> -				   &adapter->tx_ring[0]);
> +		tx_cleaned = e1000_clean_tx_irq(adapter,
> +						&adapter->tx_ring[0]);
>  		spin_unlock(&adapter->tx_queue_lock);
>  	}
>  
>  	adapter->clean_rx(adapter, &adapter->rx_ring[0],
>  	                  &work_done, budget);
>  
> +	if (tx_cleaned)
> +		work_done = budget;
> +
>  	/* If budget not fully consumed, exit the polling mode */
>  	if (work_done < budget) {
>  		if (likely(adapter->itr_setting & 3))
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index 4a6fc74..2ab3bfb 100644
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -1384,7 +1384,7 @@ static int e1000_clean(struct napi_struct *napi, int budget)
>  {
>  	struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
>  	struct net_device *poll_dev = adapter->netdev;
> -	int work_done = 0;
> +	int tx_cleaned = 0, work_done = 0;
>  
>  	/* Must NOT use netdev_priv macro here. */
>  	adapter = poll_dev->priv;
> @@ -1394,12 +1394,15 @@ static int e1000_clean(struct napi_struct *napi, int budget)
>  	 * simultaneously.  A failure obtaining the lock means
>  	 * tx_ring is currently being cleaned anyway. */
>  	if (spin_trylock(&adapter->tx_queue_lock)) {
> -		e1000_clean_tx_irq(adapter);
> +		tx_cleaned = e1000_clean_tx_irq(adapter);
>  		spin_unlock(&adapter->tx_queue_lock);
>  	}
>  
>  	adapter->clean_rx(adapter, &work_done, budget);
>  
> +	if (tx_cleaned)
> +		work_done = budget;
> +
>  	/* If budget not fully consumed, exit the polling mode */
>  	if (work_done < budget) {
>  		if (adapter->itr_setting & 3)
> diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
> index a564916..de3f45e 100644
> --- a/drivers/net/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ixgbe/ixgbe_main.c
> @@ -1468,13 +1468,16 @@ static int ixgbe_clean(struct napi_struct *napi, int budget)
>  	struct ixgbe_adapter *adapter = container_of(napi,
>  					struct ixgbe_adapter, napi);
>  	struct net_device *netdev = adapter->netdev;
> -	int work_done = 0;
> +	int tx_cleaned = 0, work_done = 0;
>  
>  	/* In non-MSIX case, there is no multi-Tx/Rx queue */
> -	ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
> +	tx_cleaned = ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
>  	ixgbe_clean_rx_irq(adapter, &adapter->rx_ring[0], &work_done,
>  			   budget);
>  
> +	if (tx_cleaned)
> +		work_done = budget;
> +
>  	/* If budget not fully consumed, exit the polling mode */
>  	if (work_done < budget) {
>  		netif_rx_complete(netdev, napi);
>
>   


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-18 12:11                 ` David Miller
  2008-01-18 13:00                   ` Robert Olsson
@ 2008-01-21 13:27                   ` Robert Olsson
  2008-01-21 13:29                     ` David Miller
  1 sibling, 1 reply; 27+ messages in thread
From: Robert Olsson @ 2008-01-21 13:27 UTC (permalink / raw)
  To: David Miller
  Cc: Robert.Olsson, elendil, jesse.brandeburg, slavon, netdev, linux-kernel


David Miller writes:

 > Yes, this semaphore thing is highly problematic.  In the most crucial
 > areas where network driver consistency matters the most for ease of
 > understanding and debugging, the Intel drivers choose to be different
 > :-(
 > 
 > The way the napi_disable() logic breaks out from high packet load in
 > net_rx_action() is it simply returns even leaving interrupts disabled
 > when a pending napi_disable() is pending.
 > 
 > This is what trips up the semaphore logic.
 > 
 > Robert, give this patch a try.


 Yes it works. e1000 tested for ~3 hours with high very high load and 
 interface up/down every 5:th sec. Without the patch the irq's gets 
 disabled within a couple of seconds

 A resolute way of handling the semaphores. :)
   
 Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
 
 Cheers
					--ro


 > In the long term this semaphore should be completely eliminated,
 > there is no justification for it.
 > 
 > Signed-off-by: David S. Miller <davem@davemloft.net>
 > 
 > diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
 > index 0c9a6f7..76c0fa6 100644
 > --- a/drivers/net/e1000/e1000_main.c
 > +++ b/drivers/net/e1000/e1000_main.c
 > @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)
 >  
 >  #ifdef CONFIG_E1000_NAPI
 >  	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 >  #endif
 >  	e1000_irq_disable(adapter);
 >  
 > diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
 > index 2ab3bfb..9cc5a6b 100644
 > --- a/drivers/net/e1000e/netdev.c
 > +++ b/drivers/net/e1000e/netdev.c
 > @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
 >  	msleep(10);
 >  
 >  	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 >  	e1000_irq_disable(adapter);
 >  
 >  	del_timer_sync(&adapter->watchdog_timer);
 > diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
 > index d2fb88d..4f63839 100644
 > --- a/drivers/net/ixgb/ixgb_main.c
 > +++ b/drivers/net/ixgb/ixgb_main.c
 > @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 >  {
 >  	struct net_device *netdev = adapter->netdev;
 >  
 > +#ifdef CONFIG_IXGB_NAPI
 > +	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 > +#endif
 > +
 >  	ixgb_irq_disable(adapter);
 >  	free_irq(adapter->pdev->irq, netdev);
 >  
 > @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 >  
 >  	if(kill_watchdog)
 >  		del_timer_sync(&adapter->watchdog_timer);
 > -#ifdef CONFIG_IXGB_NAPI
 > -	napi_disable(&adapter->napi);
 > -#endif
 > +
 >  	adapter->link_speed = 0;
 >  	adapter->link_duplex = 0;
 >  	netif_carrier_off(netdev);
 > diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
 > index de3f45e..a4265bc 100644
 > --- a/drivers/net/ixgbe/ixgbe_main.c
 > +++ b/drivers/net/ixgbe/ixgbe_main.c
 > @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 >  	IXGBE_WRITE_FLUSH(&adapter->hw);
 >  	msleep(10);
 >  
 > +	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 > +
 >  	ixgbe_irq_disable(adapter);
 >  
 > -	napi_disable(&adapter->napi);
 >  	del_timer_sync(&adapter->watchdog_timer);
 >  
 >  	netif_carrier_off(netdev);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
  2008-01-21 13:27                   ` Robert Olsson
@ 2008-01-21 13:29                     ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2008-01-21 13:29 UTC (permalink / raw)
  To: Robert.Olsson; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel

From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Mon, 21 Jan 2008 14:27:13 +0100

>  Yes it works. e1000 tested for ~3 hours with high very high load and 
>  interface up/down every 5:th sec. Without the patch the irq's gets 
>  disabled within a couple of seconds
> 
>  A resolute way of handling the semaphores. :)
>    
>  Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>


Thanks for testing Robert.

I sent off that fix to Linus an hour or so ago, hopefully
he will pick it up some time today.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-01-21 13:29 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-15  5:25 [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang Frans Pop
2008-01-15  5:53 ` David Miller
2008-01-15  6:17   ` Frans Pop
2008-01-15 14:04   ` Frans Pop
2008-01-15 16:04     ` slavon
2008-01-15 21:53       ` Brandeburg, Jesse
2008-01-16  5:02         ` David Miller
2008-01-16  8:56           ` Frans Pop
2008-01-16 10:29             ` David Miller
2008-01-16 17:07               ` Robert Olsson
2008-01-18 12:11                 ` David Miller
2008-01-18 13:00                   ` Robert Olsson
2008-01-18 13:37                     ` David Miller
2008-01-20  9:20                       ` Brandeburg, Jesse
2008-01-20  9:28                         ` Andrey Rahmatullin
2008-01-21 13:27                   ` Robert Olsson
2008-01-21 13:29                     ` David Miller
2008-01-17  7:09           ` Brandeburg, Jesse
2008-01-17  7:20             ` David Miller
2008-01-17  7:51               ` Frans Pop
2008-01-17  8:00                 ` David Miller
2008-01-17  9:40                   ` Arnaldo Carvalho de Melo
2008-01-17  9:45                     ` David Miller
2008-01-16  9:02   ` Badalian Vyacheslav
2008-01-16 12:25     ` David Miller
2008-01-16 12:28     ` David Miller
2008-01-21  6:54       ` Badalian Vyacheslav

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).