linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [NET]: Fix Ooops of napi net_rx_action.
@ 2007-12-11  9:13 Joonwoo Park
  2007-12-11 10:32 ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Joonwoo Park @ 2007-12-11  9:13 UTC (permalink / raw)
  To: netdev, linux-kernel

[NET]: Fix Ooops of napi net_rx_action.
Before doing list_move_tail napi poll_list, it should be ensured

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
---
diff --git a/net/core/dev.c b/net/core/dev.c
index 86d6261..74bd5ab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2207,7 +2207,8 @@ static void net_rx_action(struct softirq_action *h)
 		 * still "owns" the NAPI instance and therefore can
 		 * move the instance around on the list at-will.
 		 */
-		if (unlikely(work == weight))
+		if (unlikely((test_bit(NAPI_STATE_SCHED, &n->state))
+				&& (work == weight)))
 			list_move_tail(&n->poll_list, list);
 
 		netpoll_poll_unlock(have);
---

I was able to make it real oops on 2.6.24-rc4 + e1000 napi + smp.
eth0 of my laptop is connected directly to a packet generator.
The packet generator is making unicast udp packets to laptop (approximately 70000pps).
At that time, if the interface is went down, Ooops occurred.

Thanks.
Joonwoo


root@joonwpark-laptop:~# ifconfig eth0 down
BUG: unable to handle kernel paging request at virtual address 00100104
printing eip: c42ecc35 *pdpt = 000000001fc89001 *pde = 0000000000000000
Ooops: 0002 [#1] SMP
Modules linked in:

Pid: 4, comm:ksoftirqd/0 Not tainted (2.6.24-rc4 #27)
EIP: 0060:[<c42ecc35>] EFLAGS: 00010046 CPU: 0
EIP is at net_rx_action+0x1d5/0x1f0
EAX: 00100100 EBX: f77c965c ECX: 00000000 EDX: 00200200
ESI: 00000040 EDI: f77c965c EBP: f7c6df94 ESP: f7c6df64
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 Process ksoftirqd/0 (pid: 4, ti=f7c6c000 task=f7c68ab0 task.ti=f7c6c000)
Stack: 00000002 00000001 c42ecabe c4598f40 c1d0f5cc 0000c7da 000000ac f77c965c
       00000040 00000001 c4543b18 c4598f40 f7c6dfb0 c4032a07 00000004 00000000
       00000246 00000000 c4032f60 f7c6dfbc c4032ad7 c459ba80 f6c6dfcc c4032fb9
Call Trace:
 [<c40053ca>] show_trace_log_lvl+0x1a/0x30
 [<c4005489>] show_stack_log_lvl+0xa9/0xd0
 [<c400557a>] show_registers+0xca/0x1c0
 [<c4005785>] die+0x115/0x230
 [<c4020f0c>] do_page_fault+0x34c/0x7f0
 [<c43c9cea>] error_code+0x72/0x78
 [<c4032a07>] __do_softirq+0x87/0x60
 [<c4032ad7>] do_softirq+0x57/0xe0
 [<c4041ee2>] kthread+0x42/0x70
 [<c4004fc3>] kernel_thread_helper+0x7/0x14
 
gdb outputs
(gdb)  info line *net_rx_action+0x1d5
Line 157 of "include/linux/list.h"
   starts at address 0xc42ecc35 <net_rx_action+469>
   and ends at 0xc42ecc38 <net_rx_action+472>.
(gdb) l include/linux/list.h:157
152      * This is only for internal list manipulation where we know
153      * the prev/next entries already!
154      */
155     static inline void __list_del(struct list_head * prev, struct list_head * next)
156     {
157             next->prev = prev;
158             prev->next = next;
159     }
160
161     /**
(gdb) info line *__do_softirq+0x87
Line 130 of "include/linux/rcupdate.h"
   starts at address 0xc4032a07 <__do_softirq+135>
   and ends at 0xc4032a0a <__do_softirq+138>.
(gdb) l include/linux/rcupdate.h:130
125             struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
126             rdp->passed_quiesc = 1;
127     }
128     static inline void rcu_bh_qsctr_inc(int cpu)
129     {
130             struct rcu_data *rdp = &per_cpu(rcu_bh_data, cpu);
131             rdp->passed_quiesc = 1;
132     }
133
134     extern int rcu_pending(int cpu);
(gdb) info line *do_softirq+0x57
Line 269 of "kernel/softirq.c" starts at address 0xc4032ad2 <do_softirq+82>
   and ends at 0xc4032ae0 <tasklet_kill>.
(gdb) l kernel/softirq.c:269
264             local_irq_save(flags);
265
266             pending = local_softirq_pending();
267
268             if (pending)
269                     __do_softirq();
270
271             local_irq_restore(flags);
272     }
273


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-11  9:13 [PATCH] [NET]: Fix Ooops of napi net_rx_action Joonwoo Park
@ 2007-12-11 10:32 ` David Miller
  2007-12-11 12:36   ` Herbert Xu
  2007-12-11 12:57   ` Joonwoo Park
  0 siblings, 2 replies; 12+ messages in thread
From: David Miller @ 2007-12-11 10:32 UTC (permalink / raw)
  To: joonwpark81; +Cc: netdev, linux-kernel

From: "Joonwoo Park" <joonwpark81@gmail.com>
Date: Tue, 11 Dec 2007 18:13:34 +0900

Joonwoo-ssi annyoung haseyo,

> [NET]: Fix Ooops of napi net_rx_action.
> Before doing list_move_tail napi poll_list, it should be ensured
> 
> Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> ---
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 86d6261..74bd5ab 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2207,7 +2207,8 @@ static void net_rx_action(struct softirq_action *h)
>  		 * still "owns" the NAPI instance and therefore can
>  		 * move the instance around on the list at-will.
>  		 */
> -		if (unlikely(work == weight))
> +		if (unlikely((test_bit(NAPI_STATE_SCHED, &n->state))
> +				&& (work == weight)))
>  			list_move_tail(&n->poll_list, list);
>  
>  		netpoll_poll_unlock(have);

How can the NAPI_STATE_SCHED bit be cleared externally yet we take
this list_move_tail() code path?

If NAPI_STATE_SCHED is cleared, work will be zero which will never be
equal to 'weight', and this we'll never attempt the list_move_tail().

If something clears NAPI_STATE_SCHED meanwhile, we have a serious race
and your patch is an incomplete bandaid.  For example, if it can
happen, then a case like:

		if (test_bit(NAPI_STATE_SCHED, &n->state))
	... something clears NAPI_STATE_SCHED right now ...
			work = n->poll(n, weight);

can crash too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-11 10:32 ` David Miller
@ 2007-12-11 12:36   ` Herbert Xu
  2007-12-11 12:41     ` David Miller
  2007-12-11 12:57   ` Joonwoo Park
  1 sibling, 1 reply; 12+ messages in thread
From: Herbert Xu @ 2007-12-11 12:36 UTC (permalink / raw)
  To: David Miller; +Cc: joonwpark81, netdev, linux-kernel

David Miller <davem@davemloft.net> wrote:
>
> How can the NAPI_STATE_SCHED bit be cleared externally yet we take
> this list_move_tail() code path?

His driver is probably buggy.  When we had two drivers beginning
with e100 we often forgot to apply fixes to the both of them.  Now
that we have three it's even more confusing.

I just checked and indeed e1000e seems to be missing the NAPI fix
that was applied to e1000.  Of course it doesn't rule out the
possibility of another NAPI bug in e1000.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-11 12:36   ` Herbert Xu
@ 2007-12-11 12:41     ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2007-12-11 12:41 UTC (permalink / raw)
  To: herbert; +Cc: joonwpark81, netdev, linux-kernel

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 11 Dec 2007 20:36:21 +0800

> David Miller <davem@davemloft.net> wrote:
> >
> > How can the NAPI_STATE_SCHED bit be cleared externally yet we take
> > this list_move_tail() code path?
> 
> His driver is probably buggy.  When we had two drivers beginning
> with e100 we often forgot to apply fixes to the both of them.  Now
> that we have three it's even more confusing.
> 
> I just checked and indeed e1000e seems to be missing the NAPI fix
> that was applied to e1000.  Of course it doesn't rule out the
> possibility of another NAPI bug in e1000.

Thanks for checking.

Indeed I stuck the huge comment there in net_rx_action() above the
list move to try and explain things to people, so that if you saw a
crash in the list manipulation, you're go check the driver first.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-11 10:32 ` David Miller
  2007-12-11 12:36   ` Herbert Xu
@ 2007-12-11 12:57   ` Joonwoo Park
  2007-12-11 23:12     ` Brandeburg, Jesse
  1 sibling, 1 reply; 12+ messages in thread
From: Joonwoo Park @ 2007-12-11 12:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, herbert

2007/12/11, David Miller <davem@davemloft.net>:
> From: "Joonwoo Park" <joonwpark81@gmail.com>
> Date: Tue, 11 Dec 2007 18:13:34 +0900
>
> Joonwoo-ssi annyoung haseyo,

Wow Great! :-)

> How can the NAPI_STATE_SCHED bit be cleared externally yet we take
> this list_move_tail() code path?
>
> If NAPI_STATE_SCHED is cleared, work will be zero which will never be
> equal to 'weight', and this we'll never attempt the list_move_tail().
>
> If something clears NAPI_STATE_SCHED meanwhile, we have a serious race
> and your patch is an incomplete bandaid.  For example, if it can
> happen, then a case like:
>
>                if (test_bit(NAPI_STATE_SCHED, &n->state))
>        ... something clears NAPI_STATE_SCHED right now ...
>                        work = n->poll(n, weight);
>
> can crash too.
>

David,
With your suggestions, I'm feeling a doubt at e1000.
It's seems to me e1000_clean does call netif_rx_complete and returns
non-zero work_done when !netif_running()
So net_rx_action has (work == weight) as true with NAPI_STATE_SCHED cleared.

Here is the e1000_clean.

static int
e1000_clean(struct napi_struct *napi, int budget)
{
    /* Keep link state information with original netdev */
    if (!netif_carrier_ok(poll_dev))
        goto quit_polling;

...

    adapter->clean_rx(adapter, &adapter->rx_ring[0],
                      &work_done, budget);

    /* If no Tx and not enough Rx work done, exit the polling mode */
    if ((!tx_cleaned && (work_done == 0)) ||
       !netif_running(poll_dev)) {
quit_polling:
        if (likely(adapter->itr_setting & 3))
            e1000_set_itr(adapter);
        netif_rx_complete(poll_dev, napi);
        e1000_irq_enable(adapter);
    }

    return work_done;
}

Thanks.
Joonwoo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-11 12:57   ` Joonwoo Park
@ 2007-12-11 23:12     ` Brandeburg, Jesse
  2007-12-11 23:27       ` Joonwoo Park
  2007-12-12  0:12       ` Joonwoo Park
  0 siblings, 2 replies; 12+ messages in thread
From: Brandeburg, Jesse @ 2007-12-11 23:12 UTC (permalink / raw)
  To: Joonwoo Park, David Miller; +Cc: netdev, linux-kernel, herbert, Kok, Auke-jan H

Joonwoo Park wrote:
>     /* If no Tx and not enough Rx work done, exit the polling mode */
>     if ((!tx_cleaned && (work_done == 0)) ||
>        !netif_running(poll_dev)) {
> quit_polling:
>         if (likely(adapter->itr_setting & 3))
>             e1000_set_itr(adapter);
>         netif_rx_complete(poll_dev, napi);
>         e1000_irq_enable(adapter);

all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here, after
calling netif_rx_complete.  netif_rx_complete plus work_done != 0 causes
a bug.

>     }
> 
>     return work_done;
> }
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-11 23:12     ` Brandeburg, Jesse
@ 2007-12-11 23:27       ` Joonwoo Park
  2007-12-11 23:42         ` Stephen Hemminger
  2007-12-12  0:12       ` Joonwoo Park
  1 sibling, 1 reply; 12+ messages in thread
From: Joonwoo Park @ 2007-12-11 23:27 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: David Miller, netdev, linux-kernel, herbert, Kok, Auke-jan H

2007/12/12, Brandeburg, Jesse <jesse.brandeburg@intel.com>:
> Joonwoo Park wrote:
> >     /* If no Tx and not enough Rx work done, exit the polling mode */
> >     if ((!tx_cleaned && (work_done == 0)) ||
> >        !netif_running(poll_dev)) {
> > quit_polling:
> >         if (likely(adapter->itr_setting & 3))
> >             e1000_set_itr(adapter);
> >         netif_rx_complete(poll_dev, napi);
> >         e1000_irq_enable(adapter);
>
> all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here, after
> calling netif_rx_complete.  netif_rx_complete plus work_done != 0 causes
> a bug.
>
> >     }
> >
> >     return work_done;
> > }
> >
>

I'm working another patch for drivers (maybe patches)

Thanks.
Joonwoo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-11 23:27       ` Joonwoo Park
@ 2007-12-11 23:42         ` Stephen Hemminger
  2007-12-16 21:35           ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2007-12-11 23:42 UTC (permalink / raw)
  To: Joonwoo Park
  Cc: Brandeburg, Jesse, David Miller, netdev, linux-kernel, herbert,
	Kok, Auke-jan H

Perhaps we should change the warning to identify the guilty device.


--- a/net/core/dev.c	2007-11-19 09:09:57.000000000 -0800
+++ b/net/core/dev.c	2007-12-07 15:54:03.000000000 -0800
@@ -2196,7 +2196,13 @@ static void net_rx_action(struct softirq
 		if (test_bit(NAPI_STATE_SCHED, &n->state))
 			work = n->poll(n, weight);
 
-		WARN_ON_ONCE(work > weight);
+		if (unlikely(work > weight)) {
+			if (net_ratelimit())
+				printk(KERN_WARNING
+				       "%s: driver poll bug (work=%d weight=%d)\n",
+				       work, weight);
+			work = weight;
+		}
 
 		budget -= work;
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-11 23:12     ` Brandeburg, Jesse
  2007-12-11 23:27       ` Joonwoo Park
@ 2007-12-12  0:12       ` Joonwoo Park
  2007-12-12  0:38         ` Brandeburg, Jesse
  1 sibling, 1 reply; 12+ messages in thread
From: Joonwoo Park @ 2007-12-12  0:12 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: David Miller, netdev, linux-kernel, herbert, Kok, Auke-jan H

2007/12/12, Brandeburg, Jesse <jesse.brandeburg@intel.com>:
>
> all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here, after
> calling netif_rx_complete.  netif_rx_complete plus work_done != 0 causes
> a bug.
>

Brandeburg,
Don't we need to return non-zero work_done after netif_rx_complete if
work_done != weight?

Thanks,
Joonwoo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-12  0:12       ` Joonwoo Park
@ 2007-12-12  0:38         ` Brandeburg, Jesse
  2007-12-12 15:12           ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Brandeburg, Jesse @ 2007-12-12  0:38 UTC (permalink / raw)
  To: Joonwoo Park; +Cc: David Miller, netdev, linux-kernel, herbert, Kok, Auke-jan H

Joonwoo Park wrote:
> 2007/12/12, Brandeburg, Jesse <jesse.brandeburg@intel.com>:
>> 
>> all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here,
>> after calling netif_rx_complete.  netif_rx_complete plus work_done
>> != 0 causes a bug. 
>> 
> 
> Brandeburg,
> Don't we need to return non-zero work_done after netif_rx_complete if
> work_done != weight?

Actually we just need to make sure we don't return work_done == weight
as it is checked on line 2210 of dev.c.
I should also note that I was wrong above, and that the requirement is
that we MUST not return work_done == weight when indicating
netif_rx_complete.

So, returning 0 when we actually did some work, but are being removed
from the poll list because something like !netif_running, is probably
okay, but I'm sure someone will disagree with me.  Maybe returning like
this would be better
diff --git a/drivers/net/e1000/e1000_main.c
b/drivers/net/e1000/e1000_main.c
index 724f067..76d5e3b 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3933,6 +3933,10 @@ quit_polling:
                        e1000_set_itr(adapter);
                netif_rx_complete(poll_dev, napi);
                e1000_irq_enable(adapter);
+               if (work_done == weight)
+                       return work_done - 1;
+               else
+                       return work_done;
        }

        return work_done;

Jesse

 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-12  0:38         ` Brandeburg, Jesse
@ 2007-12-12 15:12           ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2007-12-12 15:12 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: joonwpark81, netdev, linux-kernel, herbert, auke-jan.h.kok

From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Date: Tue, 11 Dec 2007 16:38:37 -0800

> @@ -3933,6 +3933,10 @@ quit_polling:
>                         e1000_set_itr(adapter);
>                 netif_rx_complete(poll_dev, napi);
>                 e1000_irq_enable(adapter);
> +               if (work_done == weight)
> +                       return work_done - 1;
> +               else
> +                       return work_done;

Don't do this.

If you processed "weight" worth of packets, return that
exact value and do not netif_rx_complete() and do not
re-enable interrupts.

That is the only correct fix.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.
  2007-12-11 23:42         ` Stephen Hemminger
@ 2007-12-16 21:35           ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2007-12-16 21:35 UTC (permalink / raw)
  To: shemminger
  Cc: joonwpark81, jesse.brandeburg, netdev, linux-kernel, herbert,
	auke-jan.h.kok

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Tue, 11 Dec 2007 15:42:22 -0800

> Perhaps we should change the warning to identify the guilty device.

Applied.

Stephen, you often don't supply a proper signoff line
for one-off changes like this and I find it very irritating.
It doesn't cost you anything to hit one or two keys in your
outgoing email editor to include the signoff line.

So please do so, or I'll silently drop patches without them
in the future.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-12-16 21:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-11  9:13 [PATCH] [NET]: Fix Ooops of napi net_rx_action Joonwoo Park
2007-12-11 10:32 ` David Miller
2007-12-11 12:36   ` Herbert Xu
2007-12-11 12:41     ` David Miller
2007-12-11 12:57   ` Joonwoo Park
2007-12-11 23:12     ` Brandeburg, Jesse
2007-12-11 23:27       ` Joonwoo Park
2007-12-11 23:42         ` Stephen Hemminger
2007-12-16 21:35           ` David Miller
2007-12-12  0:12       ` Joonwoo Park
2007-12-12  0:38         ` Brandeburg, Jesse
2007-12-12 15:12           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).