linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
@ 2007-12-12  4:01 Joonwoo Park
  2007-12-12  5:39 ` Stephen Hemminger
  2007-12-12 15:18 ` David Miller
  0 siblings, 2 replies; 12+ messages in thread
From: Joonwoo Park @ 2007-12-12  4:01 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: jgarzik, baum, andy

[NETDEV]: tehuti Fix possible causing oops of net_rx_action

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
---
 drivers/net/tehuti.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
index 21230c9..955e749 100644
--- a/drivers/net/tehuti.c
+++ b/drivers/net/tehuti.c
@@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
 
 		netif_rx_complete(dev, napi);
 		bdx_enable_interrupts(priv);
+		if (unlikely(work_done == napi->weight))
+			return work_done - 1;
 	}
 	return work_done;
 }
---


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
  2007-12-12  4:01 [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action Joonwoo Park
@ 2007-12-12  5:39 ` Stephen Hemminger
  2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
                     ` (2 more replies)
  2007-12-12 15:18 ` David Miller
  1 sibling, 3 replies; 12+ messages in thread
From: Stephen Hemminger @ 2007-12-12  5:39 UTC (permalink / raw)
  To: Joonwoo Park; +Cc: netdev, linux-kernel, jgarzik, baum, andy

On Wed, 12 Dec 2007 13:01:27 +0900
"Joonwoo Park" <joonwpark81@gmail.com> wrote:

> [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> 
> Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> ---
>  drivers/net/tehuti.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> index 21230c9..955e749 100644
> --- a/drivers/net/tehuti.c
> +++ b/drivers/net/tehuti.c
> @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
>  
>  		netif_rx_complete(dev, napi);
>  		bdx_enable_interrupts(priv);
> +		if (unlikely(work_done == napi->weight))
> +			return work_done - 1;
>  	}
>  	return work_done;
>  }

A better fix would be not going over budget in the first place.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC] net: napi fix
  2007-12-12  5:39 ` Stephen Hemminger
@ 2007-12-12  5:46   ` Stephen Hemminger
  2007-12-12  6:05     ` Joonwoo Park
  2007-12-12 15:21     ` David Miller
  2007-12-12  5:48   ` [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action Joonwoo Park
  2007-12-12 15:20   ` David Miller
  2 siblings, 2 replies; 12+ messages in thread
From: Stephen Hemminger @ 2007-12-12  5:46 UTC (permalink / raw)
  To: David S. Miller; +Cc: Joonwoo Park, netdev, linux-kernel, jgarzik, baum, andy

Isn't this a better fix for all drivers, rather than peppering every
driver with the special case. This is how the logic worked up until
2.6.24.


--- a/net/core/dev.c	2007-12-11 12:16:20.000000000 -0800
+++ b/net/core/dev.c	2007-12-11 21:43:39.000000000 -0800
@@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq
 
 		have = netpoll_poll_lock(n);
 
-		weight = n->weight;
+		weight = min(n->weight, budget);
 
 		/* This NAPI_STATE_SCHED test is for avoiding a race
 		 * with netpoll's poll_napi().  Only the entity which

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
  2007-12-12  5:39 ` Stephen Hemminger
  2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
@ 2007-12-12  5:48   ` Joonwoo Park
  2007-12-12  5:53     ` Stephen Hemminger
  2007-12-12 15:20   ` David Miller
  2 siblings, 1 reply; 12+ messages in thread
From: Joonwoo Park @ 2007-12-12  5:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, linux-kernel, jgarzik, baum, andy

2007/12/12, Stephen Hemminger <shemminger@linux-foundation.org>:
> On Wed, 12 Dec 2007 13:01:27 +0900
> "Joonwoo Park" <joonwpark81@gmail.com> wrote:
>
> > [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> >
> > Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> > ---
> >  drivers/net/tehuti.c |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> > index 21230c9..955e749 100644
> > --- a/drivers/net/tehuti.c
> > +++ b/drivers/net/tehuti.c
> > @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
> >
> >               netif_rx_complete(dev, napi);
> >               bdx_enable_interrupts(priv);
> > +             if (unlikely(work_done == napi->weight))
> > +                     return work_done - 1;
> >       }
> >       return work_done;
> >  }
>
> A better fix would be not going over budget in the first place.
>
> --
> Stephen Hemminger <shemminger@linux-foundation.org>
>

Stephen,
This is code of bd_poll().
Do you mean remove napi_stop stuff?

static int bdx_poll(struct napi_struct *napi, int budget)
{
...
        work_done = bdx_rx_receive(priv, &priv->rxd_fifo0, budget);
        if ((work_done < budget) ||
            (priv->napi_stop++ >= 30)) {
                DBG("rx poll is done. backing to isr-driven\n");

                /* from time to time we exit to let NAPI layer release
                 * device lock and allow waiting tasks (eg rmmod) to advance) */
                priv->napi_stop = 0;

                netif_rx_complete(dev, napi);
                bdx_enable_interrupts(priv);
                if (unlikely(work_done == napi->weight))
                        return work_done - 1;
        }
        return work_done;
}

Thanks,
Joonwoo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
  2007-12-12  5:48   ` [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action Joonwoo Park
@ 2007-12-12  5:53     ` Stephen Hemminger
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2007-12-12  5:53 UTC (permalink / raw)
  To: Joonwoo Park; +Cc: netdev, linux-kernel, jgarzik, baum, andy

On Wed, 12 Dec 2007 14:48:27 +0900
"Joonwoo Park" <joonwpark81@gmail.com> wrote:

> 2007/12/12, Stephen Hemminger <shemminger@linux-foundation.org>:
> > On Wed, 12 Dec 2007 13:01:27 +0900
> > "Joonwoo Park" <joonwpark81@gmail.com> wrote:
> >
> > > [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> > >
> > > Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> > > ---
> > >  drivers/net/tehuti.c |    2 ++
> > >  1 files changed, 2 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> > > index 21230c9..955e749 100644
> > > --- a/drivers/net/tehuti.c
> > > +++ b/drivers/net/tehuti.c
> > > @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
> > >
> > >               netif_rx_complete(dev, napi);
> > >               bdx_enable_interrupts(priv);
> > > +             if (unlikely(work_done == napi->weight))
> > > +                     return work_done - 1;
> > >       }
> > >       return work_done;
> > >  }
> >
> > A better fix would be not going over budget in the first place.
> >
> > --
> > Stephen Hemminger <shemminger@linux-foundation.org>
> >
> 
> Stephen,
> This is code of bd_poll().
> Do you mean remove napi_stop stuff?
> 
> static int bdx_poll(struct napi_struct *napi, int budget)
> {
> ...
>         work_done = bdx_rx_receive(priv, &priv->rxd_fifo0, budget);
>         if ((work_done < budget) ||
>             (priv->napi_stop++ >= 30)) {

Yes remove the napi_stop stuff, because current NAPI expects device
to be constrained only by budget. If you need to stop sooner, just
set napi weight to be smaller.

>                 DBG("rx poll is done. backing to isr-driven\n");
> 
>                 /* from time to time we exit to let NAPI layer release
>                  * device lock and allow waiting tasks (eg rmmod) to advance) */
>                 priv->napi_stop = 0;
> 
>                 netif_rx_complete(dev, napi);
>                 bdx_enable_interrupts(priv);

With my posted fix to rx_action the following two lines would not be needed.
>                 if (unlikely(work_done == napi->weight))
>                         return work_done - 1;
>         }
>         return work_done;
> }
> 
> Thanks,
> Joonwoo


-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
@ 2007-12-12  6:05     ` Joonwoo Park
  2007-12-12 15:22       ` David Miller
  2007-12-12 15:21     ` David Miller
  1 sibling, 1 reply; 12+ messages in thread
From: Joonwoo Park @ 2007-12-12  6:05 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, netdev, linux-kernel, jgarzik, baum, andy

2007/12/12, Stephen Hemminger <shemminger@linux-foundation.org>:
> Isn't this a better fix for all drivers, rather than peppering every
> driver with the special case. This is how the logic worked up until
> 2.6.24.
>
>
> --- a/net/core/dev.c    2007-12-11 12:16:20.000000000 -0800
> +++ b/net/core/dev.c    2007-12-11 21:43:39.000000000 -0800
> @@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq
>
>                have = netpoll_poll_lock(n);
>
> -               weight = n->weight;
> +               weight = min(n->weight, budget);
>
>                /* This NAPI_STATE_SCHED test is for avoiding a race
>                 * with netpoll's poll_napi().  Only the entity which
>

Stephen,
Could you explain how it fix the problem?
IMHO I think your patch cannot solve the problem.
The drivers can call netif_rx_complete and net_rx_action can do
list_move_tail also.
Am I missing something?

Thanks
Joonwoo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
  2007-12-12  4:01 [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action Joonwoo Park
  2007-12-12  5:39 ` Stephen Hemminger
@ 2007-12-12 15:18 ` David Miller
  2007-12-13  7:07   ` Joonwoo Park
  1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2007-12-12 15:18 UTC (permalink / raw)
  To: joonwpark81; +Cc: netdev, linux-kernel, jgarzik, baum, andy

From: "Joonwoo Park" <joonwpark81@gmail.com>
Date: Wed, 12 Dec 2007 13:01:27 +0900

> @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
>  
>  		netif_rx_complete(dev, napi);
>  		bdx_enable_interrupts(priv);
> +		if (unlikely(work_done == napi->weight))
> +			return work_done - 1;
>  	}
>  	return work_done;
>  }

Any time your trying to make a caller "happy" by adjusting
a return value forcefully, it's a hack.

And I stated this in another reply about this issue.

Please do not fix the problem this way.

The correct way to fix this is, if we did process a full
"weight" or work, we should not netif_rx_complete() and
we should not re-enable chip interrupts.

Instead we should return the true "work_done" value and
allow the caller to thus poll us one more time.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
  2007-12-12  5:39 ` Stephen Hemminger
  2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
  2007-12-12  5:48   ` [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action Joonwoo Park
@ 2007-12-12 15:20   ` David Miller
  2007-12-12 16:36     ` Stephen Hemminger
  2 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-12-12 15:20 UTC (permalink / raw)
  To: shemminger; +Cc: joonwpark81, netdev, linux-kernel, jgarzik, baum, andy

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Tue, 11 Dec 2007 21:39:39 -0800

> On Wed, 12 Dec 2007 13:01:27 +0900
> "Joonwoo Park" <joonwpark81@gmail.com> wrote:
> 
> > [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> > 
> > Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> > ---
> >  drivers/net/tehuti.c |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> > index 21230c9..955e749 100644
> > --- a/drivers/net/tehuti.c
> > +++ b/drivers/net/tehuti.c
> > @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
> >  
> >  		netif_rx_complete(dev, napi);
> >  		bdx_enable_interrupts(priv);
> > +		if (unlikely(work_done == napi->weight))
> > +			return work_done - 1;
> >  	}
> >  	return work_done;
> >  }
> 
> A better fix would be not going over budget in the first place.

That's not the problem.

They are not going over the budget, rather, they are hitting
the budget yet doing netif_rx_complete() as well which is
illegal.

Unless you strictly process less than "weight" packets, you must
not netif_rx_complete() and re-enable chip interrupts.

I can't believe people are trying to fix this bug like this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
  2007-12-12  6:05     ` Joonwoo Park
@ 2007-12-12 15:21     ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2007-12-12 15:21 UTC (permalink / raw)
  To: shemminger; +Cc: joonwpark81, netdev, linux-kernel, jgarzik, baum, andy

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Tue, 11 Dec 2007 21:46:34 -0800

> Isn't this a better fix for all drivers, rather than peppering every
> driver with the special case. This is how the logic worked up until
> 2.6.24.

Stephen this is not the problem.

The problem is that the driver is doing a NAPI completion and
re-enabling chip interrupts with work_done == weight, and that is
illegal.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12  6:05     ` Joonwoo Park
@ 2007-12-12 15:22       ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2007-12-12 15:22 UTC (permalink / raw)
  To: joonwpark81; +Cc: shemminger, netdev, linux-kernel, jgarzik, baum, andy

From: "Joonwoo Park" <joonwpark81@gmail.com>
Date: Wed, 12 Dec 2007 15:05:26 +0900

> Could you explain how it fix the problem?
> IMHO I think your patch cannot solve the problem.
> The drivers can call netif_rx_complete and net_rx_action can do
> list_move_tail also.

Stephen is confused about what the bug is in these drivers,
that's all.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
  2007-12-12 15:20   ` David Miller
@ 2007-12-12 16:36     ` Stephen Hemminger
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2007-12-12 16:36 UTC (permalink / raw)
  To: David Miller, joonwpark81; +Cc: netdev, linux-kernel, jgarzik, baum, andy

On Wed, 12 Dec 2007 07:20:34 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Tue, 11 Dec 2007 21:39:39 -0800
> 
> > On Wed, 12 Dec 2007 13:01:27 +0900
> > "Joonwoo Park" <joonwpark81@gmail.com> wrote:
> > 
> > > [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> > > 
> > > Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> > > ---
> > >  drivers/net/tehuti.c |    2 ++
> > >  1 files changed, 2 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> > > index 21230c9..955e749 100644
> > > --- a/drivers/net/tehuti.c
> > > +++ b/drivers/net/tehuti.c
> > > @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
> > >  
> > >  		netif_rx_complete(dev, napi);
> > >  		bdx_enable_interrupts(priv);
> > > +		if (unlikely(work_done == napi->weight))
> > > +			return work_done - 1;
> > >  	}
> > >  	return work_done;
> > >  }
> > 
> > A better fix would be not going over budget in the first place.
> 
> That's not the problem.
> 
> They are not going over the budget, rather, they are hitting
> the budget yet doing netif_rx_complete() as well which is
> illegal.
> 
> Unless you strictly process less than "weight" packets, you must
> not netif_rx_complete() and re-enable chip interrupts.
> 
> I can't believe people are trying to fix this bug like this.

Sorry, I was looking at a different possible problem. The issue
is that if netdev_budget was set smaller (say 128) but device
weight was set larger (say 256). The new code would still allow
the device to do a full swipe (256) packets rather than only
128 as in earlier NAPI. I guess it is an okay behaviour change, because
we don't really guarantee that case.

The problem with the tehuti driver is the logic around priv->napi_stop.
That whole early stop concept should be removed since it just
duplicates the logic of netdev->weight but breaks the assumptions
in the calling netif_rx_action.



-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
  2007-12-12 15:18 ` David Miller
@ 2007-12-13  7:07   ` Joonwoo Park
  0 siblings, 0 replies; 12+ messages in thread
From: Joonwoo Park @ 2007-12-13  7:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, jgarzik, baum, andy

2007/12/13, David Miller <davem@davemloft.net>:
> From: "Joonwoo Park" <joonwpark81@gmail.com>
> Date: Wed, 12 Dec 2007 13:01:27 +0900
>
>
> Any time your trying to make a caller "happy" by adjusting
> a return value forcefully, it's a hack.
>
> And I stated this in another reply about this issue.
>
> Please do not fix the problem this way.
>
> The correct way to fix this is, if we did process a full
> "weight" or work, we should not netif_rx_complete() and
> we should not re-enable chip interrupts.
>
> Instead we should return the true "work_done" value and
> allow the caller to thus poll us one more time.
>

Thanks so much for your advice.
I agree, returning work_done itself exactly.
I will rework for these drivers.

Thanks.
Joonwoo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-12-13  7:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-12  4:01 [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action Joonwoo Park
2007-12-12  5:39 ` Stephen Hemminger
2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
2007-12-12  6:05     ` Joonwoo Park
2007-12-12 15:22       ` David Miller
2007-12-12 15:21     ` David Miller
2007-12-12  5:48   ` [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action Joonwoo Park
2007-12-12  5:53     ` Stephen Hemminger
2007-12-12 15:20   ` David Miller
2007-12-12 16:36     ` Stephen Hemminger
2007-12-12 15:18 ` David Miller
2007-12-13  7:07   ` Joonwoo Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).