linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problem with dev_kfree_skb_any() in 2.6.0
@ 2003-12-27 23:17 Benjamin Herrenschmidt
  2003-12-28  1:07 ` David S. Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2003-12-27 23:17 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linux Kernel list

I'm getting Ooops reports with sungem. The problem is that in the PM
code (and possibly other exceptional code path), it will clean the rings
at non interrupt time but with a spinlock_irq held.

it uses dev_kfree_skb_any() which does:

static inline void dev_kfree_skb_any(struct sk_buff *skb)
{
        if (in_irq())
                dev_kfree_skb_irq(skb);
        else
                dev_kfree_skb(skb);
}

However, in_irq() seem to only catch real IRQ time, so I end up calling
dev_kfree_skb(), which triggers the following BUG() in local_bh_enable()
        WARN_ON(irqs_disabled());

We should probably fix dev_kfree_skb_any() ? Still ugly imho though...

-        if (in_irq())
+        if (in_irq() || irqs_disabled())


Ben.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-27 23:17 Problem with dev_kfree_skb_any() in 2.6.0 Benjamin Herrenschmidt
@ 2003-12-28  1:07 ` David S. Miller
  2003-12-28  5:44   ` Benjamin Herrenschmidt
  2003-12-30  4:09   ` Jeff Garzik
  0 siblings, 2 replies; 14+ messages in thread
From: David S. Miller @ 2003-12-28  1:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: jgarzik, linux-kernel

On Sun, 28 Dec 2003 10:17:34 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> We should probably fix dev_kfree_skb_any() ? Still ugly imho though...
> 
> -        if (in_irq())
> +        if (in_irq() || irqs_disabled())
> 

That's not the right fix, the sungem PM code path TX queue
packet freeing should be instead done outside of IRQ spinlocks.

The easiest and safest way to do this is to have a local stack
SKB list whose pointer gets passed down into the chip reset and
thus the TX queue liberation code, the TX queue liberation code
works inside the lock but does not actually free the SKBs, instead
it tacks the SKBs onto the SKB list, then at the top level when the
IRQ lock is released the SKBs on the list are actually freed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-28  1:07 ` David S. Miller
@ 2003-12-28  5:44   ` Benjamin Herrenschmidt
  2003-12-30  4:09   ` Jeff Garzik
  1 sibling, 0 replies; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2003-12-28  5:44 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jeff Garzik, Linux Kernel list

On Sun, 2003-12-28 at 12:07, David S. Miller wrote:
> On Sun, 28 Dec 2003 10:17:34 +1100
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > We should probably fix dev_kfree_skb_any() ? Still ugly imho though...
> > 
> > -        if (in_irq())
> > +        if (in_irq() || irqs_disabled())
> > 
> 
> That's not the right fix, the sungem PM code path TX queue
> packet freeing should be instead done outside of IRQ spinlocks.
>
> .../...

The "workaround" is a bit complicated, but I'll look into it,
I could probably get the whole clean ring thing out of the
spinlock instead (I need to reduce time spent in those locks
anyway).

Though it's really inconsistent imho, to have that routine that
can be called at task time, interrupt time, but not task time with
a spinlock held... especially since it's called *kfree*, I would
have expected kfree-like semantics...

Note that among the drivers broken with that same bug (typically
in the close() path) are :

 - sunhme
 - b44
 - tg3
 - ...

Almost all drivers calling dev_kfree_skb_any() do that within a
spinlock_irq ...

Ben.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-28  1:07 ` David S. Miller
  2003-12-28  5:44   ` Benjamin Herrenschmidt
@ 2003-12-30  4:09   ` Jeff Garzik
  2003-12-30  4:51     ` David S. Miller
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2003-12-30  4:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: Benjamin Herrenschmidt, linux-kernel

David S. Miller wrote:
> On Sun, 28 Dec 2003 10:17:34 +1100
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> 
>>We should probably fix dev_kfree_skb_any() ? Still ugly imho though...
>>
>>-        if (in_irq())
>>+        if (in_irq() || irqs_disabled())
>>
> 
> 
> That's not the right fix, the sungem PM code path TX queue
> packet freeing should be instead done outside of IRQ spinlocks.


Not really...  pretty much _all_ TX queue packet freeing occurs inside 
an irq handler and inside the driver spinlock.  Further, we don't want 
to reinvent some sort of "queue skb for freeing" code in every driver.

Look at what a driver really wants from the net stack:

	if (you can free the skb now)
		free skb
	otherwise
		queue it to be freed later

The driver _shouldn't_ care about the conditions under which an skb can 
be freed.  That's entirely the net stack's domain (and should be)... 
heck, the net stack should even be free to change said conditions, 
without breaking or confusing drivers.

	Jeff




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-30  4:09   ` Jeff Garzik
@ 2003-12-30  4:51     ` David S. Miller
  2003-12-30  5:15       ` Jeff Garzik
  2003-12-30  6:14       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 14+ messages in thread
From: David S. Miller @ 2003-12-30  4:51 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: benh, linux-kernel

On Mon, 29 Dec 2003 23:09:14 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> Not really...  pretty much _all_ TX queue packet freeing occurs inside 
> an irq handler and inside the driver spinlock.  Further, we don't want 
> to reinvent some sort of "queue skb for freeing" code in every driver.

There is one important detail not mentioned.

If we let the TX free occur in cpu IRQ disabled context, the
BH to actually do the work will occur as some indeterminate
time in the future after the top level IRQ spinlock release
occurs.

Unlike local_bh_enable(), local_irq_enable() does not run
softirq work.  Similarly when comparing IRQ handler return
(which also runs softirq work if pending).

This is the most important reason why the suggested change is wrong.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-30  4:51     ` David S. Miller
@ 2003-12-30  5:15       ` Jeff Garzik
  2003-12-30  6:01         ` David S. Miller
  2003-12-30  6:14       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2003-12-30  5:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: benh, linux-kernel

On Mon, Dec 29, 2003 at 08:51:57PM -0800, David S. Miller wrote:
> On Mon, 29 Dec 2003 23:09:14 -0500
> Jeff Garzik <jgarzik@pobox.com> wrote:
> 
> > Not really...  pretty much _all_ TX queue packet freeing occurs inside 
> > an irq handler and inside the driver spinlock.  Further, we don't want 
> > to reinvent some sort of "queue skb for freeing" code in every driver.
> 
> There is one important detail not mentioned.
> 
> If we let the TX free occur in cpu IRQ disabled context, the
> BH to actually do the work will occur as some indeterminate
> time in the future after the top level IRQ spinlock release
> occurs.
> 
> Unlike local_bh_enable(), local_irq_enable() does not run
> softirq work.  Similarly when comparing IRQ handler return
> (which also runs softirq work if pending).
> 
> This is the most important reason why the suggested change is wrong.

OK, agreed.  But fixing it in the driver is still incorrect, also.

We need a single solution in the net stack, not a per-driver solution.

Look at the purpose behind his patch...

	Jeff




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-30  5:15       ` Jeff Garzik
@ 2003-12-30  6:01         ` David S. Miller
  2003-12-30  6:12           ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: David S. Miller @ 2003-12-30  6:01 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: benh, linux-kernel

On Tue, 30 Dec 2003 00:15:19 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> OK, agreed.  But fixing it in the driver is still incorrect, also.
> 
> We need a single solution in the net stack, not a per-driver solution.

I totally disagree.

Let's quickly review, this is illegal:

	local_irq_disable();
	{
		local_bh_disable();
		... do kfree_skb work ...
		local_bh_enable();
	}
	local_irq_enable();

as is this:

	local_irq_disable();
	{
		... queue to softirq TX work ...
	}
	local_irq_enable();
	... oops this won't make softirq TX work get run ...

The driver must therefore recognize that it may only free packets
in it's IRQ handler or in situations where BH protection has occurred
at a higher level or BH protection is the only protection it uses
from base context.

This is similar to how the driver must be aware that
netif_receive_skb() can cause it's ->hard_start_xmit() method to run
and therefore it must prevent deadlocks that might occur as a result
of locks held during the netif_receive_skb() call.

So let's fix the drivers. :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-30  6:01         ` David S. Miller
@ 2003-12-30  6:12           ` Jeff Garzik
  2003-12-30  6:13             ` David S. Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2003-12-30  6:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: benh, linux-kernel

David S. Miller wrote:
> On Tue, 30 Dec 2003 00:15:19 -0500
> Jeff Garzik <jgarzik@pobox.com> wrote:
> 
> 
>>OK, agreed.  But fixing it in the driver is still incorrect, also.
>>
>>We need a single solution in the net stack, not a per-driver solution.
> 
> 
> I totally disagree.
> 
> Let's quickly review, this is illegal:


You're missing the point.

Think about the name of this function:  dev_kfree_skb_any()

If this function cannot be used -anywhere-, then the concept (and the 
net stack) is fundamentally broken for this function.  We must _remove_ 
the function, and thus _I_ have a lot of driver work to do.

[jgarzik@sata linux-2.5]$ find . -name '*.[ch]' -type f | grep -v SCCS | 
xargs grep -wl dev_kfree_skb_any | wc -l
      71

	Jeff




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-30  6:12           ` Jeff Garzik
@ 2003-12-30  6:13             ` David S. Miller
  2003-12-30 17:43               ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: David S. Miller @ 2003-12-30  6:13 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: benh, linux-kernel

On Tue, 30 Dec 2003 01:12:21 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> Think about the name of this function:  dev_kfree_skb_any()
> 
> If this function cannot be used -anywhere-, then the concept (and the 
> net stack) is fundamentally broken for this function.  We must _remove_ 
> the function, and thus _I_ have a lot of driver work to do.

If it makes you happy, change the suffix of the name, I am
not emotionally attached to the name.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-30  4:51     ` David S. Miller
  2003-12-30  5:15       ` Jeff Garzik
@ 2003-12-30  6:14       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2003-12-30  6:14 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jeff Garzik, Linux Kernel list

On Tue, 2003-12-30 at 15:51, David S. Miller wrote:

> There is one important detail not mentioned.
> 
> If we let the TX free occur in cpu IRQ disabled context, the
> BH to actually do the work will occur as some indeterminate
> time in the future after the top level IRQ spinlock release
> occurs.
> 
> Unlike local_bh_enable(), local_irq_enable() does not run
> softirq work.  Similarly when comparing IRQ handler return
> (which also runs softirq work if pending).

Ok, checked that with Rusty and it seems that scheduling the
softirq will wakeup softirqd when done from non-interrupt level,
so it should just work to call dev_kfree_skb_irq() from this
task context.

inline void raise_softirq_irqoff(unsigned int nr)
{
        __raise_softirq_irqoff(nr);
 
        /*
         * If we're in an interrupt or softirq, we're done
         * (this also catches softirq-disabled code). We will
         * actually run the softirq once we return from
         * the irq or softirq.
         *
         * Otherwise we wake up ksoftirqd to make sure we
         * schedule the softirq soon.
         */
        if (!in_interrupt())
                wakeup_softirqd();
}
 
So that should be ok to just call the _irq version in these
cases. Those aren't performance critical code path anyway,
it's power management when the machine is going to sleep in
this specific case in sungem, and close() codepath in
general.

Ben.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-30  6:13             ` David S. Miller
@ 2003-12-30 17:43               ` Jeff Garzik
  2004-01-01 20:42                 ` David S. Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2003-12-30 17:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: benh, linux-kernel, Netdev

[-- Attachment #1: Type: text/plain, Size: 2066 bytes --]

David S. Miller wrote:
> On Tue, 30 Dec 2003 01:12:21 -0500
> Jeff Garzik <jgarzik@pobox.com> wrote:
> 
> 
>>Think about the name of this function:  dev_kfree_skb_any()
>>
>>If this function cannot be used -anywhere-, then the concept (and the 
>>net stack) is fundamentally broken for this function.  We must _remove_ 
>>the function, and thus _I_ have a lot of driver work to do.
> 
> 
> If it makes you happy, change the suffix of the name, I am
> not emotionally attached to the name.


It's not about the name itself.  _Think_ about the name:  what is the 
purpose of the function?  what does it imply?  How have kernel hackers 
used it in practice?  I think you're focusing too closely on 
implementation, rather than purpose.

I humbly request that we expend some brain CPU cycles to think about the 
API here.

To review:
* The base requirement is that __kfree_skb shall not call the skb's 
destructor in hard IRQ context.
* To that end, dev_kfree_skb_irq was created to queue skb's for 
__kfree_skb'ing, when that requirement is not immediately satisfied.
* dev_kfree_skb_any was created for situations that either don't know, 
or don't care, about the context in which skb's are freed.  The function 
name and implementation are less relevant than _purpose_.

As it stands now, dev_kfree_skb_any() does not serve the purpose for 
which it is used in many drivers (not just the short list found in this 
thread).

Luckily, I feel there is an easy solution, as shown in the attached 
patch.  We _already_ queue skbs in dev_kfree_skb_irq().  Therefore, 
dev_kfree_skb_any() can simply use precisely that same solution.  The 
raise-softirq code will immediately proceed to action if we are not in 
hard IRQ context, otherwise it will follow the expected path.

As an aside (tangent warning), we will need to consider further 
queueing, if we are to follow the rest of the kernel:  skb destructor 
should really be in purely task context, i.e. make sure __kfree_skb does 
its work in kernel thread context.  But that's a discussion for another 
day ;-)

	Jeff


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 392 bytes --]

===== include/linux/netdevice.h 1.66 vs edited =====
--- 1.66/include/linux/netdevice.h	Sat Nov  1 17:11:04 2003
+++ edited/include/linux/netdevice.h	Tue Dec 30 12:29:40 2003
@@ -634,10 +634,7 @@
  */
 static inline void dev_kfree_skb_any(struct sk_buff *skb)
 {
-	if (in_irq())
-		dev_kfree_skb_irq(skb);
-	else
-		dev_kfree_skb(skb);
+	dev_kfree_skb_irq(skb);
 }
 
 #define HAVE_NETIF_RX 1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2003-12-30 17:43               ` Jeff Garzik
@ 2004-01-01 20:42                 ` David S. Miller
  2004-01-02  2:58                   ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: David S. Miller @ 2004-01-01 20:42 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: benh, linux-kernel, netdev

On Tue, 30 Dec 2003 12:43:21 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> Luckily, I feel there is an easy solution, as shown in the attached 
> patch.  We _already_ queue skbs in dev_kfree_skb_irq().  Therefore, 
> dev_kfree_skb_any() can simply use precisely that same solution.  The 
> raise-softirq code will immediately proceed to action if we are not in 
> hard IRQ context, otherwise it will follow the expected path.

Ok, this is reasonable and works.

Though, is there any particular reason you don't like adding a
"|| irqs_disabled()" check to the if statement instead?
I prefer that solution better actually.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2004-01-01 20:42                 ` David S. Miller
@ 2004-01-02  2:58                   ` Jeff Garzik
  2004-01-06  3:54                     ` David S. Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2004-01-02  2:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: benh, linux-kernel, netdev

On Thu, Jan 01, 2004 at 12:42:18PM -0800, David S. Miller wrote:
> On Tue, 30 Dec 2003 12:43:21 -0500
> Jeff Garzik <jgarzik@pobox.com> wrote:
> 
> > Luckily, I feel there is an easy solution, as shown in the attached 
> > patch.  We _already_ queue skbs in dev_kfree_skb_irq().  Therefore, 
> > dev_kfree_skb_any() can simply use precisely that same solution.  The 
> > raise-softirq code will immediately proceed to action if we are not in 
> > hard IRQ context, otherwise it will follow the expected path.
> 
> Ok, this is reasonable and works.
> 
> Though, is there any particular reason you don't like adding a
> "|| irqs_disabled()" check to the if statement instead?
> I prefer that solution better actually.

Yep, in fact when I wrote the above message, I came across a couple when I
was pondering...
* the destructor runs in a more predictable context.
* given the problem that started this thread, the 'if' test is a
  potentially problematic area.  Why not eliminate all possibility that
  this problem will occur again?

The only counter argument to this -- to which I have no data to answer --
is that there may be advantage to calling __kfree_skb immediately
instead of deferring it slightly.  I didn't think that disadvantage
outweighted the above, but who knows...  I can possibly be convinced
otherwise.  (and "otherwise" would be using || irqs_disabled())

For the users who don't know/don't care about their context, it just
seemed to me that they were not a hot path like users of dev_kfree_skb()
and dev_kfree_skb_irq() [unconditional] are...

	Jeff




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with dev_kfree_skb_any() in 2.6.0
  2004-01-02  2:58                   ` Jeff Garzik
@ 2004-01-06  3:54                     ` David S. Miller
  0 siblings, 0 replies; 14+ messages in thread
From: David S. Miller @ 2004-01-06  3:54 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: benh, linux-kernel, netdev

On Thu, 1 Jan 2004 21:58:07 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> On Thu, Jan 01, 2004 at 12:42:18PM -0800, David S. Miller wrote:
> > Though, is there any particular reason you don't like adding a
> > "|| irqs_disabled()" check to the if statement instead?
> > I prefer that solution better actually.
> 
> Yep, in fact when I wrote the above message, I came across a couple when I
> was pondering...
> * the destructor runs in a more predictable context.
> * given the problem that started this thread, the 'if' test is a
>   potentially problematic area.  Why not eliminate all possibility that
>   this problem will occur again?

The way I see this, dev_kfree_skb_any() is not used in any performance critical
path, so at worst during device shutdown, reset, or power-down, TX queue
packet freeing work could be delayed by up to one jiffie.

Therefore I've put the "|| irqs_disabled()" version of the fix into my tree.

Thanks for working this out with me Jeff :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-01-06  3:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-27 23:17 Problem with dev_kfree_skb_any() in 2.6.0 Benjamin Herrenschmidt
2003-12-28  1:07 ` David S. Miller
2003-12-28  5:44   ` Benjamin Herrenschmidt
2003-12-30  4:09   ` Jeff Garzik
2003-12-30  4:51     ` David S. Miller
2003-12-30  5:15       ` Jeff Garzik
2003-12-30  6:01         ` David S. Miller
2003-12-30  6:12           ` Jeff Garzik
2003-12-30  6:13             ` David S. Miller
2003-12-30 17:43               ` Jeff Garzik
2004-01-01 20:42                 ` David S. Miller
2004-01-02  2:58                   ` Jeff Garzik
2004-01-06  3:54                     ` David S. Miller
2003-12-30  6:14       ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).