* Problem with dev_kfree_skb_any() in 2.6.0 @ 2003-12-27 23:17 Benjamin Herrenschmidt 2003-12-28 1:07 ` David S. Miller 0 siblings, 1 reply; 14+ messages in thread From: Benjamin Herrenschmidt @ 2003-12-27 23:17 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linux Kernel list I'm getting Ooops reports with sungem. The problem is that in the PM code (and possibly other exceptional code path), it will clean the rings at non interrupt time but with a spinlock_irq held. it uses dev_kfree_skb_any() which does: static inline void dev_kfree_skb_any(struct sk_buff *skb) { if (in_irq()) dev_kfree_skb_irq(skb); else dev_kfree_skb(skb); } However, in_irq() seem to only catch real IRQ time, so I end up calling dev_kfree_skb(), which triggers the following BUG() in local_bh_enable() WARN_ON(irqs_disabled()); We should probably fix dev_kfree_skb_any() ? Still ugly imho though... - if (in_irq()) + if (in_irq() || irqs_disabled()) Ben. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-27 23:17 Problem with dev_kfree_skb_any() in 2.6.0 Benjamin Herrenschmidt @ 2003-12-28 1:07 ` David S. Miller 2003-12-28 5:44 ` Benjamin Herrenschmidt 2003-12-30 4:09 ` Jeff Garzik 0 siblings, 2 replies; 14+ messages in thread From: David S. Miller @ 2003-12-28 1:07 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: jgarzik, linux-kernel On Sun, 28 Dec 2003 10:17:34 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > We should probably fix dev_kfree_skb_any() ? Still ugly imho though... > > - if (in_irq()) > + if (in_irq() || irqs_disabled()) > That's not the right fix, the sungem PM code path TX queue packet freeing should be instead done outside of IRQ spinlocks. The easiest and safest way to do this is to have a local stack SKB list whose pointer gets passed down into the chip reset and thus the TX queue liberation code, the TX queue liberation code works inside the lock but does not actually free the SKBs, instead it tacks the SKBs onto the SKB list, then at the top level when the IRQ lock is released the SKBs on the list are actually freed. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-28 1:07 ` David S. Miller @ 2003-12-28 5:44 ` Benjamin Herrenschmidt 2003-12-30 4:09 ` Jeff Garzik 1 sibling, 0 replies; 14+ messages in thread From: Benjamin Herrenschmidt @ 2003-12-28 5:44 UTC (permalink / raw) To: David S. Miller; +Cc: Jeff Garzik, Linux Kernel list On Sun, 2003-12-28 at 12:07, David S. Miller wrote: > On Sun, 28 Dec 2003 10:17:34 +1100 > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > > We should probably fix dev_kfree_skb_any() ? Still ugly imho though... > > > > - if (in_irq()) > > + if (in_irq() || irqs_disabled()) > > > > That's not the right fix, the sungem PM code path TX queue > packet freeing should be instead done outside of IRQ spinlocks. > > .../... The "workaround" is a bit complicated, but I'll look into it, I could probably get the whole clean ring thing out of the spinlock instead (I need to reduce time spent in those locks anyway). Though it's really inconsistent imho, to have that routine that can be called at task time, interrupt time, but not task time with a spinlock held... especially since it's called *kfree*, I would have expected kfree-like semantics... Note that among the drivers broken with that same bug (typically in the close() path) are : - sunhme - b44 - tg3 - ... Almost all drivers calling dev_kfree_skb_any() do that within a spinlock_irq ... Ben. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-28 1:07 ` David S. Miller 2003-12-28 5:44 ` Benjamin Herrenschmidt @ 2003-12-30 4:09 ` Jeff Garzik 2003-12-30 4:51 ` David S. Miller 1 sibling, 1 reply; 14+ messages in thread From: Jeff Garzik @ 2003-12-30 4:09 UTC (permalink / raw) To: David S. Miller; +Cc: Benjamin Herrenschmidt, linux-kernel David S. Miller wrote: > On Sun, 28 Dec 2003 10:17:34 +1100 > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > >>We should probably fix dev_kfree_skb_any() ? Still ugly imho though... >> >>- if (in_irq()) >>+ if (in_irq() || irqs_disabled()) >> > > > That's not the right fix, the sungem PM code path TX queue > packet freeing should be instead done outside of IRQ spinlocks. Not really... pretty much _all_ TX queue packet freeing occurs inside an irq handler and inside the driver spinlock. Further, we don't want to reinvent some sort of "queue skb for freeing" code in every driver. Look at what a driver really wants from the net stack: if (you can free the skb now) free skb otherwise queue it to be freed later The driver _shouldn't_ care about the conditions under which an skb can be freed. That's entirely the net stack's domain (and should be)... heck, the net stack should even be free to change said conditions, without breaking or confusing drivers. Jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-30 4:09 ` Jeff Garzik @ 2003-12-30 4:51 ` David S. Miller 2003-12-30 5:15 ` Jeff Garzik 2003-12-30 6:14 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 14+ messages in thread From: David S. Miller @ 2003-12-30 4:51 UTC (permalink / raw) To: Jeff Garzik; +Cc: benh, linux-kernel On Mon, 29 Dec 2003 23:09:14 -0500 Jeff Garzik <jgarzik@pobox.com> wrote: > Not really... pretty much _all_ TX queue packet freeing occurs inside > an irq handler and inside the driver spinlock. Further, we don't want > to reinvent some sort of "queue skb for freeing" code in every driver. There is one important detail not mentioned. If we let the TX free occur in cpu IRQ disabled context, the BH to actually do the work will occur as some indeterminate time in the future after the top level IRQ spinlock release occurs. Unlike local_bh_enable(), local_irq_enable() does not run softirq work. Similarly when comparing IRQ handler return (which also runs softirq work if pending). This is the most important reason why the suggested change is wrong. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-30 4:51 ` David S. Miller @ 2003-12-30 5:15 ` Jeff Garzik 2003-12-30 6:01 ` David S. Miller 2003-12-30 6:14 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 14+ messages in thread From: Jeff Garzik @ 2003-12-30 5:15 UTC (permalink / raw) To: David S. Miller; +Cc: benh, linux-kernel On Mon, Dec 29, 2003 at 08:51:57PM -0800, David S. Miller wrote: > On Mon, 29 Dec 2003 23:09:14 -0500 > Jeff Garzik <jgarzik@pobox.com> wrote: > > > Not really... pretty much _all_ TX queue packet freeing occurs inside > > an irq handler and inside the driver spinlock. Further, we don't want > > to reinvent some sort of "queue skb for freeing" code in every driver. > > There is one important detail not mentioned. > > If we let the TX free occur in cpu IRQ disabled context, the > BH to actually do the work will occur as some indeterminate > time in the future after the top level IRQ spinlock release > occurs. > > Unlike local_bh_enable(), local_irq_enable() does not run > softirq work. Similarly when comparing IRQ handler return > (which also runs softirq work if pending). > > This is the most important reason why the suggested change is wrong. OK, agreed. But fixing it in the driver is still incorrect, also. We need a single solution in the net stack, not a per-driver solution. Look at the purpose behind his patch... Jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-30 5:15 ` Jeff Garzik @ 2003-12-30 6:01 ` David S. Miller 2003-12-30 6:12 ` Jeff Garzik 0 siblings, 1 reply; 14+ messages in thread From: David S. Miller @ 2003-12-30 6:01 UTC (permalink / raw) To: Jeff Garzik; +Cc: benh, linux-kernel On Tue, 30 Dec 2003 00:15:19 -0500 Jeff Garzik <jgarzik@pobox.com> wrote: > OK, agreed. But fixing it in the driver is still incorrect, also. > > We need a single solution in the net stack, not a per-driver solution. I totally disagree. Let's quickly review, this is illegal: local_irq_disable(); { local_bh_disable(); ... do kfree_skb work ... local_bh_enable(); } local_irq_enable(); as is this: local_irq_disable(); { ... queue to softirq TX work ... } local_irq_enable(); ... oops this won't make softirq TX work get run ... The driver must therefore recognize that it may only free packets in it's IRQ handler or in situations where BH protection has occurred at a higher level or BH protection is the only protection it uses from base context. This is similar to how the driver must be aware that netif_receive_skb() can cause it's ->hard_start_xmit() method to run and therefore it must prevent deadlocks that might occur as a result of locks held during the netif_receive_skb() call. So let's fix the drivers. :) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-30 6:01 ` David S. Miller @ 2003-12-30 6:12 ` Jeff Garzik 2003-12-30 6:13 ` David S. Miller 0 siblings, 1 reply; 14+ messages in thread From: Jeff Garzik @ 2003-12-30 6:12 UTC (permalink / raw) To: David S. Miller; +Cc: benh, linux-kernel David S. Miller wrote: > On Tue, 30 Dec 2003 00:15:19 -0500 > Jeff Garzik <jgarzik@pobox.com> wrote: > > >>OK, agreed. But fixing it in the driver is still incorrect, also. >> >>We need a single solution in the net stack, not a per-driver solution. > > > I totally disagree. > > Let's quickly review, this is illegal: You're missing the point. Think about the name of this function: dev_kfree_skb_any() If this function cannot be used -anywhere-, then the concept (and the net stack) is fundamentally broken for this function. We must _remove_ the function, and thus _I_ have a lot of driver work to do. [jgarzik@sata linux-2.5]$ find . -name '*.[ch]' -type f | grep -v SCCS | xargs grep -wl dev_kfree_skb_any | wc -l 71 Jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-30 6:12 ` Jeff Garzik @ 2003-12-30 6:13 ` David S. Miller 2003-12-30 17:43 ` Jeff Garzik 0 siblings, 1 reply; 14+ messages in thread From: David S. Miller @ 2003-12-30 6:13 UTC (permalink / raw) To: Jeff Garzik; +Cc: benh, linux-kernel On Tue, 30 Dec 2003 01:12:21 -0500 Jeff Garzik <jgarzik@pobox.com> wrote: > Think about the name of this function: dev_kfree_skb_any() > > If this function cannot be used -anywhere-, then the concept (and the > net stack) is fundamentally broken for this function. We must _remove_ > the function, and thus _I_ have a lot of driver work to do. If it makes you happy, change the suffix of the name, I am not emotionally attached to the name. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-30 6:13 ` David S. Miller @ 2003-12-30 17:43 ` Jeff Garzik 2004-01-01 20:42 ` David S. Miller 0 siblings, 1 reply; 14+ messages in thread From: Jeff Garzik @ 2003-12-30 17:43 UTC (permalink / raw) To: David S. Miller; +Cc: benh, linux-kernel, Netdev [-- Attachment #1: Type: text/plain, Size: 2066 bytes --] David S. Miller wrote: > On Tue, 30 Dec 2003 01:12:21 -0500 > Jeff Garzik <jgarzik@pobox.com> wrote: > > >>Think about the name of this function: dev_kfree_skb_any() >> >>If this function cannot be used -anywhere-, then the concept (and the >>net stack) is fundamentally broken for this function. We must _remove_ >>the function, and thus _I_ have a lot of driver work to do. > > > If it makes you happy, change the suffix of the name, I am > not emotionally attached to the name. It's not about the name itself. _Think_ about the name: what is the purpose of the function? what does it imply? How have kernel hackers used it in practice? I think you're focusing too closely on implementation, rather than purpose. I humbly request that we expend some brain CPU cycles to think about the API here. To review: * The base requirement is that __kfree_skb shall not call the skb's destructor in hard IRQ context. * To that end, dev_kfree_skb_irq was created to queue skb's for __kfree_skb'ing, when that requirement is not immediately satisfied. * dev_kfree_skb_any was created for situations that either don't know, or don't care, about the context in which skb's are freed. The function name and implementation are less relevant than _purpose_. As it stands now, dev_kfree_skb_any() does not serve the purpose for which it is used in many drivers (not just the short list found in this thread). Luckily, I feel there is an easy solution, as shown in the attached patch. We _already_ queue skbs in dev_kfree_skb_irq(). Therefore, dev_kfree_skb_any() can simply use precisely that same solution. The raise-softirq code will immediately proceed to action if we are not in hard IRQ context, otherwise it will follow the expected path. As an aside (tangent warning), we will need to consider further queueing, if we are to follow the rest of the kernel: skb destructor should really be in purely task context, i.e. make sure __kfree_skb does its work in kernel thread context. But that's a discussion for another day ;-) Jeff [-- Attachment #2: patch --] [-- Type: text/plain, Size: 392 bytes --] ===== include/linux/netdevice.h 1.66 vs edited ===== --- 1.66/include/linux/netdevice.h Sat Nov 1 17:11:04 2003 +++ edited/include/linux/netdevice.h Tue Dec 30 12:29:40 2003 @@ -634,10 +634,7 @@ */ static inline void dev_kfree_skb_any(struct sk_buff *skb) { - if (in_irq()) - dev_kfree_skb_irq(skb); - else - dev_kfree_skb(skb); + dev_kfree_skb_irq(skb); } #define HAVE_NETIF_RX 1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-30 17:43 ` Jeff Garzik @ 2004-01-01 20:42 ` David S. Miller 2004-01-02 2:58 ` Jeff Garzik 0 siblings, 1 reply; 14+ messages in thread From: David S. Miller @ 2004-01-01 20:42 UTC (permalink / raw) To: Jeff Garzik; +Cc: benh, linux-kernel, netdev On Tue, 30 Dec 2003 12:43:21 -0500 Jeff Garzik <jgarzik@pobox.com> wrote: > Luckily, I feel there is an easy solution, as shown in the attached > patch. We _already_ queue skbs in dev_kfree_skb_irq(). Therefore, > dev_kfree_skb_any() can simply use precisely that same solution. The > raise-softirq code will immediately proceed to action if we are not in > hard IRQ context, otherwise it will follow the expected path. Ok, this is reasonable and works. Though, is there any particular reason you don't like adding a "|| irqs_disabled()" check to the if statement instead? I prefer that solution better actually. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2004-01-01 20:42 ` David S. Miller @ 2004-01-02 2:58 ` Jeff Garzik 2004-01-06 3:54 ` David S. Miller 0 siblings, 1 reply; 14+ messages in thread From: Jeff Garzik @ 2004-01-02 2:58 UTC (permalink / raw) To: David S. Miller; +Cc: benh, linux-kernel, netdev On Thu, Jan 01, 2004 at 12:42:18PM -0800, David S. Miller wrote: > On Tue, 30 Dec 2003 12:43:21 -0500 > Jeff Garzik <jgarzik@pobox.com> wrote: > > > Luckily, I feel there is an easy solution, as shown in the attached > > patch. We _already_ queue skbs in dev_kfree_skb_irq(). Therefore, > > dev_kfree_skb_any() can simply use precisely that same solution. The > > raise-softirq code will immediately proceed to action if we are not in > > hard IRQ context, otherwise it will follow the expected path. > > Ok, this is reasonable and works. > > Though, is there any particular reason you don't like adding a > "|| irqs_disabled()" check to the if statement instead? > I prefer that solution better actually. Yep, in fact when I wrote the above message, I came across a couple when I was pondering... * the destructor runs in a more predictable context. * given the problem that started this thread, the 'if' test is a potentially problematic area. Why not eliminate all possibility that this problem will occur again? The only counter argument to this -- to which I have no data to answer -- is that there may be advantage to calling __kfree_skb immediately instead of deferring it slightly. I didn't think that disadvantage outweighted the above, but who knows... I can possibly be convinced otherwise. (and "otherwise" would be using || irqs_disabled()) For the users who don't know/don't care about their context, it just seemed to me that they were not a hot path like users of dev_kfree_skb() and dev_kfree_skb_irq() [unconditional] are... Jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2004-01-02 2:58 ` Jeff Garzik @ 2004-01-06 3:54 ` David S. Miller 0 siblings, 0 replies; 14+ messages in thread From: David S. Miller @ 2004-01-06 3:54 UTC (permalink / raw) To: Jeff Garzik; +Cc: benh, linux-kernel, netdev On Thu, 1 Jan 2004 21:58:07 -0500 Jeff Garzik <jgarzik@pobox.com> wrote: > On Thu, Jan 01, 2004 at 12:42:18PM -0800, David S. Miller wrote: > > Though, is there any particular reason you don't like adding a > > "|| irqs_disabled()" check to the if statement instead? > > I prefer that solution better actually. > > Yep, in fact when I wrote the above message, I came across a couple when I > was pondering... > * the destructor runs in a more predictable context. > * given the problem that started this thread, the 'if' test is a > potentially problematic area. Why not eliminate all possibility that > this problem will occur again? The way I see this, dev_kfree_skb_any() is not used in any performance critical path, so at worst during device shutdown, reset, or power-down, TX queue packet freeing work could be delayed by up to one jiffie. Therefore I've put the "|| irqs_disabled()" version of the fix into my tree. Thanks for working this out with me Jeff :) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Problem with dev_kfree_skb_any() in 2.6.0 2003-12-30 4:51 ` David S. Miller 2003-12-30 5:15 ` Jeff Garzik @ 2003-12-30 6:14 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 14+ messages in thread From: Benjamin Herrenschmidt @ 2003-12-30 6:14 UTC (permalink / raw) To: David S. Miller; +Cc: Jeff Garzik, Linux Kernel list On Tue, 2003-12-30 at 15:51, David S. Miller wrote: > There is one important detail not mentioned. > > If we let the TX free occur in cpu IRQ disabled context, the > BH to actually do the work will occur as some indeterminate > time in the future after the top level IRQ spinlock release > occurs. > > Unlike local_bh_enable(), local_irq_enable() does not run > softirq work. Similarly when comparing IRQ handler return > (which also runs softirq work if pending). Ok, checked that with Rusty and it seems that scheduling the softirq will wakeup softirqd when done from non-interrupt level, so it should just work to call dev_kfree_skb_irq() from this task context. inline void raise_softirq_irqoff(unsigned int nr) { __raise_softirq_irqoff(nr); /* * If we're in an interrupt or softirq, we're done * (this also catches softirq-disabled code). We will * actually run the softirq once we return from * the irq or softirq. * * Otherwise we wake up ksoftirqd to make sure we * schedule the softirq soon. */ if (!in_interrupt()) wakeup_softirqd(); } So that should be ok to just call the _irq version in these cases. Those aren't performance critical code path anyway, it's power management when the machine is going to sleep in this specific case in sungem, and close() codepath in general. Ben. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2004-01-06 3:59 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-12-27 23:17 Problem with dev_kfree_skb_any() in 2.6.0 Benjamin Herrenschmidt 2003-12-28 1:07 ` David S. Miller 2003-12-28 5:44 ` Benjamin Herrenschmidt 2003-12-30 4:09 ` Jeff Garzik 2003-12-30 4:51 ` David S. Miller 2003-12-30 5:15 ` Jeff Garzik 2003-12-30 6:01 ` David S. Miller 2003-12-30 6:12 ` Jeff Garzik 2003-12-30 6:13 ` David S. Miller 2003-12-30 17:43 ` Jeff Garzik 2004-01-01 20:42 ` David S. Miller 2004-01-02 2:58 ` Jeff Garzik 2004-01-06 3:54 ` David S. Miller 2003-12-30 6:14 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).