All of lore.kernel.org
 help / color / mirror / Atom feed
* softirq oops from b44_poll
@ 2011-11-21 13:58 Xander Hover
  2011-11-21 23:17 ` Peter P Waskiewicz Jr
  0 siblings, 1 reply; 10+ messages in thread
From: Xander Hover @ 2011-11-21 13:58 UTC (permalink / raw)
  To: peter.p.waskiewicz.jr; +Cc: linux-kernel

Hi all,

I noticed the small discussion about the b44_poll OOPS and
I also have a uni-processor PC with a broadcom network device (b44)
that causes similar kernel OOPSes.

Here is a (reproducible) trace that still shows up in kernel 3.1.1:

 ------------[ cut here ]------------
    WARNING: at kernel/softirq.c:159 local_bh_enable+0x32/0x79()
    Hardware name: Dimension 2400
    Modules linked in: snd_seq_midi snd_emu10k1_synth snd_emux_synth
snd_seq_virmidi snd_seq_midi_emul snd_seq_oss snd_seq_midi_event
snd_seq snd_pcm_oss snd_mixer_oss bnep rfcomm cryptd aes_i586
aes_generic ecb btusb bluetooth rfkill ppdev snd_emu10k1 snd_rawmidi
snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer
snd_page_alloc dcdbas snd_util_mem parport_pc snd_hwdep snd parport
emu10k1_gp rtc_cmos gameport i2c_i801
    Pid: 0, comm: swapper Not tainted 3.1.1-gentoo #1
    Call Trace:
     [<c1022970>] warn_slowpath_common+0x65/0x7a
     [<c102699e>] ? local_bh_enable+0x32/0x79
     [<c1022994>] warn_slowpath_null+0xf/0x13
     [<c102699e>] local_bh_enable+0x32/0x79
     [<c134bfd8>] destroy_conntrack+0x7c/0x9b
     [<c134890b>] nf_conntrack_destroy+0x1f/0x26
     [<c132e3a6>] skb_release_head_state+0x74/0x83
     [<c132e286>] __kfree_skb+0xb/0x6b
     [<c132e30a>] consume_skb+0x24/0x26
     [<c127c925>] b44_poll+0xaa/0x449
     [<c1333ca1>] net_rx_action+0x3f/0xea
     [<c1026a44>] __do_softirq+0x5f/0xd5
     [<c10269e5>] ? local_bh_enable+0x79/0x79
     <IRQ>  [<c1026c32>] ? irq_exit+0x34/0x8d
     [<c1003628>] ? do_IRQ+0x74/0x87
     [<c13f5329>] ? common_interrupt+0x29/0x30
     [<c1006e18>] ? default_idle+0x29/0x3e
     [<c10015a7>] ? cpu_idle+0x2f/0x5d
     [<c13e91c5>] ? rest_init+0x79/0x7b
     [<c15c66a9>] ? start_kernel+0x297/0x29c
     [<c15c60b0>] ? i386_start_kernel+0xb0/0xb7
    ---[ end trace 583f33bb1aa207a9 ]---


However if I apply the following patch this error does not show up anymore:


diff --git a/drivers/net/ethernet/broadcom/b44.c
b/drivers/net/ethernet/broadcom/b44.c
index 4cf835d..3fb66d0 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -608,7 +608,7 @@ static void b44_tx(struct b44 *bp)
                                 skb->len,
                                 DMA_TO_DEVICE);
                rp->skb = NULL;
-               dev_kfree_skb(skb);
+               dev_kfree_skb_irq(skb);
        }

        bp->tx_cons = cons;
--
1.7.8.rc3


This is a "works for me..", however it may very well be that this only
suppresses the symptoms instead of solving the root cause.
I got this idea from a previous patch (see commit log below) which did
the exact opposite from this patch.

--------------
commit 4924f44b97a034dbf44c14b709b0b0907ee23f04
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Mon Jul 4 17:57:10 2011 -0700

    b44: use dev_kfree_skb() in b44_tx()

    b44_tx() is run from softirq handler, it can use dev_kfree_skb() instead
    of dev_kfree_skb_irq()
--------------

Any thoughts?

Kind regards,

Xander Hover

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: softirq oops from b44_poll
  2011-11-21 13:58 softirq oops from b44_poll Xander Hover
@ 2011-11-21 23:17 ` Peter P Waskiewicz Jr
  2011-11-22 11:21   ` Xander Hover
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Peter P Waskiewicz Jr @ 2011-11-21 23:17 UTC (permalink / raw)
  To: Xander Hover; +Cc: linux-kernel, netdev

On Mon, 2011-11-21 at 05:58 -0800, Xander Hover wrote:
> Hi all,
> 
> I noticed the small discussion about the b44_poll OOPS and
> I also have a uni-processor PC with a broadcom network device (b44)
> that causes similar kernel OOPSes.
> 
> Here is a (reproducible) trace that still shows up in kernel 3.1.1:
> 
>  ------------[ cut here ]------------
>     WARNING: at kernel/softirq.c:159 local_bh_enable+0x32/0x79()
>     Hardware name: Dimension 2400
>     Modules linked in: snd_seq_midi snd_emu10k1_synth snd_emux_synth
> snd_seq_virmidi snd_seq_midi_emul snd_seq_oss snd_seq_midi_event
> snd_seq snd_pcm_oss snd_mixer_oss bnep rfcomm cryptd aes_i586
> aes_generic ecb btusb bluetooth rfkill ppdev snd_emu10k1 snd_rawmidi
> snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer
> snd_page_alloc dcdbas snd_util_mem parport_pc snd_hwdep snd parport
> emu10k1_gp rtc_cmos gameport i2c_i801
>     Pid: 0, comm: swapper Not tainted 3.1.1-gentoo #1
>     Call Trace:
>      [<c1022970>] warn_slowpath_common+0x65/0x7a
>      [<c102699e>] ? local_bh_enable+0x32/0x79
>      [<c1022994>] warn_slowpath_null+0xf/0x13
>      [<c102699e>] local_bh_enable+0x32/0x79
>      [<c134bfd8>] destroy_conntrack+0x7c/0x9b
>      [<c134890b>] nf_conntrack_destroy+0x1f/0x26
>      [<c132e3a6>] skb_release_head_state+0x74/0x83
>      [<c132e286>] __kfree_skb+0xb/0x6b
>      [<c132e30a>] consume_skb+0x24/0x26
>      [<c127c925>] b44_poll+0xaa/0x449
>      [<c1333ca1>] net_rx_action+0x3f/0xea
>      [<c1026a44>] __do_softirq+0x5f/0xd5
>      [<c10269e5>] ? local_bh_enable+0x79/0x79
>      <IRQ>  [<c1026c32>] ? irq_exit+0x34/0x8d
>      [<c1003628>] ? do_IRQ+0x74/0x87
>      [<c13f5329>] ? common_interrupt+0x29/0x30
>      [<c1006e18>] ? default_idle+0x29/0x3e
>      [<c10015a7>] ? cpu_idle+0x2f/0x5d
>      [<c13e91c5>] ? rest_init+0x79/0x7b
>      [<c15c66a9>] ? start_kernel+0x297/0x29c
>      [<c15c60b0>] ? i386_start_kernel+0xb0/0xb7
>     ---[ end trace 583f33bb1aa207a9 ]---
> 
> 
> However if I apply the following patch this error does not show up anymore:
> 
> 
> diff --git a/drivers/net/ethernet/broadcom/b44.c
> b/drivers/net/ethernet/broadcom/b44.c
> index 4cf835d..3fb66d0 100644
> --- a/drivers/net/ethernet/broadcom/b44.c
> +++ b/drivers/net/ethernet/broadcom/b44.c
> @@ -608,7 +608,7 @@ static void b44_tx(struct b44 *bp)
>                                  skb->len,
>                                  DMA_TO_DEVICE);
>                 rp->skb = NULL;
> -               dev_kfree_skb(skb);
> +               dev_kfree_skb_irq(skb);

I suspect the "right" way to fix this is to call dev_kfree_skb_any(skb);
instead, since that will handle the in-interrupt case if that's where
we're stuck.

Can you try this patch (compile-tested only) and see if fixes the issue
you're seeing:

commit e36ef2c1a2b6b517ed43254eb89768794a049b1c
Author: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Date:   Mon Nov 21 15:14:18 2011 -0800

b44: Use dev_kfree_skb_any() in b44_tx()

Reported issues when using dev_kfree_skb() on UP systems and
systems with low numbers of cores.  dev_kfree_skb_any() will
properly save IRQ state before freeing the skb, depending on
how b44_tx() is invoked.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---

 drivers/net/ethernet/broadcom/b44.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/drivers/net/ethernet/broadcom/b44.c
b/drivers/net/ethernet/broadcom/b44.c
index 4cf835d..6a7c39b 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -608,7 +608,7 @@ static void b44_tx(struct b44 *bp)
 				 skb->len,
 				 DMA_TO_DEVICE);
 		rp->skb = NULL;
-		dev_kfree_skb(skb);
+		dev_kfree_skb_any(skb);
 	}
 
 	bp->tx_cons = cons;


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: softirq oops from b44_poll
  2011-11-21 23:17 ` Peter P Waskiewicz Jr
@ 2011-11-22 11:21   ` Xander Hover
  2011-11-22 13:43   ` Xander Hover
  2011-11-22 20:54   ` David Miller
  2 siblings, 0 replies; 10+ messages in thread
From: Xander Hover @ 2011-11-22 11:21 UTC (permalink / raw)
  To: Peter P Waskiewicz Jr; +Cc: linux-kernel

This patch fixes the issue also. I have tried it on kernel 3.1.1.
I will also try it on kernel 3.2_rc2  soon.
Using dev_free_skb_any() instead of dev_free_skb_irq()  makes indeed
more sense to me.


On Tue, Nov 22, 2011 at 12:17 AM, Peter P Waskiewicz Jr
<peter.p.waskiewicz.jr@intel.com> wrote:
> On Mon, 2011-11-21 at 05:58 -0800, Xander Hover wrote:
>> Hi all,
>>
>> I noticed the small discussion about the b44_poll OOPS and
>> I also have a uni-processor PC with a broadcom network device (b44)
>> that causes similar kernel OOPSes.
>>
>> Here is a (reproducible) trace that still shows up in kernel 3.1.1:
>>
>>  ------------[ cut here ]------------
>>     WARNING: at kernel/softirq.c:159 local_bh_enable+0x32/0x79()
>>     Hardware name: Dimension 2400
>>     Modules linked in: snd_seq_midi snd_emu10k1_synth snd_emux_synth
>> snd_seq_virmidi snd_seq_midi_emul snd_seq_oss snd_seq_midi_event
>> snd_seq snd_pcm_oss snd_mixer_oss bnep rfcomm cryptd aes_i586
>> aes_generic ecb btusb bluetooth rfkill ppdev snd_emu10k1 snd_rawmidi
>> snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer
>> snd_page_alloc dcdbas snd_util_mem parport_pc snd_hwdep snd parport
>> emu10k1_gp rtc_cmos gameport i2c_i801
>>     Pid: 0, comm: swapper Not tainted 3.1.1-gentoo #1
>>     Call Trace:
>>      [<c1022970>] warn_slowpath_common+0x65/0x7a
>>      [<c102699e>] ? local_bh_enable+0x32/0x79
>>      [<c1022994>] warn_slowpath_null+0xf/0x13
>>      [<c102699e>] local_bh_enable+0x32/0x79
>>      [<c134bfd8>] destroy_conntrack+0x7c/0x9b
>>      [<c134890b>] nf_conntrack_destroy+0x1f/0x26
>>      [<c132e3a6>] skb_release_head_state+0x74/0x83
>>      [<c132e286>] __kfree_skb+0xb/0x6b
>>      [<c132e30a>] consume_skb+0x24/0x26
>>      [<c127c925>] b44_poll+0xaa/0x449
>>      [<c1333ca1>] net_rx_action+0x3f/0xea
>>      [<c1026a44>] __do_softirq+0x5f/0xd5
>>      [<c10269e5>] ? local_bh_enable+0x79/0x79
>>      <IRQ>  [<c1026c32>] ? irq_exit+0x34/0x8d
>>      [<c1003628>] ? do_IRQ+0x74/0x87
>>      [<c13f5329>] ? common_interrupt+0x29/0x30
>>      [<c1006e18>] ? default_idle+0x29/0x3e
>>      [<c10015a7>] ? cpu_idle+0x2f/0x5d
>>      [<c13e91c5>] ? rest_init+0x79/0x7b
>>      [<c15c66a9>] ? start_kernel+0x297/0x29c
>>      [<c15c60b0>] ? i386_start_kernel+0xb0/0xb7
>>     ---[ end trace 583f33bb1aa207a9 ]---
>>
>>
>> However if I apply the following patch this error does not show up anymore:
>>
>>
>> diff --git a/drivers/net/ethernet/broadcom/b44.c
>> b/drivers/net/ethernet/broadcom/b44.c
>> index 4cf835d..3fb66d0 100644
>> --- a/drivers/net/ethernet/broadcom/b44.c
>> +++ b/drivers/net/ethernet/broadcom/b44.c
>> @@ -608,7 +608,7 @@ static void b44_tx(struct b44 *bp)
>>                                  skb->len,
>>                                  DMA_TO_DEVICE);
>>                 rp->skb = NULL;
>> -               dev_kfree_skb(skb);
>> +               dev_kfree_skb_irq(skb);
>
> I suspect the "right" way to fix this is to call dev_kfree_skb_any(skb);
> instead, since that will handle the in-interrupt case if that's where
> we're stuck.
>
> Can you try this patch (compile-tested only) and see if fixes the issue
> you're seeing:
>
> commit e36ef2c1a2b6b517ed43254eb89768794a049b1c
> Author: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Date:   Mon Nov 21 15:14:18 2011 -0800
>
> b44: Use dev_kfree_skb_any() in b44_tx()
>
> Reported issues when using dev_kfree_skb() on UP systems and
> systems with low numbers of cores.  dev_kfree_skb_any() will
> properly save IRQ state before freeing the skb, depending on
> how b44_tx() is invoked.
>
> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> ---
>
>  drivers/net/ethernet/broadcom/b44.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/drivers/net/ethernet/broadcom/b44.c
> b/drivers/net/ethernet/broadcom/b44.c
> index 4cf835d..6a7c39b 100644
> --- a/drivers/net/ethernet/broadcom/b44.c
> +++ b/drivers/net/ethernet/broadcom/b44.c
> @@ -608,7 +608,7 @@ static void b44_tx(struct b44 *bp)
>                                 skb->len,
>                                 DMA_TO_DEVICE);
>                rp->skb = NULL;
> -               dev_kfree_skb(skb);
> +               dev_kfree_skb_any(skb);
>        }
>
>        bp->tx_cons = cons;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: softirq oops from b44_poll
  2011-11-21 23:17 ` Peter P Waskiewicz Jr
  2011-11-22 11:21   ` Xander Hover
@ 2011-11-22 13:43   ` Xander Hover
  2011-11-22 20:54   ` David Miller
  2 siblings, 0 replies; 10+ messages in thread
From: Xander Hover @ 2011-11-22 13:43 UTC (permalink / raw)
  To: Peter P Waskiewicz Jr; +Cc: linux-kernel, netdev

Tested on 3.1.1 and 3.2_rc2+ now.
Fix is working as expected.

Kind regards,

Xander Hover


On Tue, Nov 22, 2011 at 12:17 AM, Peter P Waskiewicz Jr
<peter.p.waskiewicz.jr@intel.com> wrote:
> On Mon, 2011-11-21 at 05:58 -0800, Xander Hover wrote:
>> Hi all,
>>
>> I noticed the small discussion about the b44_poll OOPS and
>> I also have a uni-processor PC with a broadcom network device (b44)
>> that causes similar kernel OOPSes.
>>
>> Here is a (reproducible) trace that still shows up in kernel 3.1.1:
>>
>>  ------------[ cut here ]------------
>>     WARNING: at kernel/softirq.c:159 local_bh_enable+0x32/0x79()
>>     Hardware name: Dimension 2400
>>     Modules linked in: snd_seq_midi snd_emu10k1_synth snd_emux_synth
>> snd_seq_virmidi snd_seq_midi_emul snd_seq_oss snd_seq_midi_event
>> snd_seq snd_pcm_oss snd_mixer_oss bnep rfcomm cryptd aes_i586
>> aes_generic ecb btusb bluetooth rfkill ppdev snd_emu10k1 snd_rawmidi
>> snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer
>> snd_page_alloc dcdbas snd_util_mem parport_pc snd_hwdep snd parport
>> emu10k1_gp rtc_cmos gameport i2c_i801
>>     Pid: 0, comm: swapper Not tainted 3.1.1-gentoo #1
>>     Call Trace:
>>      [<c1022970>] warn_slowpath_common+0x65/0x7a
>>      [<c102699e>] ? local_bh_enable+0x32/0x79
>>      [<c1022994>] warn_slowpath_null+0xf/0x13
>>      [<c102699e>] local_bh_enable+0x32/0x79
>>      [<c134bfd8>] destroy_conntrack+0x7c/0x9b
>>      [<c134890b>] nf_conntrack_destroy+0x1f/0x26
>>      [<c132e3a6>] skb_release_head_state+0x74/0x83
>>      [<c132e286>] __kfree_skb+0xb/0x6b
>>      [<c132e30a>] consume_skb+0x24/0x26
>>      [<c127c925>] b44_poll+0xaa/0x449
>>      [<c1333ca1>] net_rx_action+0x3f/0xea
>>      [<c1026a44>] __do_softirq+0x5f/0xd5
>>      [<c10269e5>] ? local_bh_enable+0x79/0x79
>>      <IRQ>  [<c1026c32>] ? irq_exit+0x34/0x8d
>>      [<c1003628>] ? do_IRQ+0x74/0x87
>>      [<c13f5329>] ? common_interrupt+0x29/0x30
>>      [<c1006e18>] ? default_idle+0x29/0x3e
>>      [<c10015a7>] ? cpu_idle+0x2f/0x5d
>>      [<c13e91c5>] ? rest_init+0x79/0x7b
>>      [<c15c66a9>] ? start_kernel+0x297/0x29c
>>      [<c15c60b0>] ? i386_start_kernel+0xb0/0xb7
>>     ---[ end trace 583f33bb1aa207a9 ]---
>>
>>
>> However if I apply the following patch this error does not show up anymore:
>>
>>
>> diff --git a/drivers/net/ethernet/broadcom/b44.c
>> b/drivers/net/ethernet/broadcom/b44.c
>> index 4cf835d..3fb66d0 100644
>> --- a/drivers/net/ethernet/broadcom/b44.c
>> +++ b/drivers/net/ethernet/broadcom/b44.c
>> @@ -608,7 +608,7 @@ static void b44_tx(struct b44 *bp)
>>                                  skb->len,
>>                                  DMA_TO_DEVICE);
>>                 rp->skb = NULL;
>> -               dev_kfree_skb(skb);
>> +               dev_kfree_skb_irq(skb);
>
> I suspect the "right" way to fix this is to call dev_kfree_skb_any(skb);
> instead, since that will handle the in-interrupt case if that's where
> we're stuck.
>
> Can you try this patch (compile-tested only) and see if fixes the issue
> you're seeing:
>
> commit e36ef2c1a2b6b517ed43254eb89768794a049b1c
> Author: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Date:   Mon Nov 21 15:14:18 2011 -0800
>
> b44: Use dev_kfree_skb_any() in b44_tx()
>
> Reported issues when using dev_kfree_skb() on UP systems and
> systems with low numbers of cores.  dev_kfree_skb_any() will
> properly save IRQ state before freeing the skb, depending on
> how b44_tx() is invoked.
>
> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> ---
>
>  drivers/net/ethernet/broadcom/b44.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/drivers/net/ethernet/broadcom/b44.c
> b/drivers/net/ethernet/broadcom/b44.c
> index 4cf835d..6a7c39b 100644
> --- a/drivers/net/ethernet/broadcom/b44.c
> +++ b/drivers/net/ethernet/broadcom/b44.c
> @@ -608,7 +608,7 @@ static void b44_tx(struct b44 *bp)
>                                 skb->len,
>                                 DMA_TO_DEVICE);
>                rp->skb = NULL;
> -               dev_kfree_skb(skb);
> +               dev_kfree_skb_any(skb);
>        }
>
>        bp->tx_cons = cons;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: softirq oops from b44_poll
  2011-11-21 23:17 ` Peter P Waskiewicz Jr
  2011-11-22 11:21   ` Xander Hover
  2011-11-22 13:43   ` Xander Hover
@ 2011-11-22 20:54   ` David Miller
  2011-11-22 23:16     ` Xander Hover
  2011-11-23  8:13     ` Peter P Waskiewicz Jr
  2 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2011-11-22 20:54 UTC (permalink / raw)
  To: peter.p.waskiewicz.jr; +Cc: lkml, linux-kernel, netdev

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Date: Mon, 21 Nov 2011 15:17:33 -0800

> I suspect the "right" way to fix this is to call dev_kfree_skb_any(skb);
> instead, since that will handle the in-interrupt case if that's where
> we're stuck.

Caller is always b44_poll(), and that caller always does spin_lock_irqsave().

Adding the extra tests implied by dev_kfree_skb_any() therefore doesn't
make any sense, as it will always evaluate to dev_kfree_skb_irq().

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: softirq oops from b44_poll
  2011-11-22 20:54   ` David Miller
@ 2011-11-22 23:16     ` Xander Hover
  2011-11-23  8:14       ` Peter P Waskiewicz Jr
  2011-11-23  8:13     ` Peter P Waskiewicz Jr
  1 sibling, 1 reply; 10+ messages in thread
From: Xander Hover @ 2011-11-22 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: peter.p.waskiewicz.jr, linux-kernel, netdev

Indeed will the in_irq() test will force dev_kfree_skb_any() to call
dev_kfree_skb_irq().
The kernel warning before this patch was applied, was also trigged by
a WARN_ON_ONCE(in_irq()).
I think David is right on this one.


On Tue, Nov 22, 2011 at 9:54 PM, David Miller <davem@davemloft.net> wrote:
> From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Date: Mon, 21 Nov 2011 15:17:33 -0800
>
>> I suspect the "right" way to fix this is to call dev_kfree_skb_any(skb);
>> instead, since that will handle the in-interrupt case if that's where
>> we're stuck.
>
> Caller is always b44_poll(), and that caller always does spin_lock_irqsave().
>
> Adding the extra tests implied by dev_kfree_skb_any() therefore doesn't
> make any sense, as it will always evaluate to dev_kfree_skb_irq().
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: softirq oops from b44_poll
  2011-11-22 20:54   ` David Miller
  2011-11-22 23:16     ` Xander Hover
@ 2011-11-23  8:13     ` Peter P Waskiewicz Jr
  1 sibling, 0 replies; 10+ messages in thread
From: Peter P Waskiewicz Jr @ 2011-11-23  8:13 UTC (permalink / raw)
  To: David Miller; +Cc: lkml, linux-kernel, netdev

[-- Attachment #1: Type: text/plain, Size: 641 bytes --]

On Tue, 2011-11-22 at 12:54 -0800, David Miller wrote:
> From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Date: Mon, 21 Nov 2011 15:17:33 -0800
> 
> > I suspect the "right" way to fix this is to call dev_kfree_skb_any(skb);
> > instead, since that will handle the in-interrupt case if that's where
> > we're stuck.
> 
> Caller is always b44_poll(), and that caller always does spin_lock_irqsave().
> 
> Adding the extra tests implied by dev_kfree_skb_any() therefore doesn't
> make any sense, as it will always evaluate to dev_kfree_skb_irq().

Agreed, I didn't dig enough through the code.  Thanks Dave.

-PJ

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4394 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: softirq oops from b44_poll
  2011-11-22 23:16     ` Xander Hover
@ 2011-11-23  8:14       ` Peter P Waskiewicz Jr
  0 siblings, 0 replies; 10+ messages in thread
From: Peter P Waskiewicz Jr @ 2011-11-23  8:14 UTC (permalink / raw)
  To: Xander Hover; +Cc: David Miller, linux-kernel, netdev

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

On Tue, 2011-11-22 at 15:16 -0800, Xander Hover wrote:
> Indeed will the in_irq() test will force dev_kfree_skb_any() to call
> dev_kfree_skb_irq().
> The kernel warning before this patch was applied, was also trigged by
> a WARN_ON_ONCE(in_irq()).
> I think David is right on this one.

Of course he is.  :)

I think your patch should be submitted to fix the warning.  I'd send it
to the netdev list (cc'd) to make sure David and the rest of those folks
see it.

-PJ

> 
> On Tue, Nov 22, 2011 at 9:54 PM, David Miller <davem@davemloft.net> wrote:
> > From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> > Date: Mon, 21 Nov 2011 15:17:33 -0800
> >
> >> I suspect the "right" way to fix this is to call dev_kfree_skb_any(skb);
> >> instead, since that will handle the in-interrupt case if that's where
> >> we're stuck.
> >
> > Caller is always b44_poll(), and that caller always does spin_lock_irqsave().
> >
> > Adding the extra tests implied by dev_kfree_skb_any() therefore doesn't
> > make any sense, as it will always evaluate to dev_kfree_skb_irq().
> >

-- 
Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
LAN Access Division, Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4394 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: softirq oops from b44_poll
  2011-11-07 20:56 Josh Boyer
@ 2011-11-08  6:21 ` Peter P Waskiewicz Jr
  0 siblings, 0 replies; 10+ messages in thread
From: Peter P Waskiewicz Jr @ 2011-11-08  6:21 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Gary Zambrano, netdev, linux-kernel, kernel-team

[-- Attachment #1: Type: text/plain, Size: 3591 bytes --]

On Mon, 2011-11-07 at 12:56 -0800, Josh Boyer wrote:
> Hi all,
> 
> We've had two reports of a WARN_ON being spit out from kernel/softirq.c
> that seem fairly related in symptoms.  Both seem to involved b44_poll
> either during the middle of some disk I/O.  An example of the output is
> here:
> 
> :WARNING: at kernel/softirq.c:159 _local_bh_enable_ip+0x44/0x8e()
> :Hardware name: Vostro 1500                     
> :Modules linked in: fuse lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
> nf_conntrack sunrpc uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec
> snd_hwdep snd_seq snd_seq_device snd_pcm dell_wmi sparse_keymap dell_laptop
> joydev dcdbas microcode r852 sm_common nand nand_ids b44 nand_ecc r592 mtd ssb
> mii memstick arc4 i2c_i801 iTCO_wdt iTCO_vendor_support iwl3945 iwl_legacy
> mac80211 cfg80211 rfkill snd_timer snd soundcore snd_page_alloc firewire_ohci
> firewire_core crc_itu_t uas usb_storage sdhci_pci sdhci mmc_core nouveau ttm
> drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi wmi video [last unloaded:
> scsi_wait_scan]
> :Pid: 1511, comm: nepomukservices Not tainted 3.1.0-1.fc16.x86_64 #1
> :Call Trace:
> : <IRQ>  [<ffffffff81057a56>] warn_slowpath_common+0x83/0x9b
> : [<ffffffff81057a88>] warn_slowpath_null+0x1a/0x1c
> : [<ffffffff8105d462>] _local_bh_enable_ip+0x44/0x8e
> : [<ffffffff8105d4ba>] local_bh_enable_ip+0xe/0x10
> : [<ffffffff814b5af4>] _raw_spin_unlock_bh+0x15/0x17
> : [<ffffffffa03cc969>] destroy_conntrack+0x9d/0xdc [nf_conntrack]
> : [<ffffffff813fa083>] nf_conntrack_destroy+0x19/0x1b
> : [<ffffffff813ce4ed>] skb_release_head_state+0xa7/0xef
> : [<ffffffff813ce2f1>] __kfree_skb+0x13/0x83
> : [<ffffffff813ce3b7>] consume_skb+0x56/0x6b
> : [<ffffffffa02e48c4>] b44_poll+0xaf/0x3ec [b44]
> : [<ffffffff813d8137>] net_rx_action+0xa9/0x1b8
> : [<ffffffffa02e202e>] ? br32+0x19/0x1d [b44]
> : [<ffffffff8105d6b3>] __do_softirq+0xc9/0x1b5
> : [<ffffffff81027719>] ? ack_APIC_irq+0x15/0x17
> : [<ffffffff814be32c>] call_softirq+0x1c/0x30
> : [<ffffffff81010b45>] do_softirq+0x46/0x81
> : [<ffffffff8105d97b>] irq_exit+0x57/0xb1
> : [<ffffffff814bec0e>] do_IRQ+0x8e/0xa5
> : [<ffffffff814b5d2e>] common_interrupt+0x6e/0x6e
> : <EOI>  [<ffffffff814bc1f4>] ? sysret_audit+0x16/0x20
> 
> You can find the original bug reports in the URLs below.  This has happened
> on two different machines, one 32-bit and another 64-bit.  I'm fairly sure
> both reports are the same issue, but I haven't a clue what that issue might
> be at the moment.
> 
> Thoughts?

I don't have the hardware to play with, but from inspection, I suspect a
thread is getting stuck on that CPU from the spin_lock_irqsave() in
b44_poll().  There are some calls that are mapping and unmapping memory,
which could be blocking.  NAPI should be offering protection under
softirq context, so I'm not sure why that spinlock is even there.  And
comparing with a number of other NAPI poll routines in other drivers,
they are also not locking.

This is entirely a theory that I can't test though.

Cheers,
-PJ

> https://bugzilla.redhat.com/show_bug.cgi?id=749856
> https://bugzilla.redhat.com/show_bug.cgi?id=741117
> 
> josh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
LAN Access Division, Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4394 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* softirq oops from b44_poll
@ 2011-11-07 20:56 Josh Boyer
  2011-11-08  6:21 ` Peter P Waskiewicz Jr
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Boyer @ 2011-11-07 20:56 UTC (permalink / raw)
  To: Gary Zambrano, netdev; +Cc: linux-kernel, kernel-team

Hi all,

We've had two reports of a WARN_ON being spit out from kernel/softirq.c
that seem fairly related in symptoms.  Both seem to involved b44_poll
either during the middle of some disk I/O.  An example of the output is
here:

:WARNING: at kernel/softirq.c:159 _local_bh_enable_ip+0x44/0x8e()
:Hardware name: Vostro 1500                     
:Modules linked in: fuse lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
nf_conntrack sunrpc uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec
snd_hwdep snd_seq snd_seq_device snd_pcm dell_wmi sparse_keymap dell_laptop
joydev dcdbas microcode r852 sm_common nand nand_ids b44 nand_ecc r592 mtd ssb
mii memstick arc4 i2c_i801 iTCO_wdt iTCO_vendor_support iwl3945 iwl_legacy
mac80211 cfg80211 rfkill snd_timer snd soundcore snd_page_alloc firewire_ohci
firewire_core crc_itu_t uas usb_storage sdhci_pci sdhci mmc_core nouveau ttm
drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi wmi video [last unloaded:
scsi_wait_scan]
:Pid: 1511, comm: nepomukservices Not tainted 3.1.0-1.fc16.x86_64 #1
:Call Trace:
: <IRQ>  [<ffffffff81057a56>] warn_slowpath_common+0x83/0x9b
: [<ffffffff81057a88>] warn_slowpath_null+0x1a/0x1c
: [<ffffffff8105d462>] _local_bh_enable_ip+0x44/0x8e
: [<ffffffff8105d4ba>] local_bh_enable_ip+0xe/0x10
: [<ffffffff814b5af4>] _raw_spin_unlock_bh+0x15/0x17
: [<ffffffffa03cc969>] destroy_conntrack+0x9d/0xdc [nf_conntrack]
: [<ffffffff813fa083>] nf_conntrack_destroy+0x19/0x1b
: [<ffffffff813ce4ed>] skb_release_head_state+0xa7/0xef
: [<ffffffff813ce2f1>] __kfree_skb+0x13/0x83
: [<ffffffff813ce3b7>] consume_skb+0x56/0x6b
: [<ffffffffa02e48c4>] b44_poll+0xaf/0x3ec [b44]
: [<ffffffff813d8137>] net_rx_action+0xa9/0x1b8
: [<ffffffffa02e202e>] ? br32+0x19/0x1d [b44]
: [<ffffffff8105d6b3>] __do_softirq+0xc9/0x1b5
: [<ffffffff81027719>] ? ack_APIC_irq+0x15/0x17
: [<ffffffff814be32c>] call_softirq+0x1c/0x30
: [<ffffffff81010b45>] do_softirq+0x46/0x81
: [<ffffffff8105d97b>] irq_exit+0x57/0xb1
: [<ffffffff814bec0e>] do_IRQ+0x8e/0xa5
: [<ffffffff814b5d2e>] common_interrupt+0x6e/0x6e
: <EOI>  [<ffffffff814bc1f4>] ? sysret_audit+0x16/0x20

You can find the original bug reports in the URLs below.  This has happened
on two different machines, one 32-bit and another 64-bit.  I'm fairly sure
both reports are the same issue, but I haven't a clue what that issue might
be at the moment.

Thoughts?

https://bugzilla.redhat.com/show_bug.cgi?id=749856
https://bugzilla.redhat.com/show_bug.cgi?id=741117

josh


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-11-23  8:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-21 13:58 softirq oops from b44_poll Xander Hover
2011-11-21 23:17 ` Peter P Waskiewicz Jr
2011-11-22 11:21   ` Xander Hover
2011-11-22 13:43   ` Xander Hover
2011-11-22 20:54   ` David Miller
2011-11-22 23:16     ` Xander Hover
2011-11-23  8:14       ` Peter P Waskiewicz Jr
2011-11-23  8:13     ` Peter P Waskiewicz Jr
  -- strict thread matches above, loose matches on Subject: below --
2011-11-07 20:56 Josh Boyer
2011-11-08  6:21 ` Peter P Waskiewicz Jr

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.