linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NOHZ: local_softirq_pending 08
@ 2012-05-30 21:48 Dave Jones
  2012-05-31  7:52 ` Thomas Gleixner
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Jones @ 2012-05-30 21:48 UTC (permalink / raw)
  To: tglx; +Cc: Linux Kernel

Thomas,
I've been seeing these messages more and more lately.
When I start a kernel build for eg, it seems to spew regularly..

May 30 17:42:00 [ 1030.651976] NOHZ: local_softirq_pending 08
May 30 17:42:16 [ 1046.579491] NOHZ: local_softirq_pending 08
May 30 17:42:16 [ 1047.230344] NOHZ: local_softirq_pending 08
May 30 17:42:33 [ 1063.751879] NOHZ: local_softirq_pending 08
May 30 17:42:33 [ 1063.796340] NOHZ: local_softirq_pending 08
May 30 17:43:00 [ 1090.776493] NOHZ: local_softirq_pending 08
May 30 17:43:00 [ 1090.836544] NOHZ: local_softirq_pending 08
May 30 17:43:00 [ 1090.880271] NOHZ: local_softirq_pending 08
May 30 17:43:04 [ 1095.291939] NOHZ: local_softirq_pending 08
May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08



According to git, you ratelimited this back in 2007 with the message
"until we found the root cause of that problem."

Is this an old problem that has returned ?

Maybe interesting note: Both machines I see this on regularly are Atom cpus,
and it's always '08'.


	Dave

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-05-30 21:48 NOHZ: local_softirq_pending 08 Dave Jones
@ 2012-05-31  7:52 ` Thomas Gleixner
  2012-05-31  7:56   ` Eric Dumazet
  2012-06-01 22:57   ` Francois Romieu
  0 siblings, 2 replies; 46+ messages in thread
From: Thomas Gleixner @ 2012-05-31  7:52 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel

Dave,

On Wed, 30 May 2012, Dave Jones wrote:

> May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08
> 
> According to git, you ratelimited this back in 2007 with the message
> "until we found the root cause of that problem."
> 
> Is this an old problem that has returned ?

Looks like.
 
> Maybe interesting note: Both machines I see this on regularly are Atom cpus,
> and it's always '08'.

So it goes idle with NET_RX softirq pending. That really should not
happen. I fear you need to fire up the tracer :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-05-31  7:52 ` Thomas Gleixner
@ 2012-05-31  7:56   ` Eric Dumazet
  2012-05-31 14:04     ` Dave Jones
  2012-06-01 22:57   ` Francois Romieu
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Dumazet @ 2012-05-31  7:56 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Dave Jones, Linux Kernel

On Thu, 2012-05-31 at 09:52 +0200, Thomas Gleixner wrote:
> Dave,
> 
> On Wed, 30 May 2012, Dave Jones wrote:
> 
> > May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08
> > 
> > According to git, you ratelimited this back in 2007 with the message
> > "until we found the root cause of that problem."
> > 
> > Is this an old problem that has returned ?
> 
> Looks like.
>  
> > Maybe interesting note: Both machines I see this on regularly are Atom cpus,
> > and it's always '08'.
> 
> So it goes idle with NET_RX softirq pending. That really should not
> happen. I fear you need to fire up the tracer :)

It might be a bug in the network driver, a race with NAPI completion.

Dave, which net drivers run on these machines ?



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-05-31  7:56   ` Eric Dumazet
@ 2012-05-31 14:04     ` Dave Jones
  2012-06-08 16:46       ` Nikos Chantziaras
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Jones @ 2012-05-31 14:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Thomas Gleixner, Linux Kernel

On Thu, May 31, 2012 at 09:56:25AM +0200, Eric Dumazet wrote:
 > On Thu, 2012-05-31 at 09:52 +0200, Thomas Gleixner wrote:
 > > Dave,
 > > 
 > > On Wed, 30 May 2012, Dave Jones wrote:
 > > 
 > > > May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08
 > > > 
 > > > According to git, you ratelimited this back in 2007 with the message
 > > > "until we found the root cause of that problem."
 > > > 
 > > > Is this an old problem that has returned ?
 > > 
 > > Looks like.
 > >  
 > > > Maybe interesting note: Both machines I see this on regularly are Atom cpus,
 > > > and it's always '08'.
 > > 
 > > So it goes idle with NET_RX softirq pending. That really should not
 > > happen. I fear you need to fire up the tracer :)
 > 
 > It might be a bug in the network driver, a race with NAPI completion.
 > 
 > Dave, which net drivers run on these machines ?

r8169.

I got an email from someone else asking if I used Realtek drivers, which I thought
was an uncanny guess.

	Dave

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-05-31  7:52 ` Thomas Gleixner
  2012-05-31  7:56   ` Eric Dumazet
@ 2012-06-01 22:57   ` Francois Romieu
  2012-06-02 20:58     ` Marc Dionne
  1 sibling, 1 reply; 46+ messages in thread
From: Francois Romieu @ 2012-06-01 22:57 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Dave Jones, Linux Kernel

Thomas Gleixner <tglx@linutronix.de> :
[
> So it goes idle with NET_RX softirq pending. That really should not
> happen. I fear you need to fire up the tracer :)

Can you elaborate what "should not happen" mean ? AFAIU nothing prevents
softirq to be interrupted after NET_RX processing and thus NET_RX_SOFTIRQ
to be pending again before tick_nohz_irq_exit() runs.

Dave can you:
- specify if there much traffic when the message appears
- send /proc/interrupts + ethtool -S ethX

Thanks.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-01 22:57   ` Francois Romieu
@ 2012-06-02 20:58     ` Marc Dionne
  2012-06-05 23:15       ` Francois Romieu
  0 siblings, 1 reply; 46+ messages in thread
From: Marc Dionne @ 2012-06-02 20:58 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Thomas Gleixner, Dave Jones, Linux Kernel

If it helps, I've been seeing the same thing here on a system with an
r8169.  The message appears consistently enough on network startup to
be able to track it down to this specific commit:
    98ddf986fca17840e46e070354b7e2cd2169da15: r8169: bh locking redux
and task scheduling.

The softirq pending messages always come right after "link down"
messages for the interface.

Marc

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-02 20:58     ` Marc Dionne
@ 2012-06-05 23:15       ` Francois Romieu
  2012-06-06  1:46         ` Dave Jones
  0 siblings, 1 reply; 46+ messages in thread
From: Francois Romieu @ 2012-06-05 23:15 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Thomas Gleixner, Dave Jones, Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 10928 bytes --]

Marc Dionne <marc.c.dionne@gmail.com> :
> If it helps, I've been seeing the same thing here on a system with an
> r8169.  The message appears consistently enough on network startup to
> be able to track it down to this specific commit:
>     98ddf986fca17840e46e070354b7e2cd2169da15: r8169: bh locking redux
> and task scheduling.
> 
> The softirq pending messages always come right after "link down"
> messages for the interface.

Let's see the traces.

I have added some code to trace the timing and values of the irq in
rtl8169_interrupt(). Patches against current git are attached.

This is the relevant part of kernel messages as seen through the serial
console:
[...]
[ 2441.972077] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[ 2441.977882] r8169 0000:01:00.0: enabling Mem-Wr-Inval
[ 2441.977939] r8169 0000:01:00.0: enabling bus mastering
[ 2441.978046] r8169 0000:01:00.0: irq 42 for MSI/MSI-X
[ 2441.979412] r8169 0000:01:00.0: eth0: RTL8168b/8111b at 0xffffc90000032000, 00:13:8f:ea:b1:5d, XID 18000000 IRQ 42
[ 2441.990081] r8169 0000:01:00.0: eth0: jumbo features [frames: 4080 bytes, tx checksumming: ko]
[...]
[ 2683.588223] r8169 0000:01:00.0: 8168b-lom: link down
[ 2683.593302] r8169 0000:01:00.0: 8168b-lom: link down
[ 2683.598341] NOHZ: local_softirq_pending 08
[ 2683.599356] IPv6: ADDRCONF(NETDEV_UP): 8168b-lom: link is not ready
[ 2685.779376] r8169 0000:01:00.0: 8168b-lom: link up
[ 2685.784824] IPv6: ADDRCONF(NETDEV_CHANGE): 8168b-lom: link becomes ready

Now the content of the IntrStatus register from rtl8169_interrupt() :

[35882] 0020 (LinkChg, see [ 2683.593302] r8169 ... 8168b-lom: link down)
[57792] 0020 (LinkChg, see [ 2685.779376] r8169 ... 8168b-lom: link up)
[57960] 0084
[64880] 0084
[74880] 0084
[06240] 0084
[14960] 0084
[55040] 0084


'ip link set dev 8168b-lom up' + 'cat /sys/kernel/debug/tracing/trace':

#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
[...]
          <idle>-0     [001] d..1  2681.581946: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
              ip-1437  [000] ...1  2681.582967: rtl8169_get_stats64 <-dev_get_stats
              ip-1437  [000] ...1  2681.582974: rtl8169_get_stats64 <-dev_get_stats
              ip-1437  [000] ...1  2681.582990: rtl8169_get_stats64 <-dev_get_stats
              ip-1437  [000] ...1  2681.582996: rtl8169_get_stats64 <-dev_get_stats
              ip-1437  [000] ...1  2681.583002: rtl8169_get_stats64 <-dev_get_stats
              ip-1437  [000] ....  2681.583146: rtl_device_event <-notifier_call_chain
              ip-1437  [000] ....  2681.583148: rtl_open <-__dev_open
              ip-1437  [000] ....  2681.583153: rtl8169_init_ring <-rtl_open
              ip-1437  [000] ....  2681.583155: rtl8169_alloc_rx_data <-rtl8169_init_ring
[...]
              ip-1437  [000] ....  2681.583497: rtl8169_alloc_rx_data <-rtl8169_init_ring
              ip-1437  [000] ....  2681.583498: rtl_request_uncached_firmware <-rtl_open
              ip-1437  [000] ....  2681.583539: rtl8169_init_phy <-rtl_open
              ip-1437  [000] ....  2681.583540: rtl_hw_phy_config <-rtl8169_init_phy
              ip-1437  [000] ....  2681.583540: rtl_writephy_batch <-rtl_hw_phy_config
              ip-1437  [000] ....  2681.583541: rtl_writephy <-rtl_writephy_batch
              ip-1437  [000] ....  2681.583542: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.583669: rtl_writephy <-rtl_writephy_batch
              ip-1437  [000] ....  2681.583669: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.583795: rtl_writephy <-rtl_writephy_batch
              ip-1437  [000] ....  2681.583796: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.583924: rtl8169_xmii_reset_enable <-rtl8169_init_phy
              ip-1437  [000] ....  2681.583925: rtl_readphy <-rtl8169_xmii_reset_enable
              ip-1437  [000] ....  2681.583925: r8169_mdio_read <-rtl_readphy
              ip-1437  [000] ....  2681.584053: rtl_writephy <-rtl8169_xmii_reset_enable
              ip-1437  [000] ....  2681.584053: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.584180: rtl8169_xmii_reset_pending <-rtl8169_init_phy
              ip-1437  [000] ....  2681.584180: rtl_readphy <-rtl8169_xmii_reset_pending
              ip-1437  [000] ....  2681.584180: r8169_mdio_read <-rtl_readphy
              ip-1437  [000] ....  2681.584308: rtl8169_set_speed <-rtl8169_init_phy
              ip-1437  [000] ....  2681.584309: rtl8169_set_speed_xmii <-rtl8169_set_speed
              ip-1437  [000] ....  2681.584309: rtl_writephy <-rtl8169_set_speed_xmii
              ip-1437  [000] ....  2681.584309: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.584436: rtl_readphy <-rtl8169_set_speed_xmii
              ip-1437  [000] ....  2681.584436: r8169_mdio_read <-rtl_readphy
              ip-1437  [000] ....  2681.584564: rtl_readphy <-rtl8169_set_speed_xmii
              ip-1437  [000] ....  2681.584564: r8169_mdio_read <-rtl_readphy
              ip-1437  [000] ....  2681.584692: rtl_writephy <-rtl8169_set_speed_xmii
              ip-1437  [000] ....  2681.584692: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.584819: rtl_writephy <-rtl8169_set_speed_xmii
              ip-1437  [000] ....  2681.584819: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.584946: rtl_writephy <-rtl8169_set_speed_xmii
              ip-1437  [000] ....  2681.584946: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.585074: rtl_writephy <-r8168_pll_power_up
              ip-1437  [000] ....  2681.585075: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.585201: rtl_writephy <-r8168_pll_power_up
              ip-1437  [000] ....  2681.585201: r8169_mdio_write <-rtl_writephy
          <idle>-0     [001] d..2  2681.585306: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
              ip-1437  [000] ....  2681.585326: rtl_writephy <-r8168_pll_power_up
              ip-1437  [000] ....  2681.585327: r8169_mdio_write <-rtl_writephy
              ip-1437  [000] ....  2681.585453: rtl_hw_start_8168 <-rtl_open
              ip-1437  [000] ....  2681.585456: rtl_set_rx_tx_desc_registers <-rtl_hw_start_8168
              ip-1437  [000] ....  2681.585456: rtl_set_rx_mode <-rtl_hw_start_8168
              ip-1437  [000] ....  2681.585461: rtl_hw_start_8168bb <-rtl_hw_start_8168
              ip-1437  [000] ....  2681.585464: rtl_tx_performance_tweak <-rtl_hw_start_8168bb
              ip-1437  [000] ....  2681.585472: rtl8169_xmii_link_ok <-__rtl8169_check_link_status

[See first LinkChg event above]

              ip-1437  [000] d.h.  2681.585477: rtl8169_interrupt <-handle_irq_event_percpu
              ip-1437  [000] ..s1  2681.585481: rtl8169_poll <-net_rx_action

[r8169 NAPI handler schedules some work for the link event...]

              ip-1437  [000] ..s1  2681.585483: rtl_schedule_task <-rtl8169_poll
          <idle>-0     [001] d..2  2681.589280: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
              ip-1437  [000] .Ns1  2681.590517: rtl_set_rx_mode <-__dev_set_rx_mode
              ip-1437  [000] .Ns1  2681.590523: rtl_set_rx_mode <-__dev_set_rx_mode
              ip-1437  [000] .N..  2681.590533: rtl8169_get_stats64 <-dev_get_stats
              ip-1437  [000] .Ns1  2681.590546: rtl_set_rx_mode <-__dev_set_rx_mode

[... and kworker kicks in]

     kworker/0:2-44    [000] ....  2681.590565: rtl_task <-process_one_work
     kworker/0:2-44    [000] ....  2681.590566: rtl_slow_event_work <-rtl_task

[Here comes the 'link down' message...]

     kworker/0:2-44    [000] ....  2681.590568: rtl8169_xmii_link_ok <-__rtl8169_check_link_status
          <idle>-0     [001] d..1  2681.590774: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
          <idle>-0     [001] d..2  2681.593285: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit

[... and the 'local_softirq_pending' message takes place ~5ms later]

          <idle>-0     [000] d..1  2681.595607: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
              ip-1437  [001] ....  2681.596629: rtl_device_event <-notifier_call_chain

[NAPI was scheduled from 'rtl_slow_event_work'. rtl8169_poll experiences a
 10 ms delay but I really don't care: there should not be much slow events
 and it avoids shared locking between the slow work thread and the NAPI
 handler]

          <idle>-0     [000] .Ns2  2681.606110: rtl8169_poll <-net_rx_action
          <idle>-0     [000] d..1  2681.606377: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
          <idle>-0     [001] d..1  2681.606506: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
          <idle>-0     [000] d..2  2681.606526: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
          <idle>-0     [000] d..2  2681.607842: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
          <idle>-0     [001] d..2  2681.609292: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
          <idle>-0     [001] d..2  2681.613272: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
          <idle>-0     [001] d..2  2681.617268: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
          <idle>-0     [001] d..2  2681.621265: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
          <idle>-0     [001] d..2  2681.625275: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
          <idle>-0     [001] d..2  2681.629262: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
[...]
          <idle>-0     [001] d..1  2683.551714: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
          <idle>-0     [001] d..1  2683.751544: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
          <idle>-0     [000] d.h1  2683.774846: rtl8169_interrupt <-handle_irq_event_percpu
          <idle>-0     [000] ..s2  2683.774850: rtl8169_poll <-net_rx_action
          <idle>-0     [000] ..s2  2683.774852: rtl_schedule_task <-rtl8169_poll
     kworker/0:2-44    [000] ....  2683.774860: rtl_task <-process_one_work
     kworker/0:2-44    [000] ....  2683.774861: rtl_slow_event_work <-rtl_task

[link up message, 2s after the 'link down' one]

     kworker/0:2-44    [000] ....  2683.774863: rtl8169_xmii_link_ok <-__rtl8169_check_link_status
     kworker/0:2-44    [000] ....  2683.774865: rtl_link_chg_patch <-__rtl8169_check_link_status

98ddf986fca17840e46e070354b7e2cd2169da15 removed bh locking, whence the
implicit do_softirq() called from _local_bh_enable_ip.

With my stupid driver maintainer hat on, I can only tell that
it works as designed.

It could be worth checking if Dave's traces look the same or not.

-- 
Ueimor

[-- Attachment #2: 0001-r8169-debugfs-support.patch --]
[-- Type: text/plain, Size: 6487 bytes --]

>From 269c60147b6e7e7ad46b36e7dd1c91bb0c1b0274 Mon Sep 17 00:00:00 2001
Message-Id: <269c60147b6e7e7ad46b36e7dd1c91bb0c1b0274.1338938527.git.romieu@fr.zoreil.com>
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Fri, 25 May 2012 11:18:03 +0200
Subject: [PATCH 1/2] r8169: debugfs support.
X-Organisation: Land of Sunshine Inc.

Mostly copied from Stephen Hemminger's sky2.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
 drivers/net/ethernet/realtek/Kconfig |    9 ++
 drivers/net/ethernet/realtek/r8169.c |  186 ++++++++++++++++++++++++++++++++++
 2 files changed, 195 insertions(+)

diff --git a/drivers/net/ethernet/realtek/Kconfig b/drivers/net/ethernet/realtek/Kconfig
index 5821966..4f7930c 100644
--- a/drivers/net/ethernet/realtek/Kconfig
+++ b/drivers/net/ethernet/realtek/Kconfig
@@ -115,4 +115,13 @@ config R8169
 	  To compile this driver as a module, choose M here: the module
 	  will be called r8169.  This is recommended.
 
+config R8169_DEBUGFS
+	bool "Realtek 8169 gigabit ethernet debugfs support"
+	depends on R8169 && DEBUG_FS
+	---help---
+	  This option adds the ability to dump driver state for debugging.
+	  Debug data is available in the /sys/kernel/debug/r8169/ethX file.
+
+	  If unsure, say N.
+
 endif # NET_VENDOR_REALTEK
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 9757ce3..abb028a 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -28,6 +28,7 @@
 #include <linux/firmware.h>
 #include <linux/pci-aspm.h>
 #include <linux/prefetch.h>
+#include <linux/debugfs.h>
 
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -60,6 +61,10 @@
 #define dprintk(fmt, args...)	do {} while (0)
 #endif /* RTL8169_DEBUG */
 
+#ifdef CONFIG_R8169_DEBUGFS
+static struct dentry *rtl_debug;
+#endif
+
 #define R8169_MSG_DEFAULT \
 	(NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN)
 
@@ -774,6 +779,10 @@ struct rtl8169_private {
 		} phy_action;
 	} *rtl_fw;
 #define RTL_FIRMWARE_UNKNOWN	ERR_PTR(-EAGAIN)
+
+#ifdef CONFIG_R8169_DEBUGFS
+	struct dentry *debugfs;
+#endif
 };
 
 MODULE_AUTHOR("Realtek and the Linux r8169 crew <netdev@vger.kernel.org>");
@@ -6692,14 +6701,191 @@ static struct pci_driver rtl8169_pci_driver = {
 	.driver.pm	= RTL8169_PM_OPS,
 };
 
+#ifdef CONFIG_R8169_DEBUGFS
+
+static struct dentry *rtl_debug;
+
+struct Desc {
+	__le32 opts1;
+	__le32 opts2;
+	__le64 addr;
+};
+
+union rtl_ring {
+	struct RxDesc	*rx;
+	struct TxDesc	*tx;
+	struct Desc	*anon;
+};
+
+static void rtl_debug_show_ring(struct seq_file *seq, union rtl_ring u, int n)
+{
+	struct Desc *desc = u.anon;
+	int i;
+
+	for (i = 0; i < n; i++) {
+		seq_printf(seq, "[%03d] %08x %08x %0llx\n", i, desc->opts1,
+			   desc->opts2, (u64)desc->addr);
+		desc++;
+	}
+}
+
+#define DECLARE_RTL_DEBUG_FOPS(name) \
+static const struct file_operations rtl_debug_fops_##name = { \
+	.owner		= THIS_MODULE, \
+	.open		= rtl_debug_open_##name, \
+	.read		= seq_read, \
+	.llseek		= seq_lseek, \
+	.release	= single_release, \
+}
+
+static int rtl_debug_show_rx(struct seq_file *seq, void *v)
+{
+	struct rtl8169_private *tp = seq->private;
+	union rtl_ring u = { .rx = tp->RxDescArray };
+
+	seq_printf(seq, "Rx: dirty=%08x current=%08x\n", tp->dirty_rx,
+		   tp->cur_rx);
+	rtl_debug_show_ring(seq, u, NUM_RX_DESC);
+
+	return 0;
+}
+
+static int rtl_debug_open_rx(struct inode *inode, struct file *file)
+{
+	return single_open(file, rtl_debug_show_rx, inode->i_private);
+}
+
+DECLARE_RTL_DEBUG_FOPS(rx);
+
+static int rtl_debug_show_tx(struct seq_file *seq, void *v)
+{
+	struct rtl8169_private *tp = seq->private;
+	union rtl_ring u = { .tx = tp->TxDescArray };
+
+	seq_printf(seq, "Tx: dirty=%08x current=%08x queue:%s\n", tp->dirty_tx,
+		   tp->cur_tx, netif_queue_stopped(tp->dev) ? "off" : "on");
+	rtl_debug_show_ring(seq, u, NUM_TX_DESC);
+
+	return 0;
+}
+
+static int rtl_debug_open_tx(struct inode *inode, struct file *file)
+{
+	return single_open(file, rtl_debug_show_tx, inode->i_private);
+}
+
+DECLARE_RTL_DEBUG_FOPS(tx);
+
+static void rtl_debug_create_tree(struct net_device *dev)
+{
+	struct rtl8169_private *tp = netdev_priv(dev);
+	struct dentry *parent;
+	static const struct rtl_debug_ent {
+		const struct file_operations *fops;
+		const char *name;
+	} ent[] = {
+		{ &rtl_debug_fops_rx,	"rx" },
+		{ &rtl_debug_fops_tx,	"tx" },
+		{ NULL, }
+	};
+	int i;
+
+	parent = debugfs_create_dir(dev->name, rtl_debug);
+	if (IS_ERR(parent))
+		goto err;
+
+	for (i = 0; i < ARRAY_SIZE(ent) - 1; i++) {
+		const struct rtl_debug_ent *e = ent + i;
+		struct dentry *d;
+
+		d = debugfs_create_file(e->name, S_IRUGO, parent, tp, e->fops);
+		if (!d)
+			goto err_remove;
+	}
+
+	tp->debugfs = parent;
+	return;
+
+err_remove:
+	debugfs_remove_recursive(parent);
+err:
+	netif_err(tp, drv, dev, "failed to create debugfs tree.\n");
+	return;
+}
+
+
+static int rtl_device_event(struct notifier_block *unused, unsigned long event,
+			    void *ptr)
+{
+	struct net_device *dev = ptr;
+	struct rtl8169_private *tp = netdev_priv(dev);
+
+	if (dev->netdev_ops->ndo_open != rtl_open || !rtl_debug)
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_CHANGENAME:
+		if (tp->debugfs) {
+			tp->debugfs = debugfs_rename(rtl_debug, tp->debugfs,
+						     rtl_debug, dev->name);
+		}
+		break;
+
+	case NETDEV_GOING_DOWN:
+		if (tp->debugfs) {
+			netdev_printk(KERN_DEBUG, dev, "remove debugfs\n");
+			debugfs_remove_recursive(tp->debugfs);
+			tp->debugfs = NULL;
+		}
+		break;
+
+	case NETDEV_UP:
+		rtl_debug_create_tree(dev);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block rtl_debug_notifier = {
+	.notifier_call = rtl_device_event,
+};
+
+static __init void rtl_debug_init(void)
+{
+	struct dentry *ent;
+
+	ent = debugfs_create_dir(MODULENAME, NULL);
+	if (ent && !IS_ERR(ent)) {
+		rtl_debug = ent;
+		register_netdevice_notifier(&rtl_debug_notifier);
+	}
+}
+
+static __exit void rtl_debug_cleanup(void)
+{
+	if (rtl_debug) {
+		unregister_netdevice_notifier(&rtl_debug_notifier);
+		debugfs_remove(rtl_debug);
+		rtl_debug = NULL;
+	}
+}
+
+#else
+#define rtl_debug_init()
+#define rtl_debug_cleanup()
+#endif
+
 static int __init rtl8169_init_module(void)
 {
+	rtl_debug_init();
 	return pci_register_driver(&rtl8169_pci_driver);
 }
 
 static void __exit rtl8169_cleanup_module(void)
 {
 	pci_unregister_driver(&rtl8169_pci_driver);
+	rtl_debug_cleanup();
 }
 
 module_init(rtl8169_init_module);
-- 
1.7.10.2


[-- Attachment #3: 0002-r8169-irq-debug-stuff.patch --]
[-- Type: text/plain, Size: 3249 bytes --]

>From fb5fdae345e4099a8e543401ba36e7588c7c8e4c Mon Sep 17 00:00:00 2001
Message-Id: <fb5fdae345e4099a8e543401ba36e7588c7c8e4c.1338938527.git.romieu@fr.zoreil.com>
In-Reply-To: <269c60147b6e7e7ad46b36e7dd1c91bb0c1b0274.1338938527.git.romieu@fr.zoreil.com>
References: <269c60147b6e7e7ad46b36e7dd1c91bb0c1b0274.1338938527.git.romieu@fr.zoreil.com>
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Sat, 2 Jun 2012 11:21:58 +0200
Subject: [PATCH 2/2] r8169: irq debug stuff.
X-Organisation: Land of Sunshine Inc.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
 drivers/net/ethernet/realtek/r8169.c |   59 ++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index abb028a..4a05b68 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -782,6 +782,15 @@ struct rtl8169_private {
 
 #ifdef CONFIG_R8169_DEBUGFS
 	struct dentry *debugfs;
+
+#define RTL_DEBUG_EVENT_RING_SIZE	1024
+	struct rtl_event_ring {
+		struct rtl_event {
+			u16 ts;
+			u16 evt;
+		} evts[RTL_DEBUG_EVENT_RING_SIZE];
+		int used;
+	} event_ring;
 #endif
 };
 
@@ -5849,6 +5858,31 @@ process_pkt:
 	return count;
 }
 
+#ifdef CONFIG_R8169_DEBUGFS
+static void rtl_trace_event(struct rtl8169_private *tp, u16 status)
+{
+	struct rtl_event_ring *evr = &tp->event_ring;
+	int used = evr->used;
+	struct rtl_event *e = evr->evts + used;
+
+	if (used >= RTL_DEBUG_EVENT_RING_SIZE) {
+		if (net_ratelimit())
+			netif_err(tp, drv, tp->dev, "event ring full.\n");
+		return;
+	}
+
+	/* ms. Well, almost. */
+	e->ts = ((local_clock() / 100000) % 100000) >> 1;
+	e->evt = status;
+	evr->used = used + 1;
+}
+
+#else /* !CONFIG_R8169_DEBUGFS */
+
+#define rtl_trace_event(tp, status)	do { } while (0)
+
+#endif /* !CONFIG_R8169_DEBUGFS */
+
 static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 {
 	struct net_device *dev = dev_instance;
@@ -5858,6 +5892,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 
 	status = rtl_get_events(tp);
 	if (status && status != 0xffff) {
+		rtl_trace_event(tp, status);
+
 		status &= RTL_EVENT_NAPI | tp->event_slow;
 		if (status) {
 			handled = 1;
@@ -6776,6 +6812,28 @@ static int rtl_debug_open_tx(struct inode *inode, struct file *file)
 
 DECLARE_RTL_DEBUG_FOPS(tx);
 
+static int rtl_debug_show_irq(struct seq_file *seq, void *v)
+{
+	struct rtl8169_private *tp = seq->private;
+	struct rtl_event_ring *evr = &tp->event_ring;
+	int i;
+
+	for (i = 0; i < min(evr->used, RTL_DEBUG_EVENT_RING_SIZE); i++) {
+		struct rtl_event *e = evr->evts + i;
+
+		seq_printf(seq, "[%05d] %04x\n", e->ts << 1, e->evt);
+	}
+
+	return 0;
+}
+
+static int rtl_debug_open_irq(struct inode *inode, struct file *file)
+{
+	return single_open(file, rtl_debug_show_irq, inode->i_private);
+}
+
+DECLARE_RTL_DEBUG_FOPS(irq);
+
 static void rtl_debug_create_tree(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
@@ -6786,6 +6844,7 @@ static void rtl_debug_create_tree(struct net_device *dev)
 	} ent[] = {
 		{ &rtl_debug_fops_rx,	"rx" },
 		{ &rtl_debug_fops_tx,	"tx" },
+		{ &rtl_debug_fops_irq,	"irq" },
 		{ NULL, }
 	};
 	int i;
-- 
1.7.10.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-05 23:15       ` Francois Romieu
@ 2012-06-06  1:46         ` Dave Jones
  2012-06-06  5:42           ` Francois Romieu
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Jones @ 2012-06-06  1:46 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Marc Dionne, Thomas Gleixner, Linux Kernel

On Wed, Jun 06, 2012 at 01:15:50AM +0200, Francois Romieu wrote:
 > Marc Dionne <marc.c.dionne@gmail.com> :
 > > If it helps, I've been seeing the same thing here on a system with an
 > > r8169.  The message appears consistently enough on network startup to
 > > be able to track it down to this specific commit:
 > >     98ddf986fca17840e46e070354b7e2cd2169da15: r8169: bh locking redux
 > > and task scheduling.
 > > 
 > > The softirq pending messages always come right after "link down"
 > > messages for the interface.

for me, they're during boot when the nic comes up, and then very occasionally afterwards.

 > Let's see the traces.
 > 
 > I have added some code to trace the timing and values of the irq in
 > rtl8169_interrupt(). Patches against current git are attached.
 
with these patches applied, I get a lot of spew in dmesg like ..

[   98.659314] r8169 0000:01:00.0: eth0: event ring full.
[   98.763513] r8169 0000:01:00.0: eth0: event ring full.
[   99.182103] r8169 0000:01:00.0: eth0: event ring full.
[   99.294426] r8169 0000:01:00.0: eth0: event ring full.
[   99.420255] r8169 0000:01:00.0: eth0: event ring full.
[   99.455811] r8169 0000:01:00.0: eth0: event ring full.
[   99.520541] r8169 0000:01:00.0: eth0: event ring full.
[   99.730789] r8169 0000:01:00.0: eth0: event ring full.
[   99.887643] r8169 0000:01:00.0: eth0: event ring full.
[  103.798861] net_ratelimit: 43 callbacks suppressed

over and over..

 > 'ip link set dev 8168b-lom up' + 'cat /sys/kernel/debug/tracing/trace':

What tracer do I need to configure to get make this work ? irqsoff ?

finally, /sys/kernel/debug/r8169/ is empty.

	Dave


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-06  1:46         ` Dave Jones
@ 2012-06-06  5:42           ` Francois Romieu
  2012-06-08  2:34             ` Dave Jones
  0 siblings, 1 reply; 46+ messages in thread
From: Francois Romieu @ 2012-06-06  5:42 UTC (permalink / raw)
  To: Dave Jones; +Cc: Marc Dionne, Thomas Gleixner, Linux Kernel

Dave Jones <davej@redhat.com> :
[...]
> with these patches applied, I get a lot of spew in dmesg like ..

Yes, I only set up a single array for the irq instead of a real ring
and pipe in debugfs. You can change RTL_DEBUG_EVENT_RING_SIZE (4 bytes
per timestamp + event record). seq_file will grow its buffer when it
overflows.

[...]
> [   99.730789] r8169 0000:01:00.0: eth0: event ring full.
> [   99.887643] r8169 0000:01:00.0: eth0: event ring full.
> [  103.798861] net_ratelimit: 43 callbacks suppressed
> 
> over and over..

You can remove the message in 'rtl_trace_event'

[...]
>  > 'ip link set dev 8168b-lom up' + 'cat /sys/kernel/debug/tracing/trace':
> 
> What tracer do I need to configure to get make this work ? irqsoff ?

'function'

My sequence was:

echo 0 > tracing_on
echo nop > current_tracer
echo 10000 > buffer_size_kb
echo function > current_tracer
echo rtl* > set_ftrace_filter
echo r8169* >> set_ftrace_filter
echo tick_nohz_stop_sched_tick >> set_ftrace_filter

I only enabled tracing_on right before 'ip link ... up'.
tick_nohz_stop_sched_tick fills the log rather fast.

[...]
> finally, /sys/kernel/debug/r8169/ is empty.

I temporary saw it once too. Don't ask why.

Do things look like:

$ objdump --syms drivers/net/ethernet/realtek/r8169.ko | grep -E 'rtl_debug_(show|fops)_([rt]x|irq)' 
000000000000346d l     F .text	0000000000000087 rtl_debug_show_irq
0000000000009bdc l     F .text	00000000000000a0 rtl_debug_show_tx
000000000000355a l     F .text	0000000000000063 rtl_debug_show_rx
00000000000019e0 l     O .rodata	00000000000000d0 rtl_debug_fops_rx
0000000000001ab0 l     O .rodata	00000000000000d0 rtl_debug_fops_tx
0000000000001b80 l     O .rodata	00000000000000d0 rtl_debug_fops_irq

-- 
Ueimor

(off to work)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-06  5:42           ` Francois Romieu
@ 2012-06-08  2:34             ` Dave Jones
  2012-06-08  7:57               ` Thomas Gleixner
  2012-06-08  8:32               ` Johannes Stezenbach
  0 siblings, 2 replies; 46+ messages in thread
From: Dave Jones @ 2012-06-08  2:34 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Marc Dionne, Thomas Gleixner, Linux Kernel

On Wed, Jun 06, 2012 at 07:42:51AM +0200, Francois Romieu wrote:

 > My sequence was:
 > 
 > echo 0 > tracing_on
 > echo nop > current_tracer
 > echo 10000 > buffer_size_kb
 > echo function > current_tracer
 > echo rtl* > set_ftrace_filter
 > echo r8169* >> set_ftrace_filter
 > echo tick_nohz_stop_sched_tick >> set_ftrace_filter
 > 
 > I only enabled tracing_on right before 'ip link ... up'.

Ok. This was a bit of a pain to trace, as I can't trigger the NOHZ message
when I bring up/down the interface by hand. It only seems to happen
during boot now for some reason. So I modified the networking ifup scripts
to start/stop the logging. It's still a pain, because there are two NICs
in the affected machine, so everything happens twice.
It also seems to have started too early & finished too late.. oh well.
hopefully you can get something interesting out of it.

 > tick_nohz_stop_sched_tick fills the log rather fast.

I don't have that symbol for some reason.

The log is still huge though. (68M uncompressed) at http://www.codemonkey.org.uk/junk/r8169-trace.txt.xz

Here's the order of events on that boot...

[   53.086040] r8169 0000:01:00.0: eth0: link down
[   53.086441] r8169 0000:01:00.0: eth0: link down
[   53.086545] NOHZ: local_softirq_pending 08
[   53.113025] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   54.773637] r8169 0000:01:00.0: eth0: link up
[   54.783210] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   64.941664] r8169 0000:05:00.0: eth1: link down
[   64.941748] r8169 0000:05:00.0: eth1: link down
[   64.941792] NOHZ: local_softirq_pending 08
[   64.968326] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[   67.948677] r8169 0000:05:00.0: eth1: link up
[   67.961872] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


	Dave

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08  2:34             ` Dave Jones
@ 2012-06-08  7:57               ` Thomas Gleixner
  2012-06-08 14:23                 ` Dave Jones
  2012-06-08  8:32               ` Johannes Stezenbach
  1 sibling, 1 reply; 46+ messages in thread
From: Thomas Gleixner @ 2012-06-08  7:57 UTC (permalink / raw)
  To: Dave Jones; +Cc: Francois Romieu, Marc Dionne, Linux Kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 259 bytes --]

On Thu, 7 Jun 2012, Dave Jones wrote:
> 
> The log is still huge though. (68M uncompressed) at
> http://www.codemonkey.org.uk/junk/r8169-trace.txt.xz

Following that URL gives me a nice picture.

This is somewhat embarrassing, isn’t it? :)

Thanks,

	tglx
	  

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08  2:34             ` Dave Jones
  2012-06-08  7:57               ` Thomas Gleixner
@ 2012-06-08  8:32               ` Johannes Stezenbach
  1 sibling, 0 replies; 46+ messages in thread
From: Johannes Stezenbach @ 2012-06-08  8:32 UTC (permalink / raw)
  To: Dave Jones; +Cc: Francois Romieu, Marc Dionne, Thomas Gleixner, Linux Kernel

On Thu, Jun 07, 2012 at 10:34:20PM -0400, Dave Jones wrote:
> Ok. This was a bit of a pain to trace, as I can't trigger the NOHZ message
> when I bring up/down the interface by hand. It only seems to happen
> during boot now for some reason.

The message is only printed 10 times on each boot due to
the "ratelimit" in tick_nohz_stop_sched_tick.  I once debugged
an issue with rt2x00 (20ed3166c) and found that fairly confusing at first.

HTH
Johannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08  7:57               ` Thomas Gleixner
@ 2012-06-08 14:23                 ` Dave Jones
  2012-06-08 14:51                   ` Thomas Gleixner
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Jones @ 2012-06-08 14:23 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Francois Romieu, Marc Dionne, Linux Kernel

On Fri, Jun 08, 2012 at 09:57:03AM +0200, Thomas Gleixner wrote:
 > On Thu, 7 Jun 2012, Dave Jones wrote:
 > > 
 > > The log is still huge though. (68M uncompressed) at
 > > http://www.codemonkey.org.uk/junk/r8169-trace.txt.xz
 > 
 > Following that URL gives me a nice picture.
 > 
 > This is somewhat embarrassing, isn’t it? :)

indeed ;-) Should be fixed.

	Dave


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08 14:23                 ` Dave Jones
@ 2012-06-08 14:51                   ` Thomas Gleixner
  2012-06-08 20:27                     ` Francois Romieu
  0 siblings, 1 reply; 46+ messages in thread
From: Thomas Gleixner @ 2012-06-08 14:51 UTC (permalink / raw)
  To: Dave Jones; +Cc: Francois Romieu, Marc Dionne, Linux Kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1059 bytes --]

On Fri, 8 Jun 2012, Dave Jones wrote:

> On Fri, Jun 08, 2012 at 09:57:03AM +0200, Thomas Gleixner wrote:
>  > On Thu, 7 Jun 2012, Dave Jones wrote:
>  > > 
>  > > The log is still huge though. (68M uncompressed) at
>  > > http://www.codemonkey.org.uk/junk/r8169-trace.txt.xz
>  > 
>  > Following that URL gives me a nice picture.
>  > 
>  > This is somewhat embarrassing, isn’t it? :)
> 
> indeed ;-) Should be fixed.

I can't find the point where the warning is issued, but I think I
found the cause of the problem.

static void rtl_slow_event_work(struct rtl8169_private *tp)
{
	.....
	napi_schedule(&tp->napi);
	  --> __napi_schedule();
	      -->	list_add_tail(&napi->poll_list, &sd->poll_list);
	              __raise_softirq_irqoff(NET_RX_SOFTIRQ);

		      This merily sets the softirq bit.


So this code is really wrong. It's called from full preemptible
context of the workqueue. And if the next thing is a context switch to
idle then the pending softirq check will trigger.

I let the network folks sort out the proper solution.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-05-31 14:04     ` Dave Jones
@ 2012-06-08 16:46       ` Nikos Chantziaras
  0 siblings, 0 replies; 46+ messages in thread
From: Nikos Chantziaras @ 2012-06-08 16:46 UTC (permalink / raw)
  To: Dave Jones; +Cc: Eric Dumazet, Thomas Gleixner, Linux Kernel

On 31/05/12 17:04, Dave Jones wrote:
> On Thu, May 31, 2012 at 09:56:25AM +0200, Eric Dumazet wrote:
>   > On Thu, 2012-05-31 at 09:52 +0200, Thomas Gleixner wrote:
>   > > Dave,
>   > >
>   > > On Wed, 30 May 2012, Dave Jones wrote:
>   > >
>   > > > May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08
>   > > >
>   > > > According to git, you ratelimited this back in 2007 with the message
>   > > > "until we found the root cause of that problem."
>   > > >
>   > > > Is this an old problem that has returned ?
>   > >
>   > > Looks like.
>   > >
>   > > > Maybe interesting note: Both machines I see this on regularly are Atom cpus,
>   > > > and it's always '08'.
>   > >
>   > > So it goes idle with NET_RX softirq pending. That really should not
>   > > happen. I fear you need to fire up the tracer :)
>   >
>   > It might be a bug in the network driver, a race with NAPI completion.
>   >
>   > Dave, which net drivers run on these machines ?
>
> r8169.
>
> I got an email from someone else asking if I used Realtek drivers, which I thought
> was an uncanny guess.

Same thing happens here (kernel 3.4.1, also happened with 3.4.0):

r8169 0000:06:00.0: eth0: link down
r8169 0000:06:00.0: eth0: link down
NOHZ: local_softirq_pending 08
NOHZ: local_softirq_pending 08
NOHZ: local_softirq_pending 08
r8169 0000:06:00.0: eth0: link up

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08 14:51                   ` Thomas Gleixner
@ 2012-06-08 20:27                     ` Francois Romieu
  2012-06-08 20:41                       ` Thomas Gleixner
  0 siblings, 1 reply; 46+ messages in thread
From: Francois Romieu @ 2012-06-08 20:27 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Dave Jones, Marc Dionne, Linux Kernel

Thomas Gleixner <tglx@linutronix.de> :
[...]
> I can't find the point where the warning is issued, but I think I
> found the cause of the problem.
> 
> static void rtl_slow_event_work(struct rtl8169_private *tp)
> {
> 	.....
> 	napi_schedule(&tp->napi);
> 	  --> __napi_schedule();
> 	      -->	list_add_tail(&napi->poll_list, &sd->poll_list);
> 	              __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> 
> 		      This merily sets the softirq bit.
> 
> 
> So this code is really wrong. It's called from full preemptible
> context of the workqueue. And if the next thing is a context switch to
> idle then the pending softirq check will trigger.

void __napi_schedule(struct napi_struct *n)
{
        unsigned long flags;

        local_irq_save(flags);
        ____napi_schedule(&__get_cpu_var(softnet_data), n);
        local_irq_restore(flags);
}

Are you saying that this stuff should be considered "preemptible" ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08 20:27                     ` Francois Romieu
@ 2012-06-08 20:41                       ` Thomas Gleixner
  2012-06-08 21:53                         ` Francois Romieu
  0 siblings, 1 reply; 46+ messages in thread
From: Thomas Gleixner @ 2012-06-08 20:41 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Dave Jones, Marc Dionne, Linux Kernel

On Fri, 8 Jun 2012, Francois Romieu wrote:

> Thomas Gleixner <tglx@linutronix.de> :
> [...]
> > I can't find the point where the warning is issued, but I think I
> > found the cause of the problem.
> > 
> > static void rtl_slow_event_work(struct rtl8169_private *tp)
> > {
> > 	.....
> > 	napi_schedule(&tp->napi);
> > 	  --> __napi_schedule();
> > 	      -->	list_add_tail(&napi->poll_list, &sd->poll_list);
> > 	              __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> > 
> > 		      This merily sets the softirq bit.
> > 
> > 
> > So this code is really wrong. It's called from full preemptible
> > context of the workqueue. And if the next thing is a context switch to
> > idle then the pending softirq check will trigger.
> 
> void __napi_schedule(struct napi_struct *n)
> {
>         unsigned long flags;
> 
>         local_irq_save(flags);
>         ____napi_schedule(&__get_cpu_var(softnet_data), n);
>         local_irq_restore(flags);
> }
> 
> Are you saying that this stuff should be considered "preemptible" ?

Gah, crap. Looked at the wrong ___________underscore level. I _____so
__love _that.

Though the problem is, that it is neither called in interrupt context
nor with bh disabled, so nothing invokes the softirq before it reaches
idle.

In hard interrupt context the pending flag is evaluated in irq_exit()
and the softirqs are invoked from there. If you call that from thread
context, then a bh_disable/enable pair will make sure that the pending
softirq is invoked. Did I miss some more ___underscore magic which
does that ?

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08 20:41                       ` Thomas Gleixner
@ 2012-06-08 21:53                         ` Francois Romieu
  2012-06-08 22:33                           ` Marc Dionne
  2012-06-25 12:38                           ` Johannes Stezenbach
  0 siblings, 2 replies; 46+ messages in thread
From: Francois Romieu @ 2012-06-08 21:53 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Dave Jones, Marc Dionne, Linux Kernel

Thomas Gleixner <tglx@linutronix.de> :
[...]
> Though the problem is, that it is neither called in interrupt context
> nor with bh disabled, so nothing invokes the softirq before it reaches
> idle.
> 
> In hard interrupt context the pending flag is evaluated in irq_exit()
> and the softirqs are invoked from there. If you call that from thread
> context, then a bh_disable/enable pair will make sure that the pending
> softirq is invoked. Did I miss some more ___underscore magic which
> does that ?

No. You _are_ right.

That's why 98ddf986fca17840e46e070354b7e2cd2169da15 triggered the
message: it removed the last bh lock in the r8169 slow work context.

/me slaps head...

The whole thing is probably moot. Dave, can you apply the patch
below on top of a fresh kernel tree ?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 4a05b68..d452441 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5934,11 +5934,7 @@ static void rtl_slow_event_work(struct rtl8169_private *tp)
 	if (status & LinkChg)
 		__rtl8169_check_link_status(dev, tp, tp->mmio_addr, true);
 
-	napi_disable(&tp->napi);
-	rtl_irq_disable(tp);
-
-	napi_enable(&tp->napi);
-	napi_schedule(&tp->napi);
+	rtl_irq_enable_all(tp);
 }
 
 static void rtl_task(struct work_struct *work)

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08 21:53                         ` Francois Romieu
@ 2012-06-08 22:33                           ` Marc Dionne
  2012-06-10 20:40                             ` Dave Jones
  2012-06-25 12:38                           ` Johannes Stezenbach
  1 sibling, 1 reply; 46+ messages in thread
From: Marc Dionne @ 2012-06-08 22:33 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Thomas Gleixner, Dave Jones, Linux Kernel

On Fri, Jun 8, 2012 at 5:53 PM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> Thomas Gleixner <tglx@linutronix.de> :
> [...]
>> Though the problem is, that it is neither called in interrupt context
>> nor with bh disabled, so nothing invokes the softirq before it reaches
>> idle.
>>
>> In hard interrupt context the pending flag is evaluated in irq_exit()
>> and the softirqs are invoked from there. If you call that from thread
>> context, then a bh_disable/enable pair will make sure that the pending
>> softirq is invoked. Did I miss some more ___underscore magic which
>> does that ?
>
> No. You _are_ right.
>
> That's why 98ddf986fca17840e46e070354b7e2cd2169da15 triggered the
> message: it removed the last bh lock in the r8169 slow work context.
>
> /me slaps head...
>
> The whole thing is probably moot. Dave, can you apply the patch
> below on top of a fresh kernel tree ?
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 4a05b68..d452441 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -5934,11 +5934,7 @@ static void rtl_slow_event_work(struct rtl8169_private *tp)
>        if (status & LinkChg)
>                __rtl8169_check_link_status(dev, tp, tp->mmio_addr, true);
>
> -       napi_disable(&tp->napi);
> -       rtl_irq_disable(tp);
> -
> -       napi_enable(&tp->napi);
> -       napi_schedule(&tp->napi);
> +       rtl_irq_enable_all(tp);
>  }
>
>  static void rtl_task(struct work_struct *work)

That works for me - no warnings after several reboots and bringing the
interface up/down many times.

Marc

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08 22:33                           ` Marc Dionne
@ 2012-06-10 20:40                             ` Dave Jones
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Jones @ 2012-06-10 20:40 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Francois Romieu, Thomas Gleixner, Linux Kernel

On Fri, Jun 08, 2012 at 06:33:09PM -0400, Marc Dionne wrote:
 > On Fri, Jun 8, 2012 at 5:53 PM, Francois Romieu <romieu@fr.zoreil.com> wrote:
 > > index 4a05b68..d452441 100644
 > > --- a/drivers/net/ethernet/realtek/r8169.c
 > > +++ b/drivers/net/ethernet/realtek/r8169.c
 > > @@ -5934,11 +5934,7 @@ static void rtl_slow_event_work(struct rtl8169_private *tp)
 > >        if (status & LinkChg)
 > >                __rtl8169_check_link_status(dev, tp, tp->mmio_addr, true);
 > >
 > > -       napi_disable(&tp->napi);
 > > -       rtl_irq_disable(tp);
 > > -
 > > -       napi_enable(&tp->napi);
 > > -       napi_schedule(&tp->napi);
 > > +       rtl_irq_enable_all(tp);
 > >  }
 > >
 > >  static void rtl_task(struct work_struct *work)
 > 
 > That works for me - no warnings after several reboots and bringing the
 > interface up/down many times.

Yep, looks good to me too. Thanks Francois

	Dave


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-08 21:53                         ` Francois Romieu
  2012-06-08 22:33                           ` Marc Dionne
@ 2012-06-25 12:38                           ` Johannes Stezenbach
  2012-06-25 19:28                             ` Francois Romieu
  1 sibling, 1 reply; 46+ messages in thread
From: Johannes Stezenbach @ 2012-06-25 12:38 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Thomas Gleixner, Dave Jones, Marc Dionne, Linux Kernel

Hi,

On Fri, Jun 08, 2012 at 11:53:28PM +0200, Francois Romieu wrote:
> Thomas Gleixner <tglx@linutronix.de> :
> [...]
> > Though the problem is, that it is neither called in interrupt context
> > nor with bh disabled, so nothing invokes the softirq before it reaches
> > idle.
> > 
> > In hard interrupt context the pending flag is evaluated in irq_exit()
> > and the softirqs are invoked from there. If you call that from thread
> > context, then a bh_disable/enable pair will make sure that the pending
> > softirq is invoked. Did I miss some more ___underscore magic which
> > does that ?
> 
> No. You _are_ right.
> 
> That's why 98ddf986fca17840e46e070354b7e2cd2169da15 triggered the
> message: it removed the last bh lock in the r8169 slow work context.
> 
> /me slaps head...
> 
> The whole thing is probably moot. Dave, can you apply the patch
> below on top of a fresh kernel tree ?
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 4a05b68..d452441 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -5934,11 +5934,7 @@ static void rtl_slow_event_work(struct rtl8169_private *tp)
>  	if (status & LinkChg)
>  		__rtl8169_check_link_status(dev, tp, tp->mmio_addr, true);
>  
> -	napi_disable(&tp->napi);
> -	rtl_irq_disable(tp);
> -
> -	napi_enable(&tp->napi);
> -	napi_schedule(&tp->napi);
> +	rtl_irq_enable_all(tp);
>  }
>  
>  static void rtl_task(struct work_struct *work)

I just upgraded my rarely-used Atom based netbook from 3.3.5 to 3.4.4
and got the "NOHZ: local_softirq_pending 08" message.
Should this patch go into stable?

Johannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-25 12:38                           ` Johannes Stezenbach
@ 2012-06-25 19:28                             ` Francois Romieu
  2012-06-25 19:52                               ` Dave Jones
  0 siblings, 1 reply; 46+ messages in thread
From: Francois Romieu @ 2012-06-25 19:28 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: Thomas Gleixner, Dave Jones, Marc Dionne, Linux Kernel, David Miller

Johannes Stezenbach <js@sig21.net> :
[7dbb491878a2c51d372a8890fa45a8ff80358af1]
> I just upgraded my rarely-used Atom based netbook from 3.3.5 to 3.4.4
> and got the "NOHZ: local_softirq_pending 08" message.
> Should this patch go into stable?

It should not change your life. I'll be fine with whatever policy davej,
tglx or davem see fit (assuming they share one :o) ).

-- 
Ueimor

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2012-06-25 19:28                             ` Francois Romieu
@ 2012-06-25 19:52                               ` Dave Jones
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Jones @ 2012-06-25 19:52 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Johannes Stezenbach, Thomas Gleixner, Marc Dionne, Linux Kernel,
	David Miller

On Mon, Jun 25, 2012 at 09:28:39PM +0200, Francois Romieu wrote:
 > Johannes Stezenbach <js@sig21.net> :
 > [7dbb491878a2c51d372a8890fa45a8ff80358af1]
 > > I just upgraded my rarely-used Atom based netbook from 3.3.5 to 3.4.4
 > > and got the "NOHZ: local_softirq_pending 08" message.
 > > Should this patch go into stable?
 > 
 > It should not change your life. I'll be fine with whatever policy davej,
 > tglx or davem see fit (assuming they share one :o) ).

it never seemed to cause any noticable problem that I observed (just the dmesg spew),
so I'm ambivalent about the whole thing now that it's fixed in 3.5rc :-)

	Dave 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-23 14:31                         ` Johannes Berg
@ 2009-10-23 16:33                           ` Tilman Schmidt
  0 siblings, 0 replies; 46+ messages in thread
From: Tilman Schmidt @ 2009-10-23 16:33 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johannes Berg schrieb:
> On Fri, 2009-10-23 at 16:27 +0200, Tilman Schmidt wrote:
>> Johannes Berg schrieb:
>>> So you've verified that the entire i4l stack can cope with being called
>>> twice on the same CPU, from different contexts?
>> What makes you think so?
> 
> I thought I'd explained this in my other email. *sigh*
[snip]

Ah, I see. You misunderstood my posting. I did not propose that
patch as a definitive and verified solution, but rather as a
request for comments from the people who know and maintain the
code in question. I thought that was clear from the facts that
- - I didn't include "[PATCH]" in the subject line
- - I didn't add a "Signed-off-by" line
- - I wrote "fixed the messages", not "solved the problem"
- - I explicitly wrote "Comments?" and "Adding i4l people to CC"

Apparently all that was still not clear enough. Sorry about that.
So let me try to make my concern as explicit as possible:

- - The patch I posted had the effect that the test which reliably
  triggered the local_softirq_pending message before did not do
  so anymore.

- - To me, this seems to indicate that the netif_rx(skb) call in
  line 1177 of source file drivers/isdn/i4l/isdn_ppp.c is indeed
  involved in the problem.

- - Now I'm asking people who know more than myself about the
  ramifications of that message (ie., you) and/or the code I
  narrowed it down to (ie., the ISDN4Linux maintainers - which
  is why I added them to the CC list) to have a look and determine
  how to fix the problem properly.

- - This would of course include, in finis, the verification you
  mistakenly assumed I might have done already.

I hope that's clear enough. If you have any questions, feel free
to ask.

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFK4drJQ3+did9BuFsRAmstAJ94UF/LupINlYpjbxzz9xoiN5w34wCfflRz
YfR/fXt3HasrxUSP29REOnE=
=VQ/C
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-23 14:27                       ` Tilman Schmidt
@ 2009-10-23 14:31                         ` Johannes Berg
  2009-10-23 16:33                           ` Tilman Schmidt
  0 siblings, 1 reply; 46+ messages in thread
From: Johannes Berg @ 2009-10-23 14:31 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil

[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]

On Fri, 2009-10-23 at 16:27 +0200, Tilman Schmidt wrote:

> >> --- a/drivers/isdn/i4l/isdn_ppp.c
> >> +++ b/drivers/isdn/i4l/isdn_ppp.c
> >> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
> >>  #endif /* CONFIG_IPPP_FILTER */
> >>  	skb->dev = dev;
> >>  	skb_reset_mac_header(skb);
> >> -	netif_rx(skb);
> >> +	if (in_interrupt())
> >> +		netif_rx(skb);
> >> +	else
> >> +		netif_rx_ni(skb);
> > 
> > So you've verified that the entire i4l stack can cope with being called
> > twice on the same CPU, from different contexts?
> 
> What makes you think so?

I thought I'd explained this in my other email. *sigh*

You're squelching a warning, which comes from the fact that you're
calling something that calls into netif_rx() with softirqs enabled. That
would seem to imply that potentially a softirq could at the same time
call into that code too.

Basically, what happens now is this:

disable softirqs
call into i4l/ppp
i4l/ppp code
call netif_rx()
more code
enable softirqs


You're changing it to

call into i4l/ppp
i4l/ppp code
call netif_rx_ni()
more code

so the entire chunks "i4l/ppp code" and "more code" are now no longer
protected against being interrupted by a softirq that runs the same
code, maybe for a different device, on the same CPU.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-23 13:34                     ` Johannes Berg
@ 2009-10-23 14:27                       ` Tilman Schmidt
  2009-10-23 14:31                         ` Johannes Berg
  0 siblings, 1 reply; 46+ messages in thread
From: Tilman Schmidt @ 2009-10-23 14:27 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johannes Berg schrieb:
> On Fri, 2009-10-23 at 01:37 +0200, Tilman Schmidt wrote:
> 
>> --- a/drivers/isdn/i4l/isdn_ppp.c
>> +++ b/drivers/isdn/i4l/isdn_ppp.c
>> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
>>  #endif /* CONFIG_IPPP_FILTER */
>>  	skb->dev = dev;
>>  	skb_reset_mac_header(skb);
>> -	netif_rx(skb);
>> +	if (in_interrupt())
>> +		netif_rx(skb);
>> +	else
>> +		netif_rx_ni(skb);
> 
> So you've verified that the entire i4l stack can cope with being called
> twice on the same CPU, from different contexts?

What makes you think so?
Better yet, what do you propose?

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFK4b09Q3+did9BuFsRAqBvAKCbRI0iXQEyK3ztxkGHcqpbcceqbACgkagX
JF7nYd152ihp2uemIs/cB54=
=YOin
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-22 23:37                   ` Tilman Schmidt
@ 2009-10-23 13:34                     ` Johannes Berg
  2009-10-23 14:27                       ` Tilman Schmidt
  0 siblings, 1 reply; 46+ messages in thread
From: Johannes Berg @ 2009-10-23 13:34 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

On Fri, 2009-10-23 at 01:37 +0200, Tilman Schmidt wrote:

> --- a/drivers/isdn/i4l/isdn_ppp.c
> +++ b/drivers/isdn/i4l/isdn_ppp.c
> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
>  #endif /* CONFIG_IPPP_FILTER */
>  	skb->dev = dev;
>  	skb_reset_mac_header(skb);
> -	netif_rx(skb);
> +	if (in_interrupt())
> +		netif_rx(skb);
> +	else
> +		netif_rx_ni(skb);

So you've verified that the entire i4l stack can cope with being called
twice on the same CPU, from different contexts?

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-21 18:46                 ` Tilman Schmidt
@ 2009-10-22 23:37                   ` Tilman Schmidt
  2009-10-23 13:34                     ` Johannes Berg
  0 siblings, 1 reply; 46+ messages in thread
From: Tilman Schmidt @ 2009-10-22 23:37 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx,
	linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil

[-- Attachment #1: Type: text/plain, Size: 1719 bytes --]

On 21.10.2009 20:46, /me wrote:
>>>> I have encountered the message in the subject during a test of
>>>> the Gigaset CAPI driver, and would like to determine whether
>>>> it's a bug in the driver, a bug somewhere else, or no bug at
>>>> all. The test scenario was PPP over ISDN with pppd+capiplugin.
>>>> In an alternative scenario, also PPP over ISDN but with
>>>> smpppd+capidrv, the message did not occur.
> 
> I'm sorry, I had confused the two cases. The message occurs in
> the smpppd+capidrv scenario, not with pppd+capiplugin.
> 
>>>> Johannes' answer pointed me to the netif_rx() function.
>>>> The Gigaset driver itself doesn't call that function at all.
>>>> In the scenario where I saw the message, it was the SYNC_PPP
>>>> line discipline that did.
> 
> This analysis was therefore wrong. It would be the netif_rx()
> call towards the end of isdn_ppp_push_higher() in
> drivers/isdn/i4l/isdn_ppp.c L1177.

Having noticed that, I cooked up the following patch which fixed
the messages for me. Comments? (Adding i4l people to the already
impressive CC list.)

--- a/drivers/isdn/i4l/isdn_ppp.c
+++ b/drivers/isdn/i4l/isdn_ppp.c
@@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
 #endif /* CONFIG_IPPP_FILTER */
 	skb->dev = dev;
 	skb_reset_mac_header(skb);
-	netif_rx(skb);
+	if (in_interrupt())
+		netif_rx(skb);
+	else
+		netif_rx_ni(skb);
 	/* net_dev->local->stats.rx_packets++; done in isdn_net.c */
 	return;
 


-- 
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-15 17:53               ` Jarek Poplawski
@ 2009-10-21 18:46                 ` Tilman Schmidt
  2009-10-22 23:37                   ` Tilman Schmidt
  0 siblings, 1 reply; 46+ messages in thread
From: Tilman Schmidt @ 2009-10-21 18:46 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx,
	linux-wireless, linux-ppp, netdev, paulus

[-- Attachment #1: Type: text/plain, Size: 2202 bytes --]

On 15.10.2009 19:53 Jarek Poplawski wrote:
> Jarek Poplawski wrote, On 10/15/2009 01:40 PM:
> 
>> On 12-10-2009 13:25, Tilman Schmidt wrote:

>>> I have encountered the message in the subject during a test of
>>> the Gigaset CAPI driver, and would like to determine whether
>>> it's a bug in the driver, a bug somewhere else, or no bug at
>>> all. The test scenario was PPP over ISDN with pppd+capiplugin.
>>> In an alternative scenario, also PPP over ISDN but with
>>> smpppd+capidrv, the message did not occur.

I'm sorry, I had confused the two cases. The message occurs in
the smpppd+capidrv scenario, not with pppd+capiplugin.

>>> Johannes' answer pointed me to the netif_rx() function.
>>> The Gigaset driver itself doesn't call that function at all.
>>> In the scenario where I saw the message, it was the SYNC_PPP
>>> line discipline that did.

This analysis was therefore wrong. It would be the netif_rx()
call towards the end of isdn_ppp_push_higher() in
drivers/isdn/i4l/isdn_ppp.c L1177.

>> Anyway, I agree with Michael Buesch there is no reason to waste time
>> for tracking all netif_rx vs netif_rx_ni uses, and it seems we could
>> avoid it by using the "proper" version of raise_softirq_irqoff() in
>> __napi_schedule(). Could anybody try if I'm not wrong?
>>
>>  net/core/dev.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 28b0b9e..7fc4009 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n)
>>  
>>  	local_irq_save(flags);
>>  	list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list);
>> -	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
>> +	raise_softirq_irqoff(NET_RX_SOFTIRQ);
>>  	local_irq_restore(flags);
>>  }
>>  EXPORT_SYMBOL(__napi_schedule);

I have tested your patch and I can confirm that it fixes the messages.
I have not noticed any ill effects.

Thanks,
Tilman

-- 
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-15 11:40             ` Jarek Poplawski
@ 2009-10-15 17:53               ` Jarek Poplawski
  2009-10-21 18:46                 ` Tilman Schmidt
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-10-15 17:53 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx,
	linux-wireless, linux-ppp, netdev, paulus

Jarek Poplawski wrote, On 10/15/2009 01:40 PM:

> On 12-10-2009 13:25, Tilman Schmidt wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On Mon, 12 Oct 2009 03:32:46 -0700 (PDT), David Miller wrote:
>>> The PPP receive paths in ppp_generic.c do a local_bh_disable()/
>>> local_bh_enable() around packet receiving (via ppp_recv_lock()/
>>> ppp_recv_unlock() in ppp_do_recv).
>>>
>>> So at least that part is perfectly fine.
>>>
>>> ppp_input(), as called from ppp_sync_process(), also disables BH's
>>> around ppp_do_recv() calls (via read_lock_bh()/read_unlock_bh()).
>>>
>>> So that's fine too.
>>>
>>> Do you have a bug report or are you just scanning around looking
>>> for trouble? :-)
>> I have encountered the message in the subject during a test of
>> the Gigaset CAPI driver, and would like to determine whether
>> it's a bug in the driver, a bug somewhere else, or no bug at
>> all. The test scenario was PPP over ISDN with pppd+capiplugin.
>> In an alternative scenario, also PPP over ISDN but with
>> smpppd+capidrv, the message did not occur.
>>
>> Johannes' answer pointed me to the netif_rx() function.
>> The Gigaset driver itself doesn't call that function at all.
>> In the scenario where I saw the message, it was the SYNC_PPP
>> line discipline that did. But from your explanation I gather
>> that the cause cannot lie there.
>>
>> So now I'm looking for other possible causes of that message.


BTW, it seems calling napi_schedule() from process context should
trigger such a warning too.

Jarek P.

> 
> Anyway, I agree with Michael Buesch there is no reason to waste time
> for tracking all netif_rx vs netif_rx_ni uses, and it seems we could
> avoid it by using the "proper" version of raise_softirq_irqoff() in
> __napi_schedule(). Could anybody try if I'm not wrong?
> 
> Thanks,
> Jarek P.
> ---
> 
>  net/core/dev.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 28b0b9e..7fc4009 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n)
>  
>  	local_irq_save(flags);
>  	list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list);
> -	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
> +	raise_softirq_irqoff(NET_RX_SOFTIRQ);
>  	local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL(__napi_schedule);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-12 11:25           ` Tilman Schmidt
@ 2009-10-15 11:40             ` Jarek Poplawski
  2009-10-15 17:53               ` Jarek Poplawski
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-10-15 11:40 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx,
	linux-wireless, linux-ppp, netdev, paulus

On 12-10-2009 13:25, Tilman Schmidt wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Mon, 12 Oct 2009 03:32:46 -0700 (PDT), David Miller wrote:
>> The PPP receive paths in ppp_generic.c do a local_bh_disable()/
>> local_bh_enable() around packet receiving (via ppp_recv_lock()/
>> ppp_recv_unlock() in ppp_do_recv).
>>
>> So at least that part is perfectly fine.
>>
>> ppp_input(), as called from ppp_sync_process(), also disables BH's
>> around ppp_do_recv() calls (via read_lock_bh()/read_unlock_bh()).
>>
>> So that's fine too.
>>
>> Do you have a bug report or are you just scanning around looking
>> for trouble? :-)
> 
> I have encountered the message in the subject during a test of
> the Gigaset CAPI driver, and would like to determine whether
> it's a bug in the driver, a bug somewhere else, or no bug at
> all. The test scenario was PPP over ISDN with pppd+capiplugin.
> In an alternative scenario, also PPP over ISDN but with
> smpppd+capidrv, the message did not occur.
> 
> Johannes' answer pointed me to the netif_rx() function.
> The Gigaset driver itself doesn't call that function at all.
> In the scenario where I saw the message, it was the SYNC_PPP
> line discipline that did. But from your explanation I gather
> that the cause cannot lie there.
> 
> So now I'm looking for other possible causes of that message.

Anyway, I agree with Michael Buesch there is no reason to waste time
for tracking all netif_rx vs netif_rx_ni uses, and it seems we could
avoid it by using the "proper" version of raise_softirq_irqoff() in
__napi_schedule(). Could anybody try if I'm not wrong?

Thanks,
Jarek P.
---

 net/core/dev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 28b0b9e..7fc4009 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n)
 
 	local_irq_save(flags);
 	list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list);
-	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+	raise_softirq_irqoff(NET_RX_SOFTIRQ);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(__napi_schedule);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-12 10:32         ` David Miller
@ 2009-10-12 11:25           ` Tilman Schmidt
  2009-10-15 11:40             ` Jarek Poplawski
  0 siblings, 1 reply; 46+ messages in thread
From: Tilman Schmidt @ 2009-10-12 11:25 UTC (permalink / raw)
  To: David Miller
  Cc: tilman, johannes, hidave.darkstar, linux-kernel, tglx,
	linux-wireless, linux-ppp, netdev, paulus

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 12 Oct 2009 03:32:46 -0700 (PDT), David Miller wrote:
> The PPP receive paths in ppp_generic.c do a local_bh_disable()/
> local_bh_enable() around packet receiving (via ppp_recv_lock()/
> ppp_recv_unlock() in ppp_do_recv).
> 
> So at least that part is perfectly fine.
> 
> ppp_input(), as called from ppp_sync_process(), also disables BH's
> around ppp_do_recv() calls (via read_lock_bh()/read_unlock_bh()).
> 
> So that's fine too.
> 
> Do you have a bug report or are you just scanning around looking
> for trouble? :-)

I have encountered the message in the subject during a test of
the Gigaset CAPI driver, and would like to determine whether
it's a bug in the driver, a bug somewhere else, or no bug at
all. The test scenario was PPP over ISDN with pppd+capiplugin.
In an alternative scenario, also PPP over ISDN but with
smpppd+capidrv, the message did not occur.

Johannes' answer pointed me to the netif_rx() function.
The Gigaset driver itself doesn't call that function at all.
In the scenario where I saw the message, it was the SYNC_PPP
line discipline that did. But from your explanation I gather
that the cause cannot lie there.

So now I'm looking for other possible causes of that message.

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFK0xITQ3+did9BuFsRAmXGAKCIiqJffUnuKw9rPjxHwbj9AnXOigCdGgS9
MpxRNGs0A4GTydYJaylbjyo=
=LFxi
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-12  8:28       ` Tilman Schmidt
@ 2009-10-12 10:32         ` David Miller
  2009-10-12 11:25           ` Tilman Schmidt
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2009-10-12 10:32 UTC (permalink / raw)
  To: tilman
  Cc: johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless,
	linux-ppp, netdev, paulus

From: Tilman Schmidt <tilman@imap.cc>
Date: Mon, 12 Oct 2009 10:28:56 +0200

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> [CCing PPP people]
> 
> Am 11.10.2009 13:40 schrieb Johannes Berg:
>> On Sun, 2009-10-11 at 13:18 +0200, Tilman Schmidt wrote:
>> 
>>> Can you explain a bit more what that message is about?
>>> I am encountering it in a completely different context
>>> (PPP over ISDN) [...]
>> 
>> Basically, calling netif_rx() with softirqs enabled.
> 
> AFAICS that would have to be the netif_rx() call in
> ppp_receive_nonmp_frame() [drivers/net/ppp_generic.c#L1791],
> called (via others) from the tasklet work function
> ppp_sync_process() [drivers/net/ppp_synctty.c#L545].
> Should that be changed to the
> "if (in_interrupt()) netif_rx(skb) else netif_rx_ni(skb)"
> stanza from the linux.kernel.wireless.general thread then?

The PPP receive paths in ppp_generic.c do a local_bh_disable()/
local_bh_enable() around packet receiving (via ppp_recv_lock()/
ppp_recv_unlock() in ppp_do_recv).

So at least that part is perfectly fine.

ppp_input(), as called from ppp_sync_process(), also disables BH's
around ppp_do_recv() calls (via read_lock_bh()/read_unlock_bh()).

So that's fine too.

Do you have a bug report or are you just scanning around looking
for trouble? :-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-11 11:40     ` Johannes Berg
@ 2009-10-12  8:28       ` Tilman Schmidt
  2009-10-12 10:32         ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Tilman Schmidt @ 2009-10-12  8:28 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller,
	linux-ppp, netdev, Paul Mackerras

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[CCing PPP people]

Am 11.10.2009 13:40 schrieb Johannes Berg:
> On Sun, 2009-10-11 at 13:18 +0200, Tilman Schmidt wrote:
> 
>> Can you explain a bit more what that message is about?
>> I am encountering it in a completely different context
>> (PPP over ISDN) [...]
> 
> Basically, calling netif_rx() with softirqs enabled.

AFAICS that would have to be the netif_rx() call in
ppp_receive_nonmp_frame() [drivers/net/ppp_generic.c#L1791],
called (via others) from the tasklet work function
ppp_sync_process() [drivers/net/ppp_synctty.c#L545].
Should that be changed to the
"if (in_interrupt()) netif_rx(skb) else netif_rx_ni(skb)"
stanza from the linux.kernel.wireless.general thread then?

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFK0ujIQ3+did9BuFsRAtGBAJ9rj2pyQZ75ZKTipLhpICqA3ZvTdQCdHQs/
RmdeOT3TuPZykXJxcHLJCzU=
=87DI
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-11  9:52 Dave Young
  2009-10-11 10:08 ` Johannes Berg
@ 2009-10-11 16:39 ` Joe Korty
  1 sibling, 0 replies; 46+ messages in thread
From: Joe Korty @ 2009-10-11 16:39 UTC (permalink / raw)
  To: Dave Young; +Cc: linux-kernel, tglx

On Sun, Oct 11, 2009 at 05:52:18PM +0800, Dave Young wrote:
> With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR
> messages just while using firefox:
> 
> [  130.527399] NOHZ: local_softirq_pending 08
> [  130.559807] NOHZ: local_softirq_pending 08
> ...
> 
> Any idea? or known issue?

We have a single customer which also has this issue.
We've never been able to replicate it ourselves, but
for that one customer, changing the two places in the
networking code that use:

	__raise_softirq_irqoff
to
	raise_softirq_irqoff

seems (so far) to be working.

Regards,
Joe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-11 11:18   ` Tilman Schmidt
@ 2009-10-11 11:40     ` Johannes Berg
  2009-10-12  8:28       ` Tilman Schmidt
  0 siblings, 1 reply; 46+ messages in thread
From: Johannes Berg @ 2009-10-11 11:40 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 468 bytes --]

On Sun, 2009-10-11 at 13:18 +0200, Tilman Schmidt wrote:

> Can you explain a bit more what that message is about?
> I am encountering it in a completely different context
> (PPP over ISDN) and I would like to know where to start
> looking for the cause and developing a fix. The thread
> on linux.kernel.wireless.general only seems to address
> the specific situation in the wireless stack.

Basically, calling netif_rx() with softirqs enabled.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-11 10:08 ` Johannes Berg
  2009-10-11 10:17   ` Michael Buesch
  2009-10-11 10:55   ` Dave Young
@ 2009-10-11 11:18   ` Tilman Schmidt
  2009-10-11 11:40     ` Johannes Berg
  2 siblings, 1 reply; 46+ messages in thread
From: Tilman Schmidt @ 2009-10-11 11:18 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller

On Sun, 11 Oct 2009 12:08:55 +0200, Johannes Berg wrote:
> On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote:
> 
>> With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR
>> messages just while using firefox:
>>
>> [  130.527399] NOHZ: local_softirq_pending 08
> 
>> Any idea? or known issue?
> 
> Are you using b43 (or wl12x1)? If so, it's a known issue, but the driver
> was recently left without an active maintainer in a brouhaha about a bug
> fix.
> 
> Cf. http://thread.gmane.org/gmane.linux.kernel.wireless.general/39440

Can you explain a bit more what that message is about?
I am encountering it in a completely different context
(PPP over ISDN) and I would like to know where to start
looking for the cause and developing a fix. The thread
on linux.kernel.wireless.general only seems to address
the specific situation in the wireless stack.

Thanks,
Tilman


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-11 10:08 ` Johannes Berg
  2009-10-11 10:17   ` Michael Buesch
@ 2009-10-11 10:55   ` Dave Young
  2009-10-11 11:18   ` Tilman Schmidt
  2 siblings, 0 replies; 46+ messages in thread
From: Dave Young @ 2009-10-11 10:55 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-kernel, tglx, linux-wireless, David S. Miller

On Sun, Oct 11, 2009 at 6:08 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote:
>
>> With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR
>> messages just while using firefox:
>>
>> [  130.527399] NOHZ: local_softirq_pending 08
>
>> Any idea? or known issue?
>
> Are you using b43 (or wl12x1)? If so, it's a known issue, but the driver
> was recently left without an active maintainer in a brouhaha about a bug
> fix.

Yes, I'm using b43. I will test the patch you posted in another thread.

>
> Cf. http://thread.gmane.org/gmane.linux.kernel.wireless.general/39440
>
> Absent proof that mac80211 is safe to run with BHs enabled, the correct
> solution is disabling tasklets around the RX function, unlike all the
> proposed patches. However, Michael thinks it's such a bad solution that
> he has refused to implement it. So far, nobody has bothered to fix the
> drivers.
>
> FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael
> thinks in the stack.
>
> johannes
>
>



-- 
Regards
dave

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-11 10:17   ` Michael Buesch
  2009-10-11 10:37     ` Johannes Berg
@ 2009-10-11 10:38     ` David Miller
  1 sibling, 0 replies; 46+ messages in thread
From: David Miller @ 2009-10-11 10:38 UTC (permalink / raw)
  To: mb; +Cc: johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless

From: Michael Buesch <mb@bu3sch.de>
Date: Sun, 11 Oct 2009 12:17:30 +0200

> On Sunday 11 October 2009 12:08:55 Johannes Berg wrote:
>> On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote:
>> 
>> FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael
>> thinks in the stack.
> 
> If mac80211 requires BHs disabled, it should do this.

That's overhead, and %99 of drivers do not require it, and therefore
for %99 of drivers it's unnecessary overhead.

In general we avoid doing things like that.  Instead, we put the
cost only where it's actually needed.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-11 10:17   ` Michael Buesch
@ 2009-10-11 10:37     ` Johannes Berg
  2009-10-11 10:38     ` David Miller
  1 sibling, 0 replies; 46+ messages in thread
From: Johannes Berg @ 2009-10-11 10:37 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

On Sun, 2009-10-11 at 12:17 +0200, Michael Buesch wrote:

> Ehm, no. That's not exactly true.
> We call the non-_irqsafe functions, which by definition are designed to
> run in non-irq (soft or hard) context. At least that's how I understand the
> documentation, last time I read it.

So maybe the documentation is not entirely accurate. Such happens. From
this and previous threads tt's pretty obvious that these functions
cannot be called with softirqs enabled. And I've also stated before that
I do not believe that we should call them with softirqs enabled without
auditing the code for locking, which historically has been a weak point
of mac80211.

> Why don't you simply do local_bh_disable() in those functions, if they
> require bh disabled, instead of depending on the driver doing it?
> 
> > FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael
> > thinks in the stack.
> 
> If mac80211 requires BHs disabled, it should do this.

I don't believe adding that into mac80211, even though it nests, is a
good idea for the case of many drivers where mac80211 and/or the driver
knows. It's pretty damn trivial to add two lines of code to the driver,
instead of penalising every other driver. The typical kernel style is
making things provide the required context, not a function take any
possible context.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-11 10:08 ` Johannes Berg
@ 2009-10-11 10:17   ` Michael Buesch
  2009-10-11 10:37     ` Johannes Berg
  2009-10-11 10:38     ` David Miller
  2009-10-11 10:55   ` Dave Young
  2009-10-11 11:18   ` Tilman Schmidt
  2 siblings, 2 replies; 46+ messages in thread
From: Michael Buesch @ 2009-10-11 10:17 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller

On Sunday 11 October 2009 12:08:55 Johannes Berg wrote:
> On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote:
> 
> > With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR
> > messages just while using firefox:
> > 
> > [  130.527399] NOHZ: local_softirq_pending 08
> 
> > Any idea? or known issue?
> 
> Are you using b43 (or wl12x1)? If so, it's a known issue, but the driver
> was recently left without an active maintainer in a brouhaha about a bug
> fix.
> 
> Cf. http://thread.gmane.org/gmane.linux.kernel.wireless.general/39440
> 
> Absent proof that mac80211 is safe to run with BHs enabled, the correct
> solution is disabling tasklets around the RX function, unlike all the
> proposed patches. However, Michael thinks it's such a bad solution that
> he has refused to implement it.

Ehm, no. That's not exactly true.
We call the non-_irqsafe functions, which by definition are designed to
run in non-irq (soft or hard) context. At least that's how I understand the
documentation, last time I read it.
Why don't you simply do local_bh_disable() in those functions, if they
require bh disabled, instead of depending on the driver doing it?

> FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael
> thinks in the stack.

If mac80211 requires BHs disabled, it should do this.

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2009-10-11  9:52 Dave Young
@ 2009-10-11 10:08 ` Johannes Berg
  2009-10-11 10:17   ` Michael Buesch
                     ` (2 more replies)
  2009-10-11 16:39 ` Joe Korty
  1 sibling, 3 replies; 46+ messages in thread
From: Johannes Berg @ 2009-10-11 10:08 UTC (permalink / raw)
  To: Dave Young; +Cc: linux-kernel, tglx, linux-wireless, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 881 bytes --]

On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote:

> With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR
> messages just while using firefox:
> 
> [  130.527399] NOHZ: local_softirq_pending 08

> Any idea? or known issue?

Are you using b43 (or wl12x1)? If so, it's a known issue, but the driver
was recently left without an active maintainer in a brouhaha about a bug
fix.

Cf. http://thread.gmane.org/gmane.linux.kernel.wireless.general/39440

Absent proof that mac80211 is safe to run with BHs enabled, the correct
solution is disabling tasklets around the RX function, unlike all the
proposed patches. However, Michael thinks it's such a bad solution that
he has refused to implement it. So far, nobody has bothered to fix the
drivers.

FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael
thinks in the stack.

johannes


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NOHZ: local_softirq_pending 08
@ 2009-10-11  9:52 Dave Young
  2009-10-11 10:08 ` Johannes Berg
  2009-10-11 16:39 ` Joe Korty
  0 siblings, 2 replies; 46+ messages in thread
From: Dave Young @ 2009-10-11  9:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx

Hi

With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR
messages just while using firefox:

[  130.527399] NOHZ: local_softirq_pending 08
[  130.559807] NOHZ: local_softirq_pending 08
[  130.835292] NOHZ: local_softirq_pending 08
[  130.854630] NOHZ: local_softirq_pending 08
[  130.892045] NOHZ: local_softirq_pending 08
[  130.893276] NOHZ: local_softirq_pending 08
[  130.912109] NOHZ: local_softirq_pending 08
[  130.928032] NOHZ: local_softirq_pending 08
[  130.938503] NOHZ: local_softirq_pending 08


Any idea? or known issue?

--
Regards
dave

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
@ 2007-05-29  6:01 Rafał Bilski
  0 siblings, 0 replies; 46+ messages in thread
From: Rafał Bilski @ 2007-05-29  6:01 UTC (permalink / raw)
  To: Linux Kernel Mailing List

>>  Hi!
>>
>> A lot of "NOHZ: local_softirq_pending 08" messages (about 100) and
>> then suddenly stoped to show. I have 2.6.21.1. I checked .2 and .3
>> changelogs but I don't see anything about this message.
>> What does it mean?
>>
>> Is "08" IRQ number?
>>   8:          2    XT-PIC-XT        rtc
>>
>> Please CC me.
> Does this patch [ http://lkml.org/lkml/2007/5/22/35 ] helps you fix it ?
> 
> Regards 
> Ananitya

Thanks. Looks like it will.

Regards
Rafał



----------------------------------------------------------------------
Ile masz w domu niepotrzebnych rzeczy?
Wymien sie z sasiadami >> http://link.interia.pl/f1a93



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: NOHZ: local_softirq_pending 08
  2007-05-28 21:04 Rafał Bilski
@ 2007-05-28 22:12 ` Anant Nitya
  0 siblings, 0 replies; 46+ messages in thread
From: Anant Nitya @ 2007-05-28 22:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Rafał Bilski

On Tuesday 29 May 2007 02:34:22 Rafał Bilski wrote:
> Hi!
>
> A lot of "NOHZ: local_softirq_pending 08" messages (about 100) and
> then suddenly stoped to show. I have 2.6.21.1. I checked .2 and .3
> changelogs but I don't see anything about this message.
> What does it mean?
>
> Is "08" IRQ number?
>   8:          2    XT-PIC-XT        rtc
>
> Please CC me.
Does this patch [ http://lkml.org/lkml/2007/5/22/35 ] helps you fix it ?

Regards 
Ananitya


-- 
Out of many thousands, one may endeavor for perfection, and of
those who have achieved perfection, hardly one knows Me in truth.
				-- Gita Sutra Of Mysticism

^ permalink raw reply	[flat|nested] 46+ messages in thread

* NOHZ: local_softirq_pending 08
@ 2007-05-28 21:04 Rafał Bilski
  2007-05-28 22:12 ` Anant Nitya
  0 siblings, 1 reply; 46+ messages in thread
From: Rafał Bilski @ 2007-05-28 21:04 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hi!

A lot of "NOHZ: local_softirq_pending 08" messages (about 100) and 
then suddenly stoped to show. I have 2.6.21.1. I checked .2 and .3 
changelogs but I don't see anything about this message.
What does it mean?

Is "08" IRQ number?
  8:          2    XT-PIC-XT        rtc

Please CC me.

Thank You
Rafał



----------------------------------------------------------------------
Ile masz w domu niepotrzebnych rzeczy?
Wymien sie z sasiadami >> http://link.interia.pl/f1a93



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2012-06-25 19:53 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-30 21:48 NOHZ: local_softirq_pending 08 Dave Jones
2012-05-31  7:52 ` Thomas Gleixner
2012-05-31  7:56   ` Eric Dumazet
2012-05-31 14:04     ` Dave Jones
2012-06-08 16:46       ` Nikos Chantziaras
2012-06-01 22:57   ` Francois Romieu
2012-06-02 20:58     ` Marc Dionne
2012-06-05 23:15       ` Francois Romieu
2012-06-06  1:46         ` Dave Jones
2012-06-06  5:42           ` Francois Romieu
2012-06-08  2:34             ` Dave Jones
2012-06-08  7:57               ` Thomas Gleixner
2012-06-08 14:23                 ` Dave Jones
2012-06-08 14:51                   ` Thomas Gleixner
2012-06-08 20:27                     ` Francois Romieu
2012-06-08 20:41                       ` Thomas Gleixner
2012-06-08 21:53                         ` Francois Romieu
2012-06-08 22:33                           ` Marc Dionne
2012-06-10 20:40                             ` Dave Jones
2012-06-25 12:38                           ` Johannes Stezenbach
2012-06-25 19:28                             ` Francois Romieu
2012-06-25 19:52                               ` Dave Jones
2012-06-08  8:32               ` Johannes Stezenbach
  -- strict thread matches above, loose matches on Subject: below --
2009-10-11  9:52 Dave Young
2009-10-11 10:08 ` Johannes Berg
2009-10-11 10:17   ` Michael Buesch
2009-10-11 10:37     ` Johannes Berg
2009-10-11 10:38     ` David Miller
2009-10-11 10:55   ` Dave Young
2009-10-11 11:18   ` Tilman Schmidt
2009-10-11 11:40     ` Johannes Berg
2009-10-12  8:28       ` Tilman Schmidt
2009-10-12 10:32         ` David Miller
2009-10-12 11:25           ` Tilman Schmidt
2009-10-15 11:40             ` Jarek Poplawski
2009-10-15 17:53               ` Jarek Poplawski
2009-10-21 18:46                 ` Tilman Schmidt
2009-10-22 23:37                   ` Tilman Schmidt
2009-10-23 13:34                     ` Johannes Berg
2009-10-23 14:27                       ` Tilman Schmidt
2009-10-23 14:31                         ` Johannes Berg
2009-10-23 16:33                           ` Tilman Schmidt
2009-10-11 16:39 ` Joe Korty
2007-05-29  6:01 Rafał Bilski
2007-05-28 21:04 Rafał Bilski
2007-05-28 22:12 ` Anant Nitya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).