All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Need your help on rcu_bh_state stall on CPU 0 (t=0 jiffies)
       [not found] <CAD6qDH7j1Fr6jFsuaeJm+xGX8Q41m7nDJb7C329jF65yrO5a7g@mail.gmail.com>
@ 2017-02-20 20:58 ` Paul E. McKenney
       [not found]   ` <CAD6qDH7GY6xYDtZAF3LWFFYZZuusj7UuQg43WhEc44jXBfeNpA@mail.gmail.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2017-02-20 20:58 UTC (permalink / raw)
  To: Sumit Gemini; +Cc: ralf.hildebrandt, linux-kernel

On Mon, Feb 20, 2017 at 03:26:32PM +0530, Sumit Gemini wrote:
> Hi Paul,
> 
> 
> I got you while reading 'http://www.serverphorums.com/read.php?12,468996". I'm
> facing same problem but not able to detect what exectly the cause of
> it.
> 
> 
> I'm observed weird problem, I got rcu_bh_state on cpu0 but my system
> was in ideal state when i got rcu_bh_state (Cpu stall) problem. After
> 7 second my system recovered itself from stall to active. But i did
> not get what happen at that time.

One thing that you might look into below...

> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.520060] INFO: rcu_bh_state detected stall on CPU 0 (t=0 jiffies)
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] sending NMI to all CPUs:
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] NMI backtrace for cpu 0
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] CPU 0
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Modules linked in: nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit xt_hashcounter ip_set_hash_ipip xt_set ip_set_hash_ip deflate zlib_deflate ctr twofish_x86_64 twofish_common camellia serpent blowfish cast5 des_generic cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5 crypto_null af_key iptable_mangle ip_set nfnetlink arptable_filter arp_tables iptable_raw iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding binfmt_misc iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables mperf edd ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_proto_sctp nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma ixgbe uas sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support button container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore processor thermal_sys [last unloaded: ipt_PORTMAP]
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042]
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Pid: 0, comm: swapper Tainted: P 3.1.10-gb17-default #1 Intel Corporation S2600CO/S2600CO
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RIP: 0010:[<ffffffff81007f51>] [<ffffffff81007f51>] native_read_tsc+0x2/0xf
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RSP: 0018:ffff88043ee03db0 EFLAGS: 00000803
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RAX: 0000000037185395 RBX: 00000000000003e9 RCX: 0000000000000001
> Jan 6 16:45:07 ffm-sbc-2a osafimmd[5035]: WA IMMND DOWN on active controller f2 detected at standby immd!! f1. Possible failover
> Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Done Locking applications node id:2020f ret val:0
> Jan 6 16:45:07 ffm-sbc-2a opensaf_recovery: Control interface status:0 Role:STANDBY
> Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Current role: STANDBY
> Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Received Node Down for peer controller, > OwnNodeId = 131343, SupervisionTime = 60

Is this "WA IMMND DOWN" a normal occurrence?  Perhaps it is pointing
at a root cause.

> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RDX: 0000000000bf0977 RSI: 0000000000000002 RDI: 0000000000032bd4
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RBP: 0000000000032bd4 R08: 0000000000000000 R09: ffffffff819232b0
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R10: 7fffffffffffffff R11: 7fffffffffffffff R12: 0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R13: ffffffff819232b0 R14: 0000000000000001 R15: ffffffff81065c28
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] FS: 0000000000000000(0000) GS:ffff88043ee00000(0000) knlGS:0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CR2: 000000000069e034 CR3: 0000000001805000 CR4: 00000000000406f0
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Process swapper (pid: 0, threadinfo ffffffff81800000, task ffffffff8180d020)
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Stack:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] ffffffff81200eb5 ffffffff81200f44 00000000000003e9 0000000000001000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] 0000000000000002 ffffffff819232b0 ffffffff81017698 7fffffffffffffff
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] 0000000000000002 0000000000000002 ffffffff81017fdf 0000000000000001
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Call Trace:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81200eb5>] paravirt_read_tsc+0x5/0x8
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81200f44>] delay_tsc+0x1d/0x68
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81017698>] native_safe_apic_wait_icr_idle+0x27/0x32
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81017fdf>] default_send_IPI_dest_field.constprop.0+0x19/0x4d
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8101804b>] default_send_IPI_mask_sequence_phys+0x38/0x6a
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8101815e>] arch_trigger_all_cpu_backtrace+0x4d/0x7b
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8109801b>] check_cpu_stall+0x66/0xdb
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff810980aa>] rcu_pending+0x1a/0x10a
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8109852c>] rcu_check_callbacks+0x9d/0xae
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8104c56c>] update_process_times+0x31/0x63
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81065c92>] tick_sched_timer+0x6a/0x90
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8105b872>] __run_hrtimer+0xa4/0x148
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8105c08e>] hrtimer_interrupt+0xdb/0x19a
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81017768>] smp_apic_timer_interrupt+0x6e/0x80
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff813efcde>] apic_timer_interrupt+0x6e/0x80
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81239655>] intel_idle+0xdd/0x104
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81304773>] cpuidle_idle_call+0xda/0x169
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81001200>] cpu_idle+0x51/0x95
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8193db0f>] start_kernel+0x388/0x393
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8193d3af>] x86_64_start_kernel+0xcf/0xdc
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Code: 74 03 e6 80 c3 e6 ed c3 bf 8e 21 00 00 e9 ba 8f 1f 00 c3 90 90 90 40 88 f8 e6 70 e4 71 c3 40 88 f0 e6 70 40 88 f8 e6 71 c3 0f 31
> Jan 6 16:45:07 ffm-sbc-2a kernel[3167216.524042]: c1 48 89 d0 48 c1 e0 20 48 09 c8 c3 41 56 41 55 41 54 55 53
> 
> .
> .
> .
> .
> .
> .
> 
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] NMI backtrace for cpu 31
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CPU 31
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Modules linked in: nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit xt_hashcounter ip_set_hash_ipip xt_set ip_set_hash_ip deflate ctr twofish_x86_64 twofish_common camellia serpent > blowfish cast5 des_generic cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5 crypto_null af_key iptable_mangle ip_set nfnetlink arptable_filter arp_tables iptable_raw iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding binfmt_misc iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables mperf edd ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_proto_sctp nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma ixgbe uas sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support button container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore processor thermal_sys [last unloaded: ipt_PORTMAP]
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734]
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Pid: 0, comm: kworker/0:1 Tainted: P 3.1.10-gb17-default #1 Intel Corporation S2600CO/S2600CO
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RIP: 0010:[<ffffffff81239624>] [<ffffffff81239624>] intel_idle+0xac/0x104
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RSP: 0018:ffff880425e33eb8 EFLAGS: 00000046
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RAX: 0000000000000030 RBX: 0000000000000010 RCX: 0000000000000001
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RDX: 0000000000000000 RSI: ffff880425e33fd8 RDI: ffffffff81810500
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RBP: 0000000000000030 R08: 000000000000006d R09: 0000000000034b10
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R10: ffff88083eded830 R11: ffff88083eded830 R12: 149739342cb2ca49
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R13: 0000000000000004 R14: 000000000000001f R15: 0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] FS: 0000000000000000(0000) GS:ffff88083ede0000(0000) knlGS:0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CR2: 00007fe89df67120 CR3: 0000000001805000 CR4: 00000000000406e0
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Process kworker/0:1 (pid: 0, threadinfo ffff880425e32000, task ffff880425e30580)
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Stack:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685484] TIPC: Resetting link <1.1.1:bond0-1.1.2:bond0>, requested by peer
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685487] TIPC: Lost link <1.1.1:bond0-1.1.2:bond0> on network plane A
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685491] TIPC: Lost contact with <1.1.2>
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] 0000000000000000 000000000cdd3a47 0000000000000000 000000000cdd3a47
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] ffff880425e33fd8 0000001f3edf8970 ffff88083edf8970 ffff88083edf8b00
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] 0000000000000000 ffffffff81304773 ffffffff819232b0 ffff880425e33fd8
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call Trace:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81304773>] cpuidle_idle_call+0xda/0x169
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81001200>] cpu_idle+0x51/0x95
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Code: 28 e0 ff ff 80 e2 08 75 22 31 d2 48 83 c0 10 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 48 89 e8 0f 01 c9 <e8> 3f 6e e2 ff 4c 29 e0 48 89 c7 e8 10 ae e0 ff 48 69 e8 40 42
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call Trace:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81304773>] cpuidle_idle_call+0xda/0x169
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81001200>] cpu_idle+0x51/0x95

Since NMIs were sent to all CPUs, it is not surprising that one was found
to be idle.  Or is there something that I am missing in CPU 31's
backtrace?

> I'm using suse kernel 3.1.10

That is going back a ways!

There are quite a few patches to RCU CPU stall warnings that could
be applied.  It might be easier to move to a newer kernel.

Alternatively, if you are convinced that these stall warnings are
false positives, you could boot with the rcu_cpu_stall_suppress
kernel boot parameter set, or rebuild the kernel with a much larger
value of the RCU_CPU_STALL_TIMEOUT Kconfig option.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Need your help on rcu_bh_state stall on CPU 0 (t=0 jiffies)
       [not found]   ` <CAD6qDH7GY6xYDtZAF3LWFFYZZuusj7UuQg43WhEc44jXBfeNpA@mail.gmail.com>
@ 2017-02-21  3:01     ` Paul E. McKenney
       [not found]       ` <CAD6qDH6iQzf29ZzgFHxUCwRtbdvRCW2C5a4nvuYzhhwHgnOnPQ@mail.gmail.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2017-02-21  3:01 UTC (permalink / raw)
  To: Sumit Gemini; +Cc: ralf.hildebrandt, linux-kernel

On Tue, Feb 21, 2017 at 06:36:34AM +0530, Sumit Gemini wrote:
> Hi Pual,
> 
> Thanks for your time.
> 
> we are using HA pair with TIPC and opensf functionality. Now problem is on
> active machine TIPC link lost message was observed on 06:45:00 but standby
> machine detected this link lost at 06:45:06 because  at 06:45:00 standby
> machine having rcu_bh_state (CPU stall). Once standby machine detected link
> lost, it fired reboot command for active machine with the help of opensf at
> 06:45:07 to make itself active and standby to previously active machine.
> 
> I raised this matter before TIPC open source community, and they told
> rcu_bh_state was the reason for delaying to get link down event at standby
> end.
> 
> In normal case TIPC link broken detects in < 2 second. But in our case took
> 7 second because of that stall. It is first time, we observed such
> situation.
> 
> Now if we suppres this warning, will we observe this delay again?

That depends.  If the delay was due to the NMIs and stack-trace printing,
suppressing the warning will make the delay go away.  On the other hand,
if there really is a stall independent of the warning, then the delay
will still be there.  Why not try it and find out?

> Can you enlighten on this stall. Why does this stall happen at ideal
> machine? And if possible to debug it, what we need to do?
> 
> Any pointers will be highly appreciated.

I suggest reading Documentation/stallwarn.txt, especially the list
of possible causes near the end:

	https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt

If the delay persist when you suppress the warnings, then I suggest that
you use something like ftrace to gather information on the delay.

							Thanx, Paul

> Thanks & Regards
> ~Sumit Gemini
> 
> 
> On Feb 21, 2017 2:28 AM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> wrote:
> 
> > On Mon, Feb 20, 2017 at 03:26:32PM +0530, Sumit Gemini wrote:
> > > Hi Paul,
> > >
> > >
> > > I got you while reading 'http://www.serverphorums.com/read.php?12,468996".
> > I'm
> > > facing same problem but not able to detect what exectly the cause of
> > > it.
> > >
> > >
> > > I'm observed weird problem, I got rcu_bh_state on cpu0 but my system
> > > was in ideal state when i got rcu_bh_state (Cpu stall) problem. After
> > > 7 second my system recovered itself from stall to active. But i did
> > > not get what happen at that time.
> >
> > One thing that you might look into below...
> >
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.520060] INFO: rcu_bh_state
> > detected stall on CPU 0 (t=0 jiffies)
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] sending NMI to all
> > CPUs:
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] NMI backtrace for cpu
> > 0
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] CPU 0
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Modules linked in:
> > nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit
> > ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port
> > xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle
> > ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit xt_hashcounter
> > ip_set_hash_ipip xt_set ip_set_hash_ip deflate zlib_deflate ctr
> > twofish_x86_64 twofish_common camellia serpent blowfish cast5 des_generic
> > cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5 crypto_null
> > af_key iptable_mangle ip_set nfnetlink arptable_filter arp_tables
> > iptable_raw iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding
> > binfmt_misc iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat
> > nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables mperf
> > edd ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_proto_sctp
> > nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma ixgbe uas
> > sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support button
> > container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore processor
> > thermal_sys [last unloaded: ipt_PORTMAP]
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042]
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Pid: 0, comm: swapper
> > Tainted: P 3.1.10-gb17-default #1 Intel Corporation S2600CO/S2600CO
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RIP:
> > 0010:[<ffffffff81007f51>] [<ffffffff81007f51>] native_read_tsc+0x2/0xf
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RSP:
> > 0018:ffff88043ee03db0 EFLAGS: 00000803
> > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RAX: 0000000037185395
> > RBX: 00000000000003e9 RCX: 0000000000000001
> > > Jan 6 16:45:07 ffm-sbc-2a osafimmd[5035]: WA IMMND DOWN on active
> > controller f2 detected at standby immd!! f1. Possible failover
> > > Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Done Locking applications
> > node id:2020f ret val:0
> > > Jan 6 16:45:07 ffm-sbc-2a opensaf_recovery: Control interface status:0
> > Role:STANDBY
> > > Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Current role: STANDBY
> > > Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: Rebooting OpenSAF NodeId =
> > 131599 EE Name = , Reason: Received Node Down for peer controller, >
> > OwnNodeId = 131343, SupervisionTime = 60
> >
> > Is this "WA IMMND DOWN" a normal occurrence?  Perhaps it is pointing
> > at a root cause.
> >
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RDX: 0000000000bf0977
> > RSI: 0000000000000002 RDI: 0000000000032bd4
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RBP: 0000000000032bd4
> > R08: 0000000000000000 R09: ffffffff819232b0
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R10: 7fffffffffffffff
> > R11: 7fffffffffffffff R12: 0000000000000000
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R13: ffffffff819232b0
> > R14: 0000000000000001 R15: ffffffff81065c28
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] FS:
> > 0000000000000000(0000) GS:ffff88043ee00000(0000) knlGS:0000000000000000
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CS: 0010 DS: 0000 ES:
> > 0000 CR0: 000000008005003b
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CR2: 000000000069e034
> > CR3: 0000000001805000 CR4: 00000000000406f0
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR0: 0000000000000000
> > DR1: 0000000000000000 DR2: 0000000000000000
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR3: 0000000000000000
> > DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Process swapper (pid:
> > 0, threadinfo ffffffff81800000, task ffffffff8180d020)
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Stack:
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] ffffffff81200eb5
> > ffffffff81200f44 00000000000003e9 0000000000001000
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] 0000000000000002
> > ffffffff819232b0 ffffffff81017698 7fffffffffffffff
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] 0000000000000002
> > 0000000000000002 ffffffff81017fdf 0000000000000001
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Call Trace:
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81200eb5>]
> > paravirt_read_tsc+0x5/0x8
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81200f44>]
> > delay_tsc+0x1d/0x68
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81017698>]
> > native_safe_apic_wait_icr_idle+0x27/0x32
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81017fdf>]
> > default_send_IPI_dest_field.constprop.0+0x19/0x4d
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8101804b>]
> > default_send_IPI_mask_sequence_phys+0x38/0x6a
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8101815e>]
> > arch_trigger_all_cpu_backtrace+0x4d/0x7b
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8109801b>]
> > check_cpu_stall+0x66/0xdb
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff810980aa>]
> > rcu_pending+0x1a/0x10a
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8109852c>]
> > rcu_check_callbacks+0x9d/0xae
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8104c56c>]
> > update_process_times+0x31/0x63
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81065c92>]
> > tick_sched_timer+0x6a/0x90
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8105b872>]
> > __run_hrtimer+0xa4/0x148
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8105c08e>]
> > hrtimer_interrupt+0xdb/0x19a
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81017768>]
> > smp_apic_timer_interrupt+0x6e/0x80
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff813efcde>]
> > apic_timer_interrupt+0x6e/0x80
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81239655>]
> > intel_idle+0xdd/0x104
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81304773>]
> > cpuidle_idle_call+0xda/0x169
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81001200>]
> > cpu_idle+0x51/0x95
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8193db0f>]
> > start_kernel+0x388/0x393
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8193d3af>]
> > x86_64_start_kernel+0xcf/0xdc
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Code: 74 03 e6 80 c3
> > e6 ed c3 bf 8e 21 00 00 e9 ba 8f 1f 00 c3 90 90 90 40 88 f8 e6 70 e4 71 c3
> > 40 88 f0 e6 70 40 88 f8 e6 71 c3 0f 31
> > > Jan 6 16:45:07 ffm-sbc-2a kernel[3167216.524042]: c1 48 89 d0 48 c1 e0
> > 20 48 09 c8 c3 41 56 41 55 41 54 55 53
> > >
> > > .
> > > .
> > > .
> > > .
> > > .
> > > .
> > >
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] NMI backtrace for cpu
> > 31
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CPU 31
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Modules linked in:
> > nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit
> > ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port
> > xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle
> > ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit xt_hashcounter
> > ip_set_hash_ipip xt_set ip_set_hash_ip deflate ctr twofish_x86_64
> > twofish_common camellia serpent > blowfish cast5 des_generic cbc xcbc
> > rmd160 sha512_generic sha256_generic sha1_generic md5 crypto_null af_key
> > iptable_mangle ip_set nfnetlink arptable_filter arp_tables iptable_raw
> > iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding binfmt_misc
> > iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat
> > nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables mperf
> > edd ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_proto_sctp
> > nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma ixgbe uas
> > sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support button
> > container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore processor
> > thermal_sys [last unloaded: ipt_PORTMAP]
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734]
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Pid: 0, comm:
> > kworker/0:1 Tainted: P 3.1.10-gb17-default #1 Intel Corporation
> > S2600CO/S2600CO
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RIP:
> > 0010:[<ffffffff81239624>] [<ffffffff81239624>] intel_idle+0xac/0x104
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RSP:
> > 0018:ffff880425e33eb8 EFLAGS: 00000046
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RAX: 0000000000000030
> > RBX: 0000000000000010 RCX: 0000000000000001
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RDX: 0000000000000000
> > RSI: ffff880425e33fd8 RDI: ffffffff81810500
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RBP: 0000000000000030
> > R08: 000000000000006d R09: 0000000000034b10
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R10: ffff88083eded830
> > R11: ffff88083eded830 R12: 149739342cb2ca49
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R13: 0000000000000004
> > R14: 000000000000001f R15: 0000000000000000
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] FS:
> > 0000000000000000(0000) GS:ffff88083ede0000(0000) knlGS:0000000000000000
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CS: 0010 DS: 0000 ES:
> > 0000 CR0: 000000008005003b
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CR2: 00007fe89df67120
> > CR3: 0000000001805000 CR4: 00000000000406e0
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR0: 0000000000000000
> > DR1: 0000000000000000 DR2: 0000000000000000
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR3: 0000000000000000
> > DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Process kworker/0:1
> > (pid: 0, threadinfo ffff880425e32000, task ffff880425e30580)
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Stack:
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685484] TIPC: Resetting link
> > <1.1.1:bond0-1.1.2:bond0>, requested by peer
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685487] TIPC: Lost link
> > <1.1.1:bond0-1.1.2:bond0> on network plane A
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685491] TIPC: Lost contact
> > with <1.1.2>
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] 0000000000000000
> > 000000000cdd3a47 0000000000000000 000000000cdd3a47
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] ffff880425e33fd8
> > 0000001f3edf8970 ffff88083edf8970 ffff88083edf8b00
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] 0000000000000000
> > ffffffff81304773 ffffffff819232b0 ffff880425e33fd8
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call Trace:
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81304773>]
> > cpuidle_idle_call+0xda/0x169
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81001200>]
> > cpu_idle+0x51/0x95
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Code: 28 e0 ff ff 80
> > e2 08 75 22 31 d2 48 83 c0 10 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff
> > ff a8 08 75 08 b1 01 48 89 e8 0f 01 c9 <e8> 3f 6e e2 ff 4c 29 e0 48 89 c7
> > e8 10 ae e0 ff 48 69 e8 40 42
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call Trace:
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81304773>]
> > cpuidle_idle_call+0xda/0x169
> > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81001200>]
> > cpu_idle+0x51/0x95
> >
> > Since NMIs were sent to all CPUs, it is not surprising that one was found
> > to be idle.  Or is there something that I am missing in CPU 31's
> > backtrace?
> >
> > > I'm using suse kernel 3.1.10
> >
> > That is going back a ways!
> >
> > There are quite a few patches to RCU CPU stall warnings that could
> > be applied.  It might be easier to move to a newer kernel.
> >
> > Alternatively, if you are convinced that these stall warnings are
> > false positives, you could boot with the rcu_cpu_stall_suppress
> > kernel boot parameter set, or rebuild the kernel with a much larger
> > value of the RCU_CPU_STALL_TIMEOUT Kconfig option.
> >
> >                                                         Thanx, Paul
> >
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Need your help on rcu_bh_state stall on CPU 0 (t=0 jiffies)
       [not found]       ` <CAD6qDH6iQzf29ZzgFHxUCwRtbdvRCW2C5a4nvuYzhhwHgnOnPQ@mail.gmail.com>
@ 2017-02-21  8:05         ` Paul E. McKenney
       [not found]           ` <CAD6qDH6piF-tyTUjbGj679EBg_SneOUnC6jzxB24r_qy79dYvQ@mail.gmail.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2017-02-21  8:05 UTC (permalink / raw)
  To: Sumit Gemini; +Cc: Ralf Hildebrandt, linux-kernel

On Tue, Feb 21, 2017 at 11:36:42AM +0530, Sumit Gemini wrote:
> Thank Paul, you really save the day by giving your valuable input but I
> still confused in 2 things :-
> 
> 
>    1. If i suppress the rcu_bh_state() warning, will NMI not trigger?
>    Because the problem is i don't know how to reproduce this issue. I observed
>    this issue first time ever in my career and this issue occured at customer
>    end so i have to give him a workaround.

Yes, suppressing the RCU CPU stall warning will also suppress the NMIs.

>    2. and one important thing, i have kernel stack traces which was
>    occurred when stall issue observed. Many developer got the almost same
>    stack-starce. similar this link "
>    http://www.serverphorums.com/read.php?12,468996" in which Ralf and you
>    were discussing. So can we not get any suspicious function or statement
>    from this stack trace? or we need ftrace to debug this?

As described in stallwarn.txt (please do read this if you have not already
done so, and from your questions it does -not- seem like you have read
it), one cause of RCU CPU stall warnings is long loops in the kernel.
Adding cond_resched() in older kernels or (better) cond_resched_rcu_qs()
in newer kernels can prevent the stall warnings and other problems.
But calls to these functions must be carefully placed, because otherwise
you will get error messages such as "scheduling while atomic".

							Thanx, Paul

> Thanks
> ~Sumit Gemini
> 
> On Tue, Feb 21, 2017 at 8:31 AM, Paul E. McKenney <
> paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Tue, Feb 21, 2017 at 06:36:34AM +0530, Sumit Gemini wrote:
> > > Hi Pual,
> > >
> > > Thanks for your time.
> > >
> > > we are using HA pair with TIPC and opensf functionality. Now problem is
> > on
> > > active machine TIPC link lost message was observed on 06:45:00 but
> > standby
> > > machine detected this link lost at 06:45:06 because  at 06:45:00 standby
> > > machine having rcu_bh_state (CPU stall). Once standby machine detected
> > link
> > > lost, it fired reboot command for active machine with the help of opensf
> > at
> > > 06:45:07 to make itself active and standby to previously active machine.
> > >
> > > I raised this matter before TIPC open source community, and they told
> > > rcu_bh_state was the reason for delaying to get link down event at
> > standby
> > > end.
> > >
> > > In normal case TIPC link broken detects in < 2 second. But in our case
> > took
> > > 7 second because of that stall. It is first time, we observed such
> > > situation.
> > >
> > > Now if we suppres this warning, will we observe this delay again?
> >
> > That depends.  If the delay was due to the NMIs and stack-trace printing,
> > suppressing the warning will make the delay go away.  On the other hand,
> > if there really is a stall independent of the warning, then the delay
> > will still be there.  Why not try it and find out?
> >
> > > Can you enlighten on this stall. Why does this stall happen at ideal
> > > machine? And if possible to debug it, what we need to do?
> > >
> > > Any pointers will be highly appreciated.
> >
> > I suggest reading Documentation/stallwarn.txt, especially the list
> > of possible causes near the end:
> >
> >         https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
> >
> > If the delay persist when you suppress the warnings, then I suggest that
> > you use something like ftrace to gather information on the delay.
> >
> >                                                         Thanx, Paul
> >
> > > Thanks & Regards
> > > ~Sumit Gemini
> > >
> > >
> > > On Feb 21, 2017 2:28 AM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > wrote:
> > >
> > > > On Mon, Feb 20, 2017 at 03:26:32PM +0530, Sumit Gemini wrote:
> > > > > Hi Paul,
> > > > >
> > > > >
> > > > > I got you while reading 'http://www.serverphorums.com/
> > read.php?12,468996".
> > > > I'm
> > > > > facing same problem but not able to detect what exectly the cause of
> > > > > it.
> > > > >
> > > > >
> > > > > I'm observed weird problem, I got rcu_bh_state on cpu0 but my system
> > > > > was in ideal state when i got rcu_bh_state (Cpu stall) problem. After
> > > > > 7 second my system recovered itself from stall to active. But i did
> > > > > not get what happen at that time.
> > > >
> > > > One thing that you might look into below...
> > > >
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.520060] INFO: rcu_bh_state
> > > > detected stall on CPU 0 (t=0 jiffies)
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] sending NMI to all
> > > > CPUs:
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] NMI backtrace for
> > cpu
> > > > 0
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] CPU 0
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Modules linked in:
> > > > nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit
> > > > ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port
> > > > xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle
> > > > ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit
> > xt_hashcounter
> > > > ip_set_hash_ipip xt_set ip_set_hash_ip deflate zlib_deflate ctr
> > > > twofish_x86_64 twofish_common camellia serpent blowfish cast5
> > des_generic
> > > > cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5
> > crypto_null
> > > > af_key iptable_mangle ip_set nfnetlink arptable_filter arp_tables
> > > > iptable_raw iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding
> > > > binfmt_misc iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat
> > > > nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables
> > mperf
> > > > edd ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_proto_sctp
> > > > nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma ixgbe
> > uas
> > > > sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support button
> > > > container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore processor
> > > > thermal_sys [last unloaded: ipt_PORTMAP]
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042]
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Pid: 0, comm:
> > swapper
> > > > Tainted: P 3.1.10-gb17-default #1 Intel Corporation S2600CO/S2600CO
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RIP:
> > > > 0010:[<ffffffff81007f51>] [<ffffffff81007f51>] native_read_tsc+0x2/0xf
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RSP:
> > > > 0018:ffff88043ee03db0 EFLAGS: 00000803
> > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RAX:
> > 0000000037185395
> > > > RBX: 00000000000003e9 RCX: 0000000000000001
> > > > > Jan 6 16:45:07 ffm-sbc-2a osafimmd[5035]: WA IMMND DOWN on active
> > > > controller f2 detected at standby immd!! f1. Possible failover
> > > > > Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Done Locking applications
> > > > node id:2020f ret val:0
> > > > > Jan 6 16:45:07 ffm-sbc-2a opensaf_recovery: Control interface
> > status:0
> > > > Role:STANDBY
> > > > > Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Current role: STANDBY
> > > > > Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: Rebooting OpenSAF NodeId =
> > > > 131599 EE Name = , Reason: Received Node Down for peer controller, >
> > > > OwnNodeId = 131343, SupervisionTime = 60
> > > >
> > > > Is this "WA IMMND DOWN" a normal occurrence?  Perhaps it is pointing
> > > > at a root cause.
> > > >
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RDX:
> > 0000000000bf0977
> > > > RSI: 0000000000000002 RDI: 0000000000032bd4
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RBP:
> > 0000000000032bd4
> > > > R08: 0000000000000000 R09: ffffffff819232b0
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R10:
> > 7fffffffffffffff
> > > > R11: 7fffffffffffffff R12: 0000000000000000
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R13:
> > ffffffff819232b0
> > > > R14: 0000000000000001 R15: ffffffff81065c28
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] FS:
> > > > 0000000000000000(0000) GS:ffff88043ee00000(0000) knlGS:0000000000000000
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CS: 0010 DS: 0000
> > ES:
> > > > 0000 CR0: 000000008005003b
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CR2:
> > 000000000069e034
> > > > CR3: 0000000001805000 CR4: 00000000000406f0
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR0:
> > 0000000000000000
> > > > DR1: 0000000000000000 DR2: 0000000000000000
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR3:
> > 0000000000000000
> > > > DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Process swapper
> > (pid:
> > > > 0, threadinfo ffffffff81800000, task ffffffff8180d020)
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Stack:
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] ffffffff81200eb5
> > > > ffffffff81200f44 00000000000003e9 0000000000001000
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] 0000000000000002
> > > > ffffffff819232b0 ffffffff81017698 7fffffffffffffff
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] 0000000000000002
> > > > 0000000000000002 ffffffff81017fdf 0000000000000001
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Call Trace:
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff81200eb5>]
> > > > paravirt_read_tsc+0x5/0x8
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff81200f44>]
> > > > delay_tsc+0x1d/0x68
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff81017698>]
> > > > native_safe_apic_wait_icr_idle+0x27/0x32
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff81017fdf>]
> > > > default_send_IPI_dest_field.constprop.0+0x19/0x4d
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff8101804b>]
> > > > default_send_IPI_mask_sequence_phys+0x38/0x6a
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff8101815e>]
> > > > arch_trigger_all_cpu_backtrace+0x4d/0x7b
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff8109801b>]
> > > > check_cpu_stall+0x66/0xdb
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff810980aa>]
> > > > rcu_pending+0x1a/0x10a
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff8109852c>]
> > > > rcu_check_callbacks+0x9d/0xae
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff8104c56c>]
> > > > update_process_times+0x31/0x63
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff81065c92>]
> > > > tick_sched_timer+0x6a/0x90
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff8105b872>]
> > > > __run_hrtimer+0xa4/0x148
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff8105c08e>]
> > > > hrtimer_interrupt+0xdb/0x19a
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff81017768>]
> > > > smp_apic_timer_interrupt+0x6e/0x80
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff813efcde>]
> > > > apic_timer_interrupt+0x6e/0x80
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff81239655>]
> > > > intel_idle+0xdd/0x104
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff81304773>]
> > > > cpuidle_idle_call+0xda/0x169
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff81001200>]
> > > > cpu_idle+0x51/0x95
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff8193db0f>]
> > > > start_kernel+0x388/0x393
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > [<ffffffff8193d3af>]
> > > > x86_64_start_kernel+0xcf/0xdc
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Code: 74 03 e6 80
> > c3
> > > > e6 ed c3 bf 8e 21 00 00 e9 ba 8f 1f 00 c3 90 90 90 40 88 f8 e6 70 e4
> > 71 c3
> > > > 40 88 f0 e6 70 40 88 f8 e6 71 c3 0f 31
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel[3167216.524042]: c1 48 89 d0 48 c1
> > e0
> > > > 20 48 09 c8 c3 41 56 41 55 41 54 55 53
> > > > >
> > > > > .
> > > > > .
> > > > > .
> > > > > .
> > > > > .
> > > > > .
> > > > >
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] NMI backtrace for
> > cpu
> > > > 31
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CPU 31
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Modules linked in:
> > > > nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit
> > > > ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port
> > > > xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle
> > > > ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit
> > xt_hashcounter
> > > > ip_set_hash_ipip xt_set ip_set_hash_ip deflate ctr twofish_x86_64
> > > > twofish_common camellia serpent > blowfish cast5 des_generic cbc xcbc
> > > > rmd160 sha512_generic sha256_generic sha1_generic md5 crypto_null
> > af_key
> > > > iptable_mangle ip_set nfnetlink arptable_filter arp_tables iptable_raw
> > > > iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding binfmt_misc
> > > > iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat
> > > > nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables
> > mperf
> > > > edd ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_proto_sctp
> > > > nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma ixgbe
> > uas
> > > > sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support button
> > > > container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore processor
> > > > thermal_sys [last unloaded: ipt_PORTMAP]
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734]
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Pid: 0, comm:
> > > > kworker/0:1 Tainted: P 3.1.10-gb17-default #1 Intel Corporation
> > > > S2600CO/S2600CO
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RIP:
> > > > 0010:[<ffffffff81239624>] [<ffffffff81239624>] intel_idle+0xac/0x104
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RSP:
> > > > 0018:ffff880425e33eb8 EFLAGS: 00000046
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RAX:
> > 0000000000000030
> > > > RBX: 0000000000000010 RCX: 0000000000000001
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RDX:
> > 0000000000000000
> > > > RSI: ffff880425e33fd8 RDI: ffffffff81810500
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RBP:
> > 0000000000000030
> > > > R08: 000000000000006d R09: 0000000000034b10
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R10:
> > ffff88083eded830
> > > > R11: ffff88083eded830 R12: 149739342cb2ca49
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R13:
> > 0000000000000004
> > > > R14: 000000000000001f R15: 0000000000000000
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] FS:
> > > > 0000000000000000(0000) GS:ffff88083ede0000(0000) knlGS:0000000000000000
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CS: 0010 DS: 0000
> > ES:
> > > > 0000 CR0: 000000008005003b
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CR2:
> > 00007fe89df67120
> > > > CR3: 0000000001805000 CR4: 00000000000406e0
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR0:
> > 0000000000000000
> > > > DR1: 0000000000000000 DR2: 0000000000000000
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR3:
> > 0000000000000000
> > > > DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Process
> > kworker/0:1
> > > > (pid: 0, threadinfo ffff880425e32000, task ffff880425e30580)
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Stack:
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685484] TIPC: Resetting
> > link
> > > > <1.1.1:bond0-1.1.2:bond0>, requested by peer
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685487] TIPC: Lost link
> > > > <1.1.1:bond0-1.1.2:bond0> on network plane A
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685491] TIPC: Lost contact
> > > > with <1.1.2>
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] 0000000000000000
> > > > 000000000cdd3a47 0000000000000000 000000000cdd3a47
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] ffff880425e33fd8
> > > > 0000001f3edf8970 ffff88083edf8970 ffff88083edf8b00
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] 0000000000000000
> > > > ffffffff81304773 ffffffff819232b0 ffff880425e33fd8
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call Trace:
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > [<ffffffff81304773>]
> > > > cpuidle_idle_call+0xda/0x169
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > [<ffffffff81001200>]
> > > > cpu_idle+0x51/0x95
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Code: 28 e0 ff ff
> > 80
> > > > e2 08 75 22 31 d2 48 83 c0 10 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38
> > e0 ff
> > > > ff a8 08 75 08 b1 01 48 89 e8 0f 01 c9 <e8> 3f 6e e2 ff 4c 29 e0 48 89
> > c7
> > > > e8 10 ae e0 ff 48 69 e8 40 42
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call Trace:
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > [<ffffffff81304773>]
> > > > cpuidle_idle_call+0xda/0x169
> > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > [<ffffffff81001200>]
> > > > cpu_idle+0x51/0x95
> > > >
> > > > Since NMIs were sent to all CPUs, it is not surprising that one was
> > found
> > > > to be idle.  Or is there something that I am missing in CPU 31's
> > > > backtrace?
> > > >
> > > > > I'm using suse kernel 3.1.10
> > > >
> > > > That is going back a ways!
> > > >
> > > > There are quite a few patches to RCU CPU stall warnings that could
> > > > be applied.  It might be easier to move to a newer kernel.
> > > >
> > > > Alternatively, if you are convinced that these stall warnings are
> > > > false positives, you could boot with the rcu_cpu_stall_suppress
> > > > kernel boot parameter set, or rebuild the kernel with a much larger
> > > > value of the RCU_CPU_STALL_TIMEOUT Kconfig option.
> > > >
> > > >                                                         Thanx, Paul
> > > >
> > > >
> >
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Need your help on rcu_bh_state stall on CPU 0 (t=0 jiffies)
       [not found]               ` <CAD6qDH5OrQdKJNKb1EENvWMyZOZK7kQ-pHJQL5q9DaKKJbrOpw@mail.gmail.com>
@ 2017-02-22  0:39                 ` Paul E. McKenney
  0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2017-02-22  0:39 UTC (permalink / raw)
  To: Sumit Gemini; +Cc: linux-kernel

On Wed, Feb 22, 2017 at 05:38:00AM +0530, Sumit Gemini wrote:
> Okay Paul, i will try this. but one more thing. According to
> Documentation/stallwarn.txt, occurrence of this issue is rare and customer
> too not observing the same issue and yes this issue come at random.

You lost me on this one.  I do understand that chasing down bugs can
be difficult and frustrating to all concerned, if that is what you
are getting at.

> So how to track all function just before stall. Because if issue does not
> come for 1 month but my ftracer is running continuously it will create huge
> log file and consume RAM.

Limit the size of the ftrace buffers.  This will get you the last part
of the trace data, which is what you want in this case.  Are you able
to modify the kernel?  If so, you can use ftrace_dump() to cause the
trce log to be dumped at the very start of the stall-warning process.

> Because I discussed here in my office about ftrace, and they denied this
> option. According to them it is feasible at local end for debugging (
> reproducing issue) but not at customer end.

Perhaps your kernel is too old.  Or maybe there is some confusion
about the exact option.  Why not just try it?

> Could you please suggest what I do, I don't have idea how to proceed
> further.

I hope that you have tried reproducing this in your lab.

Another approach is to measure the actual overhead in the lab.  Compute
the actual performance degradation, having asked beforehand what
degradation would be acceptable.

One last thing:  The machine knows what is wrong.  Your challenge is to
make it tell you what the problem is, using whatever tools are available.
My ability to help you is sharply limited by the fact that you are
running a very old kernel.  I suggest reaching out to people who work
regularly with this kernel version for additional help with debugging.

							Thanx, Paul

> Thanks & Regards
> ~Sumit Gemini
> 
> On Feb 22, 2017 1:27 AM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> wrote:
> 
> > Hello, Sumit,
> >
> > Instead of tracing RCU-bh, I suggest tracing upper-level function calls
> > in the kernel.  This will allow you to see what the kernel was doing
> > before the stall warning.  See the max_graph_depth parameter in
> > https://www.kernel.org/doc/Documentation/trace/ftrace.txt for how
> > to do this.
> >
> >                                                         Thanx, Paul
> >
> > On Tue, Feb 21, 2017 at 03:48:48PM +0530, Sumit Gemini wrote:
> > > Hi Paul,
> > >
> > > I need you help again, I'm using this link "
> > > http://opensourceforu.com/2010/11/kernel-tracing-with-ftrace-part-1/" to
> > > understand how ftrace work
> > > but problem is, example here showing traces for all events when we on the
> > > tracing.
> > >
> > > Seeking help on google, i got to this video, in which you addressing
> > about
> > > tracing the rcu-bh event.
> > >
> > > https://www.youtube.com/watch?v=WWEz-uoyoPE
> > >
> > >
> > > Could you please describe the steps to collect logs whenever i got rcu_bh
> > > stall event (cpu stall problem) only?
> > >
> > >
> > > Thanks
> > > ~Sumit Gemini
> > >
> > >
> > >
> > > On Tue, Feb 21, 2017 at 1:35 PM, Paul E. McKenney <
> > > paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > > On Tue, Feb 21, 2017 at 11:36:42AM +0530, Sumit Gemini wrote:
> > > > > Thank Paul, you really save the day by giving your valuable input
> > but I
> > > > > still confused in 2 things :-
> > > > >
> > > > >
> > > > >    1. If i suppress the rcu_bh_state() warning, will NMI not trigger?
> > > > >    Because the problem is i don't know how to reproduce this issue. I
> > > > observed
> > > > >    this issue first time ever in my career and this issue occured at
> > > > customer
> > > > >    end so i have to give him a workaround.
> > > >
> > > > Yes, suppressing the RCU CPU stall warning will also suppress the NMIs.
> > > >
> > > > >    2. and one important thing, i have kernel stack traces which was
> > > > >    occurred when stall issue observed. Many developer got the almost
> > same
> > > > >    stack-starce. similar this link "
> > > > >    http://www.serverphorums.com/read.php?12,468996" in which Ralf
> > and
> > > > you
> > > > >    were discussing. So can we not get any suspicious function or
> > > > statement
> > > > >    from this stack trace? or we need ftrace to debug this?
> > > >
> > > > As described in stallwarn.txt (please do read this if you have not
> > already
> > > > done so, and from your questions it does -not- seem like you have read
> > > > it), one cause of RCU CPU stall warnings is long loops in the kernel.
> > > > Adding cond_resched() in older kernels or (better)
> > cond_resched_rcu_qs()
> > > > in newer kernels can prevent the stall warnings and other problems.
> > > > But calls to these functions must be carefully placed, because
> > otherwise
> > > > you will get error messages such as "scheduling while atomic".
> > > >
> > > >                                                         Thanx, Paul
> > > >
> > > > > Thanks
> > > > > ~Sumit Gemini
> > > > >
> > > > > On Tue, Feb 21, 2017 at 8:31 AM, Paul E. McKenney <
> > > > > paulmck@linux.vnet.ibm.com> wrote:
> > > > >
> > > > > > On Tue, Feb 21, 2017 at 06:36:34AM +0530, Sumit Gemini wrote:
> > > > > > > Hi Pual,
> > > > > > >
> > > > > > > Thanks for your time.
> > > > > > >
> > > > > > > we are using HA pair with TIPC and opensf functionality. Now
> > problem
> > > > is
> > > > > > on
> > > > > > > active machine TIPC link lost message was observed on 06:45:00
> > but
> > > > > > standby
> > > > > > > machine detected this link lost at 06:45:06 because  at 06:45:00
> > > > standby
> > > > > > > machine having rcu_bh_state (CPU stall). Once standby machine
> > > > detected
> > > > > > link
> > > > > > > lost, it fired reboot command for active machine with the help of
> > > > opensf
> > > > > > at
> > > > > > > 06:45:07 to make itself active and standby to previously active
> > > > machine.
> > > > > > >
> > > > > > > I raised this matter before TIPC open source community, and they
> > told
> > > > > > > rcu_bh_state was the reason for delaying to get link down event
> > at
> > > > > > standby
> > > > > > > end.
> > > > > > >
> > > > > > > In normal case TIPC link broken detects in < 2 second. But in our
> > > > case
> > > > > > took
> > > > > > > 7 second because of that stall. It is first time, we observed
> > such
> > > > > > > situation.
> > > > > > >
> > > > > > > Now if we suppres this warning, will we observe this delay again?
> > > > > >
> > > > > > That depends.  If the delay was due to the NMIs and stack-trace
> > > > printing,
> > > > > > suppressing the warning will make the delay go away.  On the other
> > > > hand,
> > > > > > if there really is a stall independent of the warning, then the
> > delay
> > > > > > will still be there.  Why not try it and find out?
> > > > > >
> > > > > > > Can you enlighten on this stall. Why does this stall happen at
> > ideal
> > > > > > > machine? And if possible to debug it, what we need to do?
> > > > > > >
> > > > > > > Any pointers will be highly appreciated.
> > > > > >
> > > > > > I suggest reading Documentation/stallwarn.txt, especially the list
> > > > > > of possible causes near the end:
> > > > > >
> > > > > >         https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
> > > > > >
> > > > > > If the delay persist when you suppress the warnings, then I suggest
> > > > that
> > > > > > you use something like ftrace to gather information on the delay.
> > > > > >
> > > > > >                                                         Thanx, Paul
> > > > > >
> > > > > > > Thanks & Regards
> > > > > > > ~Sumit Gemini
> > > > > > >
> > > > > > >
> > > > > > > On Feb 21, 2017 2:28 AM, "Paul E. McKenney" <
> > > > paulmck@linux.vnet.ibm.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > On Mon, Feb 20, 2017 at 03:26:32PM +0530, Sumit Gemini wrote:
> > > > > > > > > Hi Paul,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I got you while reading 'http://www.serverphorums.com/
> > > > > > read.php?12,468996".
> > > > > > > > I'm
> > > > > > > > > facing same problem but not able to detect what exectly the
> > > > cause of
> > > > > > > > > it.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm observed weird problem, I got rcu_bh_state on cpu0 but my
> > > > system
> > > > > > > > > was in ideal state when i got rcu_bh_state (Cpu stall)
> > problem.
> > > > After
> > > > > > > > > 7 second my system recovered itself from stall to active.
> > But i
> > > > did
> > > > > > > > > not get what happen at that time.
> > > > > > > >
> > > > > > > > One thing that you might look into below...
> > > > > > > >
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.520060] INFO:
> > > > rcu_bh_state
> > > > > > > > detected stall on CPU 0 (t=0 jiffies)
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] sending
> > NMI
> > > > to all
> > > > > > > > CPUs:
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] NMI
> > backtrace
> > > > for
> > > > > > cpu
> > > > > > > > 0
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] CPU 0
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Modules
> > > > linked in:
> > > > > > > > nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit
> > > > > > > > ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK
> > > > ip_set_bitmap_port
> > > > > > > > xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle
> > > > > > > > ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit
> > > > > > xt_hashcounter
> > > > > > > > ip_set_hash_ipip xt_set ip_set_hash_ip deflate zlib_deflate ctr
> > > > > > > > twofish_x86_64 twofish_common camellia serpent blowfish cast5
> > > > > > des_generic
> > > > > > > > cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5
> > > > > > crypto_null
> > > > > > > > af_key iptable_mangle ip_set nfnetlink arptable_filter
> > arp_tables
> > > > > > > > iptable_raw iptable_nat tipc xt_tcpudp xt_state xt_pkttype
> > bonding
> > > > > > > > binfmt_misc iptable_filter ip6table_filter ip6_tables
> > nf_nat_ftp
> > > > nf_nat
> > > > > > > > nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
> > > > x_tables
> > > > > > mperf
> > > > > > > > edd ipmi_devintf ipmi_si ipmi_msghandler
> > nf_conntrack_proto_sctp
> > > > > > > > nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma
> > > > ixgbe
> > > > > > uas
> > > > > > > > sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support
> > button
> > > > > > > > container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore
> > > > processor
> > > > > > > > thermal_sys [last unloaded: ipt_PORTMAP]
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Pid: 0,
> > comm:
> > > > > > swapper
> > > > > > > > Tainted: P 3.1.10-gb17-default #1 Intel Corporation
> > S2600CO/S2600CO
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RIP:
> > > > > > > > 0010:[<ffffffff81007f51>] [<ffffffff81007f51>]
> > > > native_read_tsc+0x2/0xf
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RSP:
> > > > > > > > 0018:ffff88043ee03db0 EFLAGS: 00000803
> > > > > > > > > Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RAX:
> > > > > > 0000000037185395
> > > > > > > > RBX: 00000000000003e9 RCX: 0000000000000001
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a osafimmd[5035]: WA IMMND DOWN on
> > active
> > > > > > > > controller f2 detected at standby immd!! f1. Possible failover
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Done Locking
> > > > applications
> > > > > > > > node id:2020f ret val:0
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a opensaf_recovery: Control interface
> > > > > > status:0
> > > > > > > > Role:STANDBY
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Current role:
> > STANDBY
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: Rebooting OpenSAF
> > > > NodeId =
> > > > > > > > 131599 EE Name = , Reason: Received Node Down for peer
> > controller,
> > > > >
> > > > > > > > OwnNodeId = 131343, SupervisionTime = 60
> > > > > > > >
> > > > > > > > Is this "WA IMMND DOWN" a normal occurrence?  Perhaps it is
> > > > pointing
> > > > > > > > at a root cause.
> > > > > > > >
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RDX:
> > > > > > 0000000000bf0977
> > > > > > > > RSI: 0000000000000002 RDI: 0000000000032bd4
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RBP:
> > > > > > 0000000000032bd4
> > > > > > > > R08: 0000000000000000 R09: ffffffff819232b0
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R10:
> > > > > > 7fffffffffffffff
> > > > > > > > R11: 7fffffffffffffff R12: 0000000000000000
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R13:
> > > > > > ffffffff819232b0
> > > > > > > > R14: 0000000000000001 R15: ffffffff81065c28
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] FS:
> > > > > > > > 0000000000000000(0000) GS:ffff88043ee00000(0000)
> > > > knlGS:0000000000000000
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CS: 0010
> > DS:
> > > > 0000
> > > > > > ES:
> > > > > > > > 0000 CR0: 000000008005003b
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CR2:
> > > > > > 000000000069e034
> > > > > > > > CR3: 0000000001805000 CR4: 00000000000406f0
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR0:
> > > > > > 0000000000000000
> > > > > > > > DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR3:
> > > > > > 0000000000000000
> > > > > > > > DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Process
> > > > swapper
> > > > > > (pid:
> > > > > > > > 0, threadinfo ffffffff81800000, task ffffffff8180d020)
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Stack:
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > ffffffff81200eb5
> > > > > > > > ffffffff81200f44 00000000000003e9 0000000000001000
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > 0000000000000002
> > > > > > > > ffffffff819232b0 ffffffff81017698 7fffffffffffffff
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > 0000000000000002
> > > > > > > > 0000000000000002 ffffffff81017fdf 0000000000000001
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Call
> > Trace:
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff81200eb5>]
> > > > > > > > paravirt_read_tsc+0x5/0x8
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff81200f44>]
> > > > > > > > delay_tsc+0x1d/0x68
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff81017698>]
> > > > > > > > native_safe_apic_wait_icr_idle+0x27/0x32
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff81017fdf>]
> > > > > > > > default_send_IPI_dest_field.constprop.0+0x19/0x4d
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff8101804b>]
> > > > > > > > default_send_IPI_mask_sequence_phys+0x38/0x6a
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff8101815e>]
> > > > > > > > arch_trigger_all_cpu_backtrace+0x4d/0x7b
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff8109801b>]
> > > > > > > > check_cpu_stall+0x66/0xdb
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff810980aa>]
> > > > > > > > rcu_pending+0x1a/0x10a
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff8109852c>]
> > > > > > > > rcu_check_callbacks+0x9d/0xae
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff8104c56c>]
> > > > > > > > update_process_times+0x31/0x63
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff81065c92>]
> > > > > > > > tick_sched_timer+0x6a/0x90
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff8105b872>]
> > > > > > > > __run_hrtimer+0xa4/0x148
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff8105c08e>]
> > > > > > > > hrtimer_interrupt+0xdb/0x19a
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff81017768>]
> > > > > > > > smp_apic_timer_interrupt+0x6e/0x80
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff813efcde>]
> > > > > > > > apic_timer_interrupt+0x6e/0x80
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff81239655>]
> > > > > > > > intel_idle+0xdd/0x104
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff81304773>]
> > > > > > > > cpuidle_idle_call+0xda/0x169
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff81001200>]
> > > > > > > > cpu_idle+0x51/0x95
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff8193db0f>]
> > > > > > > > start_kernel+0x388/0x393
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042]
> > > > > > [<ffffffff8193d3af>]
> > > > > > > > x86_64_start_kernel+0xcf/0xdc
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Code: 74
> > 03
> > > > e6 80
> > > > > > c3
> > > > > > > > e6 ed c3 bf 8e 21 00 00 e9 ba 8f 1f 00 c3 90 90 90 40 88 f8 e6
> > 70
> > > > e4
> > > > > > 71 c3
> > > > > > > > 40 88 f0 e6 70 40 88 f8 e6 71 c3 0f 31
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel[3167216.524042]: c1 48 89
> > d0 48
> > > > c1
> > > > > > e0
> > > > > > > > 20 48 09 c8 c3 41 56 41 55 41 54 55 53
> > > > > > > > >
> > > > > > > > > .
> > > > > > > > > .
> > > > > > > > > .
> > > > > > > > > .
> > > > > > > > > .
> > > > > > > > > .
> > > > > > > > >
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] NMI
> > backtrace
> > > > for
> > > > > > cpu
> > > > > > > > 31
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CPU 31
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Modules
> > > > linked in:
> > > > > > > > nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit
> > > > > > > > ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK
> > > > ip_set_bitmap_port
> > > > > > > > xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle
> > > > > > > > ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit
> > > > > > xt_hashcounter
> > > > > > > > ip_set_hash_ipip xt_set ip_set_hash_ip deflate ctr
> > twofish_x86_64
> > > > > > > > twofish_common camellia serpent > blowfish cast5 des_generic
> > cbc
> > > > xcbc
> > > > > > > > rmd160 sha512_generic sha256_generic sha1_generic md5
> > crypto_null
> > > > > > af_key
> > > > > > > > iptable_mangle ip_set nfnetlink arptable_filter arp_tables
> > > > iptable_raw
> > > > > > > > iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding
> > binfmt_misc
> > > > > > > > iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat
> > > > > > > > nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
> > > > x_tables
> > > > > > mperf
> > > > > > > > edd ipmi_devintf ipmi_si ipmi_msghandler
> > nf_conntrack_proto_sctp
> > > > > > > > nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma
> > > > ixgbe
> > > > > > uas
> > > > > > > > sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support
> > button
> > > > > > > > container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore
> > > > processor
> > > > > > > > thermal_sys [last unloaded: ipt_PORTMAP]
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734]
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Pid: 0,
> > comm:
> > > > > > > > kworker/0:1 Tainted: P 3.1.10-gb17-default #1 Intel Corporation
> > > > > > > > S2600CO/S2600CO
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RIP:
> > > > > > > > 0010:[<ffffffff81239624>] [<ffffffff81239624>]
> > > > intel_idle+0xac/0x104
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RSP:
> > > > > > > > 0018:ffff880425e33eb8 EFLAGS: 00000046
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RAX:
> > > > > > 0000000000000030
> > > > > > > > RBX: 0000000000000010 RCX: 0000000000000001
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RDX:
> > > > > > 0000000000000000
> > > > > > > > RSI: ffff880425e33fd8 RDI: ffffffff81810500
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RBP:
> > > > > > 0000000000000030
> > > > > > > > R08: 000000000000006d R09: 0000000000034b10
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R10:
> > > > > > ffff88083eded830
> > > > > > > > R11: ffff88083eded830 R12: 149739342cb2ca49
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R13:
> > > > > > 0000000000000004
> > > > > > > > R14: 000000000000001f R15: 0000000000000000
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] FS:
> > > > > > > > 0000000000000000(0000) GS:ffff88083ede0000(0000)
> > > > knlGS:0000000000000000
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CS: 0010
> > DS:
> > > > 0000
> > > > > > ES:
> > > > > > > > 0000 CR0: 000000008005003b
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CR2:
> > > > > > 00007fe89df67120
> > > > > > > > CR3: 0000000001805000 CR4: 00000000000406e0
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR0:
> > > > > > 0000000000000000
> > > > > > > > DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR3:
> > > > > > 0000000000000000
> > > > > > > > DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Process
> > > > > > kworker/0:1
> > > > > > > > (pid: 0, threadinfo ffff880425e32000, task ffff880425e30580)
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Stack:
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685484] TIPC:
> > > > Resetting
> > > > > > link
> > > > > > > > <1.1.1:bond0-1.1.2:bond0>, requested by peer
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685487] TIPC: Lost
> > > > link
> > > > > > > > <1.1.1:bond0-1.1.2:bond0> on network plane A
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685491] TIPC: Lost
> > > > contact
> > > > > > > > with <1.1.2>
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > > > 0000000000000000
> > > > > > > > 000000000cdd3a47 0000000000000000 000000000cdd3a47
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > > > ffff880425e33fd8
> > > > > > > > 0000001f3edf8970 ffff88083edf8970 ffff88083edf8b00
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > > > 0000000000000000
> > > > > > > > ffffffff81304773 ffffffff819232b0 ffff880425e33fd8
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call
> > Trace:
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > > > > > [<ffffffff81304773>]
> > > > > > > > cpuidle_idle_call+0xda/0x169
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > > > > > [<ffffffff81001200>]
> > > > > > > > cpu_idle+0x51/0x95
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Code: 28
> > e0
> > > > ff ff
> > > > > > 80
> > > > > > > > e2 08 75 22 31 d2 48 83 c0 10 48 89 d1 0f 01 c8 0f ae f0 48 8b
> > 86
> > > > 38
> > > > > > e0 ff
> > > > > > > > ff a8 08 75 08 b1 01 48 89 e8 0f 01 c9 <e8> 3f 6e e2 ff 4c 29
> > e0
> > > > 48 89
> > > > > > c7
> > > > > > > > e8 10 ae e0 ff 48 69 e8 40 42
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call
> > Trace:
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > > > > > [<ffffffff81304773>]
> > > > > > > > cpuidle_idle_call+0xda/0x169
> > > > > > > > > Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214]
> > > > > > [<ffffffff81001200>]
> > > > > > > > cpu_idle+0x51/0x95
> > > > > > > >
> > > > > > > > Since NMIs were sent to all CPUs, it is not surprising that
> > one was
> > > > > > found
> > > > > > > > to be idle.  Or is there something that I am missing in CPU
> > 31's
> > > > > > > > backtrace?
> > > > > > > >
> > > > > > > > > I'm using suse kernel 3.1.10
> > > > > > > >
> > > > > > > > That is going back a ways!
> > > > > > > >
> > > > > > > > There are quite a few patches to RCU CPU stall warnings that
> > could
> > > > > > > > be applied.  It might be easier to move to a newer kernel.
> > > > > > > >
> > > > > > > > Alternatively, if you are convinced that these stall warnings
> > are
> > > > > > > > false positives, you could boot with the rcu_cpu_stall_suppress
> > > > > > > > kernel boot parameter set, or rebuild the kernel with a much
> > larger
> > > > > > > > value of the RCU_CPU_STALL_TIMEOUT Kconfig option.
> > > > > > > >
> > > > > > > >                                                         Thanx,
> > Paul
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-02-22  0:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAD6qDH7j1Fr6jFsuaeJm+xGX8Q41m7nDJb7C329jF65yrO5a7g@mail.gmail.com>
2017-02-20 20:58 ` Need your help on rcu_bh_state stall on CPU 0 (t=0 jiffies) Paul E. McKenney
     [not found]   ` <CAD6qDH7GY6xYDtZAF3LWFFYZZuusj7UuQg43WhEc44jXBfeNpA@mail.gmail.com>
2017-02-21  3:01     ` Paul E. McKenney
     [not found]       ` <CAD6qDH6iQzf29ZzgFHxUCwRtbdvRCW2C5a4nvuYzhhwHgnOnPQ@mail.gmail.com>
2017-02-21  8:05         ` Paul E. McKenney
     [not found]           ` <CAD6qDH6piF-tyTUjbGj679EBg_SneOUnC6jzxB24r_qy79dYvQ@mail.gmail.com>
     [not found]             ` <20170221195753.GF30506@linux.vnet.ibm.com>
     [not found]               ` <CAD6qDH5OrQdKJNKb1EENvWMyZOZK7kQ-pHJQL5q9DaKKJbrOpw@mail.gmail.com>
2017-02-22  0:39                 ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.