All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/5] Miscellaneous fixes for 4.9
@ 2016-08-22 15:29 Paul E. McKenney
  2016-08-22 15:30 ` [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread Paul E. McKenney
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Paul E. McKenney @ 2016-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani

Hello!

This series contains miscellaneous fixes:

1.	Fix a soft lockup in the RCU's callback-offload (rcuo) kthreads,
	courtesy of Ding Tianhong.

2.	Use rcu_gp_kthread_wake() to wake up grace period kthreads,
	avoiding spurious/redundant wakeups, courtesy of Jisheng Zhang.

3.	Make wake_up_nohz_cpu() handle CPUs going offline.

4.	Don't use modular infrastructure in non-modular code, courtesy
	of Paul Gortmaker.

5.	Avoid redundant quiescent-state chasing by more carefully
	initializing flags for newly onlined CPUs.

							Thanx, Paul

------------------------------------------------------------------------

 rcu/tree.c        |   12 ++++++------
 rcu/tree_plugin.h |    1 +
 rcu/update.c      |    3 +--
 sched/core.c      |    7 +++++++
 4 files changed, 15 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread
  2016-08-22 15:29 [PATCH tip/core/rcu 0/5] Miscellaneous fixes for 4.9 Paul E. McKenney
@ 2016-08-22 15:30 ` Paul E. McKenney
  2016-08-22 16:19   ` Nikolay Borisov
  2016-08-22 15:30 ` [PATCH tip/core/rcu 2/5] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads again Paul E. McKenney
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2016-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Ding Tianhong, Paul E. McKenney

From: Ding Tianhong <dingtianhong@huawei.com>

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
[  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.106005] Stack:
[  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[  368.106005] Call Trace:
[  368.106005]  <IRQ>
[  368.106005]
[  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
[  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
[  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
[  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
[  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
[  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
[  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
[  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
[  368.106005]  <EOI>
[  368.106005]
[  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
[  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
[  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
[  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140

==================================cut here==============================

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0082fce402a0..85c5a883c6e3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
 				cl++;
 			c++;
 			local_bh_enable();
+			cond_resched_rcu_qs();
 			list = next;
 		}
 		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH tip/core/rcu 2/5] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads again
  2016-08-22 15:29 [PATCH tip/core/rcu 0/5] Miscellaneous fixes for 4.9 Paul E. McKenney
  2016-08-22 15:30 ` [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread Paul E. McKenney
@ 2016-08-22 15:30 ` Paul E. McKenney
  2016-08-22 15:30 ` [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline Paul E. McKenney
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2016-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Jisheng Zhang, Paul E. McKenney

From: Jisheng Zhang <jszhang@marvell.com>

commit abedf8e2419f ("rcu: Use simple wait queues where possible in
rcutree") converts wait queues in rcutree to use simple wait queues,
but it incorrectly reverts the commit 2aa792e6faf1 ("rcu: Use
rcu_gp_kthread_wake() to wake up grace period kthreads").

This patch tries to fix the above issue by useing rcu_gp_kthread_wake()
to wake up grace period kthreads again.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5d80925e7fc8..cc1779a7ec5f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2344,7 +2344,7 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags)
 	WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
 	WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS);
 	raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(rsp), flags);
-	swake_up(&rsp->gp_wq);  /* Memory barrier implied by swake_up() path. */
+	rcu_gp_kthread_wake(rsp);
 }
 
 /*
@@ -2970,7 +2970,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
 	}
 	WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS);
 	raw_spin_unlock_irqrestore_rcu_node(rnp_old, flags);
-	swake_up(&rsp->gp_wq); /* Memory barrier implied by swake_up() path. */
+	rcu_gp_kthread_wake(rsp);
 }
 
 /*
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-08-22 15:29 [PATCH tip/core/rcu 0/5] Miscellaneous fixes for 4.9 Paul E. McKenney
  2016-08-22 15:30 ` [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread Paul E. McKenney
  2016-08-22 15:30 ` [PATCH tip/core/rcu 2/5] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads again Paul E. McKenney
@ 2016-08-22 15:30 ` Paul E. McKenney
  2016-08-22 22:57   ` Wanpeng Li
  2016-08-22 15:30 ` [PATCH tip/core/rcu 4/5] rcu: don't use modular infrastructure in non-modular code Paul E. McKenney
  2016-08-22 15:30 ` [PATCH tip/core/rcu 5/5] rcu: Avoid redundant quiescent-state chasing Paul E. McKenney
  4 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2016-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Both timers and hrtimers are maintained on the outgoing CPU until
CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
will splat in native_smp_send_reschedule() when attempting to wake up
the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:

[ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
[ 7976.741595] Modules linked in:
[ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
[ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
[ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
[ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
[ 7976.741595] Call Trace:
[ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
[ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
[ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
[ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
[ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
[ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
[ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
[ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
[ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
[ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
[ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
[ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
[ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200

However, in this case, the wakeup is redundant, because the timer
migration will reprogram timer hardware as needed.  Note that the fact
that preemption is disabled does not avoid the splat, as the offline
operation has already passed both the synchronize_sched() and the
stop_machine() that would be blocked by disabled preemption.

This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
to wake up offline CPUs.  It also adds a comment stating that the
caller must tolerate lost wakeups when the target CPU is going offline,
and suggesting the CPU_DEAD notifier as a recovery mechanism.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5c883fe8e440..2a18856f00ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -580,6 +580,8 @@ static bool wake_up_full_nohz_cpu(int cpu)
 	 * If needed we can still optimize that later with an
 	 * empty IRQ.
 	 */
+	if (cpu_is_offline(cpu))
+		return true;  /* Don't try to wake offline CPUs. */
 	if (tick_nohz_full_cpu(cpu)) {
 		if (cpu != smp_processor_id() ||
 		    tick_nohz_tick_stopped())
@@ -590,6 +592,11 @@ static bool wake_up_full_nohz_cpu(int cpu)
 	return false;
 }
 
+/*
+ * Wake up the specified CPU.  If the CPU is going offline, it is the
+ * caller's responsibility to deal with the lost wakeup, for example,
+ * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
+ */
 void wake_up_nohz_cpu(int cpu)
 {
 	if (!wake_up_full_nohz_cpu(cpu))
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH tip/core/rcu 4/5] rcu: don't use modular infrastructure in non-modular code
  2016-08-22 15:29 [PATCH tip/core/rcu 0/5] Miscellaneous fixes for 4.9 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2016-08-22 15:30 ` [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline Paul E. McKenney
@ 2016-08-22 15:30 ` Paul E. McKenney
  2016-08-22 15:30 ` [PATCH tip/core/rcu 5/5] rcu: Avoid redundant quiescent-state chasing Paul E. McKenney
  4 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2016-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul Gortmaker, Paul E. McKenney

From: Paul Gortmaker <paul.gortmaker@windriver.com>

The Kconfig currently controlling compilation of tree.c is:

init/Kconfig:config TREE_RCU
init/Kconfig:   bool

...and update.c and sync.c are "obj-y" meaning that none are ever
built as a module by anyone.

Since MODULE_ALIAS is a no-op for non-modular code, we can remove
them from these files.

We leave moduleparam.h behind since the files instantiate some boot
time configuration parameters with module_param() still.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c   | 2 --
 kernel/rcu/update.c | 3 +--
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cc1779a7ec5f..e83446062f65 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -41,7 +41,6 @@
 #include <linux/export.h>
 #include <linux/completion.h>
 #include <linux/moduleparam.h>
-#include <linux/module.h>
 #include <linux/percpu.h>
 #include <linux/notifier.h>
 #include <linux/cpu.h>
@@ -60,7 +59,6 @@
 #include "tree.h"
 #include "rcu.h"
 
-MODULE_ALIAS("rcutree");
 #ifdef MODULE_PARAM_PREFIX
 #undef MODULE_PARAM_PREFIX
 #endif
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index f0d8322bc3ec..f19271dce0a9 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -46,7 +46,7 @@
 #include <linux/export.h>
 #include <linux/hardirq.h>
 #include <linux/delay.h>
-#include <linux/module.h>
+#include <linux/moduleparam.h>
 #include <linux/kthread.h>
 #include <linux/tick.h>
 
@@ -54,7 +54,6 @@
 
 #include "rcu.h"
 
-MODULE_ALIAS("rcupdate");
 #ifdef MODULE_PARAM_PREFIX
 #undef MODULE_PARAM_PREFIX
 #endif
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH tip/core/rcu 5/5] rcu: Avoid redundant quiescent-state chasing
  2016-08-22 15:29 [PATCH tip/core/rcu 0/5] Miscellaneous fixes for 4.9 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2016-08-22 15:30 ` [PATCH tip/core/rcu 4/5] rcu: don't use modular infrastructure in non-modular code Paul E. McKenney
@ 2016-08-22 15:30 ` Paul E. McKenney
  4 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2016-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Currently, __note_gp_changes() checks to see if the CPU has slept through
multiple grace periods.  If it has, it resynchronizes that CPU's view
of the grace-period state, which includes whether or not the current
grace period needs a quiescent state from this CPU.  The fact of this
need (or lack thereof) needs to be in two places, rdp->cpu_no_qs.b.norm
and rdp->core_needs_qs.  The former tells RCU's context-switch code to
go get a quiescent state and the latter says that it needs to be reported.
The current code unconditionally sets the former to true, but correctly
sets the latter.

This does not result in failures, but it does unnecessarily increase
the amount of work done on average at context-switch time.  This commit
therefore correctly sets both fields.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e83446062f65..733902c33dd2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1846,6 +1846,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 			      struct rcu_data *rdp)
 {
 	bool ret;
+	bool need_gp;
 
 	/* Handle the ends of any preceding grace periods first. */
 	if (rdp->completed == rnp->completed &&
@@ -1872,9 +1873,10 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 		 */
 		rdp->gpnum = rnp->gpnum;
 		trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart"));
-		rdp->cpu_no_qs.b.norm = true;
+		need_gp = !!(rnp->qsmask & rdp->grpmask);
+		rdp->cpu_no_qs.b.norm = need_gp;
 		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
-		rdp->core_needs_qs = !!(rnp->qsmask & rdp->grpmask);
+		rdp->core_needs_qs = need_gp;
 		zero_cpu_stall_ticks(rdp);
 		WRITE_ONCE(rdp->gpwrap, false);
 	}
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread
  2016-08-22 15:30 ` [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread Paul E. McKenney
@ 2016-08-22 16:19   ` Nikolay Borisov
  2016-08-22 16:44     ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2016-08-22 16:19 UTC (permalink / raw)
  To: Paul E. McKenney, linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Ding Tianhong


[SNIP]
> 
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]

This contradicts...

> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcu/tree_plugin.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 0082fce402a0..85c5a883c6e3 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
>  				cl++;
>  			c++;
>  			local_bh_enable();
> +			cond_resched_rcu_qs();

with what's here?
>  			list = next;
>  		}
>  		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread
  2016-08-22 16:19   ` Nikolay Borisov
@ 2016-08-22 16:44     ` Paul E. McKenney
  2016-08-22 16:47       ` Nikolay Borisov
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2016-08-22 16:44 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, fweisbec, oleg, bobby.prani, Ding Tianhong

On Mon, Aug 22, 2016 at 07:19:53PM +0300, Nikolay Borisov wrote:
> 
> [SNIP]
> > 
> > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> > [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
> 
> This contradicts...
> 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcu/tree_plugin.h | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 0082fce402a0..85c5a883c6e3 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
> >  				cl++;
> >  			c++;
> >  			local_bh_enable();
> > +			cond_resched_rcu_qs();
> 
> with what's here?

Ding Tianhong's original patch:

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1167918.html

had cond_resched() here, which works for his workload, but can result
in stall warnings in other cases.  I therfore changed his cond_resched()
to the cond_resched_rcu_qs() that you see above, and documented this
change in the "paulmck" note after Ding Tianhong's Signed-off-by.

So all is as it should be.

							Thanx, Paul

> >  			list = next;
> >  		}
> >  		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> > 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread
  2016-08-22 16:44     ` Paul E. McKenney
@ 2016-08-22 16:47       ` Nikolay Borisov
  0 siblings, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2016-08-22 16:47 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, fweisbec, oleg, bobby.prani, Ding Tianhong



On 22.08.2016 19:44, Paul E. McKenney wrote:
> On Mon, Aug 22, 2016 at 07:19:53PM +0300, Nikolay Borisov wrote:
>>
>> [SNIP]
>>>
>>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>>> [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
>>
>> This contradicts...
>>
>>> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>> ---
>>>  kernel/rcu/tree_plugin.h | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>> index 0082fce402a0..85c5a883c6e3 100644
>>> --- a/kernel/rcu/tree_plugin.h
>>> +++ b/kernel/rcu/tree_plugin.h
>>> @@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
>>>  				cl++;
>>>  			c++;
>>>  			local_bh_enable();
>>> +			cond_resched_rcu_qs();
>>
>> with what's here?
> 
> Ding Tianhong's original patch:
> 
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1167918.html
> 
> had cond_resched() here, which works for his workload, but can result
> in stall warnings in other cases.  I therfore changed his cond_resched()
> to the cond_resched_rcu_qs() that you see above, and documented this
> change in the "paulmck" note after Ding Tianhong's Signed-off-by.

I think my english escaped me since I took your paulmck note as "we had
cond_resched_rcu_qs initially and I replaced it with cond_resched". But
apparently it was the opposite.

Cheers,
Nik

> 
> So all is as it should be.
> 
> 							Thanx, Paul
> 
>>>  			list = next;
>>>  		}
>>>  		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
>>>
>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-08-22 15:30 ` [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline Paul E. McKenney
@ 2016-08-22 22:57   ` Wanpeng Li
  2016-08-23  0:45     ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2016-08-22 22:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Ingo Molnar, Lai Jiangshan, dipankar,
	Andrew Morton, Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, dhowells, edumazet, dvhart,
	Frédéric Weisbecker, oleg, bobby.prani

2016-08-22 23:30 GMT+08:00 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> Both timers and hrtimers are maintained on the outgoing CPU until
> CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
> mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
> will splat in native_smp_send_reschedule() when attempting to wake up
> the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
>
> [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
> [ 7976.741595] Modules linked in:
> [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
> [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
> [ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
> [ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
> [ 7976.741595] Call Trace:
> [ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
> [ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
> [ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
> [ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
> [ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
> [ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
> [ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
> [ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
> [ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
> [ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
> [ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
> [ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
> [ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
>
> However, in this case, the wakeup is redundant, because the timer
> migration will reprogram timer hardware as needed.  Note that the fact
> that preemption is disabled does not avoid the splat, as the offline
> operation has already passed both the synchronize_sched() and the
> stop_machine() that would be blocked by disabled preemption.
>
> This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
> to wake up offline CPUs.  It also adds a comment stating that the
> caller must tolerate lost wakeups when the target CPU is going offline,
> and suggesting the CPU_DEAD notifier as a recovery mechanism.

Interesting, I have a patch which posted several weeks ago fix another
similar issue, https://lkml.org/lkml/2016/8/4/143 Anyway, if my patch
also fixes your bug?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-08-22 22:57   ` Wanpeng Li
@ 2016-08-23  0:45     ` Paul E. McKenney
  2016-08-23  0:47       ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2016-08-23  0:45 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, Ingo Molnar, Lai Jiangshan, dipankar,
	Andrew Morton, Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, dhowells, edumazet, dvhart,
	Frédéric Weisbecker, oleg, bobby.prani

On Tue, Aug 23, 2016 at 06:57:20AM +0800, Wanpeng Li wrote:
> 2016-08-22 23:30 GMT+08:00 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > Both timers and hrtimers are maintained on the outgoing CPU until
> > CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
> > mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
> > will splat in native_smp_send_reschedule() when attempting to wake up
> > the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
> >
> > [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595] Modules linked in:
> > [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
> > [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
> > [ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
> > [ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
> > [ 7976.741595] Call Trace:
> > [ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
> > [ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
> > [ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
> > [ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
> > [ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
> > [ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
> > [ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
> > [ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
> > [ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
> > [ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
> > [ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
> > [ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
> >
> > However, in this case, the wakeup is redundant, because the timer
> > migration will reprogram timer hardware as needed.  Note that the fact
> > that preemption is disabled does not avoid the splat, as the offline
> > operation has already passed both the synchronize_sched() and the
> > stop_machine() that would be blocked by disabled preemption.
> >
> > This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
> > to wake up offline CPUs.  It also adds a comment stating that the
> > caller must tolerate lost wakeups when the target CPU is going offline,
> > and suggesting the CPU_DEAD notifier as a recovery mechanism.
> 
> Interesting, I have a patch which posted several weeks ago fix another
> similar issue, https://lkml.org/lkml/2016/8/4/143 Anyway, if my patch
> also fixes your bug?

I will see your several weeks and raise you more than a month:

http://lkml.kernel.org/g/20160630175845.GA10269@linux.vnet.ibm.com

So you try mine and then I will try yours.  ;-)

Especially given that I am not seeing how the code path in my trace
above reaches your change in sched_can_stop_tick()...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-08-23  0:45     ` Paul E. McKenney
@ 2016-08-23  0:47       ` Wanpeng Li
  0 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2016-08-23  0:47 UTC (permalink / raw)
  To: Paul McKenney
  Cc: linux-kernel, Ingo Molnar, Lai Jiangshan, dipankar,
	Andrew Morton, Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, dhowells, Eric Dumazet, dvhart,
	Frédéric Weisbecker, oleg, pranith kumar

2016-08-23 8:45 GMT+08:00 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Tue, Aug 23, 2016 at 06:57:20AM +0800, Wanpeng Li wrote:
>> 2016-08-22 23:30 GMT+08:00 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
>> > Both timers and hrtimers are maintained on the outgoing CPU until
>> > CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
>> > mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
>> > will splat in native_smp_send_reschedule() when attempting to wake up
>> > the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
>> >
>> > [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
>> > [ 7976.741595] Modules linked in:
>> > [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
>> > [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> > [ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
>> > [ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
>> > [ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
>> > [ 7976.741595] Call Trace:
>> > [ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
>> > [ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
>> > [ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
>> > [ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
>> > [ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
>> > [ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
>> > [ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
>> > [ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
>> > [ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
>> > [ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
>> > [ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
>> > [ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
>> > [ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
>> >
>> > However, in this case, the wakeup is redundant, because the timer
>> > migration will reprogram timer hardware as needed.  Note that the fact
>> > that preemption is disabled does not avoid the splat, as the offline
>> > operation has already passed both the synchronize_sched() and the
>> > stop_machine() that would be blocked by disabled preemption.
>> >
>> > This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
>> > to wake up offline CPUs.  It also adds a comment stating that the
>> > caller must tolerate lost wakeups when the target CPU is going offline,
>> > and suggesting the CPU_DEAD notifier as a recovery mechanism.
>>
>> Interesting, I have a patch which posted several weeks ago fix another
>> similar issue, https://lkml.org/lkml/2016/8/4/143 Anyway, if my patch
>> also fixes your bug?
>
> I will see your several weeks and raise you more than a month:
>
> http://lkml.kernel.org/g/20160630175845.GA10269@linux.vnet.ibm.com
>
> So you try mine and then I will try yours.  ;-)
>
> Especially given that I am not seeing how the code path in my trace
> above reaches your change in sched_can_stop_tick()...

Agreed, they are different bugs.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-08-23  0:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22 15:29 [PATCH tip/core/rcu 0/5] Miscellaneous fixes for 4.9 Paul E. McKenney
2016-08-22 15:30 ` [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread Paul E. McKenney
2016-08-22 16:19   ` Nikolay Borisov
2016-08-22 16:44     ` Paul E. McKenney
2016-08-22 16:47       ` Nikolay Borisov
2016-08-22 15:30 ` [PATCH tip/core/rcu 2/5] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads again Paul E. McKenney
2016-08-22 15:30 ` [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline Paul E. McKenney
2016-08-22 22:57   ` Wanpeng Li
2016-08-23  0:45     ` Paul E. McKenney
2016-08-23  0:47       ` Wanpeng Li
2016-08-22 15:30 ` [PATCH tip/core/rcu 4/5] rcu: don't use modular infrastructure in non-modular code Paul E. McKenney
2016-08-22 15:30 ` [PATCH tip/core/rcu 5/5] rcu: Avoid redundant quiescent-state chasing Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.