linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
@ 2017-01-13  2:38 Paul E. McKenney
  2017-01-13 11:25 ` Borislav Petkov
  2017-01-13 17:02 ` Borislav Petkov
  0 siblings, 2 replies; 12+ messages in thread
From: Paul E. McKenney @ 2017-01-13  2:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: lv.zheng, bp, stan.kain, waffolz, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, mingo, torvalds, rafael

The current preemptible RCU implementation goes through three phases
during bootup.  In the first phase, there is only one CPU that is running
with preemption disabled, so that a no-op is a synchronous grace period.
In the second mid-boot phase, the scheduler is running, but RCU has
not yet gotten its kthreads spawned (and, for expedited grace periods,
workqueues are not yet running.  During this time, any attempt to do
a synchronous grace period will hang the system (or complain bitterly,
depending).  In the third and final phase, RCU is fully operational and
everything works normally.

This has been OK for some time, but there has recently been some
synchronous grace periods showing up during the second mid-boot phase.
This commit therefore reworks RCU to permit synchronous grace periods
to proceed during this mid-boot phase.

This commit accomplishes this by setting a flag from the existing
rcu_scheduler_starting() function which causes all synchronous grace
periods to take the expedited path.  The expedited path now checks this
flag, using the requesting task to drive the expedited grace period
forward during the mid-boot phase.  Finally, this flag is updated by a
core_initcall() function named rcu_exp_runtime_mode(), which causes the
runtime codepaths to be used.

Note that this arrangement assumes that tasks are not sent POSIX signals
(or anything similar) from the time that the first task is spawned
through core_initcall() time.

Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 321f9ed552a9..01f71e1d2e94 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -444,6 +444,10 @@ bool __rcu_is_watching(void);
 #error "Unknown RCU implementation specified to kernel configuration"
 #endif
 
+#define RCU_SCHEDULER_INACTIVE	0
+#define RCU_SCHEDULER_INIT	1
+#define RCU_SCHEDULER_RUNNING	2
+
 /*
  * init_rcu_head_on_stack()/destroy_rcu_head_on_stack() are needed for dynamic
  * initialization and destruction of rcu_head on the stack. rcu_head structures
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 80adef7d4c3d..0d6ff3e471be 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -136,6 +136,7 @@ int rcu_jiffies_till_stall_check(void);
 #define TPS(x)  tracepoint_string(x)
 
 void rcu_early_boot_tests(void);
+void rcu_test_sync_prims(void);
 
 /*
  * This function really isn't for public consumption, but RCU is special in
diff --git a/kernel/rcu/tiny_plugin.h b/kernel/rcu/tiny_plugin.h
index 196f0302e2f4..e3953bdee383 100644
--- a/kernel/rcu/tiny_plugin.h
+++ b/kernel/rcu/tiny_plugin.h
@@ -65,7 +65,7 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active);
 void __init rcu_scheduler_starting(void)
 {
 	WARN_ON(nr_context_switches() > 0);
-	rcu_scheduler_active = 1;
+	rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
 }
 
 #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 96c52e43f7ca..7bcce4607863 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -127,13 +127,16 @@ int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
 int sysctl_panic_on_rcu_stall __read_mostly;
 
 /*
- * The rcu_scheduler_active variable transitions from zero to one just
- * before the first task is spawned.  So when this variable is zero, RCU
+ * The rcu_scheduler_active variable transitions from
+ * RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT just before the first
+ * task is spawned.  So when this variable is RCU_SCHEDULER_INACTIVE, RCU
  * can assume that there is but one task, allowing RCU to (for example)
  * optimize synchronize_rcu() to a simple barrier().  When this variable
  * is one, RCU must actually do all the hard work required to detect real
  * grace periods.  This variable is also used to suppress boot-time false
- * positives from lockdep-RCU error checking.
+ * positives from lockdep-RCU error checking.  Finally, this variable
+ * transitions from RCU_SCHEDULER_INIT to RCU_SCHEDULER_RUNNING after RCU
+ * is fully initialized, including all of its tasks having been spawned.
  */
 int rcu_scheduler_active __read_mostly;
 EXPORT_SYMBOL_GPL(rcu_scheduler_active);
@@ -3980,18 +3983,22 @@ static int __init rcu_spawn_gp_kthread(void)
 early_initcall(rcu_spawn_gp_kthread);
 
 /*
- * This function is invoked towards the end of the scheduler's initialization
- * process.  Before this is called, the idle task might contain
- * RCU read-side critical sections (during which time, this idle
- * task is booting the system).  After this function is called, the
- * idle tasks are prohibited from containing RCU read-side critical
- * sections.  This function also enables RCU lockdep checking.
+ * This function is invoked towards the end of the scheduler's
+ * initialization process.  Before this is called, the idle task might
+ * contain synchronous grace-period primitives (during which time, this idle
+ * task is booting the system, and such primitives are no-ops).  After this
+ * function is called, any synchronous grace-period primitives are run as
+ * expedited, with the requesting task driving the grace period forward.
+ * A later core_initcall() rcu_exp_runtime_mode() will switch to full
+ * runtime RCU functionality.
  */
 void rcu_scheduler_starting(void)
 {
 	WARN_ON(num_online_cpus() != 1);
 	WARN_ON(nr_context_switches() > 0);
-	rcu_scheduler_active = 1;
+	rcu_test_sync_prims();
+	rcu_scheduler_active = RCU_SCHEDULER_INIT;
+	rcu_test_sync_prims();
 }
 
 /*
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index d3053e99fdb6..e59e1849b89a 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -532,18 +532,28 @@ struct rcu_exp_work {
 };
 
 /*
+ * Common code to drive an expedited grace period forward, used by
+ * workqueues and mid-boot-time tasks.
+ */
+static void rcu_exp_sel_wait_wake(struct rcu_state *rsp,
+				  smp_call_func_t func, unsigned long s)
+{
+	/* Initialize the rcu_node tree in preparation for the wait. */
+	sync_rcu_exp_select_cpus(rsp, func);
+
+	/* Wait and clean up, including waking everyone. */
+	rcu_exp_wait_wake(rsp, s);
+}
+
+/*
  * Work-queue handler to drive an expedited grace period forward.
  */
 static void wait_rcu_exp_gp(struct work_struct *wp)
 {
 	struct rcu_exp_work *rewp;
 
-	/* Initialize the rcu_node tree in preparation for the wait. */
 	rewp = container_of(wp, struct rcu_exp_work, rew_work);
-	sync_rcu_exp_select_cpus(rewp->rew_rsp, rewp->rew_func);
-
-	/* Wait and clean up, including waking everyone. */
-	rcu_exp_wait_wake(rewp->rew_rsp, rewp->rew_s);
+	rcu_exp_sel_wait_wake(rewp->rew_rsp, rewp->rew_func, rewp->rew_s);
 }
 
 /*
@@ -569,12 +579,18 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
 	if (exp_funnel_lock(rsp, s))
 		return;  /* Someone else did our work for us. */
 
-	/* Marshall arguments and schedule the expedited grace period. */
-	rew.rew_func = func;
-	rew.rew_rsp = rsp;
-	rew.rew_s = s;
-	INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
-	schedule_work(&rew.rew_work);
+	/* Ensure that load happens before action based on it. */
+	if (unlikely(rcu_scheduler_active == RCU_SCHEDULER_INIT)) {
+		/* Direct call during scheduler init and early_initcalls(). */
+		rcu_exp_sel_wait_wake(rsp, func, s);
+	} else {
+		/* Marshall arguments & schedule the expedited grace period. */
+		rew.rew_func = func;
+		rew.rew_rsp = rsp;
+		rew.rew_s = s;
+		INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
+		schedule_work(&rew.rew_work);
+	}
 
 	/* Wait for expedited grace period to complete. */
 	rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
@@ -676,6 +692,8 @@ void synchronize_rcu_expedited(void)
 {
 	struct rcu_state *rsp = rcu_state_p;
 
+	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
+		return;
 	_synchronize_rcu_expedited(rsp, sync_rcu_exp_handler);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
@@ -693,3 +711,15 @@ void synchronize_rcu_expedited(void)
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+
+/*
+ * Switch to run-time mode once Tree RCU has fully initialized.
+ */
+static int __init rcu_exp_runtime_mode(void)
+{
+	rcu_test_sync_prims();
+	rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
+	rcu_test_sync_prims();
+	return 0;
+}
+core_initcall(rcu_exp_runtime_mode);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 85c5a883c6e3..56583e764ebf 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -670,7 +670,7 @@ void synchronize_rcu(void)
 			 lock_is_held(&rcu_lock_map) ||
 			 lock_is_held(&rcu_sched_lock_map),
 			 "Illegal synchronize_rcu() in RCU read-side critical section");
-	if (!rcu_scheduler_active)
+	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
 		return;
 	if (rcu_gp_is_expedited())
 		synchronize_rcu_expedited();
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index f19271dce0a9..ac2fc9ef2867 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -121,11 +121,14 @@ EXPORT_SYMBOL(rcu_read_lock_sched_held);
  * Should expedited grace-period primitives always fall back to their
  * non-expedited counterparts?  Intended for use within RCU.  Note
  * that if the user specifies both rcu_expedited and rcu_normal, then
- * rcu_normal wins.
+ * rcu_normal wins.  (Except during the time period during boot from
+ * when the first task is spawned until the rcu_exp_runtime_mode()
+ * core_initcall() is invoked, at which point everything is expedited.)
  */
 bool rcu_gp_is_normal(void)
 {
-	return READ_ONCE(rcu_normal);
+	return READ_ONCE(rcu_normal) &&
+	       rcu_scheduler_active != RCU_SCHEDULER_INIT;
 }
 EXPORT_SYMBOL_GPL(rcu_gp_is_normal);
 
@@ -135,13 +138,14 @@ static atomic_t rcu_expedited_nesting =
 /*
  * Should normal grace-period primitives be expedited?  Intended for
  * use within RCU.  Note that this function takes the rcu_expedited
- * sysfs/boot variable into account as well as the rcu_expedite_gp()
- * nesting.  So looping on rcu_unexpedite_gp() until rcu_gp_is_expedited()
- * returns false is a -really- bad idea.
+ * sysfs/boot variable and rcu_scheduler_active into account as well
+ * as the rcu_expedite_gp() nesting.  So looping on rcu_unexpedite_gp()
+ * until rcu_gp_is_expedited() returns false is a -really- bad idea.
  */
 bool rcu_gp_is_expedited(void)
 {
-	return rcu_expedited || atomic_read(&rcu_expedited_nesting);
+	return rcu_expedited || atomic_read(&rcu_expedited_nesting) ||
+	       rcu_scheduler_active == RCU_SCHEDULER_INIT;
 }
 EXPORT_SYMBOL_GPL(rcu_gp_is_expedited);
 
@@ -257,7 +261,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map);
 
 int notrace debug_lockdep_rcu_enabled(void)
 {
-	return rcu_scheduler_active && debug_locks &&
+	return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
 	       current->lockdep_recursion == 0;
 }
 EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);
@@ -591,7 +595,7 @@ EXPORT_SYMBOL_GPL(call_rcu_tasks);
 void synchronize_rcu_tasks(void)
 {
 	/* Complain if the scheduler has not started.  */
-	RCU_LOCKDEP_WARN(!rcu_scheduler_active,
+	RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
 			 "synchronize_rcu_tasks called too soon");
 
 	/* Wait for the grace period. */
@@ -813,6 +817,23 @@ static void rcu_spawn_tasks_kthread(void)
 
 #endif /* #ifdef CONFIG_TASKS_RCU */
 
+/*
+ * Test each non-SRCU synchronous grace-period wait API.  This is
+ * useful just after a change in mode for these primitives, and
+ * during early boot.
+ */
+void rcu_test_sync_prims(void)
+{
+	if (!IS_ENABLED(CONFIG_PROVE_RCU))
+		return;
+	synchronize_rcu();
+	synchronize_rcu_bh();
+	synchronize_sched();
+	synchronize_rcu_expedited();
+	synchronize_rcu_bh_expedited();
+	synchronize_sched_expedited();
+}
+
 #ifdef CONFIG_PROVE_RCU
 
 /*
@@ -865,6 +886,7 @@ void rcu_early_boot_tests(void)
 		early_boot_test_call_rcu_bh();
 	if (rcu_self_test_sched)
 		early_boot_test_call_rcu_sched();
+	rcu_test_sync_prims();
 }
 
 static int rcu_verify_early_boot_tests(void)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-13  2:38 [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods Paul E. McKenney
@ 2017-01-13 11:25 ` Borislav Petkov
  2017-01-14  8:00   ` Paul E. McKenney
  2017-01-13 17:02 ` Borislav Petkov
  1 sibling, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2017-01-13 11:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, lv.zheng, stan.kain, waffolz, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, mingo, torvalds, rafael

On Thu, Jan 12, 2017 at 06:38:07PM -0800, Paul E. McKenney wrote:
> The current preemptible RCU implementation goes through three phases
> during bootup.  In the first phase, there is only one CPU that is running
> with preemption disabled, so that a no-op is a synchronous grace period.
> In the second mid-boot phase, the scheduler is running, but RCU has
> not yet gotten its kthreads spawned (and, for expedited grace periods,
> workqueues are not yet running.  During this time, any attempt to do
> a synchronous grace period will hang the system (or complain bitterly,
> depending).  In the third and final phase, RCU is fully operational and
> everything works normally.
> 
> This has been OK for some time, but there has recently been some
> synchronous grace periods showing up during the second mid-boot phase.

You probably should add the callchain from the thread as an example and
for future reference:

early_amd_iommu_init()
|-> acpi_put_table(ivrs_base);
|-> acpi_tb_put_table(table_desc);
|-> acpi_tb_invalidate_table(table_desc);
|-> acpi_tb_release_table(...)
|-> acpi_os_unmap_memory
|-> acpi_os_unmap_iomem
|-> acpi_os_map_cleanup
|-> synchronize_rcu_expedited   <-- the kernel/rcu/tree_exp.h version with CONFIG_PREEMPT_RCU=y

> This commit therefore reworks RCU to permit synchronous grace periods

Just say "Rework RCU to ..."

"This commit" and "This patch" and the such are not needed in commit
messages.

> to proceed during this mid-boot phase.
> 
> This commit accomplishes this by setting a flag from the existing
> rcu_scheduler_starting() function which causes all synchronous grace
> periods to take the expedited path.  The expedited path now checks this
> flag, using the requesting task to drive the expedited grace period
> forward during the mid-boot phase.  Finally, this flag is updated by a
> core_initcall() function named rcu_exp_runtime_mode(), which causes the
> runtime codepaths to be used.
> 
> Note that this arrangement assumes that tasks are not sent POSIX signals
> (or anything similar) from the time that the first task is spawned
> through core_initcall() time.
> 
> Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
> Reported-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 321f9ed552a9..01f71e1d2e94 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -444,6 +444,10 @@ bool __rcu_is_watching(void);
>  #error "Unknown RCU implementation specified to kernel configuration"
>  #endif
>  
> +#define RCU_SCHEDULER_INACTIVE	0
> +#define RCU_SCHEDULER_INIT	1
> +#define RCU_SCHEDULER_RUNNING	2
> +
>  /*
>   * init_rcu_head_on_stack()/destroy_rcu_head_on_stack() are needed for dynamic
>   * initialization and destruction of rcu_head on the stack. rcu_head structures
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 80adef7d4c3d..0d6ff3e471be 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -136,6 +136,7 @@ int rcu_jiffies_till_stall_check(void);
>  #define TPS(x)  tracepoint_string(x)
>  
>  void rcu_early_boot_tests(void);
> +void rcu_test_sync_prims(void);
>  
>  /*
>   * This function really isn't for public consumption, but RCU is special in
> diff --git a/kernel/rcu/tiny_plugin.h b/kernel/rcu/tiny_plugin.h
> index 196f0302e2f4..e3953bdee383 100644
> --- a/kernel/rcu/tiny_plugin.h
> +++ b/kernel/rcu/tiny_plugin.h
> @@ -65,7 +65,7 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active);
>  void __init rcu_scheduler_starting(void)
>  {
>  	WARN_ON(nr_context_switches() > 0);
> -	rcu_scheduler_active = 1;
> +	rcu_scheduler_active = RCU_SCHEDULER_RUNNING;

This tiny RCU version is setting _RUNNING while the
kernel/rcu/tree.c-one is setting it to _INIT. The tiny bypasses the
_INIT step now?

I'm guessing because you've added a third state - the _RUNNING and
tiny doesn't need the intermediary _INIT, it is being set straight to
_RUNNING...

>  #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 96c52e43f7ca..7bcce4607863 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -127,13 +127,16 @@ int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
>  int sysctl_panic_on_rcu_stall __read_mostly;
>  
>  /*
> - * The rcu_scheduler_active variable transitions from zero to one just
> - * before the first task is spawned.  So when this variable is zero, RCU
> + * The rcu_scheduler_active variable transitions from
> + * RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT just before the first
> + * task is spawned.  So when this variable is RCU_SCHEDULER_INACTIVE, RCU
>   * can assume that there is but one task, allowing RCU to (for example)
>   * optimize synchronize_rcu() to a simple barrier().  When this variable
>   * is one, RCU must actually do all the hard work required to detect real

... is RCU_SCHEDULER_INIT, RCU must ...

>   * grace periods.  This variable is also used to suppress boot-time false
> - * positives from lockdep-RCU error checking.
> + * positives from lockdep-RCU error checking.  Finally, this variable

					       .  Finally, it...

By now we know it is this variable :-)

> + * transitions from RCU_SCHEDULER_INIT to RCU_SCHEDULER_RUNNING after RCU
> + * is fully initialized, including all of its tasks having been spawned.

s/tasks/kthreads/ ?

Should make it clearer...

...

> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index d3053e99fdb6..e59e1849b89a 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -532,18 +532,28 @@ struct rcu_exp_work {
>  };
>  
>  /*
> + * Common code to drive an expedited grace period forward, used by
> + * workqueues and mid-boot-time tasks.
> + */
> +static void rcu_exp_sel_wait_wake(struct rcu_state *rsp,
> +				  smp_call_func_t func, unsigned long s)
> +{
> +	/* Initialize the rcu_node tree in preparation for the wait. */
> +	sync_rcu_exp_select_cpus(rsp, func);
> +
> +	/* Wait and clean up, including waking everyone. */
> +	rcu_exp_wait_wake(rsp, s);
> +}
> +
> +/*
>   * Work-queue handler to drive an expedited grace period forward.
>   */
>  static void wait_rcu_exp_gp(struct work_struct *wp)
>  {
>  	struct rcu_exp_work *rewp;
>  
> -	/* Initialize the rcu_node tree in preparation for the wait. */
>  	rewp = container_of(wp, struct rcu_exp_work, rew_work);
> -	sync_rcu_exp_select_cpus(rewp->rew_rsp, rewp->rew_func);
> -
> -	/* Wait and clean up, including waking everyone. */
> -	rcu_exp_wait_wake(rewp->rew_rsp, rewp->rew_s);
> +	rcu_exp_sel_wait_wake(rewp->rew_rsp, rewp->rew_func, rewp->rew_s);
>  }
>  
>  /*
> @@ -569,12 +579,18 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
>  	if (exp_funnel_lock(rsp, s))
>  		return;  /* Someone else did our work for us. */
>  
> -	/* Marshall arguments and schedule the expedited grace period. */
> -	rew.rew_func = func;
> -	rew.rew_rsp = rsp;
> -	rew.rew_s = s;
> -	INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
> -	schedule_work(&rew.rew_work);
> +	/* Ensure that load happens before action based on it. */
> +	if (unlikely(rcu_scheduler_active == RCU_SCHEDULER_INIT)) {

Ok, so this variable is, AFAICT, on some hot paths. And we will query
it each time we synchronize_sched() when we decide to do the expedited
grace periods. There's that rcu_gp_is_expedited() which decides but I
don't have an idea on which paths that happens...

In any case, should we make this var a jump label or so which gets
patched properly or are the expedited paths comparatively seldom?

> +		/* Direct call during scheduler init and early_initcalls(). */
> +		rcu_exp_sel_wait_wake(rsp, func, s);
> +	} else {
> +		/* Marshall arguments & schedule the expedited grace period. */
> +		rew.rew_func = func;
> +		rew.rew_rsp = rsp;
> +		rew.rew_s = s;
> +		INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
> +		schedule_work(&rew.rew_work);
> +	}
>  
>  	/* Wait for expedited grace period to complete. */
>  	rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());

Rest looks ok to me but WTH do I know about RCU internals...

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-13  2:38 [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods Paul E. McKenney
  2017-01-13 11:25 ` Borislav Petkov
@ 2017-01-13 17:02 ` Borislav Petkov
  1 sibling, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2017-01-13 17:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, lv.zheng, stan.kain, waffolz, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, mingo, torvalds, rafael

On Thu, Jan 12, 2017 at 06:38:07PM -0800, Paul E. McKenney wrote:
> The current preemptible RCU implementation goes through three phases
> during bootup.  In the first phase, there is only one CPU that is running
> with preemption disabled, so that a no-op is a synchronous grace period.
> In the second mid-boot phase, the scheduler is running, but RCU has
> not yet gotten its kthreads spawned (and, for expedited grace periods,
> workqueues are not yet running.  During this time, any attempt to do
> a synchronous grace period will hang the system (or complain bitterly,
> depending).  In the third and final phase, RCU is fully operational and
> everything works normally.
> 
> This has been OK for some time, but there has recently been some
> synchronous grace periods showing up during the second mid-boot phase.
> This commit therefore reworks RCU to permit synchronous grace periods
> to proceed during this mid-boot phase.
> 
> This commit accomplishes this by setting a flag from the existing
> rcu_scheduler_starting() function which causes all synchronous grace
> periods to take the expedited path.  The expedited path now checks this
> flag, using the requesting task to drive the expedited grace period
> forward during the mid-boot phase.  Finally, this flag is updated by a
> core_initcall() function named rcu_exp_runtime_mode(), which causes the
> runtime codepaths to be used.
> 
> Note that this arrangement assumes that tasks are not sent POSIX signals
> (or anything similar) from the time that the first task is spawned
> through core_initcall() time.
> 
> Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
> Reported-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Tested on a bunch of boxes I have access to, looks good.

Tested-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-13 11:25 ` Borislav Petkov
@ 2017-01-14  8:00   ` Paul E. McKenney
  2017-01-14 10:35     ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2017-01-14  8:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, lv.zheng, stan.kain, waffolz, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, mingo, torvalds, rafael

On Fri, Jan 13, 2017 at 12:25:19PM +0100, Borislav Petkov wrote:
> On Thu, Jan 12, 2017 at 06:38:07PM -0800, Paul E. McKenney wrote:
> > The current preemptible RCU implementation goes through three phases
> > during bootup.  In the first phase, there is only one CPU that is running
> > with preemption disabled, so that a no-op is a synchronous grace period.
> > In the second mid-boot phase, the scheduler is running, but RCU has
> > not yet gotten its kthreads spawned (and, for expedited grace periods,
> > workqueues are not yet running.  During this time, any attempt to do
> > a synchronous grace period will hang the system (or complain bitterly,
> > depending).  In the third and final phase, RCU is fully operational and
> > everything works normally.
> > 
> > This has been OK for some time, but there has recently been some
> > synchronous grace periods showing up during the second mid-boot phase.
> 
> You probably should add the callchain from the thread as an example and
> for future reference:
> 
> early_amd_iommu_init()
> |-> acpi_put_table(ivrs_base);
> |-> acpi_tb_put_table(table_desc);
> |-> acpi_tb_invalidate_table(table_desc);
> |-> acpi_tb_release_table(...)
> |-> acpi_os_unmap_memory
> |-> acpi_os_unmap_iomem
> |-> acpi_os_map_cleanup
> |-> synchronize_rcu_expedited   <-- the kernel/rcu/tree_exp.h version with CONFIG_PREEMPT_RCU=y
> 
> > This commit therefore reworks RCU to permit synchronous grace periods

It now looks like this:

------------------------------------------------------------------------

Note that the code was buggy even before this commit, as it was subject
to failure on real-time systems that forced all expedited grace periods
to run as normal grace periods (for example, using the rcu_normal ksysfs
parameter).  The callchain from the failure case is as follows:

early_amd_iommu_init()
|-> acpi_put_table(ivrs_base);
|-> acpi_tb_put_table(table_desc);
|-> acpi_tb_invalidate_table(table_desc);
|-> acpi_tb_release_table(...)
|-> acpi_os_unmap_memory
|-> acpi_os_unmap_iomem
|-> acpi_os_map_cleanup
|-> synchronize_rcu_expedited

The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
which caused the code to try using workqueues before they were
initialized, which did not go well.

------------------------------------------------------------------------

Does that work?

> Just say "Rework RCU to ..."
> 
> "This commit" and "This patch" and the such are not needed in commit
> messages.

Fair point, but this wording appears in almost all of my patches.  ;-)

My rationale is that it provides a clear transition from describing the
problem to introducing the solution.

> > to proceed during this mid-boot phase.
> > 
> > This commit accomplishes this by setting a flag from the existing
> > rcu_scheduler_starting() function which causes all synchronous grace
> > periods to take the expedited path.  The expedited path now checks this
> > flag, using the requesting task to drive the expedited grace period
> > forward during the mid-boot phase.  Finally, this flag is updated by a
> > core_initcall() function named rcu_exp_runtime_mode(), which causes the
> > runtime codepaths to be used.
> > 
> > Note that this arrangement assumes that tasks are not sent POSIX signals
> > (or anything similar) from the time that the first task is spawned
> > through core_initcall() time.
> > 
> > Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
> > Reported-by: Borislav Petkov <bp@alien8.de>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 321f9ed552a9..01f71e1d2e94 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -444,6 +444,10 @@ bool __rcu_is_watching(void);
> >  #error "Unknown RCU implementation specified to kernel configuration"
> >  #endif
> >  
> > +#define RCU_SCHEDULER_INACTIVE	0
> > +#define RCU_SCHEDULER_INIT	1
> > +#define RCU_SCHEDULER_RUNNING	2
> > +
> >  /*
> >   * init_rcu_head_on_stack()/destroy_rcu_head_on_stack() are needed for dynamic
> >   * initialization and destruction of rcu_head on the stack. rcu_head structures
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index 80adef7d4c3d..0d6ff3e471be 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -136,6 +136,7 @@ int rcu_jiffies_till_stall_check(void);
> >  #define TPS(x)  tracepoint_string(x)
> >  
> >  void rcu_early_boot_tests(void);
> > +void rcu_test_sync_prims(void);
> >  
> >  /*
> >   * This function really isn't for public consumption, but RCU is special in
> > diff --git a/kernel/rcu/tiny_plugin.h b/kernel/rcu/tiny_plugin.h
> > index 196f0302e2f4..e3953bdee383 100644
> > --- a/kernel/rcu/tiny_plugin.h
> > +++ b/kernel/rcu/tiny_plugin.h
> > @@ -65,7 +65,7 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active);
> >  void __init rcu_scheduler_starting(void)
> >  {
> >  	WARN_ON(nr_context_switches() > 0);
> > -	rcu_scheduler_active = 1;
> > +	rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
> 
> This tiny RCU version is setting _RUNNING while the
> kernel/rcu/tree.c-one is setting it to _INIT. The tiny bypasses the
> _INIT step now?
> 
> I'm guessing because you've added a third state - the _RUNNING and
> tiny doesn't need the intermediary _INIT, it is being set straight to
> _RUNNING...

Exactly, but yes, worth a comment.

The header comment for rcu_scheduler_starting() is now as follows:

/*
 * During boot, we forgive RCU lockdep issues.  After this function is
 * invoked, we start taking RCU lockdep issues seriously.  Note that unlike
 * Tree RCU, Tiny RCU transitions directly from RCU_SCHEDULER_INACTIVE
 * to RCU_SCHEDULER_RUNNING, skipping the RCU_SCHEDULER_INIT stage.
 * The reason for this is that Tiny RCU does not need kthreads, so does
 * not have to care about the fact that the scheduler is half-initialized
 * at a certain phase of the boot process.
 */

> >  #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 96c52e43f7ca..7bcce4607863 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -127,13 +127,16 @@ int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
> >  int sysctl_panic_on_rcu_stall __read_mostly;
> >  
> >  /*
> > - * The rcu_scheduler_active variable transitions from zero to one just
> > - * before the first task is spawned.  So when this variable is zero, RCU
> > + * The rcu_scheduler_active variable transitions from
> > + * RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT just before the first
> > + * task is spawned.  So when this variable is RCU_SCHEDULER_INACTIVE, RCU
> >   * can assume that there is but one task, allowing RCU to (for example)
> >   * optimize synchronize_rcu() to a simple barrier().  When this variable
> >   * is one, RCU must actually do all the hard work required to detect real
> 
> ... is RCU_SCHEDULER_INIT, RCU must ...
> 
> >   * grace periods.  This variable is also used to suppress boot-time false
> > - * positives from lockdep-RCU error checking.
> > + * positives from lockdep-RCU error checking.  Finally, this variable
> 
> 					       .  Finally, it...
> 
> By now we know it is this variable :-)
> 
> > + * transitions from RCU_SCHEDULER_INIT to RCU_SCHEDULER_RUNNING after RCU
> > + * is fully initialized, including all of its tasks having been spawned.
> 
> s/tasks/kthreads/ ?
> 
> Should make it clearer...

Good catches, fixed!

> ...
> 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index d3053e99fdb6..e59e1849b89a 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -532,18 +532,28 @@ struct rcu_exp_work {
> >  };
> >  
> >  /*
> > + * Common code to drive an expedited grace period forward, used by
> > + * workqueues and mid-boot-time tasks.
> > + */
> > +static void rcu_exp_sel_wait_wake(struct rcu_state *rsp,
> > +				  smp_call_func_t func, unsigned long s)
> > +{
> > +	/* Initialize the rcu_node tree in preparation for the wait. */
> > +	sync_rcu_exp_select_cpus(rsp, func);
> > +
> > +	/* Wait and clean up, including waking everyone. */
> > +	rcu_exp_wait_wake(rsp, s);
> > +}
> > +
> > +/*
> >   * Work-queue handler to drive an expedited grace period forward.
> >   */
> >  static void wait_rcu_exp_gp(struct work_struct *wp)
> >  {
> >  	struct rcu_exp_work *rewp;
> >  
> > -	/* Initialize the rcu_node tree in preparation for the wait. */
> >  	rewp = container_of(wp, struct rcu_exp_work, rew_work);
> > -	sync_rcu_exp_select_cpus(rewp->rew_rsp, rewp->rew_func);
> > -
> > -	/* Wait and clean up, including waking everyone. */
> > -	rcu_exp_wait_wake(rewp->rew_rsp, rewp->rew_s);
> > +	rcu_exp_sel_wait_wake(rewp->rew_rsp, rewp->rew_func, rewp->rew_s);
> >  }
> >  
> >  /*
> > @@ -569,12 +579,18 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
> >  	if (exp_funnel_lock(rsp, s))
> >  		return;  /* Someone else did our work for us. */
> >  
> > -	/* Marshall arguments and schedule the expedited grace period. */
> > -	rew.rew_func = func;
> > -	rew.rew_rsp = rsp;
> > -	rew.rew_s = s;
> > -	INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
> > -	schedule_work(&rew.rew_work);
> > +	/* Ensure that load happens before action based on it. */
> > +	if (unlikely(rcu_scheduler_active == RCU_SCHEDULER_INIT)) {
> 
> Ok, so this variable is, AFAICT, on some hot paths. And we will query
> it each time we synchronize_sched() when we decide to do the expedited
> grace periods. There's that rcu_gp_is_expedited() which decides but I
> don't have an idea on which paths that happens...

It is marked __read_mostly for this reason, but I have always assumed that
the synchronous grace-period primitives were off the hotpath.

> In any case, should we make this var a jump label or so which gets
> patched properly or are the expedited paths comparatively seldom?

I believe that this would not buy very much, but if this variable starts
showing up on profiles, then perhaps a jump label would be appropriate.
As a separate patch, though!

> > +		/* Direct call during scheduler init and early_initcalls(). */
> > +		rcu_exp_sel_wait_wake(rsp, func, s);
> > +	} else {
> > +		/* Marshall arguments & schedule the expedited grace period. */
> > +		rew.rew_func = func;
> > +		rew.rew_rsp = rsp;
> > +		rew.rew_s = s;
> > +		INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
> > +		schedule_work(&rew.rew_work);
> > +	}
> >  
> >  	/* Wait for expedited grace period to complete. */
> >  	rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
> 
> Rest looks ok to me but WTH do I know about RCU internals...

Thank you for your review and comments!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-14  8:00   ` Paul E. McKenney
@ 2017-01-14 10:35     ` Borislav Petkov
  2017-01-14 12:27       ` Rafael J. Wysocki
  2017-01-15  5:24       ` Paul E. McKenney
  0 siblings, 2 replies; 12+ messages in thread
From: Borislav Petkov @ 2017-01-14 10:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, lv.zheng, stan.kain, waffolz, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, mingo, torvalds, rafael

On Sat, Jan 14, 2017 at 12:00:22AM -0800, Paul E. McKenney wrote:
> It now looks like this:
> 
> ------------------------------------------------------------------------
> 
> Note that the code was buggy even before this commit, as it was subject
> to failure on real-time systems that forced all expedited grace periods
> to run as normal grace periods (for example, using the rcu_normal ksysfs
> parameter).  The callchain from the failure case is as follows:
> 
> early_amd_iommu_init()
> |-> acpi_put_table(ivrs_base);
> |-> acpi_tb_put_table(table_desc);
> |-> acpi_tb_invalidate_table(table_desc);
> |-> acpi_tb_release_table(...)
> |-> acpi_os_unmap_memory
> |-> acpi_os_unmap_iomem
> |-> acpi_os_map_cleanup
> |-> synchronize_rcu_expedited
> 
> The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
> which caused the code to try using workqueues before they were
> initialized, which did not go well.
> 
> ------------------------------------------------------------------------
> 
> Does that work?

Yap, thanks.

> Fair point, but this wording appears in almost all of my patches.  ;-)

:-)

> My rationale is that it provides a clear transition from describing the
> problem to introducing the solution.

Fair enough.

> Exactly, but yes, worth a comment.
> 
> The header comment for rcu_scheduler_starting() is now as follows:
> 
> /*
>  * During boot, we forgive RCU lockdep issues.  After this function is
>  * invoked, we start taking RCU lockdep issues seriously.  Note that unlike
>  * Tree RCU, Tiny RCU transitions directly from RCU_SCHEDULER_INACTIVE
>  * to RCU_SCHEDULER_RUNNING, skipping the RCU_SCHEDULER_INIT stage.
>  * The reason for this is that Tiny RCU does not need kthreads, so does
>  * not have to care about the fact that the scheduler is half-initialized
>  * at a certain phase of the boot process.
>  */

Good.

> I believe that this would not buy very much, but if this variable starts
> showing up on profiles, then perhaps a jump label would be appropriate.
> As a separate patch, though!

Yeah, let's keep that opportunity in the bag, just in case.

> Thank you for your review and comments!

Thanks for the fix.

Btw, I'll build one more test kernel for people with your final version here:

https://lkml.kernel.org/r/1484383554-18095-2-git-send-email-paulmck@linux.vnet.ibm.com

backported to 4.9.

I say 4.9 because the reports started then, probably because of

  8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")

Which means, you probably should tag your fix CC:stable and add

Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")

to it too.

Hmmm.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-14 10:35     ` Borislav Petkov
@ 2017-01-14 12:27       ` Rafael J. Wysocki
  2017-01-14 12:50         ` Borislav Petkov
  2017-01-15  5:24       ` Paul E. McKenney
  1 sibling, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2017-01-14 12:27 UTC (permalink / raw)
  To: Borislav Petkov, Paul E. McKenney
  Cc: Linux Kernel Mailing List, Lv, stan.kain, waffolz, Josh Triplett,
	Steven Rostedt, mathieu.desnoyers, jiangshanlai, Ingo Molnar,
	Linus Torvalds, Rafael J. Wysocki

On Sat, Jan 14, 2017 at 11:35 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Sat, Jan 14, 2017 at 12:00:22AM -0800, Paul E. McKenney wrote:
>> It now looks like this:
>>
>> ------------------------------------------------------------------------
>>
>> Note that the code was buggy even before this commit, as it was subject
>> to failure on real-time systems that forced all expedited grace periods
>> to run as normal grace periods (for example, using the rcu_normal ksysfs
>> parameter).  The callchain from the failure case is as follows:
>>
>> early_amd_iommu_init()
>> |-> acpi_put_table(ivrs_base);
>> |-> acpi_tb_put_table(table_desc);
>> |-> acpi_tb_invalidate_table(table_desc);
>> |-> acpi_tb_release_table(...)
>> |-> acpi_os_unmap_memory
>> |-> acpi_os_unmap_iomem
>> |-> acpi_os_map_cleanup
>> |-> synchronize_rcu_expedited
>>
>> The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
>> which caused the code to try using workqueues before they were
>> initialized, which did not go well.
>>
>> ------------------------------------------------------------------------
>>
>> Does that work?
>
> Yap, thanks.
>
>> Fair point, but this wording appears in almost all of my patches.  ;-)
>
> :-)
>
>> My rationale is that it provides a clear transition from describing the
>> problem to introducing the solution.
>
> Fair enough.
>
>> Exactly, but yes, worth a comment.
>>
>> The header comment for rcu_scheduler_starting() is now as follows:
>>
>> /*
>>  * During boot, we forgive RCU lockdep issues.  After this function is
>>  * invoked, we start taking RCU lockdep issues seriously.  Note that unlike
>>  * Tree RCU, Tiny RCU transitions directly from RCU_SCHEDULER_INACTIVE
>>  * to RCU_SCHEDULER_RUNNING, skipping the RCU_SCHEDULER_INIT stage.
>>  * The reason for this is that Tiny RCU does not need kthreads, so does
>>  * not have to care about the fact that the scheduler is half-initialized
>>  * at a certain phase of the boot process.
>>  */
>
> Good.
>
>> I believe that this would not buy very much, but if this variable starts
>> showing up on profiles, then perhaps a jump label would be appropriate.
>> As a separate patch, though!
>
> Yeah, let's keep that opportunity in the bag, just in case.
>
>> Thank you for your review and comments!
>
> Thanks for the fix.
>
> Btw, I'll build one more test kernel for people with your final version here:
>
> https://lkml.kernel.org/r/1484383554-18095-2-git-send-email-paulmck@linux.vnet.ibm.com
>
> backported to 4.9.
>
> I say 4.9 because the reports started then, probably because of
>
>   8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
>
> Which means, you probably should tag your fix CC:stable and add
>
> Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
>
> to it too.

OK, so this fixes the problem with synchronize_rcu_expedited() in
acpi_os_map_cleanup(), right?

I wonder if the ACPI-specific fix is still needed, then?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-14 12:27       ` Rafael J. Wysocki
@ 2017-01-14 12:50         ` Borislav Petkov
  2017-01-16  1:57           ` Zheng, Lv
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2017-01-14 12:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Paul E. McKenney, Linux Kernel Mailing List, Lv, stan.kain,
	waffolz, Josh Triplett, Steven Rostedt, mathieu.desnoyers,
	jiangshanlai, Ingo Molnar, Linus Torvalds

On Sat, Jan 14, 2017 at 01:27:40PM +0100, Rafael J. Wysocki wrote:
> OK, so this fixes the problem with synchronize_rcu_expedited() in
> acpi_os_map_cleanup(), right?

Yeah.

> I wonder if the ACPI-specific fix is still needed, then?

It is not strictly necessary. If you still think it would be better to
have it regardless, you could pick it up. I.e., making ACPI more robust,
yadda yadda.

I dunno, though, perhaps it is only complicating the code unnecessarily
and then can be safely ignored with a mental note for future freezes.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-14 10:35     ` Borislav Petkov
  2017-01-14 12:27       ` Rafael J. Wysocki
@ 2017-01-15  5:24       ` Paul E. McKenney
  2017-01-15 10:09         ` Borislav Petkov
  1 sibling, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2017-01-15  5:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, lv.zheng, stan.kain, waffolz, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, mingo, torvalds, rafael

On Sat, Jan 14, 2017 at 11:35:25AM +0100, Borislav Petkov wrote:
> On Sat, Jan 14, 2017 at 12:00:22AM -0800, Paul E. McKenney wrote:
> > It now looks like this:
> > 
> > ------------------------------------------------------------------------
> > 
> > Note that the code was buggy even before this commit, as it was subject
> > to failure on real-time systems that forced all expedited grace periods
> > to run as normal grace periods (for example, using the rcu_normal ksysfs
> > parameter).  The callchain from the failure case is as follows:
> > 
> > early_amd_iommu_init()
> > |-> acpi_put_table(ivrs_base);
> > |-> acpi_tb_put_table(table_desc);
> > |-> acpi_tb_invalidate_table(table_desc);
> > |-> acpi_tb_release_table(...)
> > |-> acpi_os_unmap_memory
> > |-> acpi_os_unmap_iomem
> > |-> acpi_os_map_cleanup
> > |-> synchronize_rcu_expedited
> > 
> > The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
> > which caused the code to try using workqueues before they were
> > initialized, which did not go well.
> > 
> > ------------------------------------------------------------------------
> > 
> > Does that work?
> 
> Yap, thanks.
> 
> > Fair point, but this wording appears in almost all of my patches.  ;-)
> 
> :-)
> 
> > My rationale is that it provides a clear transition from describing the
> > problem to introducing the solution.
> 
> Fair enough.
> 
> > Exactly, but yes, worth a comment.
> > 
> > The header comment for rcu_scheduler_starting() is now as follows:
> > 
> > /*
> >  * During boot, we forgive RCU lockdep issues.  After this function is
> >  * invoked, we start taking RCU lockdep issues seriously.  Note that unlike
> >  * Tree RCU, Tiny RCU transitions directly from RCU_SCHEDULER_INACTIVE
> >  * to RCU_SCHEDULER_RUNNING, skipping the RCU_SCHEDULER_INIT stage.
> >  * The reason for this is that Tiny RCU does not need kthreads, so does
> >  * not have to care about the fact that the scheduler is half-initialized
> >  * at a certain phase of the boot process.
> >  */
> 
> Good.
> 
> > I believe that this would not buy very much, but if this variable starts
> > showing up on profiles, then perhaps a jump label would be appropriate.
> > As a separate patch, though!
> 
> Yeah, let's keep that opportunity in the bag, just in case.
> 
> > Thank you for your review and comments!
> 
> Thanks for the fix.
> 
> Btw, I'll build one more test kernel for people with your final version here:
> 
> https://lkml.kernel.org/r/1484383554-18095-2-git-send-email-paulmck@linux.vnet.ibm.com
> 
> backported to 4.9.
> 
> I say 4.9 because the reports started then, probably because of
> 
>   8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
> 
> Which means, you probably should tag your fix CC:stable and add
> 
> Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
> 
> to it too.

Like this?

							Thanx, Paul

------------------------------------------------------------------------

commit 52d7e48b86fc108e45a656d8e53e4237993c481d
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Jan 10 02:28:26 2017 -0800

    rcu: Narrow early boot window of illegal synchronous grace periods
    
    The current preemptible RCU implementation goes through three phases
    during bootup.  In the first phase, there is only one CPU that is running
    with preemption disabled, so that a no-op is a synchronous grace period.
    In the second mid-boot phase, the scheduler is running, but RCU has
    not yet gotten its kthreads spawned (and, for expedited grace periods,
    workqueues are not yet running.  During this time, any attempt to do
    a synchronous grace period will hang the system (or complain bitterly,
    depending).  In the third and final phase, RCU is fully operational and
    everything works normally.
    
    This has been OK for some time, but there has recently been some
    synchronous grace periods showing up during the second mid-boot phase.
    This code worked "by accident" for awhile, but started failing as soon
    as expedited RCU grace periods switched over to workqueues in commit
    8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue").
    Note that the code was buggy even before this commit, as it was subject
    to failure on real-time systems that forced all expedited grace periods
    to run as normal grace periods (for example, using the rcu_normal ksysfs
    parameter).  The callchain from the failure case is as follows:
    
    early_amd_iommu_init()
    |-> acpi_put_table(ivrs_base);
    |-> acpi_tb_put_table(table_desc);
    |-> acpi_tb_invalidate_table(table_desc);
    |-> acpi_tb_release_table(...)
    |-> acpi_os_unmap_memory
    |-> acpi_os_unmap_iomem
    |-> acpi_os_map_cleanup
    |-> synchronize_rcu_expedited
    
    The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
    which caused the code to try using workqueues before they were
    initialized, which did not go well.
    
    This commit therefore reworks RCU to permit synchronous grace periods
    to proceed during this mid-boot phase.  This commit is therefore a
    fix to a regression introduced in v4.9, and is therefore being put
    forward post-merge-window in v4.10.
    
    This commit sets a flag from the existing rcu_scheduler_starting()
    function which causes all synchronous grace periods to take the expedited
    path.  The expedited path now checks this flag, using the requesting task
    to drive the expedited grace period forward during the mid-boot phase.
    Finally, this flag is updated by a core_initcall() function named
    rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.
    
    Note that this arrangement assumes that tasks are not sent POSIX signals
    (or anything similar) from the time that the first task is spawned
    through core_initcall() time.
    
    Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
    Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
    Reported-by: Borislav Petkov <bp@alien8.de>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Tested-by: Stan Kain <stan.kain@gmail.com>
    Tested-by: Ivan <waffolz@hotmail.com>
    Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com>
    Tested-by: Bruno Pesavento <bpesavento@infinito.it>
    Tested-by: Borislav Petkov <bp@suse.de>
    Tested-by: Frederic Bezies <fredbezies@gmail.com>
    Cc: <stable@vger.kernel.org> # 4.9.0-

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 321f9ed552a9..01f71e1d2e94 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -444,6 +444,10 @@ bool __rcu_is_watching(void);
 #error "Unknown RCU implementation specified to kernel configuration"
 #endif
 
+#define RCU_SCHEDULER_INACTIVE	0
+#define RCU_SCHEDULER_INIT	1
+#define RCU_SCHEDULER_RUNNING	2
+
 /*
  * init_rcu_head_on_stack()/destroy_rcu_head_on_stack() are needed for dynamic
  * initialization and destruction of rcu_head on the stack. rcu_head structures
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 80adef7d4c3d..0d6ff3e471be 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -136,6 +136,7 @@ int rcu_jiffies_till_stall_check(void);
 #define TPS(x)  tracepoint_string(x)
 
 void rcu_early_boot_tests(void);
+void rcu_test_sync_prims(void);
 
 /*
  * This function really isn't for public consumption, but RCU is special in
diff --git a/kernel/rcu/tiny_plugin.h b/kernel/rcu/tiny_plugin.h
index 196f0302e2f4..c64b827ecbca 100644
--- a/kernel/rcu/tiny_plugin.h
+++ b/kernel/rcu/tiny_plugin.h
@@ -60,12 +60,17 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active);
 
 /*
  * During boot, we forgive RCU lockdep issues.  After this function is
- * invoked, we start taking RCU lockdep issues seriously.
+ * invoked, we start taking RCU lockdep issues seriously.  Note that unlike
+ * Tree RCU, Tiny RCU transitions directly from RCU_SCHEDULER_INACTIVE
+ * to RCU_SCHEDULER_RUNNING, skipping the RCU_SCHEDULER_INIT stage.
+ * The reason for this is that Tiny RCU does not need kthreads, so does
+ * not have to care about the fact that the scheduler is half-initialized
+ * at a certain phase of the boot process.
  */
 void __init rcu_scheduler_starting(void)
 {
 	WARN_ON(nr_context_switches() > 0);
-	rcu_scheduler_active = 1;
+	rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
 }
 
 #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 96c52e43f7ca..cb4e2056ccf3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -127,13 +127,16 @@ int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
 int sysctl_panic_on_rcu_stall __read_mostly;
 
 /*
- * The rcu_scheduler_active variable transitions from zero to one just
- * before the first task is spawned.  So when this variable is zero, RCU
- * can assume that there is but one task, allowing RCU to (for example)
+ * The rcu_scheduler_active variable is initialized to the value
+ * RCU_SCHEDULER_INACTIVE and transitions RCU_SCHEDULER_INIT just before the
+ * first task is spawned.  So when this variable is RCU_SCHEDULER_INACTIVE,
+ * RCU can assume that there is but one task, allowing RCU to (for example)
  * optimize synchronize_rcu() to a simple barrier().  When this variable
- * is one, RCU must actually do all the hard work required to detect real
- * grace periods.  This variable is also used to suppress boot-time false
- * positives from lockdep-RCU error checking.
+ * is RCU_SCHEDULER_INIT, RCU must actually do all the hard work required
+ * to detect real grace periods.  This variable is also used to suppress
+ * boot-time false positives from lockdep-RCU error checking.  Finally, it
+ * transitions from RCU_SCHEDULER_INIT to RCU_SCHEDULER_RUNNING after RCU
+ * is fully initialized, including all of its kthreads having been spawned.
  */
 int rcu_scheduler_active __read_mostly;
 EXPORT_SYMBOL_GPL(rcu_scheduler_active);
@@ -3980,18 +3983,22 @@ static int __init rcu_spawn_gp_kthread(void)
 early_initcall(rcu_spawn_gp_kthread);
 
 /*
- * This function is invoked towards the end of the scheduler's initialization
- * process.  Before this is called, the idle task might contain
- * RCU read-side critical sections (during which time, this idle
- * task is booting the system).  After this function is called, the
- * idle tasks are prohibited from containing RCU read-side critical
- * sections.  This function also enables RCU lockdep checking.
+ * This function is invoked towards the end of the scheduler's
+ * initialization process.  Before this is called, the idle task might
+ * contain synchronous grace-period primitives (during which time, this idle
+ * task is booting the system, and such primitives are no-ops).  After this
+ * function is called, any synchronous grace-period primitives are run as
+ * expedited, with the requesting task driving the grace period forward.
+ * A later core_initcall() rcu_exp_runtime_mode() will switch to full
+ * runtime RCU functionality.
  */
 void rcu_scheduler_starting(void)
 {
 	WARN_ON(num_online_cpus() != 1);
 	WARN_ON(nr_context_switches() > 0);
-	rcu_scheduler_active = 1;
+	rcu_test_sync_prims();
+	rcu_scheduler_active = RCU_SCHEDULER_INIT;
+	rcu_test_sync_prims();
 }
 
 /*
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index d3053e99fdb6..e59e1849b89a 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -532,18 +532,28 @@ struct rcu_exp_work {
 };
 
 /*
+ * Common code to drive an expedited grace period forward, used by
+ * workqueues and mid-boot-time tasks.
+ */
+static void rcu_exp_sel_wait_wake(struct rcu_state *rsp,
+				  smp_call_func_t func, unsigned long s)
+{
+	/* Initialize the rcu_node tree in preparation for the wait. */
+	sync_rcu_exp_select_cpus(rsp, func);
+
+	/* Wait and clean up, including waking everyone. */
+	rcu_exp_wait_wake(rsp, s);
+}
+
+/*
  * Work-queue handler to drive an expedited grace period forward.
  */
 static void wait_rcu_exp_gp(struct work_struct *wp)
 {
 	struct rcu_exp_work *rewp;
 
-	/* Initialize the rcu_node tree in preparation for the wait. */
 	rewp = container_of(wp, struct rcu_exp_work, rew_work);
-	sync_rcu_exp_select_cpus(rewp->rew_rsp, rewp->rew_func);
-
-	/* Wait and clean up, including waking everyone. */
-	rcu_exp_wait_wake(rewp->rew_rsp, rewp->rew_s);
+	rcu_exp_sel_wait_wake(rewp->rew_rsp, rewp->rew_func, rewp->rew_s);
 }
 
 /*
@@ -569,12 +579,18 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
 	if (exp_funnel_lock(rsp, s))
 		return;  /* Someone else did our work for us. */
 
-	/* Marshall arguments and schedule the expedited grace period. */
-	rew.rew_func = func;
-	rew.rew_rsp = rsp;
-	rew.rew_s = s;
-	INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
-	schedule_work(&rew.rew_work);
+	/* Ensure that load happens before action based on it. */
+	if (unlikely(rcu_scheduler_active == RCU_SCHEDULER_INIT)) {
+		/* Direct call during scheduler init and early_initcalls(). */
+		rcu_exp_sel_wait_wake(rsp, func, s);
+	} else {
+		/* Marshall arguments & schedule the expedited grace period. */
+		rew.rew_func = func;
+		rew.rew_rsp = rsp;
+		rew.rew_s = s;
+		INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
+		schedule_work(&rew.rew_work);
+	}
 
 	/* Wait for expedited grace period to complete. */
 	rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
@@ -676,6 +692,8 @@ void synchronize_rcu_expedited(void)
 {
 	struct rcu_state *rsp = rcu_state_p;
 
+	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
+		return;
 	_synchronize_rcu_expedited(rsp, sync_rcu_exp_handler);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
@@ -693,3 +711,15 @@ void synchronize_rcu_expedited(void)
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+
+/*
+ * Switch to run-time mode once Tree RCU has fully initialized.
+ */
+static int __init rcu_exp_runtime_mode(void)
+{
+	rcu_test_sync_prims();
+	rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
+	rcu_test_sync_prims();
+	return 0;
+}
+core_initcall(rcu_exp_runtime_mode);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 85c5a883c6e3..56583e764ebf 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -670,7 +670,7 @@ void synchronize_rcu(void)
 			 lock_is_held(&rcu_lock_map) ||
 			 lock_is_held(&rcu_sched_lock_map),
 			 "Illegal synchronize_rcu() in RCU read-side critical section");
-	if (!rcu_scheduler_active)
+	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
 		return;
 	if (rcu_gp_is_expedited())
 		synchronize_rcu_expedited();
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index f19271dce0a9..4f6db7e6a117 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -121,11 +121,14 @@ EXPORT_SYMBOL(rcu_read_lock_sched_held);
  * Should expedited grace-period primitives always fall back to their
  * non-expedited counterparts?  Intended for use within RCU.  Note
  * that if the user specifies both rcu_expedited and rcu_normal, then
- * rcu_normal wins.
+ * rcu_normal wins.  (Except during the time period during boot from
+ * when the first task is spawned until the rcu_exp_runtime_mode()
+ * core_initcall() is invoked, at which point everything is expedited.)
  */
 bool rcu_gp_is_normal(void)
 {
-	return READ_ONCE(rcu_normal);
+	return READ_ONCE(rcu_normal) &&
+	       rcu_scheduler_active != RCU_SCHEDULER_INIT;
 }
 EXPORT_SYMBOL_GPL(rcu_gp_is_normal);
 
@@ -135,13 +138,14 @@ static atomic_t rcu_expedited_nesting =
 /*
  * Should normal grace-period primitives be expedited?  Intended for
  * use within RCU.  Note that this function takes the rcu_expedited
- * sysfs/boot variable into account as well as the rcu_expedite_gp()
- * nesting.  So looping on rcu_unexpedite_gp() until rcu_gp_is_expedited()
- * returns false is a -really- bad idea.
+ * sysfs/boot variable and rcu_scheduler_active into account as well
+ * as the rcu_expedite_gp() nesting.  So looping on rcu_unexpedite_gp()
+ * until rcu_gp_is_expedited() returns false is a -really- bad idea.
  */
 bool rcu_gp_is_expedited(void)
 {
-	return rcu_expedited || atomic_read(&rcu_expedited_nesting);
+	return rcu_expedited || atomic_read(&rcu_expedited_nesting) ||
+	       rcu_scheduler_active == RCU_SCHEDULER_INIT;
 }
 EXPORT_SYMBOL_GPL(rcu_gp_is_expedited);
 
@@ -257,7 +261,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map);
 
 int notrace debug_lockdep_rcu_enabled(void)
 {
-	return rcu_scheduler_active && debug_locks &&
+	return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
 	       current->lockdep_recursion == 0;
 }
 EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);
@@ -591,7 +595,7 @@ EXPORT_SYMBOL_GPL(call_rcu_tasks);
 void synchronize_rcu_tasks(void)
 {
 	/* Complain if the scheduler has not started.  */
-	RCU_LOCKDEP_WARN(!rcu_scheduler_active,
+	RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
 			 "synchronize_rcu_tasks called too soon");
 
 	/* Wait for the grace period. */
@@ -813,6 +817,23 @@ static void rcu_spawn_tasks_kthread(void)
 
 #endif /* #ifdef CONFIG_TASKS_RCU */
 
+/*
+ * Test each non-SRCU synchronous grace-period wait API.  This is
+ * useful just after a change in mode for these primitives, and
+ * during early boot.
+ */
+void rcu_test_sync_prims(void)
+{
+	if (!IS_ENABLED(CONFIG_PROVE_RCU))
+		return;
+	synchronize_rcu();
+	synchronize_rcu_bh();
+	synchronize_sched();
+	synchronize_rcu_expedited();
+	synchronize_rcu_bh_expedited();
+	synchronize_sched_expedited();
+}
+
 #ifdef CONFIG_PROVE_RCU
 
 /*
@@ -865,6 +886,7 @@ void rcu_early_boot_tests(void)
 		early_boot_test_call_rcu_bh();
 	if (rcu_self_test_sched)
 		early_boot_test_call_rcu_sched();
+	rcu_test_sync_prims();
 }
 
 static int rcu_verify_early_boot_tests(void)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-15  5:24       ` Paul E. McKenney
@ 2017-01-15 10:09         ` Borislav Petkov
  2017-01-15 20:33           ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2017-01-15 10:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, lv.zheng, stan.kain, waffolz, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, mingo, torvalds, rafael

On Sat, Jan 14, 2017 at 09:24:43PM -0800, Paul E. McKenney wrote:
> > Which means, you probably should tag your fix CC:stable and add
> > 
> > Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
> > 
> > to it too.
> 
> Like this?

Very nice, ship it! :-)

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-15 10:09         ` Borislav Petkov
@ 2017-01-15 20:33           ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2017-01-15 20:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, lv.zheng, stan.kain, waffolz, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, mingo, torvalds, rafael

On Sun, Jan 15, 2017 at 11:09:31AM +0100, Borislav Petkov wrote:
> On Sat, Jan 14, 2017 at 09:24:43PM -0800, Paul E. McKenney wrote:
> > > Which means, you probably should tag your fix CC:stable and add
> > > 
> > > Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
> > > 
> > > to it too.
> > 
> > Like this?
> 
> Very nice, ship it! :-)

Seeing no objections...

Done, CCing you.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-14 12:50         ` Borislav Petkov
@ 2017-01-16  1:57           ` Zheng, Lv
  2017-01-16  4:56             ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Zheng, Lv @ 2017-01-16  1:57 UTC (permalink / raw)
  To: Borislav Petkov, Rafael J. Wysocki
  Cc: Paul E. McKenney, Linux Kernel Mailing List, stan.kain, waffolz,
	Josh Triplett, Steven Rostedt, mathieu.desnoyers, jiangshanlai,
	Ingo Molnar, Linus Torvalds

Hi,

> From: Borislav Petkov [mailto:bp@alien8.de]
> Subject: Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
> 
> On Sat, Jan 14, 2017 at 01:27:40PM +0100, Rafael J. Wysocki wrote:
> > OK, so this fixes the problem with synchronize_rcu_expedited() in
> > acpi_os_map_cleanup(), right?
> 
> Yeah.
> 
> > I wonder if the ACPI-specific fix is still needed, then?
> 
> It is not strictly necessary. If you still think it would be better to
> have it regardless, you could pick it up. I.e., making ACPI more robust,
> yadda yadda.
> 
> I dunno, though, perhaps it is only complicating the code unnecessarily
> and then can be safely ignored with a mental note for future freezes.

Glad to see it fixed inside of the API provider.

IMO, ACPI fix is unnecessary as ACPI is just a user of the RCU APIs.
And it's pointless to add special checks in the user side in order to use one of them.

Thanks and best regards
Lv

> 
> --
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
  2017-01-16  1:57           ` Zheng, Lv
@ 2017-01-16  4:56             ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2017-01-16  4:56 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Borislav Petkov, Rafael J. Wysocki, Linux Kernel Mailing List,
	stan.kain, waffolz, Josh Triplett, Steven Rostedt,
	mathieu.desnoyers, jiangshanlai, Ingo Molnar, Linus Torvalds

On Mon, Jan 16, 2017 at 01:57:25AM +0000, Zheng, Lv wrote:
> Hi,
> 
> > From: Borislav Petkov [mailto:bp@alien8.de]
> > Subject: Re: [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods
> > 
> > On Sat, Jan 14, 2017 at 01:27:40PM +0100, Rafael J. Wysocki wrote:
> > > OK, so this fixes the problem with synchronize_rcu_expedited() in
> > > acpi_os_map_cleanup(), right?
> > 
> > Yeah.
> > 
> > > I wonder if the ACPI-specific fix is still needed, then?
> > 
> > It is not strictly necessary. If you still think it would be better to
> > have it regardless, you could pick it up. I.e., making ACPI more robust,
> > yadda yadda.
> > 
> > I dunno, though, perhaps it is only complicating the code unnecessarily
> > and then can be safely ignored with a mental note for future freezes.
> 
> Glad to see it fixed inside of the API provider.
> 
> IMO, ACPI fix is unnecessary as ACPI is just a user of the RCU APIs.
> And it's pointless to add special checks in the user side in order to use one of them.

With some luck, the RCU patch will go in sooner rather than later.

Should it be delayed for whatever reason, the ACPI patch might well
be needed in the meantime.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-01-16  4:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-13  2:38 [PATCH] rcu: Narrow early boot window of illegal synchronous grace periods Paul E. McKenney
2017-01-13 11:25 ` Borislav Petkov
2017-01-14  8:00   ` Paul E. McKenney
2017-01-14 10:35     ` Borislav Petkov
2017-01-14 12:27       ` Rafael J. Wysocki
2017-01-14 12:50         ` Borislav Petkov
2017-01-16  1:57           ` Zheng, Lv
2017-01-16  4:56             ` Paul E. McKenney
2017-01-15  5:24       ` Paul E. McKenney
2017-01-15 10:09         ` Borislav Petkov
2017-01-15 20:33           ` Paul E. McKenney
2017-01-13 17:02 ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).