linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] srcu: Fix boot stall
@ 2021-04-01 23:47 Frederic Weisbecker
  2021-04-01 23:47 ` [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2021-04-01 23:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Uladzislau Rezki, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Joel Fernandes

This is a fix for a boot stall that can be reproduced with
"rcupdate.rcu_self_test=1" and CONFIG_PROVE_RCU=y.

It should be easy to trigger if you set CONFIG_MAXSMP=y and you don't
actually have thousands of CPUs.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	rcu/dev

HEAD: 6390b2aa1295f79150fac8f5ff60177eea5d8f8b

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      srcu: Remove superfluous ssp initialization on deferred work queue
      srcu: Remove superfluous sdp->srcu_lock_count zero filling
      srcu: Fix broken node geometry after early ssp init


 include/linux/srcutree.h |  1 +
 kernel/rcu/srcutree.c    | 81 +++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 64 insertions(+), 18 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue
  2021-04-01 23:47 [PATCH 0/3] srcu: Fix boot stall Frederic Weisbecker
@ 2021-04-01 23:47 ` Frederic Weisbecker
  2021-04-02  0:48   ` Paul E. McKenney
  2021-04-01 23:47 ` [PATCH 2/3] srcu: Remove superfluous sdp->srcu_lock_count zero filling Frederic Weisbecker
  2021-04-01 23:47 ` [PATCH 3/3] srcu: Fix broken node geometry after early ssp init Frederic Weisbecker
  2 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2021-04-01 23:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Uladzislau Rezki, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Joel Fernandes

When an ssp has already started a grace period and queued an early work
to flush after SRCU workqueues are created, we expect the ssp to be
properly initialized already. So we can skip this step at this stage.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
---
 kernel/rcu/srcutree.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 036ff5499ad5..7197156418e4 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1396,7 +1396,6 @@ void __init srcu_init(void)
 	while (!list_empty(&srcu_boot_list)) {
 		ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
 				      work.work.entry);
-		check_init_srcu_struct(ssp);
 		list_del_init(&ssp->work.work.entry);
 		queue_work(rcu_gp_wq, &ssp->work.work);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] srcu: Remove superfluous sdp->srcu_lock_count zero filling
  2021-04-01 23:47 [PATCH 0/3] srcu: Fix boot stall Frederic Weisbecker
  2021-04-01 23:47 ` [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue Frederic Weisbecker
@ 2021-04-01 23:47 ` Frederic Weisbecker
  2021-04-02  0:58   ` Paul E. McKenney
  2021-04-01 23:47 ` [PATCH 3/3] srcu: Fix broken node geometry after early ssp init Frederic Weisbecker
  2 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2021-04-01 23:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Uladzislau Rezki, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Joel Fernandes

alloc_percpu() zeroes out the allocated memory. Therefore we can assume
the whole struct srcu_data to be clear after calling it, just like after
a static initialization. No need for a special treatment in the dynamic
allocation case.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
---
 kernel/rcu/srcutree.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 7197156418e4..10e681ea7051 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -80,7 +80,7 @@ do {									\
  * srcu_read_unlock() running against them.  So if the is_static parameter
  * is set, don't initialize ->srcu_lock_count[] and ->srcu_unlock_count[].
  */
-static void init_srcu_struct_nodes(struct srcu_struct *ssp, bool is_static)
+static void init_srcu_struct_nodes(struct srcu_struct *ssp)
 {
 	int cpu;
 	int i;
@@ -148,14 +148,6 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp, bool is_static)
 		timer_setup(&sdp->delay_work, srcu_delay_timer, 0);
 		sdp->ssp = ssp;
 		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
-		if (is_static)
-			continue;
-
-		/* Dynamically allocated, better be no srcu_read_locks()! */
-		for (i = 0; i < ARRAY_SIZE(sdp->srcu_lock_count); i++) {
-			sdp->srcu_lock_count[i] = 0;
-			sdp->srcu_unlock_count[i] = 0;
-		}
 	}
 }
 
@@ -179,7 +171,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static)
 		ssp->sda = alloc_percpu(struct srcu_data);
 	if (!ssp->sda)
 		return -ENOMEM;
-	init_srcu_struct_nodes(ssp, is_static);
+	init_srcu_struct_nodes(ssp);
 	ssp->srcu_gp_seq_needed_exp = 0;
 	ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
 	smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] srcu: Fix broken node geometry after early ssp init
  2021-04-01 23:47 [PATCH 0/3] srcu: Fix boot stall Frederic Weisbecker
  2021-04-01 23:47 ` [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue Frederic Weisbecker
  2021-04-01 23:47 ` [PATCH 2/3] srcu: Remove superfluous sdp->srcu_lock_count zero filling Frederic Weisbecker
@ 2021-04-01 23:47 ` Frederic Weisbecker
  2021-04-02  1:12   ` Paul E. McKenney
  2 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2021-04-01 23:47 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Uladzislau Rezki, Boqun Feng,
	Lai Jiangshan, Neeraj Upadhyay, Josh Triplett, Joel Fernandes

An ssp initialized before rcu_init_geometry() will have its snp hierarchy
based on CONFIG_NR_CPUS.

Once rcu_init_geometry() is called, the nodes distribution is shrinked
and optimized toward meeting the actual possible number of CPUs detected
on boot.

Later on, the ssp that was initialized before rcu_init_geometry() is
confused and sometimes refers to its initial CONFIG_NR_CPUS based node
hierarchy, sometimes to the new num_possible_cpus() based one instead.
For example each of its sdp->mynode remain backward and refer to the
early node leaves that may not exist anymore. On the other hand the
srcu_for_each_node_breadth_first() refers to the new node hierarchy.

There are at least two bad possible outcomes to this:

1) a) A callback enqueued early on an sdp is recorded pending on
      sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with
      sdp->mynode pointing to a deep leaf (say 3 levels).

   b) The grace period ends after rcu_init_geometry() which shrinks the
      nodes level to a single one. srcu_gp_end() walks through the new
      snp hierarchy without ever reaching the old leaves so the callback
      is never executed.

   This is easily reproduced on an 8 CPUs machine with
   CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The
   srcu_barrier() after early tests verification never completes and
   the boot hangs:

	[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds.
	[ 5413.147564]       Not tainted 5.12.0-rc4+ #28
	[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	[ 5413.159753] task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00004000
	[ 5413.168099] Call Trace:
	[ 5413.170555]  __schedule+0x36c/0x930
	[ 5413.174057]  ? wait_for_completion+0x88/0x110
	[ 5413.178423]  schedule+0x46/0xf0
	[ 5413.181575]  schedule_timeout+0x284/0x380
	[ 5413.185591]  ? wait_for_completion+0x88/0x110
	[ 5413.189957]  ? mark_held_locks+0x61/0x80
	[ 5413.193882]  ? mark_held_locks+0x61/0x80
	[ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
	[ 5413.202173]  ? wait_for_completion+0x88/0x110
	[ 5413.206535]  wait_for_completion+0xb4/0x110
	[ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
	[ 5413.215610]  srcu_barrier+0x187/0x200
	[ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
	[ 5413.224244]  ? rdinit_setup+0x2b/0x2b
	[ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
	[ 5413.232700]  do_one_initcall+0x63/0x310
	[ 5413.236541]  ? rdinit_setup+0x2b/0x2b
	[ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
	[ 5413.244912]  kernel_init_freeable+0x253/0x28f
	[ 5413.249273]  ? rest_init+0x250/0x250
	[ 5413.252846]  kernel_init+0xa/0x110
	[ 5413.256257]  ret_from_fork+0x22/0x30

2) An ssp that gets initialized before rcu_init_geometry() and used
   afterward will always have stale rdp->mynode references, resulting in
   callbacks to be missed in srcu_gp_end(), just like in the previous
   scenario.

Solve this with fixing the node tree layout of early initialized ssp
once rcu_init_geometry() is done. Unfortunately this involves a new
field into struct srcu_struct to postpone the ssp hierarchy rebuild.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
---
 include/linux/srcutree.h |  1 +
 kernel/rcu/srcutree.c    | 68 +++++++++++++++++++++++++++++++++++-----
 2 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index 9cfcc8a756ae..4339e5794a72 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -85,6 +85,7 @@ struct srcu_struct {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map dep_map;
 #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
+	struct list_head early_init;
 };
 
 /* Values for state variable (bottom bits of ->srcu_gp_seq). */
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 10e681ea7051..285f0c053754 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -39,7 +39,7 @@ static ulong counter_wrap_check = (ULONG_MAX >> 2);
 module_param(counter_wrap_check, ulong, 0444);
 
 /* Early-boot callback-management, so early that no lock is required! */
-static LIST_HEAD(srcu_boot_list);
+static LIST_HEAD(srcu_boot_queue_list);
 static bool __read_mostly srcu_init_done;
 
 static void srcu_invoke_callbacks(struct work_struct *work);
@@ -133,7 +133,9 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp)
 	for_each_possible_cpu(cpu) {
 		sdp = per_cpu_ptr(ssp->sda, cpu);
 		spin_lock_init(&ACCESS_PRIVATE(sdp, lock));
-		rcu_segcblist_init(&sdp->srcu_cblist);
+		/* Preserve the queue in case of hierarchy rebuild */
+		if (!rcu_segcblist_is_enabled(&sdp->srcu_cblist))
+			rcu_segcblist_init(&sdp->srcu_cblist);
 		sdp->srcu_cblist_invoking = false;
 		sdp->srcu_gp_seq_needed = ssp->srcu_gp_seq;
 		sdp->srcu_gp_seq_needed_exp = ssp->srcu_gp_seq;
@@ -151,6 +153,8 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp)
 	}
 }
 
+static LIST_HEAD(srcu_early_init_list);
+
 /*
  * Initialize non-compile-time initialized fields, including the
  * associated srcu_node and srcu_data structures.  The is_static
@@ -174,6 +178,8 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static)
 	init_srcu_struct_nodes(ssp);
 	ssp->srcu_gp_seq_needed_exp = 0;
 	ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
+	if (!srcu_init_done)
+		list_add_tail(&ssp->early_init, &srcu_early_init_list);
 	smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */
 	return 0;
 }
@@ -679,7 +685,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
 			queue_delayed_work(rcu_gp_wq, &ssp->work,
 					   srcu_get_delay(ssp));
 		else if (list_empty(&ssp->work.work.entry))
-			list_add(&ssp->work.work.entry, &srcu_boot_list);
+			list_add(&ssp->work.work.entry, &srcu_boot_queue_list);
 	}
 	spin_unlock_irqrestore_rcu_node(ssp, flags);
 }
@@ -1380,14 +1386,62 @@ static int __init srcu_bootup_announce(void)
 }
 early_initcall(srcu_bootup_announce);
 
+static void __init restore_srcu_sdp(struct srcu_data *sdp,
+				    struct srcu_data *old_sdp,
+				    struct srcu_node *old_mynode)
+{
+	int i;
+	struct srcu_node *mynode = sdp->mynode;
+	struct srcu_node *snp = mynode;
+
+	/* No need to lock the nodes at this stage. We are on early boot */
+	for (snp = mynode; snp != NULL; snp = snp->srcu_parent) {
+		for (i = 0; i < ARRAY_SIZE(old_mynode->srcu_have_cbs); i++) {
+			snp->srcu_have_cbs[i] = old_mynode->srcu_have_cbs[i];
+
+			if (snp != mynode)
+				continue;
+
+			if (old_mynode->srcu_data_have_cbs[i] == old_sdp->grpmask) {
+				snp->srcu_data_have_cbs[i] = sdp->grpmask;
+			} else if (!old_mynode->srcu_data_have_cbs[i]) {
+				snp->srcu_data_have_cbs[i] = 0;
+			} else {
+				/* Don't expect other CPUs to have callbacks that early */
+				WARN_ON_ONCE(1);
+			}
+		}
+		snp->srcu_gp_seq_needed_exp = old_mynode->srcu_gp_seq_needed_exp;
+	}
+
+	sdp->srcu_gp_seq_needed = old_sdp->srcu_gp_seq_needed;
+	sdp->srcu_gp_seq_needed_exp = old_sdp->srcu_gp_seq_needed_exp;
+}
+
 void __init srcu_init(void)
 {
-	struct srcu_struct *ssp;
+	static struct srcu_data __initdata old_sdp;
+	static struct srcu_node __initdata old_mynode;
+	struct srcu_struct *ssp, *tmp;
 
 	srcu_init_done = true;
-	while (!list_empty(&srcu_boot_list)) {
-		ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
-				      work.work.entry);
+	/*
+	 * ssp's that got initialized before rcu_init_geometry() have a stale
+	 * node hierarchy. Rebuild their node trees and propagate states from
+	 * pruned leaf nodes.
+	 */
+	list_for_each_entry_safe(ssp, tmp, &srcu_early_init_list, early_init) {
+		struct srcu_data *sdp = this_cpu_ptr(ssp->sda);
+
+		list_del(&ssp->early_init);
+		old_sdp = *sdp;
+		old_mynode = *sdp->mynode;
+		init_srcu_struct_nodes(ssp);
+		restore_srcu_sdp(sdp, &old_sdp, &old_mynode);
+	}
+
+	/* Handle works that had to wait before workqueues get created */
+	list_for_each_entry_safe(ssp, tmp, &srcu_boot_queue_list, work.work.entry) {
 		list_del_init(&ssp->work.work.entry);
 		queue_work(rcu_gp_wq, &ssp->work.work);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue
  2021-04-01 23:47 ` [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue Frederic Weisbecker
@ 2021-04-02  0:48   ` Paul E. McKenney
  2021-04-02  0:58     ` Frederic Weisbecker
  2021-04-02  1:01     ` Paul E. McKenney
  0 siblings, 2 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-04-02  0:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Fri, Apr 02, 2021 at 01:47:02AM +0200, Frederic Weisbecker wrote:
> When an ssp has already started a grace period and queued an early work
> to flush after SRCU workqueues are created, we expect the ssp to be
> properly initialized already. So we can skip this step at this stage.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> ---
>  kernel/rcu/srcutree.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 036ff5499ad5..7197156418e4 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -1396,7 +1396,6 @@ void __init srcu_init(void)
>  	while (!list_empty(&srcu_boot_list)) {
>  		ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
>  				      work.work.entry);
> -		check_init_srcu_struct(ssp);

You lost me on this one.  What happens if the only pre-initialization
invocation on the statically allocated srcu_struct pointed to by ssp
was call_srcu()?  I am not seeing how the initialization has already
happened in that case.

What am I missing here?

							Thanx, Paul

>  		list_del_init(&ssp->work.work.entry);
>  		queue_work(rcu_gp_wq, &ssp->work.work);
>  	}
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue
  2021-04-02  0:48   ` Paul E. McKenney
@ 2021-04-02  0:58     ` Frederic Weisbecker
  2021-04-02  1:19       ` Paul E. McKenney
  2021-04-02  1:01     ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2021-04-02  0:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Thu, Apr 01, 2021 at 05:48:56PM -0700, Paul E. McKenney wrote:
> On Fri, Apr 02, 2021 at 01:47:02AM +0200, Frederic Weisbecker wrote:
> > When an ssp has already started a grace period and queued an early work
> > to flush after SRCU workqueues are created, we expect the ssp to be
> > properly initialized already. So we can skip this step at this stage.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Uladzislau Rezki <urezki@gmail.com>
> > ---
> >  kernel/rcu/srcutree.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 036ff5499ad5..7197156418e4 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1396,7 +1396,6 @@ void __init srcu_init(void)
> >  	while (!list_empty(&srcu_boot_list)) {
> >  		ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
> >  				      work.work.entry);
> > -		check_init_srcu_struct(ssp);
> 
> You lost me on this one.  What happens if the only pre-initialization
> invocation on the statically allocated srcu_struct pointed to by ssp
> was call_srcu()?  I am not seeing how the initialization has already
> happened in that case.
> 
> What am I missing here?

call_srcu() -> __call_srcu() -> srcu_gp_start_if_needed() ->
check_init_srcu_struct() ?

Or is it me missing something?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] srcu: Remove superfluous sdp->srcu_lock_count zero filling
  2021-04-01 23:47 ` [PATCH 2/3] srcu: Remove superfluous sdp->srcu_lock_count zero filling Frederic Weisbecker
@ 2021-04-02  0:58   ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-04-02  0:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Fri, Apr 02, 2021 at 01:47:03AM +0200, Frederic Weisbecker wrote:
> alloc_percpu() zeroes out the allocated memory. Therefore we can assume
> the whole struct srcu_data to be clear after calling it, just like after
> a static initialization. No need for a special treatment in the dynamic
> allocation case.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>

Good catch, thank you!!!  I queued the following with the usual
wordsmithing, so as usual please let me know if I messed anything up.

							Thanx, Paul

------------------------------------------------------------------------

commit 2cfdfbfc41bcd96e7961a619e4a7f235b274f78f
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Fri Apr 2 01:47:03 2021 +0200

    srcu: Remove superfluous sdp->srcu_lock_count zero filling
    
    Because alloc_percpu() zeroes out the allocated memory, there is no need
    to zero-fill newly allocated per-CPU memory.  This commit therefore removes
    the loop zeroing the ->srcu_lock_count and ->srcu_unlock_count arrays
    from init_srcu_struct_nodes().  This is the only use of that function's
    is_static parameter, which this commit also removes.
    
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 036ff54..7389e46 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -80,7 +80,7 @@ do {									\
  * srcu_read_unlock() running against them.  So if the is_static parameter
  * is set, don't initialize ->srcu_lock_count[] and ->srcu_unlock_count[].
  */
-static void init_srcu_struct_nodes(struct srcu_struct *ssp, bool is_static)
+static void init_srcu_struct_nodes(struct srcu_struct *ssp)
 {
 	int cpu;
 	int i;
@@ -148,14 +148,6 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp, bool is_static)
 		timer_setup(&sdp->delay_work, srcu_delay_timer, 0);
 		sdp->ssp = ssp;
 		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
-		if (is_static)
-			continue;
-
-		/* Dynamically allocated, better be no srcu_read_locks()! */
-		for (i = 0; i < ARRAY_SIZE(sdp->srcu_lock_count); i++) {
-			sdp->srcu_lock_count[i] = 0;
-			sdp->srcu_unlock_count[i] = 0;
-		}
 	}
 }
 
@@ -179,7 +171,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static)
 		ssp->sda = alloc_percpu(struct srcu_data);
 	if (!ssp->sda)
 		return -ENOMEM;
-	init_srcu_struct_nodes(ssp, is_static);
+	init_srcu_struct_nodes(ssp);
 	ssp->srcu_gp_seq_needed_exp = 0;
 	ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
 	smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue
  2021-04-02  0:48   ` Paul E. McKenney
  2021-04-02  0:58     ` Frederic Weisbecker
@ 2021-04-02  1:01     ` Paul E. McKenney
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-04-02  1:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Thu, Apr 01, 2021 at 05:48:56PM -0700, Paul E. McKenney wrote:
> On Fri, Apr 02, 2021 at 01:47:02AM +0200, Frederic Weisbecker wrote:
> > When an ssp has already started a grace period and queued an early work
> > to flush after SRCU workqueues are created, we expect the ssp to be
> > properly initialized already. So we can skip this step at this stage.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Uladzislau Rezki <urezki@gmail.com>
> > ---
> >  kernel/rcu/srcutree.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 036ff5499ad5..7197156418e4 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1396,7 +1396,6 @@ void __init srcu_init(void)
> >  	while (!list_empty(&srcu_boot_list)) {
> >  		ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
> >  				      work.work.entry);
> > -		check_init_srcu_struct(ssp);
> 
> You lost me on this one.  What happens if the only pre-initialization
> invocation on the statically allocated srcu_struct pointed to by ssp
> was call_srcu()?  I am not seeing how the initialization has already
> happened in that case.
> 
> What am I missing here?

Idiot here was looking at Tiny SRCU's call_srcu(), not that of Tree SRCU.
Never mind, I will queue this one as well.

							Thanx, Paul

> >  		list_del_init(&ssp->work.work.entry);
> >  		queue_work(rcu_gp_wq, &ssp->work.work);
> >  	}
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] srcu: Fix broken node geometry after early ssp init
  2021-04-01 23:47 ` [PATCH 3/3] srcu: Fix broken node geometry after early ssp init Frederic Weisbecker
@ 2021-04-02  1:12   ` Paul E. McKenney
  2021-04-02 10:02     ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2021-04-02  1:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Fri, Apr 02, 2021 at 01:47:04AM +0200, Frederic Weisbecker wrote:
> An ssp initialized before rcu_init_geometry() will have its snp hierarchy
> based on CONFIG_NR_CPUS.
> 
> Once rcu_init_geometry() is called, the nodes distribution is shrinked
> and optimized toward meeting the actual possible number of CPUs detected
> on boot.
> 
> Later on, the ssp that was initialized before rcu_init_geometry() is
> confused and sometimes refers to its initial CONFIG_NR_CPUS based node
> hierarchy, sometimes to the new num_possible_cpus() based one instead.
> For example each of its sdp->mynode remain backward and refer to the
> early node leaves that may not exist anymore. On the other hand the
> srcu_for_each_node_breadth_first() refers to the new node hierarchy.
> 
> There are at least two bad possible outcomes to this:
> 
> 1) a) A callback enqueued early on an sdp is recorded pending on
>       sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with
>       sdp->mynode pointing to a deep leaf (say 3 levels).
> 
>    b) The grace period ends after rcu_init_geometry() which shrinks the
>       nodes level to a single one. srcu_gp_end() walks through the new
>       snp hierarchy without ever reaching the old leaves so the callback
>       is never executed.
> 
>    This is easily reproduced on an 8 CPUs machine with
>    CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The
>    srcu_barrier() after early tests verification never completes and
>    the boot hangs:
> 
> 	[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds.
> 	[ 5413.147564]       Not tainted 5.12.0-rc4+ #28
> 	[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 	[ 5413.159753] task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00004000
> 	[ 5413.168099] Call Trace:
> 	[ 5413.170555]  __schedule+0x36c/0x930
> 	[ 5413.174057]  ? wait_for_completion+0x88/0x110
> 	[ 5413.178423]  schedule+0x46/0xf0
> 	[ 5413.181575]  schedule_timeout+0x284/0x380
> 	[ 5413.185591]  ? wait_for_completion+0x88/0x110
> 	[ 5413.189957]  ? mark_held_locks+0x61/0x80
> 	[ 5413.193882]  ? mark_held_locks+0x61/0x80
> 	[ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
> 	[ 5413.202173]  ? wait_for_completion+0x88/0x110
> 	[ 5413.206535]  wait_for_completion+0xb4/0x110
> 	[ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
> 	[ 5413.215610]  srcu_barrier+0x187/0x200
> 	[ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
> 	[ 5413.224244]  ? rdinit_setup+0x2b/0x2b
> 	[ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
> 	[ 5413.232700]  do_one_initcall+0x63/0x310
> 	[ 5413.236541]  ? rdinit_setup+0x2b/0x2b
> 	[ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
> 	[ 5413.244912]  kernel_init_freeable+0x253/0x28f
> 	[ 5413.249273]  ? rest_init+0x250/0x250
> 	[ 5413.252846]  kernel_init+0xa/0x110
> 	[ 5413.256257]  ret_from_fork+0x22/0x30
> 
> 2) An ssp that gets initialized before rcu_init_geometry() and used
>    afterward will always have stale rdp->mynode references, resulting in
>    callbacks to be missed in srcu_gp_end(), just like in the previous
>    scenario.
> 
> Solve this with fixing the node tree layout of early initialized ssp
> once rcu_init_geometry() is done. Unfortunately this involves a new
> field into struct srcu_struct to postpone the ssp hierarchy rebuild.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>

Again, good catch and thank you!

One question below.

							Thanx, Paul

> ---
>  include/linux/srcutree.h |  1 +
>  kernel/rcu/srcutree.c    | 68 +++++++++++++++++++++++++++++++++++-----
>  2 files changed, 62 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> index 9cfcc8a756ae..4339e5794a72 100644
> --- a/include/linux/srcutree.h
> +++ b/include/linux/srcutree.h
> @@ -85,6 +85,7 @@ struct srcu_struct {
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  	struct lockdep_map dep_map;
>  #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> +	struct list_head early_init;
>  };
>  
>  /* Values for state variable (bottom bits of ->srcu_gp_seq). */
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 10e681ea7051..285f0c053754 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -39,7 +39,7 @@ static ulong counter_wrap_check = (ULONG_MAX >> 2);
>  module_param(counter_wrap_check, ulong, 0444);
>  
>  /* Early-boot callback-management, so early that no lock is required! */
> -static LIST_HEAD(srcu_boot_list);
> +static LIST_HEAD(srcu_boot_queue_list);
>  static bool __read_mostly srcu_init_done;
>  
>  static void srcu_invoke_callbacks(struct work_struct *work);
> @@ -133,7 +133,9 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp)
>  	for_each_possible_cpu(cpu) {
>  		sdp = per_cpu_ptr(ssp->sda, cpu);
>  		spin_lock_init(&ACCESS_PRIVATE(sdp, lock));
> -		rcu_segcblist_init(&sdp->srcu_cblist);
> +		/* Preserve the queue in case of hierarchy rebuild */
> +		if (!rcu_segcblist_is_enabled(&sdp->srcu_cblist))
> +			rcu_segcblist_init(&sdp->srcu_cblist);
>  		sdp->srcu_cblist_invoking = false;
>  		sdp->srcu_gp_seq_needed = ssp->srcu_gp_seq;
>  		sdp->srcu_gp_seq_needed_exp = ssp->srcu_gp_seq;
> @@ -151,6 +153,8 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp)
>  	}
>  }
>  
> +static LIST_HEAD(srcu_early_init_list);
> +
>  /*
>   * Initialize non-compile-time initialized fields, including the
>   * associated srcu_node and srcu_data structures.  The is_static
> @@ -174,6 +178,8 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static)
>  	init_srcu_struct_nodes(ssp);
>  	ssp->srcu_gp_seq_needed_exp = 0;
>  	ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
> +	if (!srcu_init_done)
> +		list_add_tail(&ssp->early_init, &srcu_early_init_list);
>  	smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */
>  	return 0;
>  }
> @@ -679,7 +685,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
>  			queue_delayed_work(rcu_gp_wq, &ssp->work,
>  					   srcu_get_delay(ssp));
>  		else if (list_empty(&ssp->work.work.entry))
> -			list_add(&ssp->work.work.entry, &srcu_boot_list);
> +			list_add(&ssp->work.work.entry, &srcu_boot_queue_list);
>  	}
>  	spin_unlock_irqrestore_rcu_node(ssp, flags);
>  }
> @@ -1380,14 +1386,62 @@ static int __init srcu_bootup_announce(void)
>  }
>  early_initcall(srcu_bootup_announce);
>  
> +static void __init restore_srcu_sdp(struct srcu_data *sdp,
> +				    struct srcu_data *old_sdp,
> +				    struct srcu_node *old_mynode)
> +{
> +	int i;
> +	struct srcu_node *mynode = sdp->mynode;
> +	struct srcu_node *snp = mynode;
> +
> +	/* No need to lock the nodes at this stage. We are on early boot */
> +	for (snp = mynode; snp != NULL; snp = snp->srcu_parent) {
> +		for (i = 0; i < ARRAY_SIZE(old_mynode->srcu_have_cbs); i++) {
> +			snp->srcu_have_cbs[i] = old_mynode->srcu_have_cbs[i];
> +
> +			if (snp != mynode)
> +				continue;
> +
> +			if (old_mynode->srcu_data_have_cbs[i] == old_sdp->grpmask) {
> +				snp->srcu_data_have_cbs[i] = sdp->grpmask;
> +			} else if (!old_mynode->srcu_data_have_cbs[i]) {
> +				snp->srcu_data_have_cbs[i] = 0;
> +			} else {
> +				/* Don't expect other CPUs to have callbacks that early */
> +				WARN_ON_ONCE(1);
> +			}
> +		}
> +		snp->srcu_gp_seq_needed_exp = old_mynode->srcu_gp_seq_needed_exp;
> +	}
> +
> +	sdp->srcu_gp_seq_needed = old_sdp->srcu_gp_seq_needed;
> +	sdp->srcu_gp_seq_needed_exp = old_sdp->srcu_gp_seq_needed_exp;
> +}
> +
>  void __init srcu_init(void)
>  {
> -	struct srcu_struct *ssp;
> +	static struct srcu_data __initdata old_sdp;
> +	static struct srcu_node __initdata old_mynode;
> +	struct srcu_struct *ssp, *tmp;
>  
>  	srcu_init_done = true;
> -	while (!list_empty(&srcu_boot_list)) {
> -		ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
> -				      work.work.entry);
> +	/*
> +	 * ssp's that got initialized before rcu_init_geometry() have a stale
> +	 * node hierarchy. Rebuild their node trees and propagate states from
> +	 * pruned leaf nodes.
> +	 */
> +	list_for_each_entry_safe(ssp, tmp, &srcu_early_init_list, early_init) {
> +		struct srcu_data *sdp = this_cpu_ptr(ssp->sda);
> +
> +		list_del(&ssp->early_init);
> +		old_sdp = *sdp;
> +		old_mynode = *sdp->mynode;
> +		init_srcu_struct_nodes(ssp);
> +		restore_srcu_sdp(sdp, &old_sdp, &old_mynode);

Why not just pull all of the pre-initialization callbacks onto a local
list (re-initialization the rcu_segcblist structures if necessary),
then iterate through that list re-invoking call_srcu() on each?
There shouldn't be that many pre-initialization call_srcu() invocations,
and if someday there are, it would not be hard to make __call_srcu()
take lists of callbacks and a count instead of a single callback, correct?

Or am I once again missing something basic?

							Thanx, Paul

> +	}
> +
> +	/* Handle works that had to wait before workqueues get created */
> +	list_for_each_entry_safe(ssp, tmp, &srcu_boot_queue_list, work.work.entry) {
>  		list_del_init(&ssp->work.work.entry);
>  		queue_work(rcu_gp_wq, &ssp->work.work);
>  	}
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue
  2021-04-02  0:58     ` Frederic Weisbecker
@ 2021-04-02  1:19       ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-04-02  1:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Fri, Apr 02, 2021 at 02:58:13AM +0200, Frederic Weisbecker wrote:
> On Thu, Apr 01, 2021 at 05:48:56PM -0700, Paul E. McKenney wrote:
> > On Fri, Apr 02, 2021 at 01:47:02AM +0200, Frederic Weisbecker wrote:
> > > When an ssp has already started a grace period and queued an early work
> > > to flush after SRCU workqueues are created, we expect the ssp to be
> > > properly initialized already. So we can skip this step at this stage.
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: Uladzislau Rezki <urezki@gmail.com>
> > > ---
> > >  kernel/rcu/srcutree.c | 1 -
> > >  1 file changed, 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > index 036ff5499ad5..7197156418e4 100644
> > > --- a/kernel/rcu/srcutree.c
> > > +++ b/kernel/rcu/srcutree.c
> > > @@ -1396,7 +1396,6 @@ void __init srcu_init(void)
> > >  	while (!list_empty(&srcu_boot_list)) {
> > >  		ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
> > >  				      work.work.entry);
> > > -		check_init_srcu_struct(ssp);
> > 
> > You lost me on this one.  What happens if the only pre-initialization
> > invocation on the statically allocated srcu_struct pointed to by ssp
> > was call_srcu()?  I am not seeing how the initialization has already
> > happened in that case.
> > 
> > What am I missing here?
> 
> call_srcu() -> __call_srcu() -> srcu_gp_start_if_needed() ->
> check_init_srcu_struct() ?
> 
> Or is it me missing something?

Nope, me getting confused between Tree SRCU's and Tiny SRCU's
call_srcu() implementation.  :-/

I have queued this patch and started testing.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] srcu: Fix broken node geometry after early ssp init
  2021-04-02  1:12   ` Paul E. McKenney
@ 2021-04-02 10:02     ` Frederic Weisbecker
  2021-04-02 15:03       ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2021-04-02 10:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Thu, Apr 01, 2021 at 06:12:41PM -0700, Paul E. McKenney wrote:
> On Fri, Apr 02, 2021 at 01:47:04AM +0200, Frederic Weisbecker wrote:
gg> >  void __init srcu_init(void)
> >  {
> > -	struct srcu_struct *ssp;
> > +	static struct srcu_data __initdata old_sdp;
> > +	static struct srcu_node __initdata old_mynode;
> > +	struct srcu_struct *ssp, *tmp;
> >  
> >  	srcu_init_done = true;
> > -	while (!list_empty(&srcu_boot_list)) {
> > -		ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
> > -				      work.work.entry);
> > +	/*
> > +	 * ssp's that got initialized before rcu_init_geometry() have a stale
> > +	 * node hierarchy. Rebuild their node trees and propagate states from
> > +	 * pruned leaf nodes.
> > +	 */
> > +	list_for_each_entry_safe(ssp, tmp, &srcu_early_init_list, early_init) {
> > +		struct srcu_data *sdp = this_cpu_ptr(ssp->sda);
> > +
> > +		list_del(&ssp->early_init);
> > +		old_sdp = *sdp;
> > +		old_mynode = *sdp->mynode;
> > +		init_srcu_struct_nodes(ssp);
> > +		restore_srcu_sdp(sdp, &old_sdp, &old_mynode);
> 
> Why not just pull all of the pre-initialization callbacks onto a local
> list (re-initialization the rcu_segcblist structures if necessary),
> then iterate through that list re-invoking call_srcu() on each?
> There shouldn't be that many pre-initialization call_srcu() invocations,
> and if someday there are, it would not be hard to make __call_srcu()
> take lists of callbacks and a count instead of a single callback, correct?
> 
> Or am I once again missing something basic?

So that would work for callbacks based started grace periods but not for the
others. So if anybody calls start_poll_synchronize_rcu() early, simply requeuing
the callbacks won't help. Even worse, a blind reinitialization of the ssp would
lose the started grace period and a future poll_state_synchronize_rcu() might
never succeed.

Arguably that's a quite a corner case and I don't expect anyone to call
start_poll_synchronize_srcu() so early but who knows. An alternative is to
forbid it and warn if used before srcu is initialized.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] srcu: Fix broken node geometry after early ssp init
  2021-04-02 10:02     ` Frederic Weisbecker
@ 2021-04-02 15:03       ` Paul E. McKenney
  2021-04-02 20:50         ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2021-04-02 15:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Fri, Apr 02, 2021 at 12:02:21PM +0200, Frederic Weisbecker wrote:
> On Thu, Apr 01, 2021 at 06:12:41PM -0700, Paul E. McKenney wrote:
> > On Fri, Apr 02, 2021 at 01:47:04AM +0200, Frederic Weisbecker wrote:
> gg> >  void __init srcu_init(void)
> > >  {
> > > -	struct srcu_struct *ssp;
> > > +	static struct srcu_data __initdata old_sdp;
> > > +	static struct srcu_node __initdata old_mynode;
> > > +	struct srcu_struct *ssp, *tmp;
> > >  
> > >  	srcu_init_done = true;
> > > -	while (!list_empty(&srcu_boot_list)) {
> > > -		ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
> > > -				      work.work.entry);
> > > +	/*
> > > +	 * ssp's that got initialized before rcu_init_geometry() have a stale
> > > +	 * node hierarchy. Rebuild their node trees and propagate states from
> > > +	 * pruned leaf nodes.
> > > +	 */
> > > +	list_for_each_entry_safe(ssp, tmp, &srcu_early_init_list, early_init) {
> > > +		struct srcu_data *sdp = this_cpu_ptr(ssp->sda);
> > > +
> > > +		list_del(&ssp->early_init);
> > > +		old_sdp = *sdp;
> > > +		old_mynode = *sdp->mynode;
> > > +		init_srcu_struct_nodes(ssp);
> > > +		restore_srcu_sdp(sdp, &old_sdp, &old_mynode);
> > 
> > Why not just pull all of the pre-initialization callbacks onto a local
> > list (re-initialization the rcu_segcblist structures if necessary),
> > then iterate through that list re-invoking call_srcu() on each?
> > There shouldn't be that many pre-initialization call_srcu() invocations,
> > and if someday there are, it would not be hard to make __call_srcu()
> > take lists of callbacks and a count instead of a single callback, correct?
> > 
> > Or am I once again missing something basic?
> 
> So that would work for callbacks based started grace periods but not for the
> others. So if anybody calls start_poll_synchronize_rcu() early, simply requeuing
> the callbacks won't help. Even worse, a blind reinitialization of the ssp would
> lose the started grace period and a future poll_state_synchronize_rcu() might
> never succeed.

Good point...

> Arguably that's a quite a corner case and I don't expect anyone to call
> start_poll_synchronize_srcu() so early but who knows. An alternative is to
> forbid it and warn if used before srcu is initialized.

Another approach would be to have start_poll_synchronize_rcu() check to
see if initialization has happened, and if not, simply queue a callback.

Any other ways to make this work?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] srcu: Fix broken node geometry after early ssp init
  2021-04-02 15:03       ` Paul E. McKenney
@ 2021-04-02 20:50         ` Frederic Weisbecker
  2021-04-02 22:12           ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2021-04-02 20:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Fri, Apr 02, 2021 at 08:03:57AM -0700, Paul E. McKenney wrote:
> On Fri, Apr 02, 2021 at 12:02:21PM +0200, Frederic Weisbecker wrote:
> > Arguably that's a quite a corner case and I don't expect anyone to call
> > start_poll_synchronize_srcu() so early but who knows. An alternative is to
> > forbid it and warn if used before srcu is initialized.
> 
> Another approach would be to have start_poll_synchronize_rcu() check to
> see if initialization has happened, and if not, simply queue a callback.
> 
> Any other ways to make this work?

Ok I think that should work. We want to make sure that the cookies returned
by each call to start_poll_synchronize_rcu() before rcu_init_geometry() will
match the gpnums targeted by the corresponding callbacks we requeue.

Since we are very early and the workqueues can't schedule, the grace periods
shouldn't be able to complete. Assuming ssp->srcu_gp_seq is initialized as N.
The first call to call_srcu/start_poll_synchronize_rcu should target gpnum N +
1. Then all those that follow should target gpnum N + 2 and not further.

While we call srcu_init() and requeue the callbacks in order after resetting
gpnum to N, this should just behave the same and attribute the right gpnum
to each callbacks.

It would have been a problem if the workqueues could schedule and complete
grace periods concurrently because we might target gpnum N + 3, N + 4, ...
as we requeue the callbacks. But it's not the case so we should be fine as
long as callbacks are requeued in order.

Does that sound right to you as well? If so I can try it.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] srcu: Fix broken node geometry after early ssp init
  2021-04-02 20:50         ` Frederic Weisbecker
@ 2021-04-02 22:12           ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-04-02 22:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Uladzislau Rezki, Boqun Feng, Lai Jiangshan,
	Neeraj Upadhyay, Josh Triplett, Joel Fernandes

On Fri, Apr 02, 2021 at 10:50:38PM +0200, Frederic Weisbecker wrote:
> On Fri, Apr 02, 2021 at 08:03:57AM -0700, Paul E. McKenney wrote:
> > On Fri, Apr 02, 2021 at 12:02:21PM +0200, Frederic Weisbecker wrote:
> > > Arguably that's a quite a corner case and I don't expect anyone to call
> > > start_poll_synchronize_srcu() so early but who knows. An alternative is to
> > > forbid it and warn if used before srcu is initialized.
> > 
> > Another approach would be to have start_poll_synchronize_rcu() check to
> > see if initialization has happened, and if not, simply queue a callback.
> > 
> > Any other ways to make this work?
> 
> Ok I think that should work. We want to make sure that the cookies returned
> by each call to start_poll_synchronize_rcu() before rcu_init_geometry() will
> match the gpnums targeted by the corresponding callbacks we requeue.
> 
> Since we are very early and the workqueues can't schedule, the grace periods
> shouldn't be able to complete. Assuming ssp->srcu_gp_seq is initialized as N.
> The first call to call_srcu/start_poll_synchronize_rcu should target gpnum N +
> 1. Then all those that follow should target gpnum N + 2 and not further.
> 
> While we call srcu_init() and requeue the callbacks in order after resetting
> gpnum to N, this should just behave the same and attribute the right gpnum
> to each callbacks.
> 
> It would have been a problem if the workqueues could schedule and complete
> grace periods concurrently because we might target gpnum N + 3, N + 4, ...
> as we requeue the callbacks. But it's not the case so we should be fine as
> long as callbacks are requeued in order.
> 
> Does that sound right to you as well? If so I can try it.

Makes sense to me!

There also needs to be an additional start_poll_synchronize_rcu() check
to avoid double call_rcu() of a single rcu_head structure.  But everything
is single-threaded at that point, and this check is after the check
for already being initialized, so this should be no problem.

And yes, srcu_init() happens well before context switch is possible,
let alone workqueues scheduling.  Famous last words...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-04-02 22:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01 23:47 [PATCH 0/3] srcu: Fix boot stall Frederic Weisbecker
2021-04-01 23:47 ` [PATCH 1/3] srcu: Remove superfluous ssp initialization on deferred work queue Frederic Weisbecker
2021-04-02  0:48   ` Paul E. McKenney
2021-04-02  0:58     ` Frederic Weisbecker
2021-04-02  1:19       ` Paul E. McKenney
2021-04-02  1:01     ` Paul E. McKenney
2021-04-01 23:47 ` [PATCH 2/3] srcu: Remove superfluous sdp->srcu_lock_count zero filling Frederic Weisbecker
2021-04-02  0:58   ` Paul E. McKenney
2021-04-01 23:47 ` [PATCH 3/3] srcu: Fix broken node geometry after early ssp init Frederic Weisbecker
2021-04-02  1:12   ` Paul E. McKenney
2021-04-02 10:02     ` Frederic Weisbecker
2021-04-02 15:03       ` Paul E. McKenney
2021-04-02 20:50         ` Frederic Weisbecker
2021-04-02 22:12           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).