RCU Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] Remove GP_REPLAY state from rcu_sync
@ 2019-10-04 14:57 Joel Fernandes (Google)
  2019-10-04 14:59 ` Joel Fernandes
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Joel Fernandes (Google) @ 2019-10-04 14:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, bristot, peterz, oleg, paulmck, rcu,
	Josh Triplett, Lai Jiangshan, Mathieu Desnoyers,
	Paul E. McKenney, Steven Rostedt

From: Joel Fernandes <joel@joelfernandes.org>

Please consider this is an RFC for discussion only. Just want to discuss
why the GP_REPLAY state is needed at all.

Here's the intention AFAICS:
When rcu_sync_exit() has happened, the gp_state changes to GP_EXIT while
we wait for a grace period before transitioning to GP_IDLE. In the
meanwhile, if we receive another rcu_sync_exit(), then we want to wait
for another GP to account for that.

Drawing some timing diagrams, it looks like this:

Legend:
rse = rcu_sync_enter
rsx = rcu_sync_exit
i = GP_IDLE
x = GP_EXIT
r = GP_REPLAY
e = GP_ENTER
p = GP_PASSED
rx = GP_REPLAY changes to GP_EXIT

GP num = The GP we are one.

note: A GP passes between the states:
  e and p
  x and i
  x and rx
  rx and i

In a simple case, the timing and states look like:
time
---------------------->
GP num         1111111    2222222
GP state  i    e     p    x     i
CPU0 :         rse	  rsx

However we can enter the replay state like this:
time
---------------------->
GP num         1111111    2222222222222222222223333333
GP state  i    e     p    x              r     rx    i
CPU0 :         rse	  rsx
CPU1 :                         rse     rsx

Due to the second rse + rsx, we had to wait for another GP.

I believe the rationale is, if another rsx happens, another GP has to
happen.

But this is not always true if you consider the following events:

time
---------------------->
GP num         111111     22222222222222222222222222222222233333333
GP state  i    e     p    x                 r              rx     i
CPU0 :         rse	  rsx
CPU1 :                         rse     rsx
CPU2 :                                         rse     rsx

Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
and 2 for the rcu_sync_exit(s).

However, we had 3 rcu_sync_exit()s, not 2. In other words, the
rcu_sync_exit() got batched.

So my point here is, rcu_sync_exit() does not really always cause a new
GP to happen and we can end up having the rcu_sync_exit()s as batched
and sharing the same grace period.

Then what is the point of the GP_REPLAY state at all if it does not
always wait for a new GP?  Taking a step back, why did we intend to have
to wait for a new GP if another rcu_sync_exit() comes while one is still
in progress?

Cc: bristot@redhat.com
Cc: peterz@infradead.org
Cc: oleg@redhat.com
Cc: paulmck@kernel.org
Cc: rcu@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/sync.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index d4558ab7a07d..4f3aad67992c 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -10,7 +10,7 @@
 #include <linux/rcu_sync.h>
 #include <linux/sched.h>
 
-enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };
+enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT };
 
 #define	rss_lock	gp_wait.lock
 
@@ -85,13 +85,6 @@ static void rcu_sync_func(struct rcu_head *rhp)
 		 */
 		WRITE_ONCE(rsp->gp_state, GP_PASSED);
 		wake_up_locked(&rsp->gp_wait);
-	} else if (rsp->gp_state == GP_REPLAY) {
-		/*
-		 * A new rcu_sync_exit() has happened; requeue the callback to
-		 * catch a later GP.
-		 */
-		WRITE_ONCE(rsp->gp_state, GP_EXIT);
-		rcu_sync_call(rsp);
 	} else {
 		/*
 		 * We're at least a GP after the last rcu_sync_exit(); eveybody
@@ -167,16 +160,13 @@ void rcu_sync_enter(struct rcu_sync *rsp)
  */
 void rcu_sync_exit(struct rcu_sync *rsp)
 {
-	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
-	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
+	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED);
 
 	spin_lock_irq(&rsp->rss_lock);
 	if (!--rsp->gp_count) {
 		if (rsp->gp_state == GP_PASSED) {
 			WRITE_ONCE(rsp->gp_state, GP_EXIT);
 			rcu_sync_call(rsp);
-		} else if (rsp->gp_state == GP_EXIT) {
-			WRITE_ONCE(rsp->gp_state, GP_REPLAY);
 		}
 	}
 	spin_unlock_irq(&rsp->rss_lock);
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove GP_REPLAY state from rcu_sync
  2019-10-04 14:57 [PATCH] Remove GP_REPLAY state from rcu_sync Joel Fernandes (Google)
@ 2019-10-04 14:59 ` Joel Fernandes
  2019-10-04 15:41 ` Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Joel Fernandes @ 2019-10-04 14:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, peterz, oleg, paulmck, rcu, Josh Triplett,
	Lai Jiangshan, Mathieu Desnoyers, Paul E. McKenney,
	Steven Rostedt

On Fri, Oct 04, 2019 at 10:57:41AM -0400, Joel Fernandes (Google) wrote:
> From: Joel Fernandes <joel@joelfernandes.org>
> 
> Please consider this is an RFC for discussion only. Just want to discuss
> why the GP_REPLAY state is needed at all.

And I messed up the subject prefix, but this is *really* RFC and for
discussion purposes :)

thanks,

 - Joel


> Here's the intention AFAICS:
> When rcu_sync_exit() has happened, the gp_state changes to GP_EXIT while
> we wait for a grace period before transitioning to GP_IDLE. In the
> meanwhile, if we receive another rcu_sync_exit(), then we want to wait
> for another GP to account for that.
> 
> Drawing some timing diagrams, it looks like this:
> 
> Legend:
> rse = rcu_sync_enter
> rsx = rcu_sync_exit
> i = GP_IDLE
> x = GP_EXIT
> r = GP_REPLAY
> e = GP_ENTER
> p = GP_PASSED
> rx = GP_REPLAY changes to GP_EXIT
> 
> GP num = The GP we are one.
> 
> note: A GP passes between the states:
>   e and p
>   x and i
>   x and rx
>   rx and i
> 
> In a simple case, the timing and states look like:
> time
> ---------------------->
> GP num         1111111    2222222
> GP state  i    e     p    x     i
> CPU0 :         rse	  rsx
> 
> However we can enter the replay state like this:
> time
> ---------------------->
> GP num         1111111    2222222222222222222223333333
> GP state  i    e     p    x              r     rx    i
> CPU0 :         rse	  rsx
> CPU1 :                         rse     rsx
> 
> Due to the second rse + rsx, we had to wait for another GP.
> 
> I believe the rationale is, if another rsx happens, another GP has to
> happen.
> 
> But this is not always true if you consider the following events:
> 
> time
> ---------------------->
> GP num         111111     22222222222222222222222222222222233333333
> GP state  i    e     p    x                 r              rx     i
> CPU0 :         rse	  rsx
> CPU1 :                         rse     rsx
> CPU2 :                                         rse     rsx
> 
> Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
> and 2 for the rcu_sync_exit(s).
> 
> However, we had 3 rcu_sync_exit()s, not 2. In other words, the
> rcu_sync_exit() got batched.
> 
> So my point here is, rcu_sync_exit() does not really always cause a new
> GP to happen and we can end up having the rcu_sync_exit()s as batched
> and sharing the same grace period.
> 
> Then what is the point of the GP_REPLAY state at all if it does not
> always wait for a new GP?  Taking a step back, why did we intend to have
> to wait for a new GP if another rcu_sync_exit() comes while one is still
> in progress?
> 
> Cc: bristot@redhat.com
> Cc: peterz@infradead.org
> Cc: oleg@redhat.com
> Cc: paulmck@kernel.org
> Cc: rcu@vger.kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/sync.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> index d4558ab7a07d..4f3aad67992c 100644
> --- a/kernel/rcu/sync.c
> +++ b/kernel/rcu/sync.c
> @@ -10,7 +10,7 @@
>  #include <linux/rcu_sync.h>
>  #include <linux/sched.h>
>  
> -enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };
> +enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT };
>  
>  #define	rss_lock	gp_wait.lock
>  
> @@ -85,13 +85,6 @@ static void rcu_sync_func(struct rcu_head *rhp)
>  		 */
>  		WRITE_ONCE(rsp->gp_state, GP_PASSED);
>  		wake_up_locked(&rsp->gp_wait);
> -	} else if (rsp->gp_state == GP_REPLAY) {
> -		/*
> -		 * A new rcu_sync_exit() has happened; requeue the callback to
> -		 * catch a later GP.
> -		 */
> -		WRITE_ONCE(rsp->gp_state, GP_EXIT);
> -		rcu_sync_call(rsp);
>  	} else {
>  		/*
>  		 * We're at least a GP after the last rcu_sync_exit(); eveybody
> @@ -167,16 +160,13 @@ void rcu_sync_enter(struct rcu_sync *rsp)
>   */
>  void rcu_sync_exit(struct rcu_sync *rsp)
>  {
> -	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> -	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
> +	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED);
>  
>  	spin_lock_irq(&rsp->rss_lock);
>  	if (!--rsp->gp_count) {
>  		if (rsp->gp_state == GP_PASSED) {
>  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
>  			rcu_sync_call(rsp);
> -		} else if (rsp->gp_state == GP_EXIT) {
> -			WRITE_ONCE(rsp->gp_state, GP_REPLAY);
>  		}
>  	}
>  	spin_unlock_irq(&rsp->rss_lock);
> -- 
> 2.23.0.581.g78d2f28ef7-goog
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove GP_REPLAY state from rcu_sync
  2019-10-04 14:57 [PATCH] Remove GP_REPLAY state from rcu_sync Joel Fernandes (Google)
  2019-10-04 14:59 ` Joel Fernandes
@ 2019-10-04 15:41 ` Oleg Nesterov
  2019-10-04 16:37   ` Joel Fernandes
  2019-10-04 19:25 ` kbuild test robot
  2019-10-04 22:03 ` kbuild test robot
  3 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2019-10-04 15:41 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, bristot, peterz, paulmck, rcu, Josh Triplett,
	Lai Jiangshan, Mathieu Desnoyers, Paul E. McKenney,
	Steven Rostedt

On 10/04, Joel Fernandes (Google) wrote:
>
> But this is not always true if you consider the following events:

I'm afraid I missed your point, but...

> ---------------------->
> GP num         111111     22222222222222222222222222222222233333333
> GP state  i    e     p    x                 r              rx     i
> CPU0 :         rse	  rsx
> CPU1 :                         rse     rsx
> CPU2 :                                         rse     rsx
>
> Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
> and 2 for the rcu_sync_exit(s).

But this is fine?

We only need to ensure that we have a full GP pass between the "last"
rcu_sync_exit() and GP_XXX -> GP_IDLE transition.

> However, we had 3 rcu_sync_exit()s, not 2. In other words, the
> rcu_sync_exit() got batched.
>
> So my point here is, rcu_sync_exit() does not really always cause a new
> GP to happen

See above, it should not.

> Then what is the point of the GP_REPLAY state at all if it does not
> always wait for a new GP?

Again, I don't understand... GP_REPLAY ensures that we will have a full GP
before rcu_sync_func() sets GP_IDLE, note that it does another "recursive"
call_rcu() if it sees GP_REPLAY.

> Taking a step back, why did we intend to have
> to wait for a new GP if another rcu_sync_exit() comes while one is still
> in progress?

To ensure that if another CPU sees rcu_sync_is_idle() (GP_IDLE) after you
do rcu_sync_exit(), then it must also see all memory changes you did before
rcu_sync_exit().

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove GP_REPLAY state from rcu_sync
  2019-10-04 15:41 ` Oleg Nesterov
@ 2019-10-04 16:37   ` Joel Fernandes
  2019-10-07 14:09     ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Joel Fernandes @ 2019-10-04 16:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, bristot, peterz, paulmck, rcu, Josh Triplett,
	Lai Jiangshan, Mathieu Desnoyers, Paul E. McKenney,
	Steven Rostedt

Hi Oleg,

On Fri, Oct 04, 2019 at 05:41:03PM +0200, Oleg Nesterov wrote:
> On 10/04, Joel Fernandes (Google) wrote:
> >
> > But this is not always true if you consider the following events:
> 
> I'm afraid I missed your point, but...
> 
> > ---------------------->
> > GP num         111111     22222222222222222222222222222222233333333
> > GP state  i    e     p    x                 r              rx     i
> > CPU0 :         rse	  rsx
> > CPU1 :                         rse     rsx
> > CPU2 :                                         rse     rsx
> >
> > Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
> > and 2 for the rcu_sync_exit(s).
> 
> But this is fine?
> 
> We only need to ensure that we have a full GP pass between the "last"
> rcu_sync_exit() and GP_XXX -> GP_IDLE transition.
> 
> > However, we had 3 rcu_sync_exit()s, not 2. In other words, the
> > rcu_sync_exit() got batched.
> >
> > So my point here is, rcu_sync_exit() does not really always cause a new
> > GP to happen
> 
> See above, it should not.

Ok, I understand now. The point is to wait for a full GP, not necessarily
start a new one on each exit.

> > Then what is the point of the GP_REPLAY state at all if it does not
> > always wait for a new GP?
> 
> Again, I don't understand... GP_REPLAY ensures that we will have a full GP
> before rcu_sync_func() sets GP_IDLE, note that it does another "recursive"
> call_rcu() if it sees GP_REPLAY.

Ok, got it.

> > Taking a step back, why did we intend to have
> > to wait for a new GP if another rcu_sync_exit() comes while one is still
> > in progress?
> 
> To ensure that if another CPU sees rcu_sync_is_idle() (GP_IDLE) after you
> do rcu_sync_exit(), then it must also see all memory changes you did before
> rcu_sync_exit().

Would this not be better implemented using memory barriers, than starting new
grace periods just for memory ordering? A memory barrier is lighter than
having to go through a grace period. So something like: if the state is
already GP_EXIT, then rcu_sync_exit() issues a memory barrier instead of
replaying. But if state is GP_PASSED, then wait for a grace period. Or, do
you see a situation where this will not work?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove GP_REPLAY state from rcu_sync
  2019-10-04 14:57 [PATCH] Remove GP_REPLAY state from rcu_sync Joel Fernandes (Google)
  2019-10-04 14:59 ` Joel Fernandes
  2019-10-04 15:41 ` Oleg Nesterov
@ 2019-10-04 19:25 ` kbuild test robot
  2019-10-04 22:03 ` kbuild test robot
  3 siblings, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2019-10-04 19:25 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: kbuild-all, linux-kernel, Joel Fernandes, bristot, peterz, oleg,
	paulmck, rcu, Josh Triplett, Lai Jiangshan, Mathieu Desnoyers,
	Paul E. McKenney, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 2642 bytes --]

Hi "Joel,

I love your patch! Yet something to improve:

[auto build test ERROR on rcu/dev]
[cannot apply to v5.4-rc1 next-20191004]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Joel-Fernandes-Google/Remove-GP_REPLAY-state-from-rcu_sync/20191005-024257
base:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/rcu/sync.c: In function 'rcu_sync_dtor':
>> kernel/rcu/sync.c:187:23: error: 'GP_REPLAY' undeclared (first use in this function)
     if (rsp->gp_state == GP_REPLAY)
                          ^~~~~~~~~
   kernel/rcu/sync.c:187:23: note: each undeclared identifier is reported only once for each function it appears in

vim +/GP_REPLAY +187 kernel/rcu/sync.c

07899a6e5f5613 Oleg Nesterov 2015-08-21  174  
07899a6e5f5613 Oleg Nesterov 2015-08-21  175  /**
07899a6e5f5613 Oleg Nesterov 2015-08-21  176   * rcu_sync_dtor() - Clean up an rcu_sync structure
07899a6e5f5613 Oleg Nesterov 2015-08-21  177   * @rsp: Pointer to rcu_sync structure to be cleaned up
07899a6e5f5613 Oleg Nesterov 2015-08-21  178   */
07899a6e5f5613 Oleg Nesterov 2015-08-21  179  void rcu_sync_dtor(struct rcu_sync *rsp)
07899a6e5f5613 Oleg Nesterov 2015-08-21  180  {
89da3b94bb9741 Oleg Nesterov 2019-04-25  181  	int gp_state;
07899a6e5f5613 Oleg Nesterov 2015-08-21  182  
89da3b94bb9741 Oleg Nesterov 2019-04-25  183  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
89da3b94bb9741 Oleg Nesterov 2019-04-25  184  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
07899a6e5f5613 Oleg Nesterov 2015-08-21  185  
07899a6e5f5613 Oleg Nesterov 2015-08-21  186  	spin_lock_irq(&rsp->rss_lock);
89da3b94bb9741 Oleg Nesterov 2019-04-25 @187  	if (rsp->gp_state == GP_REPLAY)

:::::: The code at line 187 was first introduced by commit
:::::: 89da3b94bb97417ca2c5b0ce3a28643819030247 rcu/sync: Simplify the state machine

:::::: TO: Oleg Nesterov <oleg@redhat.com>
:::::: CC: Paul E. McKenney <paulmck@linux.ibm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7205 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove GP_REPLAY state from rcu_sync
  2019-10-04 14:57 [PATCH] Remove GP_REPLAY state from rcu_sync Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2019-10-04 19:25 ` kbuild test robot
@ 2019-10-04 22:03 ` kbuild test robot
  3 siblings, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2019-10-04 22:03 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: kbuild-all, linux-kernel, Joel Fernandes, bristot, peterz, oleg,
	paulmck, rcu, Josh Triplett, Lai Jiangshan, Mathieu Desnoyers,
	Paul E. McKenney, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 17125 bytes --]

Hi "Joel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on rcu/dev]
[cannot apply to v5.4-rc1 next-20191004]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Joel-Fernandes-Google/Remove-GP_REPLAY-state-from-rcu_sync/20191005-024257
base:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev
config: x86_64-randconfig-b002-201939 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/export.h:44:0,
                    from include/linux/linkage.h:7,
                    from include/linux/kernel.h:8,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/rcu_sync.h:13,
                    from kernel//rcu/sync.c:10:
   kernel//rcu/sync.c: In function 'rcu_sync_dtor':
   kernel//rcu/sync.c:187:23: error: 'GP_REPLAY' undeclared (first use in this function)
     if (rsp->gp_state == GP_REPLAY)
                          ^
   include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
    #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                       ^~~~
>> kernel//rcu/sync.c:187:2: note: in expansion of macro 'if'
     if (rsp->gp_state == GP_REPLAY)
     ^~
   kernel//rcu/sync.c:187:23: note: each undeclared identifier is reported only once for each function it appears in
     if (rsp->gp_state == GP_REPLAY)
                          ^
   include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
    #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                       ^~~~
>> kernel//rcu/sync.c:187:2: note: in expansion of macro 'if'
     if (rsp->gp_state == GP_REPLAY)
     ^~

vim +/if +187 kernel//rcu/sync.c

cc44ca848f5e51 Oleg Nesterov    2015-08-21  @10  #include <linux/rcu_sync.h>
cc44ca848f5e51 Oleg Nesterov    2015-08-21   11  #include <linux/sched.h>
cc44ca848f5e51 Oleg Nesterov    2015-08-21   12  
6d1a4c2dfe7bb0 Joel Fernandes   2019-10-04   13  enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT };
cc44ca848f5e51 Oleg Nesterov    2015-08-21   14  
cc44ca848f5e51 Oleg Nesterov    2015-08-21   15  #define	rss_lock	gp_wait.lock
cc44ca848f5e51 Oleg Nesterov    2015-08-21   16  
cc44ca848f5e51 Oleg Nesterov    2015-08-21   17  /**
cc44ca848f5e51 Oleg Nesterov    2015-08-21   18   * rcu_sync_init() - Initialize an rcu_sync structure
cc44ca848f5e51 Oleg Nesterov    2015-08-21   19   * @rsp: Pointer to rcu_sync structure to be initialized
cc44ca848f5e51 Oleg Nesterov    2015-08-21   20   */
95bf33b55ff446 Oleg Nesterov    2019-04-23   21  void rcu_sync_init(struct rcu_sync *rsp)
cc44ca848f5e51 Oleg Nesterov    2015-08-21   22  {
cc44ca848f5e51 Oleg Nesterov    2015-08-21   23  	memset(rsp, 0, sizeof(*rsp));
cc44ca848f5e51 Oleg Nesterov    2015-08-21   24  	init_waitqueue_head(&rsp->gp_wait);
cc44ca848f5e51 Oleg Nesterov    2015-08-21   25  }
cc44ca848f5e51 Oleg Nesterov    2015-08-21   26  
3942a9bd7b5842 Peter Zijlstra   2016-08-11   27  /**
27fdb35fe99011 Paul E. McKenney 2017-10-19   28   * rcu_sync_enter_start - Force readers onto slow path for multiple updates
27fdb35fe99011 Paul E. McKenney 2017-10-19   29   * @rsp: Pointer to rcu_sync structure to use for synchronization
27fdb35fe99011 Paul E. McKenney 2017-10-19   30   *
3942a9bd7b5842 Peter Zijlstra   2016-08-11   31   * Must be called after rcu_sync_init() and before first use.
3942a9bd7b5842 Peter Zijlstra   2016-08-11   32   *
3942a9bd7b5842 Peter Zijlstra   2016-08-11   33   * Ensures rcu_sync_is_idle() returns false and rcu_sync_{enter,exit}()
3942a9bd7b5842 Peter Zijlstra   2016-08-11   34   * pairs turn into NO-OPs.
3942a9bd7b5842 Peter Zijlstra   2016-08-11   35   */
3942a9bd7b5842 Peter Zijlstra   2016-08-11   36  void rcu_sync_enter_start(struct rcu_sync *rsp)
3942a9bd7b5842 Peter Zijlstra   2016-08-11   37  {
3942a9bd7b5842 Peter Zijlstra   2016-08-11   38  	rsp->gp_count++;
3942a9bd7b5842 Peter Zijlstra   2016-08-11   39  	rsp->gp_state = GP_PASSED;
3942a9bd7b5842 Peter Zijlstra   2016-08-11   40  }
3942a9bd7b5842 Peter Zijlstra   2016-08-11   41  
cc44ca848f5e51 Oleg Nesterov    2015-08-21   42  
89da3b94bb9741 Oleg Nesterov    2019-04-25   43  static void rcu_sync_func(struct rcu_head *rhp);
cc44ca848f5e51 Oleg Nesterov    2015-08-21   44  
89da3b94bb9741 Oleg Nesterov    2019-04-25   45  static void rcu_sync_call(struct rcu_sync *rsp)
89da3b94bb9741 Oleg Nesterov    2019-04-25   46  {
89da3b94bb9741 Oleg Nesterov    2019-04-25   47  	call_rcu(&rsp->cb_head, rcu_sync_func);
cc44ca848f5e51 Oleg Nesterov    2015-08-21   48  }
cc44ca848f5e51 Oleg Nesterov    2015-08-21   49  
cc44ca848f5e51 Oleg Nesterov    2015-08-21   50  /**
cc44ca848f5e51 Oleg Nesterov    2015-08-21   51   * rcu_sync_func() - Callback function managing reader access to fastpath
27fdb35fe99011 Paul E. McKenney 2017-10-19   52   * @rhp: Pointer to rcu_head in rcu_sync structure to use for synchronization
cc44ca848f5e51 Oleg Nesterov    2015-08-21   53   *
89da3b94bb9741 Oleg Nesterov    2019-04-25   54   * This function is passed to call_rcu() function by rcu_sync_enter() and
cc44ca848f5e51 Oleg Nesterov    2015-08-21   55   * rcu_sync_exit(), so that it is invoked after a grace period following the
89da3b94bb9741 Oleg Nesterov    2019-04-25   56   * that invocation of enter/exit.
89da3b94bb9741 Oleg Nesterov    2019-04-25   57   *
89da3b94bb9741 Oleg Nesterov    2019-04-25   58   * If it is called by rcu_sync_enter() it signals that all the readers were
89da3b94bb9741 Oleg Nesterov    2019-04-25   59   * switched onto slow path.
89da3b94bb9741 Oleg Nesterov    2019-04-25   60   *
89da3b94bb9741 Oleg Nesterov    2019-04-25   61   * If it is called by rcu_sync_exit() it takes action based on events that
cc44ca848f5e51 Oleg Nesterov    2015-08-21   62   * have taken place in the meantime, so that closely spaced rcu_sync_enter()
cc44ca848f5e51 Oleg Nesterov    2015-08-21   63   * and rcu_sync_exit() pairs need not wait for a grace period.
cc44ca848f5e51 Oleg Nesterov    2015-08-21   64   *
cc44ca848f5e51 Oleg Nesterov    2015-08-21   65   * If another rcu_sync_enter() is invoked before the grace period
cc44ca848f5e51 Oleg Nesterov    2015-08-21   66   * ended, reset state to allow the next rcu_sync_exit() to let the
cc44ca848f5e51 Oleg Nesterov    2015-08-21   67   * readers back onto their fastpaths (after a grace period).  If both
cc44ca848f5e51 Oleg Nesterov    2015-08-21   68   * another rcu_sync_enter() and its matching rcu_sync_exit() are invoked
cc44ca848f5e51 Oleg Nesterov    2015-08-21   69   * before the grace period ended, re-invoke call_rcu() on behalf of that
cc44ca848f5e51 Oleg Nesterov    2015-08-21   70   * rcu_sync_exit().  Otherwise, set all state back to idle so that readers
cc44ca848f5e51 Oleg Nesterov    2015-08-21   71   * can again use their fastpaths.
cc44ca848f5e51 Oleg Nesterov    2015-08-21   72   */
27fdb35fe99011 Paul E. McKenney 2017-10-19   73  static void rcu_sync_func(struct rcu_head *rhp)
cc44ca848f5e51 Oleg Nesterov    2015-08-21   74  {
27fdb35fe99011 Paul E. McKenney 2017-10-19   75  	struct rcu_sync *rsp = container_of(rhp, struct rcu_sync, cb_head);
cc44ca848f5e51 Oleg Nesterov    2015-08-21   76  	unsigned long flags;
cc44ca848f5e51 Oleg Nesterov    2015-08-21   77  
89da3b94bb9741 Oleg Nesterov    2019-04-25   78  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
89da3b94bb9741 Oleg Nesterov    2019-04-25   79  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
cc44ca848f5e51 Oleg Nesterov    2015-08-21   80  
cc44ca848f5e51 Oleg Nesterov    2015-08-21   81  	spin_lock_irqsave(&rsp->rss_lock, flags);
cc44ca848f5e51 Oleg Nesterov    2015-08-21   82  	if (rsp->gp_count) {
cc44ca848f5e51 Oleg Nesterov    2015-08-21   83  		/*
89da3b94bb9741 Oleg Nesterov    2019-04-25   84  		 * We're at least a GP after the GP_IDLE->GP_ENTER transition.
cc44ca848f5e51 Oleg Nesterov    2015-08-21   85  		 */
89da3b94bb9741 Oleg Nesterov    2019-04-25   86  		WRITE_ONCE(rsp->gp_state, GP_PASSED);
89da3b94bb9741 Oleg Nesterov    2019-04-25   87  		wake_up_locked(&rsp->gp_wait);
cc44ca848f5e51 Oleg Nesterov    2015-08-21   88  	} else {
cc44ca848f5e51 Oleg Nesterov    2015-08-21   89  		/*
89da3b94bb9741 Oleg Nesterov    2019-04-25   90  		 * We're at least a GP after the last rcu_sync_exit(); eveybody
89da3b94bb9741 Oleg Nesterov    2019-04-25   91  		 * will now have observed the write side critical section.
89da3b94bb9741 Oleg Nesterov    2019-04-25   92  		 * Let 'em rip!.
cc44ca848f5e51 Oleg Nesterov    2015-08-21   93  		 */
89da3b94bb9741 Oleg Nesterov    2019-04-25   94  		WRITE_ONCE(rsp->gp_state, GP_IDLE);
cc44ca848f5e51 Oleg Nesterov    2015-08-21   95  	}
cc44ca848f5e51 Oleg Nesterov    2015-08-21   96  	spin_unlock_irqrestore(&rsp->rss_lock, flags);
cc44ca848f5e51 Oleg Nesterov    2015-08-21   97  }
cc44ca848f5e51 Oleg Nesterov    2015-08-21   98  
cc44ca848f5e51 Oleg Nesterov    2015-08-21   99  /**
89da3b94bb9741 Oleg Nesterov    2019-04-25  100   * rcu_sync_enter() - Force readers onto slowpath
89da3b94bb9741 Oleg Nesterov    2019-04-25  101   * @rsp: Pointer to rcu_sync structure to use for synchronization
89da3b94bb9741 Oleg Nesterov    2019-04-25  102   *
89da3b94bb9741 Oleg Nesterov    2019-04-25  103   * This function is used by updaters who need readers to make use of
89da3b94bb9741 Oleg Nesterov    2019-04-25  104   * a slowpath during the update.  After this function returns, all
89da3b94bb9741 Oleg Nesterov    2019-04-25  105   * subsequent calls to rcu_sync_is_idle() will return false, which
89da3b94bb9741 Oleg Nesterov    2019-04-25  106   * tells readers to stay off their fastpaths.  A later call to
89da3b94bb9741 Oleg Nesterov    2019-04-25  107   * rcu_sync_exit() re-enables reader slowpaths.
89da3b94bb9741 Oleg Nesterov    2019-04-25  108   *
89da3b94bb9741 Oleg Nesterov    2019-04-25  109   * When called in isolation, rcu_sync_enter() must wait for a grace
89da3b94bb9741 Oleg Nesterov    2019-04-25  110   * period, however, closely spaced calls to rcu_sync_enter() can
89da3b94bb9741 Oleg Nesterov    2019-04-25  111   * optimize away the grace-period wait via a state machine implemented
89da3b94bb9741 Oleg Nesterov    2019-04-25  112   * by rcu_sync_enter(), rcu_sync_exit(), and rcu_sync_func().
89da3b94bb9741 Oleg Nesterov    2019-04-25  113   */
89da3b94bb9741 Oleg Nesterov    2019-04-25  114  void rcu_sync_enter(struct rcu_sync *rsp)
89da3b94bb9741 Oleg Nesterov    2019-04-25  115  {
89da3b94bb9741 Oleg Nesterov    2019-04-25  116  	int gp_state;
89da3b94bb9741 Oleg Nesterov    2019-04-25  117  
89da3b94bb9741 Oleg Nesterov    2019-04-25  118  	spin_lock_irq(&rsp->rss_lock);
89da3b94bb9741 Oleg Nesterov    2019-04-25  119  	gp_state = rsp->gp_state;
89da3b94bb9741 Oleg Nesterov    2019-04-25  120  	if (gp_state == GP_IDLE) {
89da3b94bb9741 Oleg Nesterov    2019-04-25  121  		WRITE_ONCE(rsp->gp_state, GP_ENTER);
89da3b94bb9741 Oleg Nesterov    2019-04-25  122  		WARN_ON_ONCE(rsp->gp_count);
89da3b94bb9741 Oleg Nesterov    2019-04-25  123  		/*
89da3b94bb9741 Oleg Nesterov    2019-04-25  124  		 * Note that we could simply do rcu_sync_call(rsp) here and
89da3b94bb9741 Oleg Nesterov    2019-04-25  125  		 * avoid the "if (gp_state == GP_IDLE)" block below.
89da3b94bb9741 Oleg Nesterov    2019-04-25  126  		 *
89da3b94bb9741 Oleg Nesterov    2019-04-25  127  		 * However, synchronize_rcu() can be faster if rcu_expedited
89da3b94bb9741 Oleg Nesterov    2019-04-25  128  		 * or rcu_blocking_is_gp() is true.
89da3b94bb9741 Oleg Nesterov    2019-04-25  129  		 *
89da3b94bb9741 Oleg Nesterov    2019-04-25  130  		 * Another reason is that we can't wait for rcu callback if
89da3b94bb9741 Oleg Nesterov    2019-04-25  131  		 * we are called at early boot time but this shouldn't happen.
89da3b94bb9741 Oleg Nesterov    2019-04-25  132  		 */
89da3b94bb9741 Oleg Nesterov    2019-04-25  133  	}
89da3b94bb9741 Oleg Nesterov    2019-04-25  134  	rsp->gp_count++;
89da3b94bb9741 Oleg Nesterov    2019-04-25  135  	spin_unlock_irq(&rsp->rss_lock);
89da3b94bb9741 Oleg Nesterov    2019-04-25  136  
89da3b94bb9741 Oleg Nesterov    2019-04-25  137  	if (gp_state == GP_IDLE) {
89da3b94bb9741 Oleg Nesterov    2019-04-25  138  		/*
89da3b94bb9741 Oleg Nesterov    2019-04-25  139  		 * See the comment above, this simply does the "synchronous"
89da3b94bb9741 Oleg Nesterov    2019-04-25  140  		 * call_rcu(rcu_sync_func) which does GP_ENTER -> GP_PASSED.
89da3b94bb9741 Oleg Nesterov    2019-04-25  141  		 */
89da3b94bb9741 Oleg Nesterov    2019-04-25  142  		synchronize_rcu();
89da3b94bb9741 Oleg Nesterov    2019-04-25  143  		rcu_sync_func(&rsp->cb_head);
89da3b94bb9741 Oleg Nesterov    2019-04-25  144  		/* Not really needed, wait_event() would see GP_PASSED. */
89da3b94bb9741 Oleg Nesterov    2019-04-25  145  		return;
89da3b94bb9741 Oleg Nesterov    2019-04-25  146  	}
89da3b94bb9741 Oleg Nesterov    2019-04-25  147  
89da3b94bb9741 Oleg Nesterov    2019-04-25  148  	wait_event(rsp->gp_wait, READ_ONCE(rsp->gp_state) >= GP_PASSED);
89da3b94bb9741 Oleg Nesterov    2019-04-25  149  }
89da3b94bb9741 Oleg Nesterov    2019-04-25  150  
89da3b94bb9741 Oleg Nesterov    2019-04-25  151  /**
89da3b94bb9741 Oleg Nesterov    2019-04-25  152   * rcu_sync_exit() - Allow readers back onto fast path after grace period
cc44ca848f5e51 Oleg Nesterov    2015-08-21  153   * @rsp: Pointer to rcu_sync structure to use for synchronization
cc44ca848f5e51 Oleg Nesterov    2015-08-21  154   *
cc44ca848f5e51 Oleg Nesterov    2015-08-21  155   * This function is used by updaters who have completed, and can therefore
cc44ca848f5e51 Oleg Nesterov    2015-08-21  156   * now allow readers to make use of their fastpaths after a grace period
cc44ca848f5e51 Oleg Nesterov    2015-08-21  157   * has elapsed.  After this grace period has completed, all subsequent
cc44ca848f5e51 Oleg Nesterov    2015-08-21  158   * calls to rcu_sync_is_idle() will return true, which tells readers that
cc44ca848f5e51 Oleg Nesterov    2015-08-21  159   * they can once again use their fastpaths.
cc44ca848f5e51 Oleg Nesterov    2015-08-21  160   */
cc44ca848f5e51 Oleg Nesterov    2015-08-21  161  void rcu_sync_exit(struct rcu_sync *rsp)
cc44ca848f5e51 Oleg Nesterov    2015-08-21  162  {
6d1a4c2dfe7bb0 Joel Fernandes   2019-10-04  163  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED);
89da3b94bb9741 Oleg Nesterov    2019-04-25  164  
cc44ca848f5e51 Oleg Nesterov    2015-08-21  165  	spin_lock_irq(&rsp->rss_lock);
cc44ca848f5e51 Oleg Nesterov    2015-08-21  166  	if (!--rsp->gp_count) {
89da3b94bb9741 Oleg Nesterov    2019-04-25  167  		if (rsp->gp_state == GP_PASSED) {
89da3b94bb9741 Oleg Nesterov    2019-04-25  168  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
89da3b94bb9741 Oleg Nesterov    2019-04-25  169  			rcu_sync_call(rsp);
cc44ca848f5e51 Oleg Nesterov    2015-08-21  170  		}
cc44ca848f5e51 Oleg Nesterov    2015-08-21  171  	}
cc44ca848f5e51 Oleg Nesterov    2015-08-21  172  	spin_unlock_irq(&rsp->rss_lock);
cc44ca848f5e51 Oleg Nesterov    2015-08-21  173  }
07899a6e5f5613 Oleg Nesterov    2015-08-21  174  
07899a6e5f5613 Oleg Nesterov    2015-08-21  175  /**
07899a6e5f5613 Oleg Nesterov    2015-08-21  176   * rcu_sync_dtor() - Clean up an rcu_sync structure
07899a6e5f5613 Oleg Nesterov    2015-08-21  177   * @rsp: Pointer to rcu_sync structure to be cleaned up
07899a6e5f5613 Oleg Nesterov    2015-08-21  178   */
07899a6e5f5613 Oleg Nesterov    2015-08-21  179  void rcu_sync_dtor(struct rcu_sync *rsp)
07899a6e5f5613 Oleg Nesterov    2015-08-21  180  {
89da3b94bb9741 Oleg Nesterov    2019-04-25  181  	int gp_state;
07899a6e5f5613 Oleg Nesterov    2015-08-21  182  
89da3b94bb9741 Oleg Nesterov    2019-04-25  183  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
89da3b94bb9741 Oleg Nesterov    2019-04-25  184  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
07899a6e5f5613 Oleg Nesterov    2015-08-21  185  
07899a6e5f5613 Oleg Nesterov    2015-08-21  186  	spin_lock_irq(&rsp->rss_lock);
89da3b94bb9741 Oleg Nesterov    2019-04-25 @187  	if (rsp->gp_state == GP_REPLAY)

:::::: The code at line 187 was first introduced by commit
:::::: 89da3b94bb97417ca2c5b0ce3a28643819030247 rcu/sync: Simplify the state machine

:::::: TO: Oleg Nesterov <oleg@redhat.com>
:::::: CC: Paul E. McKenney <paulmck@linux.ibm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30353 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove GP_REPLAY state from rcu_sync
  2019-10-04 16:37   ` Joel Fernandes
@ 2019-10-07 14:09     ` Oleg Nesterov
  0 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2019-10-07 14:09 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, bristot, peterz, paulmck, rcu, Josh Triplett,
	Lai Jiangshan, Mathieu Desnoyers, Paul E. McKenney,
	Steven Rostedt

On 10/04, Joel Fernandes wrote:
>
> On Fri, Oct 04, 2019 at 05:41:03PM +0200, Oleg Nesterov wrote:
> > On 10/04, Joel Fernandes (Google) wrote:
> > >
> > > Taking a step back, why did we intend to have
> > > to wait for a new GP if another rcu_sync_exit() comes while one is still
> > > in progress?
> >
> > To ensure that if another CPU sees rcu_sync_is_idle() (GP_IDLE) after you
> > do rcu_sync_exit(), then it must also see all memory changes you did before
> > rcu_sync_exit().
>
> Would this not be better implemented using memory barriers, than starting new
> grace periods just for memory ordering? A memory barrier is lighter than
> having to go through a grace period. So something like: if the state is
> already GP_EXIT, then rcu_sync_exit() issues a memory barrier instead of
> replaying. But if state is GP_PASSED, then wait for a grace period.

But these 2 cases do not differ. If we can use mb() if GP_EXIT, then we can
do the same if state == GP_PASSED and just move the state to GP_IDLE, and
remove both GP_PASSED/GP_REPLAY states.

However, in this case the readers will need the barrier too, and rcu_sync_enter()
will _always_ need to block (wait for GP).

rcu_sync.c is "equivalent" to the following implementation:


            struct rcu_sync_struct {
                    atomic_t writers;
            };

            bool rcu_sync_is_idle(rss)
            {
                    return atomic_read(rss->writers) == 0;
            }

            void rcu_sync_enter(rss)
            {
                    atomic_inc(rss->writers);
                    synchronize_rcu();
            }

            void rcu_sync_exit(rss)
            {
                    synchronize_rcu();
                    atomic_dec(rss->writers);
            }

except

	- rcu_sync_exit() never blocks

	- synchronize_rcu/call_rci is called only if it is really needed.
	  In particular, if 2 writers come in a row the 2nd one will not
	  block in _enter().

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-04 14:57 [PATCH] Remove GP_REPLAY state from rcu_sync Joel Fernandes (Google)
2019-10-04 14:59 ` Joel Fernandes
2019-10-04 15:41 ` Oleg Nesterov
2019-10-04 16:37   ` Joel Fernandes
2019-10-07 14:09     ` Oleg Nesterov
2019-10-04 19:25 ` kbuild test robot
2019-10-04 22:03 ` kbuild test robot

RCU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/rcu/0 rcu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 rcu rcu/ https://lore.kernel.org/rcu \
		rcu@vger.kernel.org rcu@archiver.kernel.org
	public-inbox-index rcu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.rcu


AGPL code for this site: git clone https://public-inbox.org/ public-inbox