Re: Question about lock in synchronize_rcu implementation of URCU

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found] <CAAKbDrfmcqEE7_6p_J0DPVO85yDGJ0GCbi5k9Lm19g4VaLgOWw@mail.gmail.com>
@ 2016-04-28  2:08 ` Paul E. McKenney
       [not found] ` <20160428020803.GN4967@linux.vnet.ibm.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2016-04-28  2:08 UTC (permalink / raw)
  To: Yuxin Ren; +Cc: lttng-dev

On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> Hi,
> 
> I am learning the URCU code.
> 
> Why do we need rcu_gp_lock in synchronize_rcu?
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> 
> In the comment, it says this lock ensures mutual exclusion between
> threads calling synchronize_rcu().
> But only the first thread added to waiter queue can proceed to detect
> grace period.
> How can multiple threads currently perform the grace thread?

They don't concurrently perform grace periods, and it would be wasteful
for them to do so.  Instead, the first one performs the grace period,
and all that were waiting at the time it started get the benefit of that
same grace period.

Any that arrived after the first grace period performs the first
grace period are served by whichever of them performs the second
grace period.

							Thanx, Paul

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found] ` <20160428020803.GN4967@linux.vnet.ibm.com>
@ 2016-04-28  2:18   ` Yuxin Ren
       [not found]   ` <CAAKbDrfRMKzu6brHG=pfm0mu_Oy1afVZ94zDCiN7Y0w5o+SMZQ@mail.gmail.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Yuxin Ren @ 2016-04-28  2:18 UTC (permalink / raw)
  To: Paul McKenney; +Cc: lttng-dev

As they don't currently perform grace period, why do we use the rcu_gp_lock?

Thank you.
Yuxin

On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
>> Hi,
>>
>> I am learning the URCU code.
>>
>> Why do we need rcu_gp_lock in synchronize_rcu?
>> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
>>
>> In the comment, it says this lock ensures mutual exclusion between
>> threads calling synchronize_rcu().
>> But only the first thread added to waiter queue can proceed to detect
>> grace period.
>> How can multiple threads currently perform the grace thread?
>
> They don't concurrently perform grace periods, and it would be wasteful
> for them to do so.  Instead, the first one performs the grace period,
> and all that were waiting at the time it started get the benefit of that
> same grace period.
>
> Any that arrived after the first grace period performs the first
> grace period are served by whichever of them performs the second
> grace period.
>
>                                                         Thanx, Paul
>
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found]   ` <CAAKbDrfRMKzu6brHG=pfm0mu_Oy1afVZ94zDCiN7Y0w5o+SMZQ@mail.gmail.com>
@ 2016-04-28  4:23     ` Paul E. McKenney
       [not found]     ` <20160428042327.GP4967@linux.vnet.ibm.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2016-04-28  4:23 UTC (permalink / raw)
  To: Yuxin Ren; +Cc: lttng-dev

Try building without it and see what happens when you run the tests.

Might well be that it is unnecessary, but I will defer to Mathieu
on that point.

							Thanx, Paul

On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> As they don't currently perform grace period, why do we use the rcu_gp_lock?
> 
> Thank you.
> Yuxin
> 
> On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> >> Hi,
> >>
> >> I am learning the URCU code.
> >>
> >> Why do we need rcu_gp_lock in synchronize_rcu?
> >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> >>
> >> In the comment, it says this lock ensures mutual exclusion between
> >> threads calling synchronize_rcu().
> >> But only the first thread added to waiter queue can proceed to detect
> >> grace period.
> >> How can multiple threads currently perform the grace thread?
> >
> > They don't concurrently perform grace periods, and it would be wasteful
> > for them to do so.  Instead, the first one performs the grace period,
> > and all that were waiting at the time it started get the benefit of that
> > same grace period.
> >
> > Any that arrived after the first grace period performs the first
> > grace period are served by whichever of them performs the second
> > grace period.
> >
> >                                                         Thanx, Paul
> >
> 

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found]     ` <20160428042327.GP4967@linux.vnet.ibm.com>
@ 2016-04-28 12:44       ` Boqun Feng
       [not found]       ` <20160428124401.GB25392@insomnia>
  1 sibling, 0 replies; 11+ messages in thread
From: Boqun Feng @ 2016-04-28 12:44 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 4542 bytes --]

Hi Paul and Yuxin,

On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> Try building without it and see what happens when you run the tests.
> 

I've run a 'regtest' with the following modification out of curiousity:

diff --git a/urcu.c b/urcu.c
index a5568bdbd075..9dc3c9feae56 100644
--- a/urcu.c
+++ b/urcu.c
@@ -398,8 +398,6 @@ void synchronize_rcu(void)
 	/* We won't need to wake ourself up */
 	urcu_wait_set_state(&wait, URCU_WAIT_RUNNING);
 
-	mutex_lock(&rcu_gp_lock);
-
 	/*
 	 * Move all waiters into our local queue.
 	 */
@@ -480,7 +478,6 @@ void synchronize_rcu(void)
 	smp_mb_master();
 out:
 	mutex_unlock(&rcu_registry_lock);
-	mutex_unlock(&rcu_gp_lock);
 
 	/*
 	 * Wakeup waiters only after we have completed the grace period


And guess what, the result of the test was:

Test Summary Report
-------------------
./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
  Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
                  150, 165, 180, 195, 210, 225, 240, 253
		  255
Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 597.64 csys = 9579.25 CPU)
Result: FAIL

And test case 30 for example is something like:

tests/benchmark/test_urcu_mb <nreaders> <nwriters> 1 -d 0 -b 32768

and it failed because:

lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' failed.
zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d 0 -b 32768

So I think what was going on here was:

CPU 0 (reader)			CPU 1 (writer)					CPU 2 (writer)
===================		====================				======================
rcu_read_lock();								new = malloc(sizeof(int));
local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old		*new = 8;
										old = rcu_xchg_pointer(&test_rcu_pointer, new);
				synchronize_rcu():
				  urcu_wait_add(); // return 0
				  urcu_move_waiters(); // @gp_waiters is empty,
				  		       // the next urcu_wait_add() will return 0

				  						synchronize_rcu():
									  	  urcu_wait_add(); // return 0

				  mutex_lock(&rcu_register_lock);
				  wait_for_readers(); // remove registered reader from @registery,
				  		      // release rcu_register_lock and wait via poll()

										  mutex_lock(&rcu_registry_lock);
										  wait_for_readers(); // @regsitery is empty! we are so lucky
										  return;

										if (old)
											free(old)
rcu_read_unlock();
assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.


So the point is there could be two writers calling synchronize_rcu() but
not returning early(both of them enter the slow path to perform a grace
period), so the rcu_gp_lock is necessary in this case.

(Cc  Mathieu)

But this is only my understanding and I'm learning the URCU code too ;-)

Regards,
Boqun


> Might well be that it is unnecessary, but I will defer to Mathieu
> on that point.
> 
> 							Thanx, Paul
> 
> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> > As they don't currently perform grace period, why do we use the rcu_gp_lock?
> > 
> > Thank you.
> > Yuxin
> > 
> > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> > >> Hi,
> > >>
> > >> I am learning the URCU code.
> > >>
> > >> Why do we need rcu_gp_lock in synchronize_rcu?
> > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> > >>
> > >> In the comment, it says this lock ensures mutual exclusion between
> > >> threads calling synchronize_rcu().
> > >> But only the first thread added to waiter queue can proceed to detect
> > >> grace period.
> > >> How can multiple threads currently perform the grace thread?
> > >
> > > They don't concurrently perform grace periods, and it would be wasteful
> > > for them to do so.  Instead, the first one performs the grace period,
> > > and all that were waiting at the time it started get the benefit of that
> > > same grace period.
> > >
> > > Any that arrived after the first grace period performs the first
> > > grace period are served by whichever of them performs the second
> > > grace period.
> > >
> > >                                                         Thanx, Paul
> > >
> > 
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found]       ` <20160428124401.GB25392@insomnia>
@ 2016-04-28 13:47         ` Yuxin Ren
  2016-04-28 13:53         ` Paul E. McKenney
       [not found]         ` <CAAKbDrf5mta5bzDeJiwBh7_V=k6abx1DpbPx9sYCOhsgyrSHjA@mail.gmail.com>
  2 siblings, 0 replies; 11+ messages in thread
From: Yuxin Ren @ 2016-04-28 13:47 UTC (permalink / raw)
  To: Boqun Feng; +Cc: lttng-dev, Paul E. McKenney

Hi Boqun and Paul,

Thank you so much for your help.

I found one reason to use that lock.
In the slow path, a thread will move all waiters to a local queue.
https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
After this, following thread can also perform grace period, as the
global waiter queue is empty.
Thus the rcu_gp_lock is used to ensure mutual exclusion.

However, from real time aspect, such blocking will cause priority
inversion: higher priority writer can be blocked by low priority
writer.
Is there a way to modify the code to allow multiple threads to perform
grace period concurrently?

Thanks again!!
Yuxin

On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng <boqun.feng@gmail.com> wrote:
> Hi Paul and Yuxin,
>
> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
>> Try building without it and see what happens when you run the tests.
>>
>
> I've run a 'regtest' with the following modification out of curiousity:
>
> diff --git a/urcu.c b/urcu.c
> index a5568bdbd075..9dc3c9feae56 100644
> --- a/urcu.c
> +++ b/urcu.c
> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>         /* We won't need to wake ourself up */
>         urcu_wait_set_state(&wait, URCU_WAIT_RUNNING);
>
> -       mutex_lock(&rcu_gp_lock);
> -
>         /*
>          * Move all waiters into our local queue.
>          */
> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>         smp_mb_master();
>  out:
>         mutex_unlock(&rcu_registry_lock);
> -       mutex_unlock(&rcu_gp_lock);
>
>         /*
>          * Wakeup waiters only after we have completed the grace period
>
>
> And guess what, the result of the test was:
>
> Test Summary Report
> -------------------
> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>                   150, 165, 180, 195, 210, 225, 240, 253
>                   255
> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 597.64 csys = 9579.25 CPU)
> Result: FAIL
>
> And test case 30 for example is something like:
>
> tests/benchmark/test_urcu_mb <nreaders> <nwriters> 1 -d 0 -b 32768
>
> and it failed because:
>
> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' failed.
> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d 0 -b 32768
>
> So I think what was going on here was:
>
> CPU 0 (reader)                  CPU 1 (writer)                                  CPU 2 (writer)
> ===================             ====================                            ======================
> rcu_read_lock();                                                                new = malloc(sizeof(int));
> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old              *new = 8;
>                                                                                 old = rcu_xchg_pointer(&test_rcu_pointer, new);
>                                 synchronize_rcu():
>                                   urcu_wait_add(); // return 0
>                                   urcu_move_waiters(); // @gp_waiters is empty,
>                                                        // the next urcu_wait_add() will return 0
>
>                                                                                 synchronize_rcu():
>                                                                                   urcu_wait_add(); // return 0
>
>                                   mutex_lock(&rcu_register_lock);
>                                   wait_for_readers(); // remove registered reader from @registery,
>                                                       // release rcu_register_lock and wait via poll()
>
>                                                                                   mutex_lock(&rcu_registry_lock);
>                                                                                   wait_for_readers(); // @regsitery is empty! we are so lucky
>                                                                                   return;
>
>                                                                                 if (old)
>                                                                                         free(old)
> rcu_read_unlock();
> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
>
>
> So the point is there could be two writers calling synchronize_rcu() but
> not returning early(both of them enter the slow path to perform a grace
> period), so the rcu_gp_lock is necessary in this case.
>
> (Cc  Mathieu)
>
> But this is only my understanding and I'm learning the URCU code too ;-)
>
> Regards,
> Boqun
>
>
>> Might well be that it is unnecessary, but I will defer to Mathieu
>> on that point.
>>
>>                                                       Thanx, Paul
>>
>> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
>> > As they don't currently perform grace period, why do we use the rcu_gp_lock?
>> >
>> > Thank you.
>> > Yuxin
>> >
>> > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
>> > <paulmck@linux.vnet.ibm.com> wrote:
>> > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
>> > >> Hi,
>> > >>
>> > >> I am learning the URCU code.
>> > >>
>> > >> Why do we need rcu_gp_lock in synchronize_rcu?
>> > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
>> > >>
>> > >> In the comment, it says this lock ensures mutual exclusion between
>> > >> threads calling synchronize_rcu().
>> > >> But only the first thread added to waiter queue can proceed to detect
>> > >> grace period.
>> > >> How can multiple threads currently perform the grace thread?
>> > >
>> > > They don't concurrently perform grace periods, and it would be wasteful
>> > > for them to do so.  Instead, the first one performs the grace period,
>> > > and all that were waiting at the time it started get the benefit of that
>> > > same grace period.
>> > >
>> > > Any that arrived after the first grace period performs the first
>> > > grace period are served by whichever of them performs the second
>> > > grace period.
>> > >
>> > >                                                         Thanx, Paul
>> > >
>> >
>>
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev@lists.lttng.org
>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found]       ` <20160428124401.GB25392@insomnia>
  2016-04-28 13:47         ` Yuxin Ren
@ 2016-04-28 13:53         ` Paul E. McKenney
       [not found]         ` <CAAKbDrf5mta5bzDeJiwBh7_V=k6abx1DpbPx9sYCOhsgyrSHjA@mail.gmail.com>
  2 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2016-04-28 13:53 UTC (permalink / raw)
  To: Boqun Feng; +Cc: lttng-dev

On Thu, Apr 28, 2016 at 08:44:01PM +0800, Boqun Feng wrote:
> Hi Paul and Yuxin,
> 
> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> > Try building without it and see what happens when you run the tests.
> > 
> 
> I've run a 'regtest' with the following modification out of curiousity:
> 
> diff --git a/urcu.c b/urcu.c
> index a5568bdbd075..9dc3c9feae56 100644
> --- a/urcu.c
> +++ b/urcu.c
> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>  	/* We won't need to wake ourself up */
>  	urcu_wait_set_state(&wait, URCU_WAIT_RUNNING);
>  
> -	mutex_lock(&rcu_gp_lock);
> -
>  	/*
>  	 * Move all waiters into our local queue.
>  	 */
> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>  	smp_mb_master();
>  out:
>  	mutex_unlock(&rcu_registry_lock);
> -	mutex_unlock(&rcu_gp_lock);
>  
>  	/*
>  	 * Wakeup waiters only after we have completed the grace period
> 
> 
> And guess what, the result of the test was:
> 
> Test Summary Report
> -------------------
> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>                   150, 165, 180, 195, 210, 225, 240, 253
> 		  255
> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 597.64 csys = 9579.25 CPU)
> Result: FAIL
> 
> And test case 30 for example is something like:
> 
> tests/benchmark/test_urcu_mb <nreaders> <nwriters> 1 -d 0 -b 32768
> 
> and it failed because:
> 
> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' failed.
> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d 0 -b 32768
> 
> So I think what was going on here was:
> 
> CPU 0 (reader)			CPU 1 (writer)					CPU 2 (writer)
> ===================		====================				======================
> rcu_read_lock();								new = malloc(sizeof(int));
> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old		*new = 8;
> 										old = rcu_xchg_pointer(&test_rcu_pointer, new);
> 				synchronize_rcu():
> 				  urcu_wait_add(); // return 0
> 				  urcu_move_waiters(); // @gp_waiters is empty,
> 				  		       // the next urcu_wait_add() will return 0
> 
> 				  						synchronize_rcu():
> 									  	  urcu_wait_add(); // return 0
> 
> 				  mutex_lock(&rcu_register_lock);
> 				  wait_for_readers(); // remove registered reader from @registery,
> 				  		      // release rcu_register_lock and wait via poll()
> 
> 										  mutex_lock(&rcu_registry_lock);
> 										  wait_for_readers(); // @regsitery is empty! we are so lucky
> 										  return;
> 
> 										if (old)
> 											free(old)
> rcu_read_unlock();
> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
> 
> 
> So the point is there could be two writers calling synchronize_rcu() but
> not returning early(both of them enter the slow path to perform a grace
> period), so the rcu_gp_lock is necessary in this case.
> 
> (Cc  Mathieu)
> 
> But this is only my understanding and I'm learning the URCU code too ;-)

Nothing quite like actually trying it and seeing what happens!  One of
the best learning methods that I know of.

Assuming the act of actually trying it is non-fatal, of course.  ;-)

							Thanx, Paul

> Regards,
> Boqun
> 
> 
> > Might well be that it is unnecessary, but I will defer to Mathieu
> > on that point.
> > 
> > 							Thanx, Paul
> > 
> > On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> > > As they don't currently perform grace period, why do we use the rcu_gp_lock?
> > > 
> > > Thank you.
> > > Yuxin
> > > 
> > > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
> > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> > > >> Hi,
> > > >>
> > > >> I am learning the URCU code.
> > > >>
> > > >> Why do we need rcu_gp_lock in synchronize_rcu?
> > > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> > > >>
> > > >> In the comment, it says this lock ensures mutual exclusion between
> > > >> threads calling synchronize_rcu().
> > > >> But only the first thread added to waiter queue can proceed to detect
> > > >> grace period.
> > > >> How can multiple threads currently perform the grace thread?
> > > >
> > > > They don't concurrently perform grace periods, and it would be wasteful
> > > > for them to do so.  Instead, the first one performs the grace period,
> > > > and all that were waiting at the time it started get the benefit of that
> > > > same grace period.
> > > >
> > > > Any that arrived after the first grace period performs the first
> > > > grace period are served by whichever of them performs the second
> > > > grace period.
> > > >
> > > >                                                         Thanx, Paul
> > > >
> > > 
> > 
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev@lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found]         ` <CAAKbDrf5mta5bzDeJiwBh7_V=k6abx1DpbPx9sYCOhsgyrSHjA@mail.gmail.com>
@ 2016-04-28 13:55           ` Paul E. McKenney
  2016-04-28 14:38           ` Mathieu Desnoyers
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2016-04-28 13:55 UTC (permalink / raw)
  To: Yuxin Ren; +Cc: lttng-dev

On Thu, Apr 28, 2016 at 09:47:23AM -0400, Yuxin Ren wrote:
> Hi Boqun and Paul,
> 
> Thank you so much for your help.
> 
> I found one reason to use that lock.
> In the slow path, a thread will move all waiters to a local queue.
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> After this, following thread can also perform grace period, as the
> global waiter queue is empty.
> Thus the rcu_gp_lock is used to ensure mutual exclusion.
> 
> However, from real time aspect, such blocking will cause priority
> inversion: higher priority writer can be blocked by low priority
> writer.
> Is there a way to modify the code to allow multiple threads to perform
> grace period concurrently?

If a thread has real-time requirements, you shouldn't have it do
synchronous grace periods, just as you shouldn't have it do (say)
sleep(10).

You should instead either (1) have some other non-realtime thread do
the cleanup activities involving synchronize_rcu() or (2) have the
real-time thread use the asynchronous call_rcu().

							Thanx, Paul

> Thanks again!!
> Yuxin
> 
> On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng <boqun.feng@gmail.com> wrote:
> > Hi Paul and Yuxin,
> >
> > On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> >> Try building without it and see what happens when you run the tests.
> >>
> >
> > I've run a 'regtest' with the following modification out of curiousity:
> >
> > diff --git a/urcu.c b/urcu.c
> > index a5568bdbd075..9dc3c9feae56 100644
> > --- a/urcu.c
> > +++ b/urcu.c
> > @@ -398,8 +398,6 @@ void synchronize_rcu(void)
> >         /* We won't need to wake ourself up */
> >         urcu_wait_set_state(&wait, URCU_WAIT_RUNNING);
> >
> > -       mutex_lock(&rcu_gp_lock);
> > -
> >         /*
> >          * Move all waiters into our local queue.
> >          */
> > @@ -480,7 +478,6 @@ void synchronize_rcu(void)
> >         smp_mb_master();
> >  out:
> >         mutex_unlock(&rcu_registry_lock);
> > -       mutex_unlock(&rcu_gp_lock);
> >
> >         /*
> >          * Wakeup waiters only after we have completed the grace period
> >
> >
> > And guess what, the result of the test was:
> >
> > Test Summary Report
> > -------------------
> > ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
> >   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
> >                   150, 165, 180, 195, 210, 225, 240, 253
> >                   255
> > Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 597.64 csys = 9579.25 CPU)
> > Result: FAIL
> >
> > And test case 30 for example is something like:
> >
> > tests/benchmark/test_urcu_mb <nreaders> <nwriters> 1 -d 0 -b 32768
> >
> > and it failed because:
> >
> > lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' failed.
> > zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d 0 -b 32768
> >
> > So I think what was going on here was:
> >
> > CPU 0 (reader)                  CPU 1 (writer)                                  CPU 2 (writer)
> > ===================             ====================                            ======================
> > rcu_read_lock();                                                                new = malloc(sizeof(int));
> > local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old              *new = 8;
> >                                                                                 old = rcu_xchg_pointer(&test_rcu_pointer, new);
> >                                 synchronize_rcu():
> >                                   urcu_wait_add(); // return 0
> >                                   urcu_move_waiters(); // @gp_waiters is empty,
> >                                                        // the next urcu_wait_add() will return 0
> >
> >                                                                                 synchronize_rcu():
> >                                                                                   urcu_wait_add(); // return 0
> >
> >                                   mutex_lock(&rcu_register_lock);
> >                                   wait_for_readers(); // remove registered reader from @registery,
> >                                                       // release rcu_register_lock and wait via poll()
> >
> >                                                                                   mutex_lock(&rcu_registry_lock);
> >                                                                                   wait_for_readers(); // @regsitery is empty! we are so lucky
> >                                                                                   return;
> >
> >                                                                                 if (old)
> >                                                                                         free(old)
> > rcu_read_unlock();
> > assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
> >
> >
> > So the point is there could be two writers calling synchronize_rcu() but
> > not returning early(both of them enter the slow path to perform a grace
> > period), so the rcu_gp_lock is necessary in this case.
> >
> > (Cc  Mathieu)
> >
> > But this is only my understanding and I'm learning the URCU code too ;-)
> >
> > Regards,
> > Boqun
> >
> >
> >> Might well be that it is unnecessary, but I will defer to Mathieu
> >> on that point.
> >>
> >>                                                       Thanx, Paul
> >>
> >> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> >> > As they don't currently perform grace period, why do we use the rcu_gp_lock?
> >> >
> >> > Thank you.
> >> > Yuxin
> >> >
> >> > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
> >> > <paulmck@linux.vnet.ibm.com> wrote:
> >> > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> >> > >> Hi,
> >> > >>
> >> > >> I am learning the URCU code.
> >> > >>
> >> > >> Why do we need rcu_gp_lock in synchronize_rcu?
> >> > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> >> > >>
> >> > >> In the comment, it says this lock ensures mutual exclusion between
> >> > >> threads calling synchronize_rcu().
> >> > >> But only the first thread added to waiter queue can proceed to detect
> >> > >> grace period.
> >> > >> How can multiple threads currently perform the grace thread?
> >> > >
> >> > > They don't concurrently perform grace periods, and it would be wasteful
> >> > > for them to do so.  Instead, the first one performs the grace period,
> >> > > and all that were waiting at the time it started get the benefit of that
> >> > > same grace period.
> >> > >
> >> > > Any that arrived after the first grace period performs the first
> >> > > grace period are served by whichever of them performs the second
> >> > > grace period.
> >> > >
> >> > >                                                         Thanx, Paul
> >> > >
> >> >
> >>
> >> _______________________________________________
> >> lttng-dev mailing list
> >> lttng-dev@lists.lttng.org
> >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found]         ` <CAAKbDrf5mta5bzDeJiwBh7_V=k6abx1DpbPx9sYCOhsgyrSHjA@mail.gmail.com>
  2016-04-28 13:55           ` Paul E. McKenney
@ 2016-04-28 14:38           ` Mathieu Desnoyers
       [not found]           ` <981217620.72238.1461854320490.JavaMail.zimbra@efficios.com>
  2016-04-28 16:08           ` Mathieu Desnoyers
  3 siblings, 0 replies; 11+ messages in thread
From: Mathieu Desnoyers @ 2016-04-28 14:38 UTC (permalink / raw)
  To: Yuxin Ren; +Cc: Paul E. McKenney, lttng-dev

----- On Apr 28, 2016, at 9:47 AM, Yuxin Ren ryx@gwmail.gwu.edu wrote:

> Hi Boqun and Paul,
> 
> Thank you so much for your help.
> 
> I found one reason to use that lock.
> In the slow path, a thread will move all waiters to a local queue.
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> After this, following thread can also perform grace period, as the
> global waiter queue is empty.
> Thus the rcu_gp_lock is used to ensure mutual exclusion.
> 
> However, from real time aspect, such blocking will cause priority
> inversion: higher priority writer can be blocked by low priority
> writer.
> Is there a way to modify the code to allow multiple threads to perform
> grace period concurrently?

Before we redesign urcu for RT, would it be possible to simply
use pi-mutexes (priority inheritance) instead to protect grace periods
from each other with the current urcu scheme ?

Thanks,

Mathieu


> 
> Thanks again!!
> Yuxin
> 
> On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng <boqun.feng@gmail.com> wrote:
>> Hi Paul and Yuxin,
>>
>> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
>>> Try building without it and see what happens when you run the tests.
>>>
>>
>> I've run a 'regtest' with the following modification out of curiousity:
>>
>> diff --git a/urcu.c b/urcu.c
>> index a5568bdbd075..9dc3c9feae56 100644
>> --- a/urcu.c
>> +++ b/urcu.c
>> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>>         /* We won't need to wake ourself up */
>>         urcu_wait_set_state(&wait, URCU_WAIT_RUNNING);
>>
>> -       mutex_lock(&rcu_gp_lock);
>> -
>>         /*
>>          * Move all waiters into our local queue.
>>          */
>> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>>         smp_mb_master();
>>  out:
>>         mutex_unlock(&rcu_registry_lock);
>> -       mutex_unlock(&rcu_gp_lock);
>>
>>         /*
>>          * Wakeup waiters only after we have completed the grace period
>>
>>
>> And guess what, the result of the test was:
>>
>> Test Summary Report
>> -------------------
>> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>>                   150, 165, 180, 195, 210, 225, 240, 253
>>                   255
>> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr
>> 597.64 csys = 9579.25 CPU)
>> Result: FAIL
>>
>> And test case 30 for example is something like:
>>
>> tests/benchmark/test_urcu_mb <nreaders> <nwriters> 1 -d 0 -b 32768
>>
>> and it failed because:
>>
>> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8'
>> failed.
>> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d
>> 0 -b 32768
>>
>> So I think what was going on here was:
>>
>> CPU 0 (reader)                  CPU 1 (writer)
>> CPU 2 (writer)
>> ===================             ====================
>> ======================
>> rcu_read_lock();
>> new =
>> malloc(sizeof(int));
>> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
>> *new = 8;
>>                                                                                 old = rcu_xchg_pointer(&test_rcu_pointer, new);
>>                                 synchronize_rcu():
>>                                   urcu_wait_add(); // return 0
>>                                   urcu_move_waiters(); // @gp_waiters is empty,
>>                                                        // the next urcu_wait_add() will return 0
>>
>>                                                                                 synchronize_rcu():
>>                                                                                   urcu_wait_add(); // return 0
>>
>>                                   mutex_lock(&rcu_register_lock);
>>                                   wait_for_readers(); // remove registered reader from @registery,
>>                                                       // release rcu_register_lock and wait via poll()
>>
>>                                                                                   mutex_lock(&rcu_registry_lock);
>>                                                                                   wait_for_readers(); // @regsitery is empty! we are so lucky
>>                                                                                   return;
>>
>>                                                                                 if (old)
>>                                                                                         free(old)
>> rcu_read_unlock();
>> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
>>
>>
>> So the point is there could be two writers calling synchronize_rcu() but
>> not returning early(both of them enter the slow path to perform a grace
>> period), so the rcu_gp_lock is necessary in this case.
>>
>> (Cc  Mathieu)
>>
>> But this is only my understanding and I'm learning the URCU code too ;-)
>>
>> Regards,
>> Boqun
>>
>>
>>> Might well be that it is unnecessary, but I will defer to Mathieu
>>> on that point.
>>>
>>>                                                       Thanx, Paul
>>>
>>> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
>>> > As they don't currently perform grace period, why do we use the rcu_gp_lock?
>>> >
>>> > Thank you.
>>> > Yuxin
>>> >
>>> > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
>>> > <paulmck@linux.vnet.ibm.com> wrote:
>>> > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
>>> > >> Hi,
>>> > >>
>>> > >> I am learning the URCU code.
>>> > >>
>>> > >> Why do we need rcu_gp_lock in synchronize_rcu?
>>> > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
>>> > >>
>>> > >> In the comment, it says this lock ensures mutual exclusion between
>>> > >> threads calling synchronize_rcu().
>>> > >> But only the first thread added to waiter queue can proceed to detect
>>> > >> grace period.
>>> > >> How can multiple threads currently perform the grace thread?
>>> > >
>>> > > They don't concurrently perform grace periods, and it would be wasteful
>>> > > for them to do so.  Instead, the first one performs the grace period,
>>> > > and all that were waiting at the time it started get the benefit of that
>>> > > same grace period.
>>> > >
>>> > > Any that arrived after the first grace period performs the first
>>> > > grace period are served by whichever of them performs the second
>>> > > grace period.
>>> > >
>>> > >                                                         Thanx, Paul
>>> > >
>>> >
>>>
>>> _______________________________________________
>>> lttng-dev mailing list
>>> lttng-dev@lists.lttng.org
> >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found]           ` <981217620.72238.1461854320490.JavaMail.zimbra@efficios.com>
@ 2016-04-28 14:57             ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2016-04-28 14:57 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev

On Thu, Apr 28, 2016 at 02:38:40PM +0000, Mathieu Desnoyers wrote:
> ----- On Apr 28, 2016, at 9:47 AM, Yuxin Ren ryx@gwmail.gwu.edu wrote:
> 
> > Hi Boqun and Paul,
> > 
> > Thank you so much for your help.
> > 
> > I found one reason to use that lock.
> > In the slow path, a thread will move all waiters to a local queue.
> > https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> > After this, following thread can also perform grace period, as the
> > global waiter queue is empty.
> > Thus the rcu_gp_lock is used to ensure mutual exclusion.
> > 
> > However, from real time aspect, such blocking will cause priority
> > inversion: higher priority writer can be blocked by low priority
> > writer.
> > Is there a way to modify the code to allow multiple threads to perform
> > grace period concurrently?
> 
> Before we redesign urcu for RT, would it be possible to simply
> use pi-mutexes (priority inheritance) instead to protect grace periods
> from each other with the current urcu scheme ?

Given that priority inversion can happen with low-priority readers
blocking a grace period that a high-priority updater is waiting on,
I stand by my earlier advice:  Don't let high-priority updaters block
waiting for grace periods.  ;-)

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > Thanks again!!
> > Yuxin
> > 
> > On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng <boqun.feng@gmail.com> wrote:
> >> Hi Paul and Yuxin,
> >>
> >> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> >>> Try building without it and see what happens when you run the tests.
> >>>
> >>
> >> I've run a 'regtest' with the following modification out of curiousity:
> >>
> >> diff --git a/urcu.c b/urcu.c
> >> index a5568bdbd075..9dc3c9feae56 100644
> >> --- a/urcu.c
> >> +++ b/urcu.c
> >> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
> >>         /* We won't need to wake ourself up */
> >>         urcu_wait_set_state(&wait, URCU_WAIT_RUNNING);
> >>
> >> -       mutex_lock(&rcu_gp_lock);
> >> -
> >>         /*
> >>          * Move all waiters into our local queue.
> >>          */
> >> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
> >>         smp_mb_master();
> >>  out:
> >>         mutex_unlock(&rcu_registry_lock);
> >> -       mutex_unlock(&rcu_gp_lock);
> >>
> >>         /*
> >>          * Wakeup waiters only after we have completed the grace period
> >>
> >>
> >> And guess what, the result of the test was:
> >>
> >> Test Summary Report
> >> -------------------
> >> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
> >>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
> >>                   150, 165, 180, 195, 210, 225, 240, 253
> >>                   255
> >> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr
> >> 597.64 csys = 9579.25 CPU)
> >> Result: FAIL
> >>
> >> And test case 30 for example is something like:
> >>
> >> tests/benchmark/test_urcu_mb <nreaders> <nwriters> 1 -d 0 -b 32768
> >>
> >> and it failed because:
> >>
> >> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8'
> >> failed.
> >> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d
> >> 0 -b 32768
> >>
> >> So I think what was going on here was:
> >>
> >> CPU 0 (reader)                  CPU 1 (writer)
> >> CPU 2 (writer)
> >> ===================             ====================
> >> ======================
> >> rcu_read_lock();
> >> new =
> >> malloc(sizeof(int));
> >> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
> >> *new = 8;
> >>                                                                                 old = rcu_xchg_pointer(&test_rcu_pointer, new);
> >>                                 synchronize_rcu():
> >>                                   urcu_wait_add(); // return 0
> >>                                   urcu_move_waiters(); // @gp_waiters is empty,
> >>                                                        // the next urcu_wait_add() will return 0
> >>
> >>                                                                                 synchronize_rcu():
> >>                                                                                   urcu_wait_add(); // return 0
> >>
> >>                                   mutex_lock(&rcu_register_lock);
> >>                                   wait_for_readers(); // remove registered reader from @registery,
> >>                                                       // release rcu_register_lock and wait via poll()
> >>
> >>                                                                                   mutex_lock(&rcu_registry_lock);
> >>                                                                                   wait_for_readers(); // @regsitery is empty! we are so lucky
> >>                                                                                   return;
> >>
> >>                                                                                 if (old)
> >>                                                                                         free(old)
> >> rcu_read_unlock();
> >> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
> >>
> >>
> >> So the point is there could be two writers calling synchronize_rcu() but
> >> not returning early(both of them enter the slow path to perform a grace
> >> period), so the rcu_gp_lock is necessary in this case.
> >>
> >> (Cc  Mathieu)
> >>
> >> But this is only my understanding and I'm learning the URCU code too ;-)
> >>
> >> Regards,
> >> Boqun
> >>
> >>
> >>> Might well be that it is unnecessary, but I will defer to Mathieu
> >>> on that point.
> >>>
> >>>                                                       Thanx, Paul
> >>>
> >>> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> >>> > As they don't currently perform grace period, why do we use the rcu_gp_lock?
> >>> >
> >>> > Thank you.
> >>> > Yuxin
> >>> >
> >>> > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
> >>> > <paulmck@linux.vnet.ibm.com> wrote:
> >>> > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> >>> > >> Hi,
> >>> > >>
> >>> > >> I am learning the URCU code.
> >>> > >>
> >>> > >> Why do we need rcu_gp_lock in synchronize_rcu?
> >>> > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> >>> > >>
> >>> > >> In the comment, it says this lock ensures mutual exclusion between
> >>> > >> threads calling synchronize_rcu().
> >>> > >> But only the first thread added to waiter queue can proceed to detect
> >>> > >> grace period.
> >>> > >> How can multiple threads currently perform the grace thread?
> >>> > >
> >>> > > They don't concurrently perform grace periods, and it would be wasteful
> >>> > > for them to do so.  Instead, the first one performs the grace period,
> >>> > > and all that were waiting at the time it started get the benefit of that
> >>> > > same grace period.
> >>> > >
> >>> > > Any that arrived after the first grace period performs the first
> >>> > > grace period are served by whichever of them performs the second
> >>> > > grace period.
> >>> > >
> >>> > >                                                         Thanx, Paul
> >>> > >
> >>> >
> >>>
> >>> _______________________________________________
> >>> lttng-dev mailing list
> >>> lttng-dev@lists.lttng.org
> > >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> 

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about lock in synchronize_rcu implementation of URCU
       [not found]         ` <CAAKbDrf5mta5bzDeJiwBh7_V=k6abx1DpbPx9sYCOhsgyrSHjA@mail.gmail.com>
                             ` (2 preceding siblings ...)
       [not found]           ` <981217620.72238.1461854320490.JavaMail.zimbra@efficios.com>
@ 2016-04-28 16:08           ` Mathieu Desnoyers
  3 siblings, 0 replies; 11+ messages in thread
From: Mathieu Desnoyers @ 2016-04-28 16:08 UTC (permalink / raw)
  To: Yuxin Ren; +Cc: Paul E. McKenney, lttng-dev

----- On Apr 28, 2016, at 9:47 AM, Yuxin Ren ryx@gwmail.gwu.edu wrote:

> Hi Boqun and Paul,
> 
> Thank you so much for your help.
> 
> I found one reason to use that lock.
> In the slow path, a thread will move all waiters to a local queue.
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> After this, following thread can also perform grace period, as the
> global waiter queue is empty.
> Thus the rcu_gp_lock is used to ensure mutual exclusion.
> 
> However, from real time aspect, such blocking will cause priority
> inversion: higher priority writer can be blocked by low priority
> writer.
> Is there a way to modify the code to allow multiple threads to perform
> grace period concurrently?

By the way, there are other reasons for this lock in the urcu implementation:
it protects the rcu_registry, and the rcu_gp.ctr count.

Thanks,

Mathieu

> 
> Thanks again!!
> Yuxin
> 
> On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng <boqun.feng@gmail.com> wrote:
>> Hi Paul and Yuxin,
>>
>> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
>>> Try building without it and see what happens when you run the tests.
>>>
>>
>> I've run a 'regtest' with the following modification out of curiousity:
>>
>> diff --git a/urcu.c b/urcu.c
>> index a5568bdbd075..9dc3c9feae56 100644
>> --- a/urcu.c
>> +++ b/urcu.c
>> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>>         /* We won't need to wake ourself up */
>>         urcu_wait_set_state(&wait, URCU_WAIT_RUNNING);
>>
>> -       mutex_lock(&rcu_gp_lock);
>> -
>>         /*
>>          * Move all waiters into our local queue.
>>          */
>> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>>         smp_mb_master();
>>  out:
>>         mutex_unlock(&rcu_registry_lock);
>> -       mutex_unlock(&rcu_gp_lock);
>>
>>         /*
>>          * Wakeup waiters only after we have completed the grace period
>>
>>
>> And guess what, the result of the test was:
>>
>> Test Summary Report
>> -------------------
>> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>>                   150, 165, 180, 195, 210, 225, 240, 253
>>                   255
>> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr
>> 597.64 csys = 9579.25 CPU)
>> Result: FAIL
>>
>> And test case 30 for example is something like:
>>
>> tests/benchmark/test_urcu_mb <nreaders> <nwriters> 1 -d 0 -b 32768
>>
>> and it failed because:
>>
>> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8'
>> failed.
>> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d
>> 0 -b 32768
>>
>> So I think what was going on here was:
>>
>> CPU 0 (reader)                  CPU 1 (writer)
>> CPU 2 (writer)
>> ===================             ====================
>> ======================
>> rcu_read_lock();
>> new =
>> malloc(sizeof(int));
>> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
>> *new = 8;
>>                                                                                 old = rcu_xchg_pointer(&test_rcu_pointer, new);
>>                                 synchronize_rcu():
>>                                   urcu_wait_add(); // return 0
>>                                   urcu_move_waiters(); // @gp_waiters is empty,
>>                                                        // the next urcu_wait_add() will return 0
>>
>>                                                                                 synchronize_rcu():
>>                                                                                   urcu_wait_add(); // return 0
>>
>>                                   mutex_lock(&rcu_register_lock);
>>                                   wait_for_readers(); // remove registered reader from @registery,
>>                                                       // release rcu_register_lock and wait via poll()
>>
>>                                                                                   mutex_lock(&rcu_registry_lock);
>>                                                                                   wait_for_readers(); // @regsitery is empty! we are so lucky
>>                                                                                   return;
>>
>>                                                                                 if (old)
>>                                                                                         free(old)
>> rcu_read_unlock();
>> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
>>
>>
>> So the point is there could be two writers calling synchronize_rcu() but
>> not returning early(both of them enter the slow path to perform a grace
>> period), so the rcu_gp_lock is necessary in this case.
>>
>> (Cc  Mathieu)
>>
>> But this is only my understanding and I'm learning the URCU code too ;-)
>>
>> Regards,
>> Boqun
>>
>>
>>> Might well be that it is unnecessary, but I will defer to Mathieu
>>> on that point.
>>>
>>>                                                       Thanx, Paul
>>>
>>> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
>>> > As they don't currently perform grace period, why do we use the rcu_gp_lock?
>>> >
>>> > Thank you.
>>> > Yuxin
>>> >
>>> > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
>>> > <paulmck@linux.vnet.ibm.com> wrote:
>>> > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
>>> > >> Hi,
>>> > >>
>>> > >> I am learning the URCU code.
>>> > >>
>>> > >> Why do we need rcu_gp_lock in synchronize_rcu?
>>> > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
>>> > >>
>>> > >> In the comment, it says this lock ensures mutual exclusion between
>>> > >> threads calling synchronize_rcu().
>>> > >> But only the first thread added to waiter queue can proceed to detect
>>> > >> grace period.
>>> > >> How can multiple threads currently perform the grace thread?
>>> > >
>>> > > They don't concurrently perform grace periods, and it would be wasteful
>>> > > for them to do so.  Instead, the first one performs the grace period,
>>> > > and all that were waiting at the time it started get the benefit of that
>>> > > same grace period.
>>> > >
>>> > > Any that arrived after the first grace period performs the first
>>> > > grace period are served by whichever of them performs the second
>>> > > grace period.
>>> > >
>>> > >                                                         Thanx, Paul
>>> > >
>>> >
>>>
>>> _______________________________________________
>>> lttng-dev mailing list
>>> lttng-dev@lists.lttng.org
> >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Question about lock in synchronize_rcu implementation of URCU
@ 2016-04-28  1:34 Yuxin Ren
  0 siblings, 0 replies; 11+ messages in thread
From: Yuxin Ren @ 2016-04-28  1:34 UTC (permalink / raw)
  To: lttng-dev, Paul McKenney

Hi,

I am learning the URCU code.

Why do we need rcu_gp_lock in synchronize_rcu?
https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401

In the comment, it says this lock ensures mutual exclusion between
threads calling synchronize_rcu().
But only the first thread added to waiter queue can proceed to detect
grace period.
How can multiple threads currently perform the grace thread?

Thanks a lot!
Yuxin
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-04-28 16:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAAKbDrfmcqEE7_6p_J0DPVO85yDGJ0GCbi5k9Lm19g4VaLgOWw@mail.gmail.com>
2016-04-28  2:08 ` Question about lock in synchronize_rcu implementation of URCU Paul E. McKenney
     [not found] ` <20160428020803.GN4967@linux.vnet.ibm.com>
2016-04-28  2:18   ` Yuxin Ren
     [not found]   ` <CAAKbDrfRMKzu6brHG=pfm0mu_Oy1afVZ94zDCiN7Y0w5o+SMZQ@mail.gmail.com>
2016-04-28  4:23     ` Paul E. McKenney
     [not found]     ` <20160428042327.GP4967@linux.vnet.ibm.com>
2016-04-28 12:44       ` Boqun Feng
     [not found]       ` <20160428124401.GB25392@insomnia>
2016-04-28 13:47         ` Yuxin Ren
2016-04-28 13:53         ` Paul E. McKenney
     [not found]         ` <CAAKbDrf5mta5bzDeJiwBh7_V=k6abx1DpbPx9sYCOhsgyrSHjA@mail.gmail.com>
2016-04-28 13:55           ` Paul E. McKenney
2016-04-28 14:38           ` Mathieu Desnoyers
     [not found]           ` <981217620.72238.1461854320490.JavaMail.zimbra@efficios.com>
2016-04-28 14:57             ` Paul E. McKenney
2016-04-28 16:08           ` Mathieu Desnoyers
2016-04-28  1:34 Yuxin Ren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.