All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] CodeSamples/count: add necessary partial memory barriers
@ 2023-03-28 14:58 Alan Huang
  2023-03-28 23:43 ` Akira Yokosawa
  0 siblings, 1 reply; 7+ messages in thread
From: Alan Huang @ 2023-03-28 14:58 UTC (permalink / raw)
  To: paulmck, akiyks; +Cc: perfbook, Alan Huang

This patch add several necessary partial memory barriers, the first one
change READ_ONCE to smp_load_acquire to makes sure the reading from theftp[t]
happens before the reading from counterp[t]. The litmus testing below
represents the original code pattern, and the result is "Sometimes":

C counter_sig

{}

P0(int *theft, int *counter)
{
    int r0;
    int r1;

    r0 = READ_ONCE(*theft);
    r1 = READ_ONCE(*counter);
}

P1(int *theft, int *counter)
{
    WRITE_ONCE(*counter, 1);
    smp_mb();
    WRITE_ONCE(*theft, 1);
}

exists (0:r0=1 /\ 0:r1=0)

Second one change WRITE_ONCE to smp_store_release to make sure that setting
counterp[t] happens before the setting theftp[p] to THEFT_IDLE. Here is the
litmus testing, The result is "Sometimes":

C counter_sig_2

{
    int theft = 1;
    int counter = 1;
}

P0(int *theft, int *counter)
{
    WRITE_ONCE(*counter, 0);
    WRITE_ONCE(*theft, 0);
}

P1(int *theft, int *counter)
{
    if (READ_ONCE(*theft) == 0)
    {
        WRITE_ONCE(*counter, READ_ONCE(*counter)+1);
    }
}

P2(int *counter)
{
    int r0;

    r0 = READ_ONCE(*counter);
}

exists (2:r0=2)

Note that I also changed the reading of theft variable
to smp_load_acquire in add_count/sub_count's fast path
to make sure that reading theft happens before reading
counter.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
---
 CodeSamples/count/count_lim_sig.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/CodeSamples/count/count_lim_sig.c b/CodeSamples/count/count_lim_sig.c
index 023d6215..59da8077 100644
--- a/CodeSamples/count/count_lim_sig.c
+++ b/CodeSamples/count/count_lim_sig.c
@@ -81,7 +81,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
 	for_each_tid(t, tid) {				//\lnlbl{flush:loop2:b}
 		if (theftp[t] == NULL)			//\lnlbl{flush:skip:nonexist}
 			continue;			//\lnlbl{flush:next2}
-		while (READ_ONCE(*theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
+		while (smp_load_acquire(theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
 			poll(NULL, 0, 1);		//\lnlbl{flush:block}
 			if (READ_ONCE(*theftp[t]) == THEFT_REQ)//\lnlbl{flush:check:REQ}
 				pthread_kill(tid, SIGUSR1);//\lnlbl{flush:signal2}
@@ -90,7 +90,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
 		*counterp[t] = 0;
 		globalreserve -= *countermaxp[t];
 		*countermaxp[t] = 0;			//\lnlbl{flush:thiev:e}
-		WRITE_ONCE(*theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
+		smp_store_release(theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
 	}						//\lnlbl{flush:loop2:e}
 }							//\lnlbl{flush:e}
 
@@ -116,7 +116,7 @@ int add_count(unsigned long delta)			//\lnlbl{b}
 
 	WRITE_ONCE(counting, 1);			//\lnlbl{fast:b}
 	barrier();					//\lnlbl{barrier:1}
-	if (READ_ONCE(theft) <= THEFT_REQ &&		//\lnlbl{check:b}
+	if (smp_load_acquire(&theft) <= THEFT_REQ &&		//\lnlbl{check:b}
 	    countermax - counter >= delta) {		//\lnlbl{check:e}
 		WRITE_ONCE(counter, counter + delta);	//\lnlbl{add:f}
 		fastpath = 1;				//\lnlbl{fasttaken}
@@ -155,7 +155,7 @@ int sub_count(unsigned long delta)
 
 	WRITE_ONCE(counting, 1);
 	barrier();
-	if (READ_ONCE(theft) <= THEFT_REQ &&
+	if (smp_load_acquire(&theft) <= THEFT_REQ &&
 	    counter >= delta) {
 		WRITE_ONCE(counter, counter - delta);
 		fastpath = 1;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] CodeSamples/count: add necessary partial memory barriers
  2023-03-28 14:58 [PATCH v3] CodeSamples/count: add necessary partial memory barriers Alan Huang
@ 2023-03-28 23:43 ` Akira Yokosawa
  2023-03-29 23:18   ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Akira Yokosawa @ 2023-03-28 23:43 UTC (permalink / raw)
  To: Alan Huang, paulmck; +Cc: perfbook, Akira Yokosawa

On Tue, 28 Mar 2023 10:58:24 -0400, Alan Huang wrote:
> This patch add several necessary partial memory barriers, the first one
> change READ_ONCE to smp_load_acquire to makes sure the reading from theftp[t]
> happens before the reading from counterp[t]. The litmus testing below
> represents the original code pattern, and the result is "Sometimes":
> 
> C counter_sig
> 
> {}
> 
> P0(int *theft, int *counter)
> {
>     int r0;
>     int r1;
> 
>     r0 = READ_ONCE(*theft);
>     r1 = READ_ONCE(*counter);
> }
> 
> P1(int *theft, int *counter)
> {
>     WRITE_ONCE(*counter, 1);
>     smp_mb();
>     WRITE_ONCE(*theft, 1);
> }
> 
> exists (0:r0=1 /\ 0:r1=0)
> 
> Second one change WRITE_ONCE to smp_store_release to make sure that setting
> counterp[t] happens before the setting theftp[p] to THEFT_IDLE. Here is the
> litmus testing, The result is "Sometimes":
> 
> C counter_sig_2
> 
> {
>     int theft = 1;
>     int counter = 1;
> }
> 
> P0(int *theft, int *counter)
> {
>     WRITE_ONCE(*counter, 0);
>     WRITE_ONCE(*theft, 0);
> }
> 
> P1(int *theft, int *counter)
> {
>     if (READ_ONCE(*theft) == 0)
>     {
>         WRITE_ONCE(*counter, READ_ONCE(*counter)+1);
>     }
> }
> 
> P2(int *counter)
> {
>     int r0;
> 
>     r0 = READ_ONCE(*counter);
> }
> 
> exists (2:r0=2)
> 
> Note that I also changed the reading of theft variable
> to smp_load_acquire in add_count/sub_count's fast path
> to make sure that reading theft happens before reading
> counter.
> 
> Signed-off-by: Alan Huang <mmpgouride@gmail.com>

Looks good to me.
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>

Paul, there is a question for you.
Please see below.

> ---
>  CodeSamples/count/count_lim_sig.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/CodeSamples/count/count_lim_sig.c b/CodeSamples/count/count_lim_sig.c
> index 023d6215..59da8077 100644
> --- a/CodeSamples/count/count_lim_sig.c
> +++ b/CodeSamples/count/count_lim_sig.c
> @@ -81,7 +81,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
>  	for_each_tid(t, tid) {				//\lnlbl{flush:loop2:b}
>  		if (theftp[t] == NULL)			//\lnlbl{flush:skip:nonexist}
>  			continue;			//\lnlbl{flush:next2}
> -		while (READ_ONCE(*theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
> +		while (smp_load_acquire(theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
>  			poll(NULL, 0, 1);		//\lnlbl{flush:block}
>  			if (READ_ONCE(*theftp[t]) == THEFT_REQ)//\lnlbl{flush:check:REQ}
>  				pthread_kill(tid, SIGUSR1);//\lnlbl{flush:signal2}
> @@ -90,7 +90,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
>  		*counterp[t] = 0;
>  		globalreserve -= *countermaxp[t];
>  		*countermaxp[t] = 0;			//\lnlbl{flush:thiev:e}
> -		WRITE_ONCE(*theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
> +		smp_store_release(theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
>  	}						//\lnlbl{flush:loop2:e}
>  }							//\lnlbl{flush:e}
>  
> @@ -116,7 +116,7 @@ int add_count(unsigned long delta)			//\lnlbl{b}
>  
>  	WRITE_ONCE(counting, 1);			//\lnlbl{fast:b}
>  	barrier();					//\lnlbl{barrier:1}
> -	if (READ_ONCE(theft) <= THEFT_REQ &&		//\lnlbl{check:b}
> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&		//\lnlbl{check:b}
>  	    countermax - counter >= delta) {		//\lnlbl{check:e}
>  		WRITE_ONCE(counter, counter + delta);	//\lnlbl{add:f}
>  		fastpath = 1;				//\lnlbl{fasttaken}
> @@ -155,7 +155,7 @@ int sub_count(unsigned long delta)
>  
>  	WRITE_ONCE(counting, 1);
>  	barrier();
> -	if (READ_ONCE(theft) <= THEFT_REQ &&
> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&
>  	    counter >= delta) {
>  		WRITE_ONCE(counter, counter - delta);
>  		fastpath = 1;

In add_count() and sub_count(), is the plain accesses to counter
guaranteed to see the value stored in flush_local_count()?

I'm asking because counter is declared as a thread-local variable:

In this code, as each of the plain loads of counter is the first
access to the variable in each function, there is not much room
for compilers to exploit the plainness, I suppose.

What do you think?

        Thanks, Akira


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] CodeSamples/count: add necessary partial memory barriers
  2023-03-28 23:43 ` Akira Yokosawa
@ 2023-03-29 23:18   ` Paul E. McKenney
  2023-03-30  9:59     ` Alan Huang
  2023-03-30 11:30     ` Akira Yokosawa
  0 siblings, 2 replies; 7+ messages in thread
From: Paul E. McKenney @ 2023-03-29 23:18 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Alan Huang, perfbook

On Wed, Mar 29, 2023 at 08:43:33AM +0900, Akira Yokosawa wrote:
> On Tue, 28 Mar 2023 10:58:24 -0400, Alan Huang wrote:
> > This patch add several necessary partial memory barriers, the first one
> > change READ_ONCE to smp_load_acquire to makes sure the reading from theftp[t]
> > happens before the reading from counterp[t]. The litmus testing below
> > represents the original code pattern, and the result is "Sometimes":
> > 
> > C counter_sig
> > 
> > {}
> > 
> > P0(int *theft, int *counter)
> > {
> >     int r0;
> >     int r1;
> > 
> >     r0 = READ_ONCE(*theft);
> >     r1 = READ_ONCE(*counter);
> > }
> > 
> > P1(int *theft, int *counter)
> > {
> >     WRITE_ONCE(*counter, 1);
> >     smp_mb();
> >     WRITE_ONCE(*theft, 1);
> > }
> > 
> > exists (0:r0=1 /\ 0:r1=0)
> > 
> > Second one change WRITE_ONCE to smp_store_release to make sure that setting
> > counterp[t] happens before the setting theftp[p] to THEFT_IDLE. Here is the
> > litmus testing, The result is "Sometimes":
> > 
> > C counter_sig_2
> > 
> > {
> >     int theft = 1;
> >     int counter = 1;
> > }
> > 
> > P0(int *theft, int *counter)
> > {
> >     WRITE_ONCE(*counter, 0);
> >     WRITE_ONCE(*theft, 0);
> > }
> > 
> > P1(int *theft, int *counter)
> > {
> >     if (READ_ONCE(*theft) == 0)
> >     {
> >         WRITE_ONCE(*counter, READ_ONCE(*counter)+1);
> >     }
> > }
> > 
> > P2(int *counter)
> > {
> >     int r0;
> > 
> >     r0 = READ_ONCE(*counter);
> > }
> > 
> > exists (2:r0=2)
> > 
> > Note that I also changed the reading of theft variable
> > to smp_load_acquire in add_count/sub_count's fast path
> > to make sure that reading theft happens before reading
> > counter.
> > 
> > Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> 
> Looks good to me.
> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>

Thank you both!

> Paul, there is a question for you.
> Please see below.
> 
> > ---
> >  CodeSamples/count/count_lim_sig.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/CodeSamples/count/count_lim_sig.c b/CodeSamples/count/count_lim_sig.c
> > index 023d6215..59da8077 100644
> > --- a/CodeSamples/count/count_lim_sig.c
> > +++ b/CodeSamples/count/count_lim_sig.c
> > @@ -81,7 +81,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
> >  	for_each_tid(t, tid) {				//\lnlbl{flush:loop2:b}
> >  		if (theftp[t] == NULL)			//\lnlbl{flush:skip:nonexist}
> >  			continue;			//\lnlbl{flush:next2}
> > -		while (READ_ONCE(*theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
> > +		while (smp_load_acquire(theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
> >  			poll(NULL, 0, 1);		//\lnlbl{flush:block}
> >  			if (READ_ONCE(*theftp[t]) == THEFT_REQ)//\lnlbl{flush:check:REQ}
> >  				pthread_kill(tid, SIGUSR1);//\lnlbl{flush:signal2}
> > @@ -90,7 +90,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
> >  		*counterp[t] = 0;
> >  		globalreserve -= *countermaxp[t];
> >  		*countermaxp[t] = 0;			//\lnlbl{flush:thiev:e}
> > -		WRITE_ONCE(*theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
> > +		smp_store_release(theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
> >  	}						//\lnlbl{flush:loop2:e}
> >  }							//\lnlbl{flush:e}
> >  
> > @@ -116,7 +116,7 @@ int add_count(unsigned long delta)			//\lnlbl{b}
> >  
> >  	WRITE_ONCE(counting, 1);			//\lnlbl{fast:b}
> >  	barrier();					//\lnlbl{barrier:1}
> > -	if (READ_ONCE(theft) <= THEFT_REQ &&		//\lnlbl{check:b}
> > +	if (smp_load_acquire(&theft) <= THEFT_REQ &&		//\lnlbl{check:b}
> >  	    countermax - counter >= delta) {		//\lnlbl{check:e}
> >  		WRITE_ONCE(counter, counter + delta);	//\lnlbl{add:f}
> >  		fastpath = 1;				//\lnlbl{fasttaken}
> > @@ -155,7 +155,7 @@ int sub_count(unsigned long delta)
> >  
> >  	WRITE_ONCE(counting, 1);
> >  	barrier();
> > -	if (READ_ONCE(theft) <= THEFT_REQ &&
> > +	if (smp_load_acquire(&theft) <= THEFT_REQ &&
> >  	    counter >= delta) {
> >  		WRITE_ONCE(counter, counter - delta);
> >  		fastpath = 1;
> 
> In add_count() and sub_count(), is the plain accesses to counter
> guaranteed to see the value stored in flush_local_count()?
> 
> I'm asking because counter is declared as a thread-local variable:
> 
> In this code, as each of the plain loads of counter is the first
> access to the variable in each function, there is not much room
> for compilers to exploit the plainness, I suppose.
> 
> What do you think?

Well, gitk thinks that the last deliberate change to this code was in
2010, before READ_ONCE() and WRITE_ONCE(), let alone LKMM.  So it needs
careful scrutiny, along with the other counting algorithms.

This commit looks like a step in the right direction, so I am applying
it, thank you both!

Alan, would you like to drag the counting code kicking and screaming
into the 2020s?  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] CodeSamples/count: add necessary partial memory barriers
  2023-03-29 23:18   ` Paul E. McKenney
@ 2023-03-30  9:59     ` Alan Huang
  2023-03-30 15:04       ` Paul E. McKenney
  2023-03-30 11:30     ` Akira Yokosawa
  1 sibling, 1 reply; 7+ messages in thread
From: Alan Huang @ 2023-03-30  9:59 UTC (permalink / raw)
  To: paulmck; +Cc: Akira Yokosawa, perfbook

[-- Attachment #1: Type: text/plain, Size: 5161 bytes --]

> Alan, would you like to drag the counting code kicking and screaming
> into the 2020s?  ;-)

I’d like to!

Thanks,
Alan

> 2023年3月30日 上午7:18,Paul E. McKenney <paulmck@kernel.org> 写道:
> 
> On Wed, Mar 29, 2023 at 08:43:33AM +0900, Akira Yokosawa wrote:
>> On Tue, 28 Mar 2023 10:58:24 -0400, Alan Huang wrote:
>>> This patch add several necessary partial memory barriers, the first one
>>> change READ_ONCE to smp_load_acquire to makes sure the reading from theftp[t]
>>> happens before the reading from counterp[t]. The litmus testing below
>>> represents the original code pattern, and the result is "Sometimes":
>>> 
>>> C counter_sig
>>> 
>>> {}
>>> 
>>> P0(int *theft, int *counter)
>>> {
>>>    int r0;
>>>    int r1;
>>> 
>>>    r0 = READ_ONCE(*theft);
>>>    r1 = READ_ONCE(*counter);
>>> }
>>> 
>>> P1(int *theft, int *counter)
>>> {
>>>    WRITE_ONCE(*counter, 1);
>>>    smp_mb();
>>>    WRITE_ONCE(*theft, 1);
>>> }
>>> 
>>> exists (0:r0=1 /\ 0:r1=0)
>>> 
>>> Second one change WRITE_ONCE to smp_store_release to make sure that setting
>>> counterp[t] happens before the setting theftp[p] to THEFT_IDLE. Here is the
>>> litmus testing, The result is "Sometimes":
>>> 
>>> C counter_sig_2
>>> 
>>> {
>>>    int theft = 1;
>>>    int counter = 1;
>>> }
>>> 
>>> P0(int *theft, int *counter)
>>> {
>>>    WRITE_ONCE(*counter, 0);
>>>    WRITE_ONCE(*theft, 0);
>>> }
>>> 
>>> P1(int *theft, int *counter)
>>> {
>>>    if (READ_ONCE(*theft) == 0)
>>>    {
>>>        WRITE_ONCE(*counter, READ_ONCE(*counter)+1);
>>>    }
>>> }
>>> 
>>> P2(int *counter)
>>> {
>>>    int r0;
>>> 
>>>    r0 = READ_ONCE(*counter);
>>> }
>>> 
>>> exists (2:r0=2)
>>> 
>>> Note that I also changed the reading of theft variable
>>> to smp_load_acquire in add_count/sub_count's fast path
>>> to make sure that reading theft happens before reading
>>> counter.
>>> 
>>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>> 
>> Looks good to me.
>> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
> 
> Thank you both!
> 
>> Paul, there is a question for you.
>> Please see below.
>> 
>>> ---
>>> CodeSamples/count/count_lim_sig.c | 8 ++++----
>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/CodeSamples/count/count_lim_sig.c b/CodeSamples/count/count_lim_sig.c
>>> index 023d6215..59da8077 100644
>>> --- a/CodeSamples/count/count_lim_sig.c
>>> +++ b/CodeSamples/count/count_lim_sig.c
>>> @@ -81,7 +81,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
>>> 	for_each_tid(t, tid) {				//\lnlbl{flush:loop2:b}
>>> 		if (theftp[t] == NULL)			//\lnlbl{flush:skip:nonexist}
>>> 			continue;			//\lnlbl{flush:next2}
>>> -		while (READ_ONCE(*theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
>>> +		while (smp_load_acquire(theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
>>> 			poll(NULL, 0, 1);		//\lnlbl{flush:block}
>>> 			if (READ_ONCE(*theftp[t]) == THEFT_REQ)//\lnlbl{flush:check:REQ}
>>> 				pthread_kill(tid, SIGUSR1);//\lnlbl{flush:signal2}
>>> @@ -90,7 +90,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
>>> 		*counterp[t] = 0;
>>> 		globalreserve -= *countermaxp[t];
>>> 		*countermaxp[t] = 0;			//\lnlbl{flush:thiev:e}
>>> -		WRITE_ONCE(*theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
>>> +		smp_store_release(theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
>>> 	}						//\lnlbl{flush:loop2:e}
>>> }							//\lnlbl{flush:e}
>>> 
>>> @@ -116,7 +116,7 @@ int add_count(unsigned long delta)			//\lnlbl{b}
>>> 
>>> 	WRITE_ONCE(counting, 1);			//\lnlbl{fast:b}
>>> 	barrier();					//\lnlbl{barrier:1}
>>> -	if (READ_ONCE(theft) <= THEFT_REQ &&		//\lnlbl{check:b}
>>> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&		//\lnlbl{check:b}
>>> 	    countermax - counter >= delta) {		//\lnlbl{check:e}
>>> 		WRITE_ONCE(counter, counter + delta);	//\lnlbl{add:f}
>>> 		fastpath = 1;				//\lnlbl{fasttaken}
>>> @@ -155,7 +155,7 @@ int sub_count(unsigned long delta)
>>> 
>>> 	WRITE_ONCE(counting, 1);
>>> 	barrier();
>>> -	if (READ_ONCE(theft) <= THEFT_REQ &&
>>> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&
>>> 	    counter >= delta) {
>>> 		WRITE_ONCE(counter, counter - delta);
>>> 		fastpath = 1;
>> 
>> In add_count() and sub_count(), is the plain accesses to counter
>> guaranteed to see the value stored in flush_local_count()?
>> 
>> I'm asking because counter is declared as a thread-local variable:
>> 
>> In this code, as each of the plain loads of counter is the first
>> access to the variable in each function, there is not much room
>> for compilers to exploit the plainness, I suppose.
>> 
>> What do you think?
> 
> Well, gitk thinks that the last deliberate change to this code was in
> 2010, before READ_ONCE() and WRITE_ONCE(), let alone LKMM.  So it needs
> careful scrutiny, along with the other counting algorithms.
> 
> This commit looks like a step in the right direction, so I am applying
> it, thank you both!
> 
> Alan, would you like to drag the counting code kicking and screaming
> into the 2020s?  ;-)
> 
> 							Thanx, Paul


[-- Attachment #2: Type: text/html, Size: 26245 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] CodeSamples/count: add necessary partial memory barriers
  2023-03-29 23:18   ` Paul E. McKenney
  2023-03-30  9:59     ` Alan Huang
@ 2023-03-30 11:30     ` Akira Yokosawa
  2023-03-30 15:05       ` Paul E. McKenney
  1 sibling, 1 reply; 7+ messages in thread
From: Akira Yokosawa @ 2023-03-30 11:30 UTC (permalink / raw)
  To: paulmck; +Cc: Alan Huang, perfbook, Akira Yokosawa

On Wed, 29 Mar 2023 16:18:43 -0700, Paul E. McKenney wrote:
> On Wed, Mar 29, 2023 at 08:43:33AM +0900, Akira Yokosawa wrote:
>> On Tue, 28 Mar 2023 10:58:24 -0400, Alan Huang wrote:
>>> This patch add several necessary partial memory barriers, the first one
>>> change READ_ONCE to smp_load_acquire to makes sure the reading from theftp[t]
>>> happens before the reading from counterp[t]. The litmus testing below
>>> represents the original code pattern, and the result is "Sometimes":
>>>
>>> C counter_sig
>>>
>>> {}
>>>
>>> P0(int *theft, int *counter)
>>> {
>>>     int r0;
>>>     int r1;
>>>
>>>     r0 = READ_ONCE(*theft);
>>>     r1 = READ_ONCE(*counter);
>>> }
>>>
>>> P1(int *theft, int *counter)
>>> {
>>>     WRITE_ONCE(*counter, 1);
>>>     smp_mb();
>>>     WRITE_ONCE(*theft, 1);
>>> }
>>>
>>> exists (0:r0=1 /\ 0:r1=0)
>>>
>>> Second one change WRITE_ONCE to smp_store_release to make sure that setting
>>> counterp[t] happens before the setting theftp[p] to THEFT_IDLE. Here is the
>>> litmus testing, The result is "Sometimes":
>>>
>>> C counter_sig_2
>>>
>>> {
>>>     int theft = 1;
>>>     int counter = 1;
>>> }
>>>
>>> P0(int *theft, int *counter)
>>> {
>>>     WRITE_ONCE(*counter, 0);
>>>     WRITE_ONCE(*theft, 0);
>>> }
>>>
>>> P1(int *theft, int *counter)
>>> {
>>>     if (READ_ONCE(*theft) == 0)
>>>     {
>>>         WRITE_ONCE(*counter, READ_ONCE(*counter)+1);
>>>     }
>>> }
>>>
>>> P2(int *counter)
>>> {
>>>     int r0;
>>>
>>>     r0 = READ_ONCE(*counter);
>>> }
>>>
>>> exists (2:r0=2)
>>>
>>> Note that I also changed the reading of theft variable
>>> to smp_load_acquire in add_count/sub_count's fast path
>>> to make sure that reading theft happens before reading
>>> counter.
>>>
>>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>>
>> Looks good to me.
>> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>

Paul, somehow commit c72371753921 ("CodeSamples/count: Add necessary
partial memory barriers") missed the "Reviewd-by:" part.

Can you fill it in if it's not too late?

> 
> Thank you both!
> 
>> Paul, there is a question for you.
>> Please see below.
>>
>>> ---
>>>  CodeSamples/count/count_lim_sig.c | 8 ++++----
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/CodeSamples/count/count_lim_sig.c b/CodeSamples/count/count_lim_sig.c
>>> index 023d6215..59da8077 100644
>>> --- a/CodeSamples/count/count_lim_sig.c
>>> +++ b/CodeSamples/count/count_lim_sig.c
>>> @@ -81,7 +81,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
>>>  	for_each_tid(t, tid) {				//\lnlbl{flush:loop2:b}
>>>  		if (theftp[t] == NULL)			//\lnlbl{flush:skip:nonexist}
>>>  			continue;			//\lnlbl{flush:next2}
>>> -		while (READ_ONCE(*theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
>>> +		while (smp_load_acquire(theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
>>>  			poll(NULL, 0, 1);		//\lnlbl{flush:block}
>>>  			if (READ_ONCE(*theftp[t]) == THEFT_REQ)//\lnlbl{flush:check:REQ}
>>>  				pthread_kill(tid, SIGUSR1);//\lnlbl{flush:signal2}
>>> @@ -90,7 +90,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
>>>  		*counterp[t] = 0;
>>>  		globalreserve -= *countermaxp[t];
>>>  		*countermaxp[t] = 0;			//\lnlbl{flush:thiev:e}
>>> -		WRITE_ONCE(*theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
>>> +		smp_store_release(theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
>>>  	}						//\lnlbl{flush:loop2:e}
>>>  }							//\lnlbl{flush:e}
>>>  
>>> @@ -116,7 +116,7 @@ int add_count(unsigned long delta)			//\lnlbl{b}
>>>  
>>>  	WRITE_ONCE(counting, 1);			//\lnlbl{fast:b}
>>>  	barrier();					//\lnlbl{barrier:1}
>>> -	if (READ_ONCE(theft) <= THEFT_REQ &&		//\lnlbl{check:b}
>>> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&		//\lnlbl{check:b}
>>>  	    countermax - counter >= delta) {		//\lnlbl{check:e}
>>>  		WRITE_ONCE(counter, counter + delta);	//\lnlbl{add:f}
>>>  		fastpath = 1;				//\lnlbl{fasttaken}
>>> @@ -155,7 +155,7 @@ int sub_count(unsigned long delta)
>>>  
>>>  	WRITE_ONCE(counting, 1);
>>>  	barrier();
>>> -	if (READ_ONCE(theft) <= THEFT_REQ &&
>>> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&
>>>  	    counter >= delta) {
>>>  		WRITE_ONCE(counter, counter - delta);
>>>  		fastpath = 1;
>>
>> In add_count() and sub_count(), is the plain accesses to counter
>> guaranteed to see the value stored in flush_local_count()?
>>
>> I'm asking because counter is declared as a thread-local variable:
>>
>> In this code, as each of the plain loads of counter is the first
>> access to the variable in each function, there is not much room
>> for compilers to exploit the plainness, I suppose.>>
>> What do you think?
> 
> Well, gitk thinks that the last deliberate change to this code was in
> 2010, before READ_ONCE() and WRITE_ONCE(), let alone LKMM.  So it needs
> careful scrutiny, along with the other counting algorithms.

I see.

FWIW, the documentation of GCC [1] states that thread-local variables
are allowed to be referenced via their addresses from any thread.
So compilers are _not_ allowed to exclude the possibility of updates
from other threads, it seems.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

        Thanks, Akira

> 
> This commit looks like a step in the right direction, so I am applying
> it, thank you both!
> 
> Alan, would you like to drag the counting code kicking and screaming
> into the 2020s?  ;-)> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] CodeSamples/count: add necessary partial memory barriers
  2023-03-30  9:59     ` Alan Huang
@ 2023-03-30 15:04       ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2023-03-30 15:04 UTC (permalink / raw)
  To: Alan Huang; +Cc: Akira Yokosawa, perfbook

On Thu, Mar 30, 2023 at 05:59:37PM +0800, Alan Huang wrote:
> > Alan, would you like to drag the counting code kicking and screaming
> > into the 2020s?  ;-)
> 
> I’d like to!

Very good!  Again, this code was written before tools such as LKMM
existed, so you will likely find instances where I used heavier-weight
memory ordering than needed, and, as always, you also will likely find
out-and-out bugs.

Looking forward to seeing what you come up with!

							Thanx, Paul

> Thanks,
> Alan
> 
> > 2023年3月30日 上午7:18,Paul E. McKenney <paulmck@kernel.org> 写道:
> > 
> > On Wed, Mar 29, 2023 at 08:43:33AM +0900, Akira Yokosawa wrote:
> >> On Tue, 28 Mar 2023 10:58:24 -0400, Alan Huang wrote:
> >>> This patch add several necessary partial memory barriers, the first one
> >>> change READ_ONCE to smp_load_acquire to makes sure the reading from theftp[t]
> >>> happens before the reading from counterp[t]. The litmus testing below
> >>> represents the original code pattern, and the result is "Sometimes":
> >>> 
> >>> C counter_sig
> >>> 
> >>> {}
> >>> 
> >>> P0(int *theft, int *counter)
> >>> {
> >>>    int r0;
> >>>    int r1;
> >>> 
> >>>    r0 = READ_ONCE(*theft);
> >>>    r1 = READ_ONCE(*counter);
> >>> }
> >>> 
> >>> P1(int *theft, int *counter)
> >>> {
> >>>    WRITE_ONCE(*counter, 1);
> >>>    smp_mb();
> >>>    WRITE_ONCE(*theft, 1);
> >>> }
> >>> 
> >>> exists (0:r0=1 /\ 0:r1=0)
> >>> 
> >>> Second one change WRITE_ONCE to smp_store_release to make sure that setting
> >>> counterp[t] happens before the setting theftp[p] to THEFT_IDLE. Here is the
> >>> litmus testing, The result is "Sometimes":
> >>> 
> >>> C counter_sig_2
> >>> 
> >>> {
> >>>    int theft = 1;
> >>>    int counter = 1;
> >>> }
> >>> 
> >>> P0(int *theft, int *counter)
> >>> {
> >>>    WRITE_ONCE(*counter, 0);
> >>>    WRITE_ONCE(*theft, 0);
> >>> }
> >>> 
> >>> P1(int *theft, int *counter)
> >>> {
> >>>    if (READ_ONCE(*theft) == 0)
> >>>    {
> >>>        WRITE_ONCE(*counter, READ_ONCE(*counter)+1);
> >>>    }
> >>> }
> >>> 
> >>> P2(int *counter)
> >>> {
> >>>    int r0;
> >>> 
> >>>    r0 = READ_ONCE(*counter);
> >>> }
> >>> 
> >>> exists (2:r0=2)
> >>> 
> >>> Note that I also changed the reading of theft variable
> >>> to smp_load_acquire in add_count/sub_count's fast path
> >>> to make sure that reading theft happens before reading
> >>> counter.
> >>> 
> >>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> >> 
> >> Looks good to me.
> >> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
> > 
> > Thank you both!
> > 
> >> Paul, there is a question for you.
> >> Please see below.
> >> 
> >>> ---
> >>> CodeSamples/count/count_lim_sig.c | 8 ++++----
> >>> 1 file changed, 4 insertions(+), 4 deletions(-)
> >>> 
> >>> diff --git a/CodeSamples/count/count_lim_sig.c b/CodeSamples/count/count_lim_sig.c
> >>> index 023d6215..59da8077 100644
> >>> --- a/CodeSamples/count/count_lim_sig.c
> >>> +++ b/CodeSamples/count/count_lim_sig.c
> >>> @@ -81,7 +81,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
> >>> 	for_each_tid(t, tid) {				//\lnlbl{flush:loop2:b}
> >>> 		if (theftp[t] == NULL)			//\lnlbl{flush:skip:nonexist}
> >>> 			continue;			//\lnlbl{flush:next2}
> >>> -		while (READ_ONCE(*theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
> >>> +		while (smp_load_acquire(theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
> >>> 			poll(NULL, 0, 1);		//\lnlbl{flush:block}
> >>> 			if (READ_ONCE(*theftp[t]) == THEFT_REQ)//\lnlbl{flush:check:REQ}
> >>> 				pthread_kill(tid, SIGUSR1);//\lnlbl{flush:signal2}
> >>> @@ -90,7 +90,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
> >>> 		*counterp[t] = 0;
> >>> 		globalreserve -= *countermaxp[t];
> >>> 		*countermaxp[t] = 0;			//\lnlbl{flush:thiev:e}
> >>> -		WRITE_ONCE(*theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
> >>> +		smp_store_release(theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
> >>> 	}						//\lnlbl{flush:loop2:e}
> >>> }							//\lnlbl{flush:e}
> >>> 
> >>> @@ -116,7 +116,7 @@ int add_count(unsigned long delta)			//\lnlbl{b}
> >>> 
> >>> 	WRITE_ONCE(counting, 1);			//\lnlbl{fast:b}
> >>> 	barrier();					//\lnlbl{barrier:1}
> >>> -	if (READ_ONCE(theft) <= THEFT_REQ &&		//\lnlbl{check:b}
> >>> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&		//\lnlbl{check:b}
> >>> 	    countermax - counter >= delta) {		//\lnlbl{check:e}
> >>> 		WRITE_ONCE(counter, counter + delta);	//\lnlbl{add:f}
> >>> 		fastpath = 1;				//\lnlbl{fasttaken}
> >>> @@ -155,7 +155,7 @@ int sub_count(unsigned long delta)
> >>> 
> >>> 	WRITE_ONCE(counting, 1);
> >>> 	barrier();
> >>> -	if (READ_ONCE(theft) <= THEFT_REQ &&
> >>> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&
> >>> 	    counter >= delta) {
> >>> 		WRITE_ONCE(counter, counter - delta);
> >>> 		fastpath = 1;
> >> 
> >> In add_count() and sub_count(), is the plain accesses to counter
> >> guaranteed to see the value stored in flush_local_count()?
> >> 
> >> I'm asking because counter is declared as a thread-local variable:
> >> 
> >> In this code, as each of the plain loads of counter is the first
> >> access to the variable in each function, there is not much room
> >> for compilers to exploit the plainness, I suppose.
> >> 
> >> What do you think?
> > 
> > Well, gitk thinks that the last deliberate change to this code was in
> > 2010, before READ_ONCE() and WRITE_ONCE(), let alone LKMM.  So it needs
> > careful scrutiny, along with the other counting algorithms.
> > 
> > This commit looks like a step in the right direction, so I am applying
> > it, thank you both!
> > 
> > Alan, would you like to drag the counting code kicking and screaming
> > into the 2020s?  ;-)
> > 
> > 							Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] CodeSamples/count: add necessary partial memory barriers
  2023-03-30 11:30     ` Akira Yokosawa
@ 2023-03-30 15:05       ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2023-03-30 15:05 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Alan Huang, perfbook

On Thu, Mar 30, 2023 at 08:30:11PM +0900, Akira Yokosawa wrote:
> On Wed, 29 Mar 2023 16:18:43 -0700, Paul E. McKenney wrote:
> > On Wed, Mar 29, 2023 at 08:43:33AM +0900, Akira Yokosawa wrote:
> >> On Tue, 28 Mar 2023 10:58:24 -0400, Alan Huang wrote:
> >>> This patch add several necessary partial memory barriers, the first one
> >>> change READ_ONCE to smp_load_acquire to makes sure the reading from theftp[t]
> >>> happens before the reading from counterp[t]. The litmus testing below
> >>> represents the original code pattern, and the result is "Sometimes":
> >>>
> >>> C counter_sig
> >>>
> >>> {}
> >>>
> >>> P0(int *theft, int *counter)
> >>> {
> >>>     int r0;
> >>>     int r1;
> >>>
> >>>     r0 = READ_ONCE(*theft);
> >>>     r1 = READ_ONCE(*counter);
> >>> }
> >>>
> >>> P1(int *theft, int *counter)
> >>> {
> >>>     WRITE_ONCE(*counter, 1);
> >>>     smp_mb();
> >>>     WRITE_ONCE(*theft, 1);
> >>> }
> >>>
> >>> exists (0:r0=1 /\ 0:r1=0)
> >>>
> >>> Second one change WRITE_ONCE to smp_store_release to make sure that setting
> >>> counterp[t] happens before the setting theftp[p] to THEFT_IDLE. Here is the
> >>> litmus testing, The result is "Sometimes":
> >>>
> >>> C counter_sig_2
> >>>
> >>> {
> >>>     int theft = 1;
> >>>     int counter = 1;
> >>> }
> >>>
> >>> P0(int *theft, int *counter)
> >>> {
> >>>     WRITE_ONCE(*counter, 0);
> >>>     WRITE_ONCE(*theft, 0);
> >>> }
> >>>
> >>> P1(int *theft, int *counter)
> >>> {
> >>>     if (READ_ONCE(*theft) == 0)
> >>>     {
> >>>         WRITE_ONCE(*counter, READ_ONCE(*counter)+1);
> >>>     }
> >>> }
> >>>
> >>> P2(int *counter)
> >>> {
> >>>     int r0;
> >>>
> >>>     r0 = READ_ONCE(*counter);
> >>> }
> >>>
> >>> exists (2:r0=2)
> >>>
> >>> Note that I also changed the reading of theft variable
> >>> to smp_load_acquire in add_count/sub_count's fast path
> >>> to make sure that reading theft happens before reading
> >>> counter.
> >>>
> >>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> >>
> >> Looks good to me.
> >> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
> 
> Paul, somehow commit c72371753921 ("CodeSamples/count: Add necessary
> partial memory barriers") missed the "Reviewd-by:" part.
> 
> Can you fill it in if it's not too late?

Gah!  I fixed it, and thank you for pointing it out.

I clearly was not doing well yesterday!

							Thanx, Paul

> > Thank you both!
> > 
> >> Paul, there is a question for you.
> >> Please see below.
> >>
> >>> ---
> >>>  CodeSamples/count/count_lim_sig.c | 8 ++++----
> >>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/CodeSamples/count/count_lim_sig.c b/CodeSamples/count/count_lim_sig.c
> >>> index 023d6215..59da8077 100644
> >>> --- a/CodeSamples/count/count_lim_sig.c
> >>> +++ b/CodeSamples/count/count_lim_sig.c
> >>> @@ -81,7 +81,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
> >>>  	for_each_tid(t, tid) {				//\lnlbl{flush:loop2:b}
> >>>  		if (theftp[t] == NULL)			//\lnlbl{flush:skip:nonexist}
> >>>  			continue;			//\lnlbl{flush:next2}
> >>> -		while (READ_ONCE(*theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
> >>> +		while (smp_load_acquire(theftp[t]) != THEFT_READY) {//\lnlbl{flush:loop3:b}
> >>>  			poll(NULL, 0, 1);		//\lnlbl{flush:block}
> >>>  			if (READ_ONCE(*theftp[t]) == THEFT_REQ)//\lnlbl{flush:check:REQ}
> >>>  				pthread_kill(tid, SIGUSR1);//\lnlbl{flush:signal2}
> >>> @@ -90,7 +90,7 @@ static void flush_local_count(void)			//\lnlbl{flush:b}
> >>>  		*counterp[t] = 0;
> >>>  		globalreserve -= *countermaxp[t];
> >>>  		*countermaxp[t] = 0;			//\lnlbl{flush:thiev:e}
> >>> -		WRITE_ONCE(*theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
> >>> +		smp_store_release(theftp[t], THEFT_IDLE);	//\lnlbl{flush:IDLE}
> >>>  	}						//\lnlbl{flush:loop2:e}
> >>>  }							//\lnlbl{flush:e}
> >>>  
> >>> @@ -116,7 +116,7 @@ int add_count(unsigned long delta)			//\lnlbl{b}
> >>>  
> >>>  	WRITE_ONCE(counting, 1);			//\lnlbl{fast:b}
> >>>  	barrier();					//\lnlbl{barrier:1}
> >>> -	if (READ_ONCE(theft) <= THEFT_REQ &&		//\lnlbl{check:b}
> >>> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&		//\lnlbl{check:b}
> >>>  	    countermax - counter >= delta) {		//\lnlbl{check:e}
> >>>  		WRITE_ONCE(counter, counter + delta);	//\lnlbl{add:f}
> >>>  		fastpath = 1;				//\lnlbl{fasttaken}
> >>> @@ -155,7 +155,7 @@ int sub_count(unsigned long delta)
> >>>  
> >>>  	WRITE_ONCE(counting, 1);
> >>>  	barrier();
> >>> -	if (READ_ONCE(theft) <= THEFT_REQ &&
> >>> +	if (smp_load_acquire(&theft) <= THEFT_REQ &&
> >>>  	    counter >= delta) {
> >>>  		WRITE_ONCE(counter, counter - delta);
> >>>  		fastpath = 1;
> >>
> >> In add_count() and sub_count(), is the plain accesses to counter
> >> guaranteed to see the value stored in flush_local_count()?
> >>
> >> I'm asking because counter is declared as a thread-local variable:
> >>
> >> In this code, as each of the plain loads of counter is the first
> >> access to the variable in each function, there is not much room
> >> for compilers to exploit the plainness, I suppose.>>
> >> What do you think?
> > 
> > Well, gitk thinks that the last deliberate change to this code was in
> > 2010, before READ_ONCE() and WRITE_ONCE(), let alone LKMM.  So it needs
> > careful scrutiny, along with the other counting algorithms.
> 
> I see.
> 
> FWIW, the documentation of GCC [1] states that thread-local variables
> are allowed to be referenced via their addresses from any thread.
> So compilers are _not_ allowed to exclude the possibility of updates
> from other threads, it seems.
> 
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html
> 
>         Thanks, Akira
> 
> > 
> > This commit looks like a step in the right direction, so I am applying
> > it, thank you both!
> > 
> > Alan, would you like to drag the counting code kicking and screaming
> > into the 2020s?  ;-)> 
> > 							Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-03-30 15:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28 14:58 [PATCH v3] CodeSamples/count: add necessary partial memory barriers Alan Huang
2023-03-28 23:43 ` Akira Yokosawa
2023-03-29 23:18   ` Paul E. McKenney
2023-03-30  9:59     ` Alan Huang
2023-03-30 15:04       ` Paul E. McKenney
2023-03-30 11:30     ` Akira Yokosawa
2023-03-30 15:05       ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.