All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rcu: use try_cmpxchg in check_cpu_stall
@ 2023-02-28 15:51 Uros Bizjak
  2023-02-28 20:39 ` Joel Fernandes
  0 siblings, 1 reply; 25+ messages in thread
From: Uros Bizjak @ 2023-02-28 15:51 UTC (permalink / raw)
  To: rcu, linux-kernel
  Cc: Uros Bizjak, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes

Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
check_cpu_stall.  x86 CMPXCHG instruction returns success in ZF flag, so
this change saves a compare after cmpxchg (and related move instruction in
front of cmpxchg).

No functional change intended.

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 kernel/rcu/tree_stall.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b10b8349bb2a..d81c88e66b42 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -760,7 +760,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
 	jn = jiffies + ULONG_MAX / 2;
 	if (rcu_gp_in_progress() &&
 	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
-	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
+	    try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
 
 		/*
 		 * If a virtual machine is stopped by the host it can look to
@@ -778,7 +778,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
 
 	} else if (rcu_gp_in_progress() &&
 		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
-		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
+		   try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
 
 		/*
 		 * If a virtual machine is stopped by the host it can look to
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-02-28 15:51 [PATCH] rcu: use try_cmpxchg in check_cpu_stall Uros Bizjak
@ 2023-02-28 20:39 ` Joel Fernandes
  2023-02-28 20:56   ` Joel Fernandes
                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-02-28 20:39 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: rcu, linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote:
> Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
> check_cpu_stall.  x86 CMPXCHG instruction returns success in ZF flag, so
> this change saves a compare after cmpxchg (and related move instruction in
> front of cmpxchg).

In my codegen, I am not seeing mov instruction before the cmp removed, how
can that be? The rax has to be populated with a mov before cmpxchg right?

So try_cmpxchg gives: mov, cmpxchg, cmp, jne
Where as cmpxchg gives: mov, cmpxchg, mov, jne

So yeah you got rid of compare, but I am not seeing reduction in moves.
Either way, I think it is an improvement due to dropping cmp so:

Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


> 
> No functional change intended.
> 
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> ---
>  kernel/rcu/tree_stall.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index b10b8349bb2a..d81c88e66b42 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -760,7 +760,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  	jn = jiffies + ULONG_MAX / 2;
>  	if (rcu_gp_in_progress() &&
>  	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
> -	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> +	    try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
>  
>  		/*
>  		 * If a virtual machine is stopped by the host it can look to
> @@ -778,7 +778,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  
>  	} else if (rcu_gp_in_progress() &&
>  		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
> -		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> +		   try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
>  
>  		/*
>  		 * If a virtual machine is stopped by the host it can look to
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-02-28 20:39 ` Joel Fernandes
@ 2023-02-28 20:56   ` Joel Fernandes
  2023-02-28 21:03   ` Steven Rostedt
  2023-03-01  9:52   ` Uros Bizjak
  2 siblings, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-02-28 20:56 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: rcu, linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan



> On Feb 28, 2023, at 3:39 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote:
>> Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
>> check_cpu_stall.  x86 CMPXCHG instruction returns success in ZF flag, so
>> this change saves a compare after cmpxchg (and related move instruction in
>> front of cmpxchg).
> 
> In my codegen, I am not seeing mov instruction before the cmp removed, how
> can that be? The rax has to be populated with a mov before cmpxchg right?
> 
> So try_cmpxchg gives: mov, cmpxchg, cmp, jne
> Where as cmpxchg gives: mov, cmpxchg, mov, jne
> 
> So yeah you got rid of compare, but I am not seeing reduction in moves.
> Either way, I think it is an improvement due to dropping cmp so:
> 
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Ah there probably is a lesser mov to hold the value to compare against,
but maybe not always? 

Would be nice to clarify that in the changelog anyway.

Thanks,

 - Joel

> 
> thanks,
> 
> - Joel
> 
> 
>> 
>> No functional change intended.
>> 
>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
>> Cc: Frederic Weisbecker <frederic@kernel.org>
>> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
>> Cc: Josh Triplett <josh@joshtriplett.org>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
>> Cc: Joel Fernandes <joel@joelfernandes.org>
>> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
>> ---
>> kernel/rcu/tree_stall.h | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>> index b10b8349bb2a..d81c88e66b42 100644
>> --- a/kernel/rcu/tree_stall.h
>> +++ b/kernel/rcu/tree_stall.h
>> @@ -760,7 +760,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
>>    jn = jiffies + ULONG_MAX / 2;
>>    if (rcu_gp_in_progress() &&
>>        (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
>> -        cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>> +        try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
>> 
>>        /*
>>         * If a virtual machine is stopped by the host it can look to
>> @@ -778,7 +778,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
>> 
>>    } else if (rcu_gp_in_progress() &&
>>           ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
>> -           cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>> +           try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
>> 
>>        /*
>>         * If a virtual machine is stopped by the host it can look to
>> -- 
>> 2.39.2
>> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-02-28 20:39 ` Joel Fernandes
  2023-02-28 20:56   ` Joel Fernandes
@ 2023-02-28 21:03   ` Steven Rostedt
  2023-02-28 21:29     ` Paul E. McKenney
  2023-03-01  9:52   ` Uros Bizjak
  2 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-02-28 21:03 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uros Bizjak, rcu, linux-kernel, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Tue, 28 Feb 2023 20:39:30 +0000
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote:
> > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
> > check_cpu_stall.  x86 CMPXCHG instruction returns success in ZF flag, so
> > this change saves a compare after cmpxchg (and related move instruction in
> > front of cmpxchg).  
> 
> In my codegen, I am not seeing mov instruction before the cmp removed, how
> can that be? The rax has to be populated with a mov before cmpxchg right?
> 
> So try_cmpxchg gives: mov, cmpxchg, cmp, jne
> Where as cmpxchg gives: mov, cmpxchg, mov, jne
> 
> So yeah you got rid of compare, but I am not seeing reduction in moves.
> Either way, I think it is an improvement due to dropping cmp so:

Did you get the above backwards?

Anyway, when looking at the conversion of cmpxchg() to try_cmpxchg() that
Uros sent to me for the ring buffer, the code went from:

0000000000000070 <ring_buffer_record_off>:
      70:       48 8d 4f 08             lea    0x8(%rdi),%rcx
      74:       8b 57 08                mov    0x8(%rdi),%edx
      77:       89 d6                   mov    %edx,%esi
      79:       89 d0                   mov    %edx,%eax
      7b:       81 ce 00 00 10 00       or     $0x100000,%esi
      81:       f0 0f b1 31             lock cmpxchg %esi,(%rcx)
      85:       39 d0                   cmp    %edx,%eax
      87:       75 eb                   jne    74 <ring_buffer_record_off+0x4>
      89:       e9 00 00 00 00          jmp    8e <ring_buffer_record_off+0x1e>
                        8a: R_X86_64_PLT32      __x86_return_thunk-0x4
      8e:       66 90                   xchg   %ax,%ax


  To

00000000000001a0 <ring_buffer_record_off>:
     1a0:       8b 47 08                mov    0x8(%rdi),%eax
     1a3:       48 8d 4f 08             lea    0x8(%rdi),%rcx
     1a7:       89 c2                   mov    %eax,%edx
     1a9:       81 ca 00 00 10 00       or     $0x100000,%edx
     1af:       f0 0f b1 57 08          lock cmpxchg %edx,0x8(%rdi)
     1b4:       75 05                   jne    1bb <ring_buffer_record_off+0x1b>
     1b6:       e9 00 00 00 00          jmp    1bb <ring_buffer_record_off+0x1b>
                        1b7: R_X86_64_PLT32     __x86_return_thunk-0x4
     1bb:       89 c2                   mov    %eax,%edx
     1bd:       81 ca 00 00 10 00       or     $0x100000,%edx
     1c3:       f0 0f b1 11             lock cmpxchg %edx,(%rcx)
     1c7:       75 f2                   jne    1bb <ring_buffer_record_off+0x1b>
     1c9:       e9 00 00 00 00          jmp    1ce <ring_buffer_record_off+0x2e>
                        1ca: R_X86_64_PLT32     __x86_return_thunk-0x4
     1ce:       66 90                   xchg   %ax,%ax


It does add a bit more code, but the fast path seems better (where the
cmpxchg succeeds). That would be:

00000000000001a0 <ring_buffer_record_off>:
     1a0:       8b 47 08                mov    0x8(%rdi),%eax
     1a3:       48 8d 4f 08             lea    0x8(%rdi),%rcx
     1a7:       89 c2                   mov    %eax,%edx
     1a9:       81 ca 00 00 10 00       or     $0x100000,%edx
     1af:       f0 0f b1 57 08          lock cmpxchg %edx,0x8(%rdi)
     1b4:       75 05                   jne    1bb <ring_buffer_record_off+0x1b>
     1b6:       e9 00 00 00 00          jmp    1bb <ring_buffer_record_off+0x1b>
                        1b7: R_X86_64_PLT32     __x86_return_thunk-0x4


Where there's only two moves and no cmp, where the former has 3 moves and a
cmp in the fast path.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-02-28 21:03   ` Steven Rostedt
@ 2023-02-28 21:29     ` Paul E. McKenney
  2023-02-28 21:41       ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2023-02-28 21:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, Uros Bizjak, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 28, 2023 at 04:03:24PM -0500, Steven Rostedt wrote:
> On Tue, 28 Feb 2023 20:39:30 +0000
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote:
> > > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
> > > check_cpu_stall.  x86 CMPXCHG instruction returns success in ZF flag, so
> > > this change saves a compare after cmpxchg (and related move instruction in
> > > front of cmpxchg).  
> > 
> > In my codegen, I am not seeing mov instruction before the cmp removed, how
> > can that be? The rax has to be populated with a mov before cmpxchg right?
> > 
> > So try_cmpxchg gives: mov, cmpxchg, cmp, jne
> > Where as cmpxchg gives: mov, cmpxchg, mov, jne
> > 
> > So yeah you got rid of compare, but I am not seeing reduction in moves.
> > Either way, I think it is an improvement due to dropping cmp so:
> 
> Did you get the above backwards?
> 
> Anyway, when looking at the conversion of cmpxchg() to try_cmpxchg() that
> Uros sent to me for the ring buffer, the code went from:
> 
> 0000000000000070 <ring_buffer_record_off>:
>       70:       48 8d 4f 08             lea    0x8(%rdi),%rcx
>       74:       8b 57 08                mov    0x8(%rdi),%edx
>       77:       89 d6                   mov    %edx,%esi
>       79:       89 d0                   mov    %edx,%eax
>       7b:       81 ce 00 00 10 00       or     $0x100000,%esi
>       81:       f0 0f b1 31             lock cmpxchg %esi,(%rcx)
>       85:       39 d0                   cmp    %edx,%eax
>       87:       75 eb                   jne    74 <ring_buffer_record_off+0x4>
>       89:       e9 00 00 00 00          jmp    8e <ring_buffer_record_off+0x1e>
>                         8a: R_X86_64_PLT32      __x86_return_thunk-0x4
>       8e:       66 90                   xchg   %ax,%ax
> 
> 
>   To
> 
> 00000000000001a0 <ring_buffer_record_off>:
>      1a0:       8b 47 08                mov    0x8(%rdi),%eax
>      1a3:       48 8d 4f 08             lea    0x8(%rdi),%rcx
>      1a7:       89 c2                   mov    %eax,%edx
>      1a9:       81 ca 00 00 10 00       or     $0x100000,%edx
>      1af:       f0 0f b1 57 08          lock cmpxchg %edx,0x8(%rdi)
>      1b4:       75 05                   jne    1bb <ring_buffer_record_off+0x1b>
>      1b6:       e9 00 00 00 00          jmp    1bb <ring_buffer_record_off+0x1b>
>                         1b7: R_X86_64_PLT32     __x86_return_thunk-0x4
>      1bb:       89 c2                   mov    %eax,%edx
>      1bd:       81 ca 00 00 10 00       or     $0x100000,%edx
>      1c3:       f0 0f b1 11             lock cmpxchg %edx,(%rcx)
>      1c7:       75 f2                   jne    1bb <ring_buffer_record_off+0x1b>
>      1c9:       e9 00 00 00 00          jmp    1ce <ring_buffer_record_off+0x2e>
>                         1ca: R_X86_64_PLT32     __x86_return_thunk-0x4
>      1ce:       66 90                   xchg   %ax,%ax
> 
> 
> It does add a bit more code, but the fast path seems better (where the
> cmpxchg succeeds). That would be:
> 
> 00000000000001a0 <ring_buffer_record_off>:
>      1a0:       8b 47 08                mov    0x8(%rdi),%eax
>      1a3:       48 8d 4f 08             lea    0x8(%rdi),%rcx
>      1a7:       89 c2                   mov    %eax,%edx
>      1a9:       81 ca 00 00 10 00       or     $0x100000,%edx
>      1af:       f0 0f b1 57 08          lock cmpxchg %edx,0x8(%rdi)
>      1b4:       75 05                   jne    1bb <ring_buffer_record_off+0x1b>
>      1b6:       e9 00 00 00 00          jmp    1bb <ring_buffer_record_off+0x1b>
>                         1b7: R_X86_64_PLT32     __x86_return_thunk-0x4
> 
> 
> Where there's only two moves and no cmp, where the former has 3 moves and a
> cmp in the fast path.

All well and good, but the stall-warning code is nowhere near a fastpath.

Is try_cmpxchg() considered more readable in this context?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-02-28 21:29     ` Paul E. McKenney
@ 2023-02-28 21:41       ` Steven Rostedt
  2023-02-28 21:56         ` Paul E. McKenney
  2023-02-28 23:30         ` Joel Fernandes
  0 siblings, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2023-02-28 21:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, Uros Bizjak, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Tue, 28 Feb 2023 13:29:11 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> All well and good, but the stall-warning code is nowhere near a fastpath.
> 
> Is try_cmpxchg() considered more readable in this context?


-	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
+	    try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {

It's basically the same :-/

But looking at this use case, I'd actually NAK it, as it is misleading.

As try_cmpxchg() is used to get rid of the updating of the old value. As in
the ring buffer code we had:

void ring_buffer_record_off(struct trace_buffer *buffer)
{
	unsigned int rd;
	unsigned int new_rd;

	do {
		rd = atomic_read(&buffer->record_disabled);
		new_rd = rd | RB_BUFFER_OFF;
	} while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd);
}

and the try_cmpxchg() converted it to:

void ring_buffer_record_off(struct trace_buffer *buffer)
{
	unsigned int rd;
	unsigned int new_rd;

	rd = atomic_read(&buffer->record_disabled);
	do {
		new_rd = rd | RB_BUFFER_OFF;
	} while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd));
}

Which got rid of the need to constantly update the rd variable (cmpxchg
will load rax with the value read, so it removes the need for an extra
move).

But in your case, we don't need to update js, in which case the
try_cmpxchg() does.

The patch that Uros sent me for the ring buffer code also does some of
that, which I feel is wrong.

So with that, I would nack the patch.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-02-28 21:41       ` Steven Rostedt
@ 2023-02-28 21:56         ` Paul E. McKenney
  2023-02-28 23:30         ` Joel Fernandes
  1 sibling, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2023-02-28 21:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, Uros Bizjak, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 28, 2023 at 04:41:24PM -0500, Steven Rostedt wrote:
> On Tue, 28 Feb 2023 13:29:11 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > All well and good, but the stall-warning code is nowhere near a fastpath.
> > 
> > Is try_cmpxchg() considered more readable in this context?
> 
> 
> -	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> +	    try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
> 
> It's basically the same :-/

That was my assessment.  ;-)

> But looking at this use case, I'd actually NAK it, as it is misleading.
> 
> As try_cmpxchg() is used to get rid of the updating of the old value. As in
> the ring buffer code we had:
> 
> void ring_buffer_record_off(struct trace_buffer *buffer)
> {
> 	unsigned int rd;
> 	unsigned int new_rd;
> 
> 	do {
> 		rd = atomic_read(&buffer->record_disabled);
> 		new_rd = rd | RB_BUFFER_OFF;
> 	} while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd);
> }
> 
> and the try_cmpxchg() converted it to:
> 
> void ring_buffer_record_off(struct trace_buffer *buffer)
> {
> 	unsigned int rd;
> 	unsigned int new_rd;
> 
> 	rd = atomic_read(&buffer->record_disabled);
> 	do {
> 		new_rd = rd | RB_BUFFER_OFF;
> 	} while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd));
> }
> 
> Which got rid of the need to constantly update the rd variable (cmpxchg
> will load rax with the value read, so it removes the need for an extra
> move).
> 
> But in your case, we don't need to update js, in which case the
> try_cmpxchg() does.
> 
> The patch that Uros sent me for the ring buffer code also does some of
> that, which I feel is wrong.
> 
> So with that, I would nack the patch.

OK, I will leave this one out.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-02-28 21:41       ` Steven Rostedt
  2023-02-28 21:56         ` Paul E. McKenney
@ 2023-02-28 23:30         ` Joel Fernandes
  2023-03-01  0:08           ` Steven Rostedt
  1 sibling, 1 reply; 25+ messages in thread
From: Joel Fernandes @ 2023-02-28 23:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Uros Bizjak, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

Hey Steve,

On Tue, Feb 28, 2023 at 4:41 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 28 Feb 2023 13:29:11 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
>
> > All well and good, but the stall-warning code is nowhere near a fastpath.
> >
> > Is try_cmpxchg() considered more readable in this context?
>
>
> -           cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> +           try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
>
> It's basically the same :-/
>
> But looking at this use case, I'd actually NAK it, as it is misleading.

I'm trying to parse this. You are saying it is misleading, because it
updates js when it doesn't need to?

> As try_cmpxchg() is used to get rid of the updating of the old value. As in
> the ring buffer code we had:
>
> void ring_buffer_record_off(struct trace_buffer *buffer)
> {
>         unsigned int rd;
>         unsigned int new_rd;
>
>         do {
>                 rd = atomic_read(&buffer->record_disabled);
>                 new_rd = rd | RB_BUFFER_OFF;
>         } while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd);

Hear you actually meant "rd" as the second parameter without the & ?

> }
>
> and the try_cmpxchg() converted it to:
>
> void ring_buffer_record_off(struct trace_buffer *buffer)
> {
>         unsigned int rd;
>         unsigned int new_rd;
>
>         rd = atomic_read(&buffer->record_disabled);
>         do {
>                 new_rd = rd | RB_BUFFER_OFF;
>         } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd));
> }
>
> Which got rid of the need to constantly update the rd variable (cmpxchg
> will load rax with the value read, so it removes the need for an extra
> move).

So that's a good thing?

>
> But in your case, we don't need to update js, in which case the
> try_cmpxchg() does.

Right, it has lesser value here but I'm curious why you feel it also
doesn't belong in that ring buffer loop you shared (or did you mean,
it does belong there but not in other ftrace code modified by Uros?).

> The patch that Uros sent me for the ring buffer code also does some of
> that, which I feel is wrong.

I am guessing that is code *other than* the ring buffer loop above.

> So with that, I would nack the patch.

Dropping it for RCU is fine with me as well.

(And yes for the other thread, I had it backwards, try_cmpxchg is what
drops the need for cmp instruction -- sorry.)

thanks,

 -Joel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-02-28 23:30         ` Joel Fernandes
@ 2023-03-01  0:08           ` Steven Rostedt
  2023-03-01  0:45             ` Joel Fernandes
  2023-03-01 10:28             ` Uros Bizjak
  0 siblings, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2023-03-01  0:08 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Uros Bizjak, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Tue, 28 Feb 2023 18:30:14 -0500
Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > But looking at this use case, I'd actually NAK it, as it is misleading.  
> 
> I'm trying to parse this. You are saying it is misleading, because it
> updates js when it doesn't need to?

Correct.

> 
> > As try_cmpxchg() is used to get rid of the updating of the old value. As in
> > the ring buffer code we had:
> >
> > void ring_buffer_record_off(struct trace_buffer *buffer)
> > {
> >         unsigned int rd;
> >         unsigned int new_rd;
> >
> >         do {
> >                 rd = atomic_read(&buffer->record_disabled);
> >                 new_rd = rd | RB_BUFFER_OFF;
> >         } while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd);  
> 
> Hear you actually meant "rd" as the second parameter without the & ?

Yes, I cut and pasted the updated code and incorrectly try to revert it in
this example :-p

> 
> > }
> >
> > and the try_cmpxchg() converted it to:
> >
> > void ring_buffer_record_off(struct trace_buffer *buffer)
> > {
> >         unsigned int rd;
> >         unsigned int new_rd;
> >
> >         rd = atomic_read(&buffer->record_disabled);
> >         do {
> >                 new_rd = rd | RB_BUFFER_OFF;
> >         } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd));
> > }
> >
> > Which got rid of the need to constantly update the rd variable (cmpxchg
> > will load rax with the value read, so it removes the need for an extra
> > move).  
> 
> So that's a good thing?

Yes. For looping, try_cmpxchg() is the proper function to use. But in the
RCU case (and other cases in the ring-buffer patch) there is no loop, and
no need to modify the "old" variable.

> 
> >
> > But in your case, we don't need to update js, in which case the
> > try_cmpxchg() does.  
> 
> Right, it has lesser value here but I'm curious why you feel it also
> doesn't belong in that ring buffer loop you shared (or did you mean,
> it does belong there but not in other ftrace code modified by Uros?).

The ring buffer patch had more than one change, where half the updates were
fine, and half were not.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01  0:08           ` Steven Rostedt
@ 2023-03-01  0:45             ` Joel Fernandes
  2023-03-01 10:28             ` Uros Bizjak
  1 sibling, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-03-01  0:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Uros Bizjak, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

Hey Steve,

On Tue, Feb 28, 2023 at 7:08 PM Steven Rostedt <rostedt@goodmis.org> wrote:
[...]
> >
> > >
> > > But in your case, we don't need to update js, in which case the
> > > try_cmpxchg() does.
> >
> > Right, it has lesser value here but I'm curious why you feel it also
> > doesn't belong in that ring buffer loop you shared (or did you mean,
> > it does belong there but not in other ftrace code modified by Uros?).
>
> The ring buffer patch had more than one change, where half the updates were
> fine, and half were not.

Thanks for all the clarifications. Good to know such loop design
patterns can benefit from it!


 - Joel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-02-28 20:39 ` Joel Fernandes
  2023-02-28 20:56   ` Joel Fernandes
  2023-02-28 21:03   ` Steven Rostedt
@ 2023-03-01  9:52   ` Uros Bizjak
  2 siblings, 0 replies; 25+ messages in thread
From: Uros Bizjak @ 2023-03-01  9:52 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 28, 2023 at 9:39 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote:
> > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
> > check_cpu_stall.  x86 CMPXCHG instruction returns success in ZF flag, so
> > this change saves a compare after cmpxchg (and related move instruction in
> > front of cmpxchg).
>
> In my codegen, I am not seeing mov instruction before the cmp removed, how
> can that be? The rax has to be populated with a mov before cmpxchg right?
>
> So try_cmpxchg gives: mov, cmpxchg, cmp, jne
> Where as cmpxchg gives: mov, cmpxchg, mov, jne

(I assume the above is reversed)

You have quite a new compiler (e.g. gcc-12+) which is able to reorder
basic blocks, reschedule instructions and create fast paths through
the code based on likely/unlikely annotations in the definition of
try_cmpxchg. [On a related note, some code growth is allowed to the
compiler in the hot path since the kernel is compiled with -O2, not
-Os.] gcc-10.3.1 improves the code from tree.c from:

    a1c5:    0f 84 53 03 00 00        je     a51e <rcu_sched_clock_irq+0x70e>
    a1cb:    48 89 c8                 mov    %rcx,%rax
    a1ce:    f0 48 0f b1 35 00 00     lock cmpxchg %rsi,0x0(%rip)
  # a1d7 <rcu_sched_clock_irq+0x3c7>
    a1d5:    00 00
            a1d3: R_X86_64_PC32    .data+0xf9c
    a1d7:    48 39 c1                 cmp    %rax,%rcx
    a1da:    0f 85 3e 03 00 00        jne    a51e <rcu_sched_clock_irq+0x70e>

to:

    a1d0:    0f 84 49 03 00 00        je     a51f <rcu_sched_clock_irq+0x70f>
    a1d6:    f0 48 0f b1 35 00 00     lock cmpxchg %rsi,0x0(%rip)
  # a1df <rcu_sched_clock_irq+0x3cf>
    a1dd:    00 00
            a1db: R_X86_64_PC32    .data+0xf9c
    a1df:    0f 85 3a 03 00 00        jne    a51f <rcu_sched_clock_irq+0x70f>

The change brings the following code size improvement:

  text    data     bss     dec     hex filename
 57712    9945      86   67743   1089f tree-new.o
 57760    9945      86   67791   108cf tree-old.o

Small change, but the result of effectively an almost mechanical
one-line substitutions.

Uros.

> So yeah you got rid of compare, but I am not seeing reduction in moves.
> Either way, I think it is an improvement due to dropping cmp so:
>
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> thanks,
>
>  - Joel
>
>
> >
> > No functional change intended.
> >
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> > ---
> >  kernel/rcu/tree_stall.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index b10b8349bb2a..d81c88e66b42 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -760,7 +760,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
> >       jn = jiffies + ULONG_MAX / 2;
> >       if (rcu_gp_in_progress() &&
> >           (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
> > -         cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> > +         try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
> >
> >               /*
> >                * If a virtual machine is stopped by the host it can look to
> > @@ -778,7 +778,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
> >
> >       } else if (rcu_gp_in_progress() &&
> >                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
> > -                cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> > +                try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) {
> >
> >               /*
> >                * If a virtual machine is stopped by the host it can look to
> > --
> > 2.39.2
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01  0:08           ` Steven Rostedt
  2023-03-01  0:45             ` Joel Fernandes
@ 2023-03-01 10:28             ` Uros Bizjak
  2023-03-01 16:38               ` Steven Rostedt
  1 sibling, 1 reply; 25+ messages in thread
From: Uros Bizjak @ 2023-03-01 10:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, Paul E. McKenney, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, Mar 1, 2023 at 1:08 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 28 Feb 2023 18:30:14 -0500
> Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > But looking at this use case, I'd actually NAK it, as it is misleading.
> >
> > I'm trying to parse this. You are saying it is misleading, because it
> > updates js when it doesn't need to?
>
> Correct.

I'm a bit late to the discussion (well, I have to sleep from time to
time, too), but in the hope that everybody interested in this issue
will find the reply, I'll try to clarify the "updates" claim:

The try_cmpxchg is written in such a way that benefits loops as well
as linear code, in the latter case it depends on the compiler to
eliminate the dead assignment.

When changing linear code from cmpxchg to try_cmpxchg, one has to take
care that the variable, passed by reference, is unused after cmpxchg,
so it can be considered as a temporary variable (as said elsewhere,
the alternative is to copy the value to a local temporary variable and
pass the pointer to this variable to try_cmpxchg - the compiler will
eliminate the assignment if the original variable is unused).

Even in linear code, the conversion from cmpxchg to try_cmpxchg is
able to eliminate assignment and compare, as can be seen when the code
is compiled with gcc-10.3.1:

    a1c5:    0f 84 53 03 00 00        je     a51e <rcu_sched_clock_irq+0x70e>
    a1cb:    48 89 c8                 mov    %rcx,%rax
    a1ce:    f0 48 0f b1 35 00 00     lock cmpxchg %rsi,0x0(%rip)
  # a1d7 <rcu_sched_clock_irq+0x3c7>
    a1d5:    00 00
            a1d3: R_X86_64_PC32    .data+0xf9c
    a1d7:    48 39 c1                 cmp    %rax,%rcx
    a1da:    0f 85 3e 03 00 00        jne    a51e <rcu_sched_clock_irq+0x70e>

to:

    a1d0:    0f 84 49 03 00 00        je     a51f <rcu_sched_clock_irq+0x70f>
    a1d6:    f0 48 0f b1 35 00 00     lock cmpxchg %rsi,0x0(%rip)
  # a1df <rcu_sched_clock_irq+0x3cf>
    a1dd:    00 00
            a1db: R_X86_64_PC32    .data+0xf9c
    a1df:    0f 85 3a 03 00 00        jne    a51f <rcu_sched_clock_irq+0x70f>

Newer compilers (e.g. gcc-12+) are able to use likely/unlikely
annotations to reorder the code, so the change is less visible. But
due to reordering, even targets that don't define try_cmpxchg natively
benefit from the change, please see thread at [1].

These benefits are the reason the change to try_cmpxchg was accepted
also in the linear code elsewhere in the linux kernel, e.g. [2,3] to
name a few commits, with a thumbs-up and a claim that the new code is
actually *clearer* at the merge commit [4].

I really think that the above demonstrates various improvements, and
would be unfortunate not to consider them.

[1] https://lore.kernel.org/lkml/871qwgmqws.fsf@mpe.ellerman.id.au/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e1da8fe031303599e78f88e0dad9f44272e4f99
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8baceabca656d5ef4494cdeb3b9b9fbb844ac613
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=91bc559d8d3aed488b4b50e9eba1d7ebb1da7bbf

Uros.

> >
> > > As try_cmpxchg() is used to get rid of the updating of the old value. As in
> > > the ring buffer code we had:
> > >
> > > void ring_buffer_record_off(struct trace_buffer *buffer)
> > > {
> > >         unsigned int rd;
> > >         unsigned int new_rd;
> > >
> > >         do {
> > >                 rd = atomic_read(&buffer->record_disabled);
> > >                 new_rd = rd | RB_BUFFER_OFF;
> > >         } while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd);
> >
> > Hear you actually meant "rd" as the second parameter without the & ?
>
> Yes, I cut and pasted the updated code and incorrectly try to revert it in
> this example :-p
>
> >
> > > }
> > >
> > > and the try_cmpxchg() converted it to:
> > >
> > > void ring_buffer_record_off(struct trace_buffer *buffer)
> > > {
> > >         unsigned int rd;
> > >         unsigned int new_rd;
> > >
> > >         rd = atomic_read(&buffer->record_disabled);
> > >         do {
> > >                 new_rd = rd | RB_BUFFER_OFF;
> > >         } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd));
> > > }
> > >
> > > Which got rid of the need to constantly update the rd variable (cmpxchg
> > > will load rax with the value read, so it removes the need for an extra
> > > move).
> >
> > So that's a good thing?
>
> Yes. For looping, try_cmpxchg() is the proper function to use. But in the
> RCU case (and other cases in the ring-buffer patch) there is no loop, and
> no need to modify the "old" variable.
>
> >
> > >
> > > But in your case, we don't need to update js, in which case the
> > > try_cmpxchg() does.
> >
> > Right, it has lesser value here but I'm curious why you feel it also
> > doesn't belong in that ring buffer loop you shared (or did you mean,
> > it does belong there but not in other ftrace code modified by Uros?).
>
> The ring buffer patch had more than one change, where half the updates were
> fine, and half were not.
>
> -- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 10:28             ` Uros Bizjak
@ 2023-03-01 16:38               ` Steven Rostedt
  2023-03-01 18:43                 ` Uros Bizjak
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-03-01 16:38 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Joel Fernandes, Paul E. McKenney, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, 1 Mar 2023 11:28:47 +0100
Uros Bizjak <ubizjak@gmail.com> wrote:

> These benefits are the reason the change to try_cmpxchg was accepted
> also in the linear code elsewhere in the linux kernel, e.g. [2,3] to
> name a few commits, with a thumbs-up and a claim that the new code is
> actually *clearer* at the merge commit [4].

I'll say it here too. I really like Joel's suggestion of having a
cmpxchg_success() that does not have the added side effect of modifying the
old variable.

I think that would allow for the arch optimizations that you are trying to
achieve, as well as remove the side effect that might cause issues down the
road.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 16:38               ` Steven Rostedt
@ 2023-03-01 18:43                 ` Uros Bizjak
  2023-03-01 18:52                   ` Steven Rostedt
  2023-03-01 20:08                   ` Paul E. McKenney
  0 siblings, 2 replies; 25+ messages in thread
From: Uros Bizjak @ 2023-03-01 18:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, Paul E. McKenney, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 827 bytes --]

On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 1 Mar 2023 11:28:47 +0100
> Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > These benefits are the reason the change to try_cmpxchg was accepted
> > also in the linear code elsewhere in the linux kernel, e.g. [2,3] to
> > name a few commits, with a thumbs-up and a claim that the new code is
> > actually *clearer* at the merge commit [4].
>
> I'll say it here too. I really like Joel's suggestion of having a
> cmpxchg_success() that does not have the added side effect of modifying the
> old variable.
>
> I think that would allow for the arch optimizations that you are trying to
> achieve, as well as remove the side effect that might cause issues down the
> road.

Attached patch implements this suggestion.

Uros.

[-- Attachment #2: rcu-2.diff.txt --]
[-- Type: text/plain, Size: 1209 bytes --]

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b10b8349bb2a..229263ebba3b 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -709,6 +709,12 @@ static void print_cpu_stall(unsigned long gps)
 	set_preempt_need_resched();
 }
 
+#define cmpxchg_success(ptr, old, new)				\
+({								\
+	__typeof__(*(ptr)) __tmp = (old);			\
+	try_cmpxchg((ptr), &__tmp, (new));			\
+})
+
 static void check_cpu_stall(struct rcu_data *rdp)
 {
 	bool didstall = false;
@@ -760,7 +766,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
 	jn = jiffies + ULONG_MAX / 2;
 	if (rcu_gp_in_progress() &&
 	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
-	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
+	    cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) {
 
 		/*
 		 * If a virtual machine is stopped by the host it can look to
@@ -778,7 +784,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
 
 	} else if (rcu_gp_in_progress() &&
 		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
-		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
+		   cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) {
 
 		/*
 		 * If a virtual machine is stopped by the host it can look to

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 18:43                 ` Uros Bizjak
@ 2023-03-01 18:52                   ` Steven Rostedt
  2023-03-01 19:18                     ` Uros Bizjak
  2023-03-01 20:08                   ` Paul E. McKenney
  1 sibling, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-03-01 18:52 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Joel Fernandes, Paul E. McKenney, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, 1 Mar 2023 19:43:34 +0100
Uros Bizjak <ubizjak@gmail.com> wrote:

> On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 1 Mar 2023 11:28:47 +0100
> > Uros Bizjak <ubizjak@gmail.com> wrote:
> >  
> > > These benefits are the reason the change to try_cmpxchg was accepted
> > > also in the linear code elsewhere in the linux kernel, e.g. [2,3] to
> > > name a few commits, with a thumbs-up and a claim that the new code is
> > > actually *clearer* at the merge commit [4].  
> >
> > I'll say it here too. I really like Joel's suggestion of having a
> > cmpxchg_success() that does not have the added side effect of modifying the
> > old variable.
> >
> > I think that would allow for the arch optimizations that you are trying to
> > achieve, as well as remove the side effect that might cause issues down the
> > road.  
> 
> Attached patch implements this suggestion.

I like it!

Anyway to make this more generic?

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 18:52                   ` Steven Rostedt
@ 2023-03-01 19:18                     ` Uros Bizjak
  2023-03-01 19:43                       ` Steven Rostedt
  2023-03-01 19:45                       ` Joel Fernandes
  0 siblings, 2 replies; 25+ messages in thread
From: Uros Bizjak @ 2023-03-01 19:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, Paul E. McKenney, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, Mar 1, 2023 at 7:52 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 1 Mar 2023 19:43:34 +0100
> Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Wed, 1 Mar 2023 11:28:47 +0100
> > > Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > > These benefits are the reason the change to try_cmpxchg was accepted
> > > > also in the linear code elsewhere in the linux kernel, e.g. [2,3] to
> > > > name a few commits, with a thumbs-up and a claim that the new code is
> > > > actually *clearer* at the merge commit [4].
> > >
> > > I'll say it here too. I really like Joel's suggestion of having a
> > > cmpxchg_success() that does not have the added side effect of modifying the
> > > old variable.
> > >
> > > I think that would allow for the arch optimizations that you are trying to
> > > achieve, as well as remove the side effect that might cause issues down the
> > > road.
> >
> > Attached patch implements this suggestion.
>
> I like it!

Thanks!

> Anyway to make this more generic?

If we want to put the definition in generic headers, then we also need
to define acquire/release/relaxed and 64bit variants. ATM, we have two
sites that require this definition and I think that for now we could
live with two instances of the same definition in two separate
subsystems. But this would definitely be a good future addition. There
is some code in the form of

if (cmpxchg (ptr, 0, 1) == 0)

that can not be converted to try_cmpxchg, but can use cmpxchg_success.

Uros.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 19:18                     ` Uros Bizjak
@ 2023-03-01 19:43                       ` Steven Rostedt
  2023-03-01 19:46                         ` Joel Fernandes
  2023-03-01 19:45                       ` Joel Fernandes
  1 sibling, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-03-01 19:43 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Joel Fernandes, Paul E. McKenney, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, 1 Mar 2023 20:18:11 +0100
Uros Bizjak <ubizjak@gmail.com> wrote:

> If we want to put the definition in generic headers, then we also need
> to define acquire/release/relaxed and 64bit variants. ATM, we have two
> sites that require this definition and I think that for now we could
> live with two instances of the same definition in two separate
> subsystems. But this would definitely be a good future addition. There
> is some code in the form of
> 
> if (cmpxchg (ptr, 0, 1) == 0)
> 
> that can not be converted to try_cmpxchg, but can use cmpxchg_success.

And even modify code that uses temp variables. For example, where you
modified the ring buffer code to use try_cmpxchg(), I could convert your:

static int rb_head_page_replace(struct buffer_page *old,
				struct buffer_page *new)
{
	unsigned long *ptr = (unsigned long *)&old->list.prev->next;
	unsigned long val;

	val = *ptr & ~RB_FLAG_MASK;
	val |= RB_PAGE_HEAD;

	return try_cmpxchg(ptr, &val, (unsigned long)&new->list);
}

Into just:

static int rb_head_page_replace(struct buffer_page *old,
				struct buffer_page *new)
{
	unsigned long *ptr = (unsigned long *)&old->list.prev->next;
	unsigned long val;

	val = *ptr & ~RB_FLAG_MASK;

	return cmpxchg_success(ptr, val | RB_PAGE_HEAD, (unsigned long)&new->list);
}

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 19:18                     ` Uros Bizjak
  2023-03-01 19:43                       ` Steven Rostedt
@ 2023-03-01 19:45                       ` Joel Fernandes
  1 sibling, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-03-01 19:45 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Steven Rostedt, Paul E. McKenney, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan



> On Mar 1, 2023, at 2:18 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Wed, Mar 1, 2023 at 7:52 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>> 
>>> On Wed, 1 Mar 2023 19:43:34 +0100
>>> Uros Bizjak <ubizjak@gmail.com> wrote:
>>> 
>>> On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>>>> 
>>>> On Wed, 1 Mar 2023 11:28:47 +0100
>>>> Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> 
>>>>> These benefits are the reason the change to try_cmpxchg was accepted
>>>>> also in the linear code elsewhere in the linux kernel, e.g. [2,3] to
>>>>> name a few commits, with a thumbs-up and a claim that the new code is
>>>>> actually *clearer* at the merge commit [4].
>>>> 
>>>> I'll say it here too. I really like Joel's suggestion of having a
>>>> cmpxchg_success() that does not have the added side effect of modifying the
>>>> old variable.
>>>> 
>>>> I think that would allow for the arch optimizations that you are trying to
>>>> achieve, as well as remove the side effect that might cause issues down the
>>>> road.
>>> 
>>> Attached patch implements this suggestion.
>> 
>> I like it!

Me too :)

> Thanks!
> 
>> Anyway to make this more generic?
> 
> If we want to put the definition in generic headers, then we also need
> to define acquire/release/relaxed and 64bit variants. ATM, we have two
> sites that require this definition and I think that for now we could
> live with two instances of the same definition in two separate
> subsystems. But this would definitely be a good future addition. There
> is some code in the form of
> 
> if (cmpxchg (ptr, 0, 1) == 0)
> 
> that can not be converted to try_cmpxchg, but can use cmpxchg_success.

I would prefer if we can put it in generic headers instead of duplicating
across ftrace and RCU.

thanks,

 - Joel

> 
> Uros.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 19:43                       ` Steven Rostedt
@ 2023-03-01 19:46                         ` Joel Fernandes
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-03-01 19:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Uros Bizjak, Paul E. McKenney, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan



> On Mar 1, 2023, at 2:43 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Wed, 1 Mar 2023 20:18:11 +0100
> Uros Bizjak <ubizjak@gmail.com> wrote:
> 
>> If we want to put the definition in generic headers, then we also need
>> to define acquire/release/relaxed and 64bit variants. ATM, we have two
>> sites that require this definition and I think that for now we could
>> live with two instances of the same definition in two separate
>> subsystems. But this would definitely be a good future addition. There
>> is some code in the form of
>> 
>> if (cmpxchg (ptr, 0, 1) == 0)
>> 
>> that can not be converted to try_cmpxchg, but can use cmpxchg_success.
> 
> And even modify code that uses temp variables. For example, where you
> modified the ring buffer code to use try_cmpxchg(), I could convert your:
> 
> static int rb_head_page_replace(struct buffer_page *old,
>                struct buffer_page *new)
> {
>    unsigned long *ptr = (unsigned long *)&old->list.prev->next;
>    unsigned long val;
> 
>    val = *ptr & ~RB_FLAG_MASK;
>    val |= RB_PAGE_HEAD;
> 
>    return try_cmpxchg(ptr, &val, (unsigned long)&new->list);
> }
> 
> Into just:
> 
> static int rb_head_page_replace(struct buffer_page *old,
>                struct buffer_page *new)
> {
>    unsigned long *ptr = (unsigned long *)&old->list.prev->next;
>    unsigned long val;
> 
>    val = *ptr & ~RB_FLAG_MASK;
> 
>    return cmpxchg_success(ptr, val | RB_PAGE_HEAD, (unsigned long)&new->list);
> }

Nice! Looks much better.

Thanks,

 - Joel

> 
> -- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 18:43                 ` Uros Bizjak
  2023-03-01 18:52                   ` Steven Rostedt
@ 2023-03-01 20:08                   ` Paul E. McKenney
  2023-03-01 20:18                     ` Joel Fernandes
  2023-03-01 20:18                     ` Steven Rostedt
  1 sibling, 2 replies; 25+ messages in thread
From: Paul E. McKenney @ 2023-03-01 20:08 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Steven Rostedt, Joel Fernandes, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, Mar 01, 2023 at 07:43:34PM +0100, Uros Bizjak wrote:
> On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 1 Mar 2023 11:28:47 +0100
> > Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > > These benefits are the reason the change to try_cmpxchg was accepted
> > > also in the linear code elsewhere in the linux kernel, e.g. [2,3] to
> > > name a few commits, with a thumbs-up and a claim that the new code is
> > > actually *clearer* at the merge commit [4].
> >
> > I'll say it here too. I really like Joel's suggestion of having a
> > cmpxchg_success() that does not have the added side effect of modifying the
> > old variable.
> >
> > I think that would allow for the arch optimizations that you are trying to
> > achieve, as well as remove the side effect that might cause issues down the
> > road.
> 
> Attached patch implements this suggestion.

Please help me out here.

Why on earth are we even discussing making this change to code that
normally never executes?  Performance is not a consideration here.

What am I missing here?  Is there some sort of forward-progress
issue that this change addresses?

							Thanx, Paul

> Uros.

> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index b10b8349bb2a..229263ebba3b 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -709,6 +709,12 @@ static void print_cpu_stall(unsigned long gps)
>  	set_preempt_need_resched();
>  }
>  
> +#define cmpxchg_success(ptr, old, new)				\
> +({								\
> +	__typeof__(*(ptr)) __tmp = (old);			\
> +	try_cmpxchg((ptr), &__tmp, (new));			\
> +})
> +
>  static void check_cpu_stall(struct rcu_data *rdp)
>  {
>  	bool didstall = false;
> @@ -760,7 +766,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  	jn = jiffies + ULONG_MAX / 2;
>  	if (rcu_gp_in_progress() &&
>  	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
> -	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> +	    cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) {
>  
>  		/*
>  		 * If a virtual machine is stopped by the host it can look to
> @@ -778,7 +784,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  
>  	} else if (rcu_gp_in_progress() &&
>  		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
> -		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> +		   cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) {
>  
>  		/*
>  		 * If a virtual machine is stopped by the host it can look to


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 20:08                   ` Paul E. McKenney
@ 2023-03-01 20:18                     ` Joel Fernandes
  2023-03-01 20:18                     ` Steven Rostedt
  1 sibling, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-03-01 20:18 UTC (permalink / raw)
  To: paulmck
  Cc: Uros Bizjak, Steven Rostedt, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan



> On Mar 1, 2023, at 3:08 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Wed, Mar 01, 2023 at 07:43:34PM +0100, Uros Bizjak wrote:
>>> On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>>> 
>>> On Wed, 1 Mar 2023 11:28:47 +0100
>>> Uros Bizjak <ubizjak@gmail.com> wrote:
>>> 
>>>> These benefits are the reason the change to try_cmpxchg was accepted
>>>> also in the linear code elsewhere in the linux kernel, e.g. [2,3] to
>>>> name a few commits, with a thumbs-up and a claim that the new code is
>>>> actually *clearer* at the merge commit [4].
>>> 
>>> I'll say it here too. I really like Joel's suggestion of having a
>>> cmpxchg_success() that does not have the added side effect of modifying the
>>> old variable.
>>> 
>>> I think that would allow for the arch optimizations that you are trying to
>>> achieve, as well as remove the side effect that might cause issues down the
>>> road.
>> 
>> Attached patch implements this suggestion.
> 
> Please help me out here.
> 
> Why on earth are we even discussing making this change to code that
> normally never executes?  Performance is not a consideration here.
> 
> What am I missing here?  Is there some sort of forward-progress
> issue that this change addresses?

I do not think it is anything with performance. The suggestion just makes
the code easier to read. In the case of ftrace (not RCU), it results in further
deleted lines of code.

Maybe it got confusing because we are discussing the change as it
applies to both ftrace and RCU.

You could argue that it has to do with performance in the fast path, but it
is probably down in the noise.

 - Joel 


> 
>                            Thanx, Paul
> 
>> Uros.
> 
>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>> index b10b8349bb2a..229263ebba3b 100644
>> --- a/kernel/rcu/tree_stall.h
>> +++ b/kernel/rcu/tree_stall.h
>> @@ -709,6 +709,12 @@ static void print_cpu_stall(unsigned long gps)
>>    set_preempt_need_resched();
>> }
>> 
>> +#define cmpxchg_success(ptr, old, new)                \
>> +({                                \
>> +    __typeof__(*(ptr)) __tmp = (old);            \
>> +    try_cmpxchg((ptr), &__tmp, (new));            \
>> +})
>> +
>> static void check_cpu_stall(struct rcu_data *rdp)
>> {
>>    bool didstall = false;
>> @@ -760,7 +766,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
>>    jn = jiffies + ULONG_MAX / 2;
>>    if (rcu_gp_in_progress() &&
>>        (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
>> -        cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>> +        cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) {
>> 
>>        /*
>>         * If a virtual machine is stopped by the host it can look to
>> @@ -778,7 +784,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
>> 
>>    } else if (rcu_gp_in_progress() &&
>>           ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
>> -           cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>> +           cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) {
>> 
>>        /*
>>         * If a virtual machine is stopped by the host it can look to
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 20:08                   ` Paul E. McKenney
  2023-03-01 20:18                     ` Joel Fernandes
@ 2023-03-01 20:18                     ` Steven Rostedt
  2023-03-01 20:36                       ` Paul E. McKenney
  1 sibling, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-03-01 20:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uros Bizjak, Joel Fernandes, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, 1 Mar 2023 12:08:20 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > Attached patch implements this suggestion.  
> 
> Please help me out here.
> 
> Why on earth are we even discussing making this change to code that
> normally never executes?  Performance is not a consideration here.
> 
> What am I missing here?  Is there some sort of forward-progress
> issue that this change addresses?

Well, we sorta hijacked this thread. It turned into a more general
discussion, as there is code that this change will be useful for
(ring_buffer.c), but we just happen to be having the discussion here.

Where it will at most remove some text and give you back a few extra bytes
of memory ;-)

But if we do use cmpxchg_success() IMHO, it does improve readability.

> -	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> +	    cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) {

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 20:18                     ` Steven Rostedt
@ 2023-03-01 20:36                       ` Paul E. McKenney
  2023-03-01 20:46                         ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2023-03-01 20:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Uros Bizjak, Joel Fernandes, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, Mar 01, 2023 at 03:18:26PM -0500, Steven Rostedt wrote:
> On Wed, 1 Mar 2023 12:08:20 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > Attached patch implements this suggestion.  
> > 
> > Please help me out here.
> > 
> > Why on earth are we even discussing making this change to code that
> > normally never executes?  Performance is not a consideration here.
> > 
> > What am I missing here?  Is there some sort of forward-progress
> > issue that this change addresses?
> 
> Well, we sorta hijacked this thread. It turned into a more general
> discussion, as there is code that this change will be useful for
> (ring_buffer.c), but we just happen to be having the discussion here.
> 
> Where it will at most remove some text and give you back a few extra bytes
> of memory ;-)
> 
> But if we do use cmpxchg_success() IMHO, it does improve readability.
> 
> > -	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> > +	    cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) {

Some years down the road, should cmpxchg_success() be on the tip of
the tongue of every kernel hacker, perhaps.  Or perhaps not.

In the meantime, we have yet another abysmally documented atomic
operation that is not well known throughout the community.  And then the
people coming across this curse everyone who had anything to do with it,
as they search the source code, dig through assembly output, and so on
trying to work out exactly what this thing does.

Sorry, but no way.

Again, unless there is some sort of forward-progress argument or
similar convincing argument.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 20:36                       ` Paul E. McKenney
@ 2023-03-01 20:46                         ` Steven Rostedt
  2023-03-01 21:29                           ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-03-01 20:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uros Bizjak, Joel Fernandes, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, 1 Mar 2023 12:36:45 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> Some years down the road, should cmpxchg_success() be on the tip of
> the tongue of every kernel hacker, perhaps.  Or perhaps not.

A bit of a catch-22 I would say. It will only become something everyone
knows if it exists.

> 
> In the meantime, we have yet another abysmally documented atomic

Is it?

> operation that is not well known throughout the community.  And then the
> people coming across this curse everyone who had anything to do with it,
> as they search the source code, dig through assembly output, and so on
> trying to work out exactly what this thing does.
> 
> Sorry, but no way.
> 
> Again, unless there is some sort of forward-progress argument or
> similar convincing argument.

Speaking of forward progress...

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/atomic_t.txt#n316

Anyway, I'm guessing this will not become part of rcu any time soon. But
for the ring buffer, I would happily take it.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rcu: use try_cmpxchg in check_cpu_stall
  2023-03-01 20:46                         ` Steven Rostedt
@ 2023-03-01 21:29                           ` Paul E. McKenney
  0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2023-03-01 21:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Uros Bizjak, Joel Fernandes, rcu, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, Mar 01, 2023 at 03:46:41PM -0500, Steven Rostedt wrote:
> On Wed, 1 Mar 2023 12:36:45 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > Some years down the road, should cmpxchg_success() be on the tip of
> > the tongue of every kernel hacker, perhaps.  Or perhaps not.
> 
> A bit of a catch-22 I would say. It will only become something everyone
> knows if it exists.
> 
> > In the meantime, we have yet another abysmally documented atomic
> 
> Is it?

Is sure is.

> > operation that is not well known throughout the community.  And then the
> > people coming across this curse everyone who had anything to do with it,
> > as they search the source code, dig through assembly output, and so on
> > trying to work out exactly what this thing does.
> > 
> > Sorry, but no way.
> > 
> > Again, unless there is some sort of forward-progress argument or
> > similar convincing argument.
> 
> Speaking of forward progress...
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/atomic_t.txt#n316

Yes, that is probably the best starting point.

And if you have been around long enough, you know that this is in fact
the best starting point.  Plus if you can correctly interpret the words
in that document.  And if you are familiar with the entire document.
But otherwise, good luck learning the semantics for something like
atomic_fetch_add_release().

> Anyway, I'm guessing this will not become part of rcu any time soon. But
> for the ring buffer, I would happily take it.

Certainly not unless someone comes up with a good reason for it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-03-01 21:29 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28 15:51 [PATCH] rcu: use try_cmpxchg in check_cpu_stall Uros Bizjak
2023-02-28 20:39 ` Joel Fernandes
2023-02-28 20:56   ` Joel Fernandes
2023-02-28 21:03   ` Steven Rostedt
2023-02-28 21:29     ` Paul E. McKenney
2023-02-28 21:41       ` Steven Rostedt
2023-02-28 21:56         ` Paul E. McKenney
2023-02-28 23:30         ` Joel Fernandes
2023-03-01  0:08           ` Steven Rostedt
2023-03-01  0:45             ` Joel Fernandes
2023-03-01 10:28             ` Uros Bizjak
2023-03-01 16:38               ` Steven Rostedt
2023-03-01 18:43                 ` Uros Bizjak
2023-03-01 18:52                   ` Steven Rostedt
2023-03-01 19:18                     ` Uros Bizjak
2023-03-01 19:43                       ` Steven Rostedt
2023-03-01 19:46                         ` Joel Fernandes
2023-03-01 19:45                       ` Joel Fernandes
2023-03-01 20:08                   ` Paul E. McKenney
2023-03-01 20:18                     ` Joel Fernandes
2023-03-01 20:18                     ` Steven Rostedt
2023-03-01 20:36                       ` Paul E. McKenney
2023-03-01 20:46                         ` Steven Rostedt
2023-03-01 21:29                           ` Paul E. McKenney
2023-03-01  9:52   ` Uros Bizjak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.