All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] locking/qspinlock: Optimize pending state waiting for unlock
@ 2023-01-05  2:19 guoren
  2023-01-05  3:34 ` Waiman Long
  2023-01-05 10:36 ` [tip: locking/core] locking/qspinlock: Micro-optimize " tip-bot2 for Guo Ren
  0 siblings, 2 replies; 3+ messages in thread
From: guoren @ 2023-01-05  2:19 UTC (permalink / raw)
  To: peterz, longman, mingo
  Cc: linux-kernel, guoren, Guo Ren, Boqun Feng, Will Deacon

From: Guo Ren <guoren@linux.alibaba.com>

When we're pending, we only care about lock value. The xchg_tail
wouldn't affect the pending state. That means the hardware thread
could stay in a sleep state and leaves the rest execution units'
resources of pipeline to other hardware threads. This situation is
the SMT scenarios in the same core. Not an entering low-power state
situation. Of course, the granularity between cores is "cacheline",
but the granularity between SMT hw threads of the same core could
be "byte" which internal LSU handles. For example, when a hw-thread
yields the resources of the core to other hw-threads, this patch
could help the hw-thread stay in the sleep state and prevent it
from being woken up by other hw-threads xchg_tail.

Link: https://lore.kernel.org/lkml/20221224120545.262989-1-guoren@kernel.org/
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
---
Changes in v2:
 - Add acked tag
 - Optimize commit log
 - Add discussion Link tag
---
 kernel/locking/qspinlock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 2b23378775fe..ebe6b8ec7cb3 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -371,7 +371,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	/*
 	 * We're pending, wait for the owner to go away.
 	 *
-	 * 0,1,1 -> 0,1,0
+	 * 0,1,1 -> *,1,0
 	 *
 	 * this wait loop must be a load-acquire such that we match the
 	 * store-release that clears the locked bit and create lock
@@ -380,7 +380,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * barriers.
 	 */
 	if (val & _Q_LOCKED_MASK)
-		atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_MASK));
+		smp_cond_load_acquire(&lock->locked, !VAL);
 
 	/*
 	 * take ownership and clear the pending bit.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH V2] locking/qspinlock: Optimize pending state waiting for unlock
  2023-01-05  2:19 [PATCH V2] locking/qspinlock: Optimize pending state waiting for unlock guoren
@ 2023-01-05  3:34 ` Waiman Long
  2023-01-05 10:36 ` [tip: locking/core] locking/qspinlock: Micro-optimize " tip-bot2 for Guo Ren
  1 sibling, 0 replies; 3+ messages in thread
From: Waiman Long @ 2023-01-05  3:34 UTC (permalink / raw)
  To: guoren, peterz, mingo; +Cc: linux-kernel, Guo Ren, Boqun Feng, Will Deacon

On 1/4/23 21:19, guoren@kernel.org wrote:
> From: Guo Ren <guoren@linux.alibaba.com>
>
> When we're pending, we only care about lock value. The xchg_tail
> wouldn't affect the pending state. That means the hardware thread
> could stay in a sleep state and leaves the rest execution units'
> resources of pipeline to other hardware threads. This situation is
> the SMT scenarios in the same core. Not an entering low-power state
> situation. Of course, the granularity between cores is "cacheline",
> but the granularity between SMT hw threads of the same core could
> be "byte" which internal LSU handles. For example, when a hw-thread
> yields the resources of the core to other hw-threads, this patch
> could help the hw-thread stay in the sleep state and prevent it
> from being woken up by other hw-threads xchg_tail.
>
> Link: https://lore.kernel.org/lkml/20221224120545.262989-1-guoren@kernel.org/
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> Acked-by: Waiman Long <longman@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> ---
> Changes in v2:
>   - Add acked tag
>   - Optimize commit log
>   - Add discussion Link tag
> ---
>   kernel/locking/qspinlock.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index 2b23378775fe..ebe6b8ec7cb3 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -371,7 +371,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>   	/*
>   	 * We're pending, wait for the owner to go away.
>   	 *
> -	 * 0,1,1 -> 0,1,0
> +	 * 0,1,1 -> *,1,0
>   	 *
>   	 * this wait loop must be a load-acquire such that we match the
>   	 * store-release that clears the locked bit and create lock
> @@ -380,7 +380,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>   	 * barriers.
>   	 */
>   	if (val & _Q_LOCKED_MASK)
> -		atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_MASK));
> +		smp_cond_load_acquire(&lock->locked, !VAL);
>   
>   	/*
>   	 * take ownership and clear the pending bit.

Yes, the new patch description looks good to me. Thank for sending the v2.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip: locking/core] locking/qspinlock: Micro-optimize pending state waiting for unlock
  2023-01-05  2:19 [PATCH V2] locking/qspinlock: Optimize pending state waiting for unlock guoren
  2023-01-05  3:34 ` Waiman Long
@ 2023-01-05 10:36 ` tip-bot2 for Guo Ren
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Guo Ren @ 2023-01-05 10:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Guo Ren, Guo Ren, Ingo Molnar, Waiman Long, Peter Zijlstra, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     4282494a20cdcaf38d553f2c2ff6f252084f979c
Gitweb:        https://git.kernel.org/tip/4282494a20cdcaf38d553f2c2ff6f252084f979c
Author:        Guo Ren <guoren@linux.alibaba.com>
AuthorDate:    Wed, 04 Jan 2023 21:19:52 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 05 Jan 2023 11:01:50 +01:00

locking/qspinlock: Micro-optimize pending state waiting for unlock

When we're pending, we only care about lock value. The xchg_tail
wouldn't affect the pending state. That means the hardware thread
could stay in a sleep state and leaves the rest execution units'
resources of pipeline to other hardware threads. This situation is
the SMT scenarios in the same core. Not an entering low-power state
situation. Of course, the granularity between cores is "cacheline",
but the granularity between SMT hw threads of the same core could
be "byte" which internal LSU handles. For example, when a hw-thread
yields the resources of the core to other hw-threads, this patch
could help the hw-thread stay in the sleep state and prevent it
from being woken up by other hw-threads xchg_tail.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20230105021952.3090070-1-guoren@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/locking/qspinlock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 2b23378..ebe6b8e 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -371,7 +371,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	/*
 	 * We're pending, wait for the owner to go away.
 	 *
-	 * 0,1,1 -> 0,1,0
+	 * 0,1,1 -> *,1,0
 	 *
 	 * this wait loop must be a load-acquire such that we match the
 	 * store-release that clears the locked bit and create lock
@@ -380,7 +380,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * barriers.
 	 */
 	if (val & _Q_LOCKED_MASK)
-		atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_MASK));
+		smp_cond_load_acquire(&lock->locked, !VAL);
 
 	/*
 	 * take ownership and clear the pending bit.

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-05 10:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-05  2:19 [PATCH V2] locking/qspinlock: Optimize pending state waiting for unlock guoren
2023-01-05  3:34 ` Waiman Long
2023-01-05 10:36 ` [tip: locking/core] locking/qspinlock: Micro-optimize " tip-bot2 for Guo Ren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.