* [PATCH 1/6] powerpc/qspinlock: Fix stale propagated yield_cpu
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
@ 2023-10-16 12:43 ` Nicholas Piggin
2023-10-16 12:43 ` [PATCH 2/6] powerpc/qspinlock: stop queued waiters trying to set lock sleepy Nicholas Piggin
` (8 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2023-10-16 12:43 UTC (permalink / raw)
To: linuxppc-dev
Cc: Shrikanth Hegde, Laurent Dufour, Srikar Dronamraju,
Nicholas Piggin, Nysal Jan K . A
yield_cpu is a sample of a preempted lock holder that gets propagated
back through the queue. Queued waiters use this to yield to the
preempted lock holder without continually sampling the lock word (which
would defeat the purpose of MCS queueing by bouncing the cache line).
The problem is that yield_cpu can become stale. It can take some time to
be passed down the chain, and if any queued waiter gets preempted then
it will cease to propagate the yield_cpu to later waiters.
This can result in yielding to a CPU that no longer holds the lock,
which is bad, but particularly if it is currently in H_CEDE (idle),
then it appears to be preempted and some hypervisors (PowerVM) can
cause very long H_CONFER latencies waiting for H_CEDE wakeup. This
results in latency spikes and hard lockups on oversubscribed
partitions with lock contention.
This is a minimal fix. Before yielding to yield_cpu, sample the lock
word to confirm yield_cpu is still the owner, and bail out of it is not.
Thanks to a bunch of people who reported this and tracked down the
exact problem using tracepoints and dispatch trace logs.
Fixes: 28db61e207ea ("powerpc/qspinlock: allow propagation of yield CPU down the queue")
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Laurent Dufour <ldufour@linux.ibm.com>
Reported-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Debugged-by: Nysal Jan K.A <nysal@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/lib/qspinlock.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 253620979d0c..6dd2f46bd3ef 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -406,6 +406,9 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
if ((yield_count & 1) == 0)
goto yield_prev; /* owner vcpu is running */
+ if (get_owner_cpu(READ_ONCE(lock->val)) != yield_cpu)
+ goto yield_prev; /* re-sample lock owner */
+
spin_end();
preempted = true;
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/6] powerpc/qspinlock: stop queued waiters trying to set lock sleepy
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
2023-10-16 12:43 ` [PATCH 1/6] powerpc/qspinlock: Fix stale propagated yield_cpu Nicholas Piggin
@ 2023-10-16 12:43 ` Nicholas Piggin
2023-10-20 9:50 ` Nysal Jan K.A.
2023-10-16 12:43 ` [PATCH 3/6] powerpc/qspinlock: propagate owner preemptedness rather than CPU number Nicholas Piggin
` (7 subsequent siblings)
9 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2023-10-16 12:43 UTC (permalink / raw)
To: linuxppc-dev
Cc: Shrikanth Hegde, Srikar Dronamraju, Nicholas Piggin, Nysal Jan K . A
If a queued waiter notices the lock owner or the previous waiter has
been preempted, it attempts to mark the lock sleepy, but it does this
as a try-set operation using the original lock value it got when
queueing, which will become stale as the queue progresses, and the
try-set will fail. Drop this and just set the sleepy seen clock.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/lib/qspinlock.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 6dd2f46bd3ef..75608ced14c2 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -247,22 +247,18 @@ static __always_inline void seen_sleepy_lock(void)
this_cpu_write(sleepy_lock_seen_clock, sched_clock());
}
-static __always_inline void seen_sleepy_node(struct qspinlock *lock, u32 val)
+static __always_inline void seen_sleepy_node(void)
{
if (pv_sleepy_lock) {
if (pv_sleepy_lock_interval_ns)
this_cpu_write(sleepy_lock_seen_clock, sched_clock());
- if (val & _Q_LOCKED_VAL) {
- if (!(val & _Q_SLEEPY_VAL))
- try_set_sleepy(lock, val);
- }
+ /* Don't set sleepy because we likely have a stale val */
}
}
-static struct qnode *get_tail_qnode(struct qspinlock *lock, u32 val)
+static struct qnode *get_tail_qnode(struct qspinlock *lock, int prev_cpu)
{
- int cpu = decode_tail_cpu(val);
- struct qnodes *qnodesp = per_cpu_ptr(&qnodes, cpu);
+ struct qnodes *qnodesp = per_cpu_ptr(&qnodes, prev_cpu);
int idx;
/*
@@ -381,9 +377,8 @@ static __always_inline void propagate_yield_cpu(struct qnode *node, u32 val, int
}
/* Called inside spin_begin() */
-static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *node, u32 val, bool paravirt)
+static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *node, int prev_cpu, bool paravirt)
{
- int prev_cpu = decode_tail_cpu(val);
u32 yield_count;
int yield_cpu;
bool preempted = false;
@@ -412,7 +407,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
spin_end();
preempted = true;
- seen_sleepy_node(lock, val);
+ seen_sleepy_node();
smp_rmb();
@@ -436,7 +431,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
spin_end();
preempted = true;
- seen_sleepy_node(lock, val);
+ seen_sleepy_node();
smp_rmb(); /* See __yield_to_locked_owner comment */
@@ -587,7 +582,8 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
* head of the waitqueue.
*/
if (old & _Q_TAIL_CPU_MASK) {
- struct qnode *prev = get_tail_qnode(lock, old);
+ int prev_cpu = decode_tail_cpu(old);
+ struct qnode *prev = get_tail_qnode(lock, prev_cpu);
/* Link @node into the waitqueue. */
WRITE_ONCE(prev->next, node);
@@ -597,7 +593,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
while (!READ_ONCE(node->locked)) {
spec_barrier();
- if (yield_to_prev(lock, node, old, paravirt))
+ if (yield_to_prev(lock, node, prev_cpu, paravirt))
seen_preempted = true;
}
spec_barrier();
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] powerpc/qspinlock: stop queued waiters trying to set lock sleepy
2023-10-16 12:43 ` [PATCH 2/6] powerpc/qspinlock: stop queued waiters trying to set lock sleepy Nicholas Piggin
@ 2023-10-20 9:50 ` Nysal Jan K.A.
0 siblings, 0 replies; 12+ messages in thread
From: Nysal Jan K.A. @ 2023-10-20 9:50 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: Shrikanth Hegde, linuxppc-dev, Srikar Dronamraju
On Mon, Oct 16, 2023 at 10:43:01PM +1000, Nicholas Piggin wrote:
> If a queued waiter notices the lock owner or the previous waiter has
> been preempted, it attempts to mark the lock sleepy, but it does this
> as a try-set operation using the original lock value it got when
> queueing, which will become stale as the queue progresses, and the
> try-set will fail. Drop this and just set the sleepy seen clock.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/lib/qspinlock.c | 24 ++++++++++--------------
> 1 file changed, 10 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
> index 6dd2f46bd3ef..75608ced14c2 100644
> --- a/arch/powerpc/lib/qspinlock.c
> +++ b/arch/powerpc/lib/qspinlock.c
> @@ -247,22 +247,18 @@ static __always_inline void seen_sleepy_lock(void)
> this_cpu_write(sleepy_lock_seen_clock, sched_clock());
> }
>
> -static __always_inline void seen_sleepy_node(struct qspinlock *lock, u32 val)
> +static __always_inline void seen_sleepy_node(void)
> {
> if (pv_sleepy_lock) {
> if (pv_sleepy_lock_interval_ns)
> this_cpu_write(sleepy_lock_seen_clock, sched_clock());
> - if (val & _Q_LOCKED_VAL) {
> - if (!(val & _Q_SLEEPY_VAL))
> - try_set_sleepy(lock, val);
> - }
> + /* Don't set sleepy because we likely have a stale val */
> }
> }
seen_sleepy_lock() and seen_sleepy_node() are now the same
>
> -static struct qnode *get_tail_qnode(struct qspinlock *lock, u32 val)
> +static struct qnode *get_tail_qnode(struct qspinlock *lock, int prev_cpu)
> {
> - int cpu = decode_tail_cpu(val);
> - struct qnodes *qnodesp = per_cpu_ptr(&qnodes, cpu);
> + struct qnodes *qnodesp = per_cpu_ptr(&qnodes, prev_cpu);
> int idx;
>
> /*
> @@ -381,9 +377,8 @@ static __always_inline void propagate_yield_cpu(struct qnode *node, u32 val, int
> }
>
> /* Called inside spin_begin() */
> -static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *node, u32 val, bool paravirt)
> +static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *node, int prev_cpu, bool paravirt)
> {
> - int prev_cpu = decode_tail_cpu(val);
> u32 yield_count;
> int yield_cpu;
> bool preempted = false;
> @@ -412,7 +407,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
> spin_end();
>
> preempted = true;
> - seen_sleepy_node(lock, val);
> + seen_sleepy_node();
>
> smp_rmb();
>
> @@ -436,7 +431,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
> spin_end();
>
> preempted = true;
> - seen_sleepy_node(lock, val);
> + seen_sleepy_node();
>
> smp_rmb(); /* See __yield_to_locked_owner comment */
>
> @@ -587,7 +582,8 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
> * head of the waitqueue.
> */
> if (old & _Q_TAIL_CPU_MASK) {
> - struct qnode *prev = get_tail_qnode(lock, old);
> + int prev_cpu = decode_tail_cpu(old);
> + struct qnode *prev = get_tail_qnode(lock, prev_cpu);
>
> /* Link @node into the waitqueue. */
> WRITE_ONCE(prev->next, node);
> @@ -597,7 +593,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
> while (!READ_ONCE(node->locked)) {
> spec_barrier();
>
> - if (yield_to_prev(lock, node, old, paravirt))
> + if (yield_to_prev(lock, node, prev_cpu, paravirt))
> seen_preempted = true;
> }
> spec_barrier();
> --
> 2.42.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/6] powerpc/qspinlock: propagate owner preemptedness rather than CPU number
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
2023-10-16 12:43 ` [PATCH 1/6] powerpc/qspinlock: Fix stale propagated yield_cpu Nicholas Piggin
2023-10-16 12:43 ` [PATCH 2/6] powerpc/qspinlock: stop queued waiters trying to set lock sleepy Nicholas Piggin
@ 2023-10-16 12:43 ` Nicholas Piggin
2023-10-16 12:43 ` [PATCH 4/6] powerpc/qspinlock: don't propagate the not-sleepy state Nicholas Piggin
` (6 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2023-10-16 12:43 UTC (permalink / raw)
To: linuxppc-dev
Cc: Shrikanth Hegde, Srikar Dronamraju, Nicholas Piggin, Nysal Jan K . A
Rather than propagating the CPU number of the preempted lock owner,
just propagate whether the owner was preempted. Waiters must read the
lock value when yielding to it to prevent races anyway, so might as
well always load the owner CPU from the lock.
To further simplify the code, also don't propagate the -1 (or
sleepy=false in the new scheme) down the queue. Instead, have the
waiters clear it themselves when finding the lock owner is not
preempted.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/lib/qspinlock.c | 80 ++++++++++++++++--------------------
1 file changed, 36 insertions(+), 44 deletions(-)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 75608ced14c2..0932d24a6b07 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -16,7 +16,8 @@ struct qnode {
struct qnode *next;
struct qspinlock *lock;
int cpu;
- int yield_cpu;
+ u8 sleepy; /* 1 if the previous vCPU was preempted or
+ * if the previous node was sleepy */
u8 locked; /* 1 if lock acquired */
};
@@ -349,7 +350,7 @@ static __always_inline bool yield_head_to_locked_owner(struct qspinlock *lock, u
return __yield_to_locked_owner(lock, val, paravirt, mustq);
}
-static __always_inline void propagate_yield_cpu(struct qnode *node, u32 val, int *set_yield_cpu, bool paravirt)
+static __always_inline void propagate_sleepy(struct qnode *node, u32 val, bool *set_sleepy, bool paravirt)
{
struct qnode *next;
int owner;
@@ -358,21 +359,17 @@ static __always_inline void propagate_yield_cpu(struct qnode *node, u32 val, int
return;
if (!pv_yield_propagate_owner)
return;
-
- owner = get_owner_cpu(val);
- if (*set_yield_cpu == owner)
+ if (*set_sleepy)
return;
next = READ_ONCE(node->next);
if (!next)
return;
+ owner = get_owner_cpu(val);
if (vcpu_is_preempted(owner)) {
- next->yield_cpu = owner;
- *set_yield_cpu = owner;
- } else if (*set_yield_cpu != -1) {
- next->yield_cpu = owner;
- *set_yield_cpu = owner;
+ next->sleepy = 1;
+ *set_sleepy = true;
}
}
@@ -380,7 +377,6 @@ static __always_inline void propagate_yield_cpu(struct qnode *node, u32 val, int
static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *node, int prev_cpu, bool paravirt)
{
u32 yield_count;
- int yield_cpu;
bool preempted = false;
if (!paravirt)
@@ -389,36 +385,32 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
if (!pv_yield_propagate_owner)
goto yield_prev;
- yield_cpu = READ_ONCE(node->yield_cpu);
- if (yield_cpu == -1) {
- /* Propagate back the -1 CPU */
- if (node->next && node->next->yield_cpu != -1)
- node->next->yield_cpu = yield_cpu;
+ if (!READ_ONCE(node->sleepy)) {
+ /* Propagate back sleepy==false */
+ if (node->next && node->next->sleepy)
+ node->next->sleepy = 0;
goto yield_prev;
- }
-
- yield_count = yield_count_of(yield_cpu);
- if ((yield_count & 1) == 0)
- goto yield_prev; /* owner vcpu is running */
-
- if (get_owner_cpu(READ_ONCE(lock->val)) != yield_cpu)
- goto yield_prev; /* re-sample lock owner */
-
- spin_end();
-
- preempted = true;
- seen_sleepy_node();
-
- smp_rmb();
+ } else {
+ u32 val = READ_ONCE(lock->val);
+
+ if (val & _Q_LOCKED_VAL) {
+ if (node->next && !node->next->sleepy) {
+ /*
+ * Propagate sleepy to next waiter. Only if
+ * owner is preempted, which allows the queue
+ * to become "non-sleepy" if vCPU preemption
+ * ceases to occur, even if the lock remains
+ * highly contended.
+ */
+ if (vcpu_is_preempted(get_owner_cpu(val)))
+ node->next->sleepy = 1;
+ }
- if (yield_cpu == node->yield_cpu) {
- if (node->next && node->next->yield_cpu != yield_cpu)
- node->next->yield_cpu = yield_cpu;
- yield_to_preempted(yield_cpu, yield_count);
- spin_begin();
- return preempted;
+ preempted = yield_to_locked_owner(lock, val, paravirt);
+ if (preempted)
+ return preempted;
+ }
}
- spin_begin();
yield_prev:
if (!pv_yield_prev)
@@ -541,7 +533,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
bool sleepy = false;
bool mustq = false;
int idx;
- int set_yield_cpu = -1;
+ bool set_sleepy = false;
int iters = 0;
BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
@@ -565,7 +557,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
node->next = NULL;
node->lock = lock;
node->cpu = smp_processor_id();
- node->yield_cpu = -1;
+ node->sleepy = 0;
node->locked = 0;
tail = encode_tail_cpu(node->cpu);
@@ -599,9 +591,9 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
spec_barrier();
spin_end();
- /* Clear out stale propagated yield_cpu */
- if (paravirt && pv_yield_propagate_owner && node->yield_cpu != -1)
- node->yield_cpu = -1;
+ /* Clear out stale propagated sleepy */
+ if (paravirt && pv_yield_propagate_owner && node->sleepy)
+ node->sleepy = 0;
smp_rmb(); /* acquire barrier for the mcs lock */
@@ -644,7 +636,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
}
}
- propagate_yield_cpu(node, val, &set_yield_cpu, paravirt);
+ propagate_sleepy(node, val, &set_sleepy, paravirt);
preempted = yield_head_to_locked_owner(lock, val, paravirt);
if (!maybe_stealers)
continue;
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/6] powerpc/qspinlock: don't propagate the not-sleepy state
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
` (2 preceding siblings ...)
2023-10-16 12:43 ` [PATCH 3/6] powerpc/qspinlock: propagate owner preemptedness rather than CPU number Nicholas Piggin
@ 2023-10-16 12:43 ` Nicholas Piggin
2023-10-16 12:43 ` [PATCH 5/6] powerpc/qspinlock: Propagate sleepy if previous waiter is preempted Nicholas Piggin
` (5 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2023-10-16 12:43 UTC (permalink / raw)
To: linuxppc-dev
Cc: Shrikanth Hegde, Srikar Dronamraju, Nicholas Piggin, Nysal Jan K . A
To simplify things, don't propagate the not-sleepy condition back down
the queue. Instead, have the waiters clear their own node->sleepy when
finding the lock owner is not preempted.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/lib/qspinlock.c | 26 ++++++++------------------
1 file changed, 8 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 0932d24a6b07..6bb627e90a32 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -350,7 +350,7 @@ static __always_inline bool yield_head_to_locked_owner(struct qspinlock *lock, u
return __yield_to_locked_owner(lock, val, paravirt, mustq);
}
-static __always_inline void propagate_sleepy(struct qnode *node, u32 val, bool *set_sleepy, bool paravirt)
+static __always_inline void propagate_sleepy(struct qnode *node, u32 val, bool paravirt)
{
struct qnode *next;
int owner;
@@ -359,18 +359,17 @@ static __always_inline void propagate_sleepy(struct qnode *node, u32 val, bool *
return;
if (!pv_yield_propagate_owner)
return;
- if (*set_sleepy)
- return;
next = READ_ONCE(node->next);
if (!next)
return;
+ if (next->sleepy)
+ return;
+
owner = get_owner_cpu(val);
- if (vcpu_is_preempted(owner)) {
+ if (vcpu_is_preempted(owner))
next->sleepy = 1;
- *set_sleepy = true;
- }
}
/* Called inside spin_begin() */
@@ -385,12 +384,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
if (!pv_yield_propagate_owner)
goto yield_prev;
- if (!READ_ONCE(node->sleepy)) {
- /* Propagate back sleepy==false */
- if (node->next && node->next->sleepy)
- node->next->sleepy = 0;
- goto yield_prev;
- } else {
+ if (node->sleepy) {
u32 val = READ_ONCE(lock->val);
if (val & _Q_LOCKED_VAL) {
@@ -410,6 +404,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
if (preempted)
return preempted;
}
+ node->sleepy = false;
}
yield_prev:
@@ -533,7 +528,6 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
bool sleepy = false;
bool mustq = false;
int idx;
- bool set_sleepy = false;
int iters = 0;
BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
@@ -591,10 +585,6 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
spec_barrier();
spin_end();
- /* Clear out stale propagated sleepy */
- if (paravirt && pv_yield_propagate_owner && node->sleepy)
- node->sleepy = 0;
-
smp_rmb(); /* acquire barrier for the mcs lock */
/*
@@ -636,7 +626,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
}
}
- propagate_sleepy(node, val, &set_sleepy, paravirt);
+ propagate_sleepy(node, val, paravirt);
preempted = yield_head_to_locked_owner(lock, val, paravirt);
if (!maybe_stealers)
continue;
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/6] powerpc/qspinlock: Propagate sleepy if previous waiter is preempted
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
` (3 preceding siblings ...)
2023-10-16 12:43 ` [PATCH 4/6] powerpc/qspinlock: don't propagate the not-sleepy state Nicholas Piggin
@ 2023-10-16 12:43 ` Nicholas Piggin
2023-10-16 12:43 ` [PATCH 6/6] powerpc/qspinlock: Rename yield_propagate_owner tunable Nicholas Piggin
` (4 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2023-10-16 12:43 UTC (permalink / raw)
To: linuxppc-dev
Cc: Shrikanth Hegde, Srikar Dronamraju, Nicholas Piggin, Nysal Jan K . A
The sleepy (aka lock-owner-is-preempted) condition is propagated down
the queue by each waiter. If a waiter becomes preempted, it can no
longer propagate sleepy. To allow subsequent waiters to yield to the
lock owner, also check the lock owner in this case.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/lib/qspinlock.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 6bb627e90a32..c68c2bd7b853 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -384,7 +384,11 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
if (!pv_yield_propagate_owner)
goto yield_prev;
- if (node->sleepy) {
+ /*
+ * If the previous waiter was preempted it might not be able to
+ * propagate sleepy to us, so check the lock in that case too.
+ */
+ if (node->sleepy || vcpu_is_preempted(prev_cpu)) {
u32 val = READ_ONCE(lock->val);
if (val & _Q_LOCKED_VAL) {
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/6] powerpc/qspinlock: Rename yield_propagate_owner tunable
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
` (4 preceding siblings ...)
2023-10-16 12:43 ` [PATCH 5/6] powerpc/qspinlock: Propagate sleepy if previous waiter is preempted Nicholas Piggin
@ 2023-10-16 12:43 ` Nicholas Piggin
2023-10-18 7:41 ` [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Shrikanth Hegde
` (3 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2023-10-16 12:43 UTC (permalink / raw)
To: linuxppc-dev
Cc: Shrikanth Hegde, Srikar Dronamraju, Nicholas Piggin, Nysal Jan K . A
Rename yield_propagate_owner to yield_sleepy_owner, which better
describes what it does (what, not how).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/lib/qspinlock.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index c68c2bd7b853..5de4dd549f6e 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -44,7 +44,7 @@ static bool pv_sleepy_lock_sticky __read_mostly = false;
static u64 pv_sleepy_lock_interval_ns __read_mostly = 0;
static int pv_sleepy_lock_factor __read_mostly = 256;
static bool pv_yield_prev __read_mostly = true;
-static bool pv_yield_propagate_owner __read_mostly = true;
+static bool pv_yield_sleepy_owner __read_mostly = true;
static bool pv_prod_head __read_mostly = false;
static DEFINE_PER_CPU_ALIGNED(struct qnodes, qnodes);
@@ -357,7 +357,7 @@ static __always_inline void propagate_sleepy(struct qnode *node, u32 val, bool p
if (!paravirt)
return;
- if (!pv_yield_propagate_owner)
+ if (!pv_yield_sleepy_owner)
return;
next = READ_ONCE(node->next);
@@ -381,7 +381,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
if (!paravirt)
goto relax;
- if (!pv_yield_propagate_owner)
+ if (!pv_yield_sleepy_owner)
goto yield_prev;
/*
@@ -934,21 +934,21 @@ static int pv_yield_prev_get(void *data, u64 *val)
DEFINE_SIMPLE_ATTRIBUTE(fops_pv_yield_prev, pv_yield_prev_get, pv_yield_prev_set, "%llu\n");
-static int pv_yield_propagate_owner_set(void *data, u64 val)
+static int pv_yield_sleepy_owner_set(void *data, u64 val)
{
- pv_yield_propagate_owner = !!val;
+ pv_yield_sleepy_owner = !!val;
return 0;
}
-static int pv_yield_propagate_owner_get(void *data, u64 *val)
+static int pv_yield_sleepy_owner_get(void *data, u64 *val)
{
- *val = pv_yield_propagate_owner;
+ *val = pv_yield_sleepy_owner;
return 0;
}
-DEFINE_SIMPLE_ATTRIBUTE(fops_pv_yield_propagate_owner, pv_yield_propagate_owner_get, pv_yield_propagate_owner_set, "%llu\n");
+DEFINE_SIMPLE_ATTRIBUTE(fops_pv_yield_sleepy_owner, pv_yield_sleepy_owner_get, pv_yield_sleepy_owner_set, "%llu\n");
static int pv_prod_head_set(void *data, u64 val)
{
@@ -980,7 +980,7 @@ static __init int spinlock_debugfs_init(void)
debugfs_create_file("qspl_pv_sleepy_lock_interval_ns", 0600, arch_debugfs_dir, NULL, &fops_pv_sleepy_lock_interval_ns);
debugfs_create_file("qspl_pv_sleepy_lock_factor", 0600, arch_debugfs_dir, NULL, &fops_pv_sleepy_lock_factor);
debugfs_create_file("qspl_pv_yield_prev", 0600, arch_debugfs_dir, NULL, &fops_pv_yield_prev);
- debugfs_create_file("qspl_pv_yield_propagate_owner", 0600, arch_debugfs_dir, NULL, &fops_pv_yield_propagate_owner);
+ debugfs_create_file("qspl_pv_yield_sleepy_owner", 0600, arch_debugfs_dir, NULL, &fops_pv_yield_sleepy_owner);
debugfs_create_file("qspl_pv_prod_head", 0600, arch_debugfs_dir, NULL, &fops_pv_prod_head);
}
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
` (5 preceding siblings ...)
2023-10-16 12:43 ` [PATCH 6/6] powerpc/qspinlock: Rename yield_propagate_owner tunable Nicholas Piggin
@ 2023-10-18 7:41 ` Shrikanth Hegde
2023-10-20 10:04 ` Nysal Jan K.A.
` (2 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Shrikanth Hegde @ 2023-10-18 7:41 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Srikar Dronamraju, Nysal Jan K . A
On 10/16/23 6:12 PM, Nicholas Piggin wrote:
> This fixes a long-standing latency bug in the powerpc qspinlock
> implementation that quite a few people have reported and helped
> out with debugging.
>
> The first patch is a minimal fix that avoids the problem. The
> other patches are streamlining and improvements after the fix.
>
Hi Nick, Thanks for the fix. This issue has been happening in various
scenarios when there was vCPU contention.
Tested this on Power10 Shared processor LPAR(SPLPAR) based on powerVM.
System has two SPLPARs. on LPAR1 trying various scenarios and
LPAR2 is running constant stress-ng threads consuming 100% its CPU.
LPAR1: 96VP, 64EC and LPAR2 is 32VP, 32EC.
lscpu of LPAR1:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 768
On-line CPU(s) list: 0-767
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Scenarios tried on LPAR1:
1. run ppc64_cpu --smt=1 and ppc64_cpu --smt=8 to switch between SMT=1
and SMT=8
2. create a cgroup, assign 5% quota to it and run same number of
stress-ng as number of CPUs within that cgroup.
3. Run a suite of microbenchmarks such as unixbench, schbench, hackbench
stress-ng with perf enabled.
baseline was tip/master at 84ab57184ff4 (origin/master, origin/HEAD)
Merge branch into tip/master: 'x86/tdx'
Hard lockup was SEEN in each of the above scenario with baseline.
With this patch series applied hard lockup was NOT SEEN in each of
the above scenario.
So,
Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
> Thanks,
> Nick
>
> Nicholas Piggin (6):
> powerpc/qspinlock: Fix stale propagated yield_cpu
> powerpc/qspinlock: stop queued waiters trying to set lock sleepy
> powerpc/qspinlock: propagate owner preemptedness rather than CPU
> number
> powerpc/qspinlock: don't propagate the not-sleepy state
> powerpc/qspinlock: Propagate sleepy if previous waiter is preempted
> powerpc/qspinlock: Rename yield_propagate_owner tunable
>
> arch/powerpc/lib/qspinlock.c | 119 +++++++++++++++--------------------
> 1 file changed, 52 insertions(+), 67 deletions(-)
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
` (6 preceding siblings ...)
2023-10-18 7:41 ` [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Shrikanth Hegde
@ 2023-10-20 10:04 ` Nysal Jan K.A.
2023-10-20 12:01 ` (subset) " Michael Ellerman
2023-10-27 9:59 ` Michael Ellerman
9 siblings, 0 replies; 12+ messages in thread
From: Nysal Jan K.A. @ 2023-10-20 10:04 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: Shrikanth Hegde, linuxppc-dev, Srikar Dronamraju
On Mon, Oct 16, 2023 at 10:42:59PM +1000, Nicholas Piggin wrote:
> This fixes a long-standing latency bug in the powerpc qspinlock
> implementation that quite a few people have reported and helped
> out with debugging.
>
> The first patch is a minimal fix that avoids the problem. The
> other patches are streamlining and improvements after the fix.
>
> Thanks,
> Nick
>
> Nicholas Piggin (6):
> powerpc/qspinlock: Fix stale propagated yield_cpu
> powerpc/qspinlock: stop queued waiters trying to set lock sleepy
> powerpc/qspinlock: propagate owner preemptedness rather than CPU
> number
> powerpc/qspinlock: don't propagate the not-sleepy state
> powerpc/qspinlock: Propagate sleepy if previous waiter is preempted
> powerpc/qspinlock: Rename yield_propagate_owner tunable
>
> arch/powerpc/lib/qspinlock.c | 119 +++++++++++++++--------------------
> 1 file changed, 52 insertions(+), 67 deletions(-)
>
> --
> 2.42.0
>
Just a minor comment regarding patch 2.
For the series:
Reviewed-by: Nysal Jan K.A <nysal@linux.ibm.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: (subset) [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
` (7 preceding siblings ...)
2023-10-20 10:04 ` Nysal Jan K.A.
@ 2023-10-20 12:01 ` Michael Ellerman
2023-10-27 9:59 ` Michael Ellerman
9 siblings, 0 replies; 12+ messages in thread
From: Michael Ellerman @ 2023-10-20 12:01 UTC (permalink / raw)
To: linuxppc-dev, Nicholas Piggin
Cc: Shrikanth Hegde, Srikar Dronamraju, Nysal Jan K . A
On Mon, 16 Oct 2023 22:42:59 +1000, Nicholas Piggin wrote:
> This fixes a long-standing latency bug in the powerpc qspinlock
> implementation that quite a few people have reported and helped
> out with debugging.
>
> The first patch is a minimal fix that avoids the problem. The
> other patches are streamlining and improvements after the fix.
>
> [...]
Patch 1 applied to powerpc/fixes.
[1/6] powerpc/qspinlock: Fix stale propagated yield_cpu
https://git.kernel.org/powerpc/c/f9bc9bbe8afdf83412728f0b464979a72a3b9ec2
cheers
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: (subset) [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other
2023-10-16 12:42 [PATCH 0/6] powerpc/qspinlock: Fix yield latency bug and other Nicholas Piggin
` (8 preceding siblings ...)
2023-10-20 12:01 ` (subset) " Michael Ellerman
@ 2023-10-27 9:59 ` Michael Ellerman
9 siblings, 0 replies; 12+ messages in thread
From: Michael Ellerman @ 2023-10-27 9:59 UTC (permalink / raw)
To: linuxppc-dev, Nicholas Piggin
Cc: Shrikanth Hegde, Srikar Dronamraju, Nysal Jan K . A
On Mon, 16 Oct 2023 22:42:59 +1000, Nicholas Piggin wrote:
> This fixes a long-standing latency bug in the powerpc qspinlock
> implementation that quite a few people have reported and helped
> out with debugging.
>
> The first patch is a minimal fix that avoids the problem. The
> other patches are streamlining and improvements after the fix.
>
> [...]
Applied to powerpc/next.
[2/6] powerpc/qspinlock: stop queued waiters trying to set lock sleepy
https://git.kernel.org/powerpc/c/f6568647382c667912245c8d07aa26c9c6d4d0c8
[3/6] powerpc/qspinlock: propagate owner preemptedness rather than CPU number
https://git.kernel.org/powerpc/c/fd8fae50c9c6117c4e05954deab2cc515508666b
[4/6] powerpc/qspinlock: don't propagate the not-sleepy state
https://git.kernel.org/powerpc/c/fcf77d44274b96a55cc74043561a4b3151b9ad24
[5/6] powerpc/qspinlock: Propagate sleepy if previous waiter is preempted
https://git.kernel.org/powerpc/c/1e6d5f725738fff83e1c3cbdb8547142eb8820c2
[6/6] powerpc/qspinlock: Rename yield_propagate_owner tunable
https://git.kernel.org/powerpc/c/b629b541702b082d161c307c9d7d9867911fc933
cheers
^ permalink raw reply [flat|nested] 12+ messages in thread