* [PATCHv2 0/3] improve wait logic of stop_machine
@ 2019-06-13 10:35 Heiko Carstens
2019-06-13 10:35 ` [PATCHv2 1/3] processor: remove spin_cpu_yield Heiko Carstens
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Heiko Carstens @ 2019-06-13 10:35 UTC (permalink / raw)
To: Peter Zijlstra, Thomas Gleixner, Vasily Gorbik,
Christian Borntraeger, Michael Ellerman, Paul Mackerras
Cc: linux-arch, linux-kernel, linux-s390
FWIW, it would be nice to get some Acks.
Given that this patch set touches common code only in a way that
shouldn't have any effect on any architecture except s390, I'll add
this to the s390 tree, unless somebody objects.
RFC->v2:
Use cpumask_next_wrap as suggested by Peter Zijlstra.
RFC:
The stop_machine loop to advance the state machine and to wait for all
affected CPUs to check-in calls cpu_relax_yield in a tight loop until
the last missing CPUs acknowledged the state transition.
On a virtual system where not all logical CPUs are backed by real CPUs
all the time it can take a while for all CPUs to check-in. With the
current definition of cpu_relax_yield on s390 a diagnose 0x44 is done
which tells the hypervisor to schedule *some* other CPU. That can be
any CPU and not necessarily one of the CPUs that need to run in order
to advance the state machine. This can lead to a pretty bad diagnose
0x44 storm until the last missing CPU finally checked-in.
Replace the undirected cpu_relax_yield based on diagnose 0x44 with an
architecture specific directed yield. Each CPU in the wait loop will
pick up the next CPU in the cpumask of stop_machine. The diagnose 0x9c
is used to tell the hypervisor to run this next CPU instead of the
current one. If there is only a limited number of real CPUs backing
the virtual CPUs we end up with the real CPUs passed around in a
round-robin fashion.
Patches 1 and 3 are just possible cleanups; the interesting part is
patch 2.
Heiko Carstens (2):
processor: remove spin_cpu_yield
processor: get rid of cpu_relax_yield
Martin Schwidefsky (1):
s390: improve wait logic of stop_machine
arch/powerpc/include/asm/processor.h | 2 --
arch/s390/include/asm/processor.h | 7 +------
arch/s390/kernel/processor.c | 19 +++++++++++++------
arch/s390/kernel/smp.c | 2 +-
include/linux/processor.h | 9 ---------
include/linux/sched.h | 4 ----
include/linux/stop_machine.h | 1 +
kernel/stop_machine.c | 19 ++++++++++++++-----
8 files changed, 30 insertions(+), 33 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCHv2 1/3] processor: remove spin_cpu_yield
2019-06-13 10:35 [PATCHv2 0/3] improve wait logic of stop_machine Heiko Carstens
@ 2019-06-13 10:35 ` Heiko Carstens
2019-06-13 10:35 ` [PATCHv2 2/3] s390: improve wait logic of stop_machine Heiko Carstens
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Heiko Carstens @ 2019-06-13 10:35 UTC (permalink / raw)
To: Peter Zijlstra, Thomas Gleixner, Vasily Gorbik,
Christian Borntraeger, Michael Ellerman, Paul Mackerras
Cc: linux-arch, linux-kernel, linux-s390
spin_cpu_yield is unused, therefore remove it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
arch/powerpc/include/asm/processor.h | 2 --
include/linux/processor.h | 9 ---------
2 files changed, 11 deletions(-)
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index ef573fe9873e..a9993e7a443b 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -346,8 +346,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
#define spin_cpu_relax() barrier()
-#define spin_cpu_yield() spin_cpu_relax()
-
#define spin_end() HMT_medium()
#define spin_until_cond(cond) \
diff --git a/include/linux/processor.h b/include/linux/processor.h
index dbc952eec869..dc78bdc7079a 100644
--- a/include/linux/processor.h
+++ b/include/linux/processor.h
@@ -32,15 +32,6 @@
#define spin_cpu_relax() cpu_relax()
#endif
-/*
- * spin_cpu_yield may be called to yield (undirected) to the hypervisor if
- * necessary. This should be used if the wait is expected to take longer
- * than context switch overhead, but we can't sleep or do a directed yield.
- */
-#ifndef spin_cpu_yield
-#define spin_cpu_yield() cpu_relax_yield()
-#endif
-
#ifndef spin_end
#define spin_end()
#endif
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCHv2 2/3] s390: improve wait logic of stop_machine
2019-06-13 10:35 [PATCHv2 0/3] improve wait logic of stop_machine Heiko Carstens
2019-06-13 10:35 ` [PATCHv2 1/3] processor: remove spin_cpu_yield Heiko Carstens
@ 2019-06-13 10:35 ` Heiko Carstens
2019-06-13 10:35 ` [PATCHv2 3/3] processor: get rid of cpu_relax_yield Heiko Carstens
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Heiko Carstens @ 2019-06-13 10:35 UTC (permalink / raw)
To: Peter Zijlstra, Thomas Gleixner, Vasily Gorbik,
Christian Borntraeger, Michael Ellerman, Paul Mackerras
Cc: linux-arch, linux-kernel, linux-s390
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
The stop_machine loop to advance the state machine and to wait for all
affected CPUs to check-in calls cpu_relax_yield in a tight loop until
the last missing CPUs acknowledged the state transition.
On a virtual system where not all logical CPUs are backed by real CPUs
all the time it can take a while for all CPUs to check-in. With the
current definition of cpu_relax_yield a diagnose 0x44 is done which
tells the hypervisor to schedule *some* other CPU. That can be any
CPU and not necessarily one of the CPUs that need to run in order to
advance the state machine. This can lead to a pretty bad diagnose 0x44
storm until the last missing CPU finally checked-in.
Replace the undirected cpu_relax_yield based on diagnose 0x44 with a
directed yield. Each CPU in the wait loop will pick up the next CPU
in the cpumask of stop_machine. The diagnose 0x9c is used to tell the
hypervisor to run this next CPU instead of the current one. If there
is only a limited number of real CPUs backing the virtual CPUs we
end up with the real CPUs passed around in a round-robin fashion.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
arch/s390/include/asm/processor.h | 3 ++-
arch/s390/kernel/processor.c | 17 ++++++++++++-----
arch/s390/kernel/smp.c | 2 +-
include/linux/sched.h | 2 +-
kernel/stop_machine.c | 14 +++++++++-----
5 files changed, 25 insertions(+), 13 deletions(-)
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index b0fcbc37b637..445ce9ee4404 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -36,6 +36,7 @@
#ifndef __ASSEMBLY__
+#include <linux/cpumask.h>
#include <linux/linkage.h>
#include <linux/irqflags.h>
#include <asm/cpu.h>
@@ -225,7 +226,7 @@ static __no_kasan_or_inline unsigned short stap(void)
* Give up the time slice of the virtual PU.
*/
#define cpu_relax_yield cpu_relax_yield
-void cpu_relax_yield(void);
+void cpu_relax_yield(const struct cpumask *cpumask);
#define cpu_relax() barrier()
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 5de13307b703..4cdaefec1b7c 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -31,6 +31,7 @@ struct cpu_info {
};
static DEFINE_PER_CPU(struct cpu_info, cpu_info);
+static DEFINE_PER_CPU(int, cpu_relax_retry);
static bool machine_has_cpu_mhz;
@@ -58,13 +59,19 @@ void s390_update_cpu_mhz(void)
on_each_cpu(update_cpu_mhz, NULL, 0);
}
-void notrace cpu_relax_yield(void)
+void notrace cpu_relax_yield(const struct cpumask *cpumask)
{
- if (!smp_cpu_mtid && MACHINE_HAS_DIAG44) {
- diag_stat_inc(DIAG_STAT_X044);
- asm volatile("diag 0,0,0x44");
+ int cpu, this_cpu;
+
+ this_cpu = smp_processor_id();
+ if (__this_cpu_inc_return(cpu_relax_retry) >= spin_retry) {
+ __this_cpu_write(cpu_relax_retry, 0);
+ cpu = cpumask_next_wrap(this_cpu, cpumask, this_cpu, false);
+ if (cpu >= nr_cpu_ids)
+ return;
+ if (arch_vcpu_is_preempted(cpu))
+ smp_yield_cpu(cpu);
}
- barrier();
}
EXPORT_SYMBOL(cpu_relax_yield);
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 35fafa2b91a8..a8eef7b7770a 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -418,7 +418,7 @@ void smp_yield_cpu(int cpu)
diag_stat_inc_norecursion(DIAG_STAT_X09C);
asm volatile("diag %0,0,0x9c"
: : "d" (pcpu_devices[cpu].address));
- } else if (MACHINE_HAS_DIAG44) {
+ } else if (MACHINE_HAS_DIAG44 && !smp_cpu_mtid) {
diag_stat_inc_norecursion(DIAG_STAT_X044);
asm volatile("diag 0,0,0x44");
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 11837410690f..1f9f3160da7e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1519,7 +1519,7 @@ static inline int set_cpus_allowed_ptr(struct task_struct *p, const struct cpuma
#endif
#ifndef cpu_relax_yield
-#define cpu_relax_yield() cpu_relax()
+#define cpu_relax_yield(cpumask) cpu_relax()
#endif
extern int yield_to(struct task_struct *p, bool preempt);
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 2b5a6754646f..b8b0c5ff8da9 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -183,6 +183,7 @@ static int multi_cpu_stop(void *data)
struct multi_stop_data *msdata = data;
enum multi_stop_state curstate = MULTI_STOP_NONE;
int cpu = smp_processor_id(), err = 0;
+ const struct cpumask *cpumask;
unsigned long flags;
bool is_active;
@@ -192,15 +193,18 @@ static int multi_cpu_stop(void *data)
*/
local_save_flags(flags);
- if (!msdata->active_cpus)
- is_active = cpu == cpumask_first(cpu_online_mask);
- else
- is_active = cpumask_test_cpu(cpu, msdata->active_cpus);
+ if (!msdata->active_cpus) {
+ cpumask = cpu_online_mask;
+ is_active = cpu == cpumask_first(cpumask);
+ } else {
+ cpumask = msdata->active_cpus;
+ is_active = cpumask_test_cpu(cpu, cpumask);
+ }
/* Simple state machine */
do {
/* Chill out and ensure we re-read multi_stop_state. */
- cpu_relax_yield();
+ cpu_relax_yield(cpumask);
if (msdata->state != curstate) {
curstate = msdata->state;
switch (curstate) {
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCHv2 3/3] processor: get rid of cpu_relax_yield
2019-06-13 10:35 [PATCHv2 0/3] improve wait logic of stop_machine Heiko Carstens
2019-06-13 10:35 ` [PATCHv2 1/3] processor: remove spin_cpu_yield Heiko Carstens
2019-06-13 10:35 ` [PATCHv2 2/3] s390: improve wait logic of stop_machine Heiko Carstens
@ 2019-06-13 10:35 ` Heiko Carstens
2019-06-13 13:48 ` [PATCHv2 0/3] improve wait logic of stop_machine Peter Zijlstra
2019-06-13 19:57 ` Thomas Gleixner
4 siblings, 0 replies; 6+ messages in thread
From: Heiko Carstens @ 2019-06-13 10:35 UTC (permalink / raw)
To: Peter Zijlstra, Thomas Gleixner, Vasily Gorbik,
Christian Borntraeger, Michael Ellerman, Paul Mackerras
Cc: linux-arch, linux-kernel, linux-s390
stop_machine is the only user left of cpu_relax_yield. Given that it
now has special semantics which are tied to stop_machine introduce a
weak stop_machine_yield function which architectures can override, and
get rid of the generic cpu_relax_yield implementation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
arch/s390/include/asm/processor.h | 6 ------
arch/s390/kernel/processor.c | 4 ++--
include/linux/sched.h | 4 ----
include/linux/stop_machine.h | 1 +
kernel/stop_machine.c | 7 ++++++-
5 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 445ce9ee4404..14883b1562e0 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -222,12 +222,6 @@ static __no_kasan_or_inline unsigned short stap(void)
return cpu_address;
}
-/*
- * Give up the time slice of the virtual PU.
- */
-#define cpu_relax_yield cpu_relax_yield
-void cpu_relax_yield(const struct cpumask *cpumask);
-
#define cpu_relax() barrier()
#define ECAG_CACHE_ATTRIBUTE 0
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 4cdaefec1b7c..6ebc2117c66c 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -7,6 +7,7 @@
#define KMSG_COMPONENT "cpu"
#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+#include <linux/stop_machine.h>
#include <linux/cpufeature.h>
#include <linux/bitops.h>
#include <linux/kernel.h>
@@ -59,7 +60,7 @@ void s390_update_cpu_mhz(void)
on_each_cpu(update_cpu_mhz, NULL, 0);
}
-void notrace cpu_relax_yield(const struct cpumask *cpumask)
+void notrace stop_machine_yield(const struct cpumask *cpumask)
{
int cpu, this_cpu;
@@ -73,7 +74,6 @@ void notrace cpu_relax_yield(const struct cpumask *cpumask)
smp_yield_cpu(cpu);
}
}
-EXPORT_SYMBOL(cpu_relax_yield);
/*
* cpu_init - initializes state that is per-CPU.
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1f9f3160da7e..911675416b05 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1518,10 +1518,6 @@ static inline int set_cpus_allowed_ptr(struct task_struct *p, const struct cpuma
}
#endif
-#ifndef cpu_relax_yield
-#define cpu_relax_yield(cpumask) cpu_relax()
-#endif
-
extern int yield_to(struct task_struct *p, bool preempt);
extern void set_user_nice(struct task_struct *p, long nice);
extern int task_prio(const struct task_struct *p);
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 6d3635c86dbe..f9a0c6189852 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -36,6 +36,7 @@ int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
void stop_machine_park(int cpu);
void stop_machine_unpark(int cpu);
+void stop_machine_yield(const struct cpumask *cpumask);
#else /* CONFIG_SMP */
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index b8b0c5ff8da9..b4f83f7bdf86 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -177,6 +177,11 @@ static void ack_state(struct multi_stop_data *msdata)
set_state(msdata, msdata->state + 1);
}
+void __weak stop_machine_yield(const struct cpumask *cpumask)
+{
+ cpu_relax();
+}
+
/* This is the cpu_stop function which stops the CPU. */
static int multi_cpu_stop(void *data)
{
@@ -204,7 +209,7 @@ static int multi_cpu_stop(void *data)
/* Simple state machine */
do {
/* Chill out and ensure we re-read multi_stop_state. */
- cpu_relax_yield(cpumask);
+ stop_machine_yield(cpumask);
if (msdata->state != curstate) {
curstate = msdata->state;
switch (curstate) {
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCHv2 0/3] improve wait logic of stop_machine
2019-06-13 10:35 [PATCHv2 0/3] improve wait logic of stop_machine Heiko Carstens
` (2 preceding siblings ...)
2019-06-13 10:35 ` [PATCHv2 3/3] processor: get rid of cpu_relax_yield Heiko Carstens
@ 2019-06-13 13:48 ` Peter Zijlstra
2019-06-13 19:57 ` Thomas Gleixner
4 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2019-06-13 13:48 UTC (permalink / raw)
To: Heiko Carstens
Cc: Thomas Gleixner, Vasily Gorbik, Christian Borntraeger,
Michael Ellerman, Paul Mackerras, linux-arch, linux-kernel,
linux-s390
On Thu, Jun 13, 2019 at 12:35:07PM +0200, Heiko Carstens wrote:
> Heiko Carstens (2):
> processor: remove spin_cpu_yield
> processor: get rid of cpu_relax_yield
>
> Martin Schwidefsky (1):
> s390: improve wait logic of stop_machine
>
> arch/powerpc/include/asm/processor.h | 2 --
> arch/s390/include/asm/processor.h | 7 +------
> arch/s390/kernel/processor.c | 19 +++++++++++++------
> arch/s390/kernel/smp.c | 2 +-
> include/linux/processor.h | 9 ---------
> include/linux/sched.h | 4 ----
> include/linux/stop_machine.h | 1 +
> kernel/stop_machine.c | 19 ++++++++++++++-----
> 8 files changed, 30 insertions(+), 33 deletions(-)
Seems sensible to me.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2 0/3] improve wait logic of stop_machine
2019-06-13 10:35 [PATCHv2 0/3] improve wait logic of stop_machine Heiko Carstens
` (3 preceding siblings ...)
2019-06-13 13:48 ` [PATCHv2 0/3] improve wait logic of stop_machine Peter Zijlstra
@ 2019-06-13 19:57 ` Thomas Gleixner
4 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2019-06-13 19:57 UTC (permalink / raw)
To: Heiko Carstens
Cc: Peter Zijlstra, Vasily Gorbik, Christian Borntraeger,
Michael Ellerman, Paul Mackerras, linux-arch, linux-kernel,
linux-s390
On Thu, 13 Jun 2019, Heiko Carstens wrote:
>
> Heiko Carstens (2):
> processor: remove spin_cpu_yield
> processor: get rid of cpu_relax_yield
>
> Martin Schwidefsky (1):
> s390: improve wait logic of stop_machine
>
> arch/powerpc/include/asm/processor.h | 2 --
> arch/s390/include/asm/processor.h | 7 +------
> arch/s390/kernel/processor.c | 19 +++++++++++++------
> arch/s390/kernel/smp.c | 2 +-
> include/linux/processor.h | 9 ---------
> include/linux/sched.h | 4 ----
> include/linux/stop_machine.h | 1 +
> kernel/stop_machine.c | 19 ++++++++++++++-----
> 8 files changed, 30 insertions(+), 33 deletions(-)
Acked-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-06-13 19:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-13 10:35 [PATCHv2 0/3] improve wait logic of stop_machine Heiko Carstens
2019-06-13 10:35 ` [PATCHv2 1/3] processor: remove spin_cpu_yield Heiko Carstens
2019-06-13 10:35 ` [PATCHv2 2/3] s390: improve wait logic of stop_machine Heiko Carstens
2019-06-13 10:35 ` [PATCHv2 3/3] processor: get rid of cpu_relax_yield Heiko Carstens
2019-06-13 13:48 ` [PATCHv2 0/3] improve wait logic of stop_machine Peter Zijlstra
2019-06-13 19:57 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).